How do I escape a left paren in a Perl regex? - regex

If I have a file containing some escaped parens, how can I replace all instances with an unescaped paren using Perl?
i.e. turn this:
.... foo\(bar ....
into this
.... foo(bar ....
I tried the following but receivied this error message:
perl -pe "s/\\\(/\(/g" ./file
Unmatched ( in regex; marked by <-- HERE in m/\\( <-- HERE / at -e line 1.

You're forgetting that backslashes mean something to the shell, too. Try using single quotes instead of double quotes. (Or put your script in a file, where you won't need to worry about shell quoting.)

Gah. From command line, no less. Way too many levels of metacharacter interpretation.
Try replacing your double quotes with single quotes, see if that helps.

cjm's answer is probably the best. If you must do it at the command line, try using quotemeta() or the metaquoting escape sequence (\Q...\E). This worked for me in a bash prompt:
perl -pe "s/\Q\(\E/(/g" ./file

Related

sed: Replace lines matching a pattern that contains forward slashes?

I know this question has been asked before, I just can't seem to get the correct syntax for my sed command.
I need to replace OPP/com.user.opp.orchest.po.services.stub-npo/npo-stub with OPP/com.user.opp.orchest.po.services.stub-ica/npo-ica
A snippet of the file I am replacing it is the following:
config.xml
<compareType>PLAIN</compareType>
<pattern>
OPP/com.user.opp.orchest.po.services.stub-npo/npo-stub
</pattern>
<branches>
<com.sonyuser.hudson.plugins.gerrit.trigger.hudsontrigger.data.Branch>
<compareType>ANT</compareType>
<pattern>master</pattern>
</com.sonyuser.hudson.plugins.gerrit.trigger.hudsontrigger.data.Branch>
</branches>
${REPO_MIRROR}/OPP/com.user.opp.orchest.po.services.stub-npo/npo-stub
I have tried the following,
sed -i '/^\/OPP/\com.user.opp.orchest.po.services.stub-npo/\npo-stub\/OPP/\com.user.opp.orchest.po.services.stub-ica/\npo-ica/g' config.xml
In your command, you are missing s for substitution and have wrongly escaped \ character. Also as you replied to my comment, that you want to replace it from anywhere in the file, you don't have to use ^ character in your regex. And dot . in regex means any character so they need to be escaped too.
You can use this command,
sed -i 's/OPP\/com\.user\.opp\.orchest\.po\.services\.stub-npo\/npo-stub/OPP\/com.user.opp.orchest.po.services.stub-ica\/npo-ica/g' yourfilename
You need to specify s command and replace the /\ with \/. There are some other typos here as well (\/ at the start is not necessary). Also, escape dots to match literal dots. A good idea is to use some other delimiter here instead of /, e.g. ,, because you have / chars in the regex and replacement parts.
You may use
sed -i 's,^OPP/com\.user\.opp\.orchest\.po\.services\.stub-npo/npo-stub,OPP/com.user.opp.orchest.po.services.stub-ica/npo-ica,' file
See the online demo

Conditional in perl regex replacement

I'm trying to return different replacement results with a perl regex one-liner if it matches a group. So far I've got this:
echo abcd | perl -pe "s/(ab)(cd)?/defined($2)?\1\2:''/e"
But I get
Backslash found where operator expected at -e line 1, near "1\"
(Missing operator before \?)
syntax error at -e line 1, near "1\"
Execution of -e aborted due to compilation errors.
If the input is abcd I want to get abcd out, if it's ab I want to get an empty string. Where am I going wrong here?
You used regex atoms \1 and \2 (match what the first or second capture captured) outside of a regex pattern. You meant to use $1 and $2 (as you did in another spot).
Further more, dollar signs inside double-quoted strings have meaning to your shell. It's best to use single quotes around your program[1].
echo abcd | perl -pe's/(ab)(cd)?/defined($2)?$1.$2:""/e'
Simpler:
echo abcd | perl -pe's/(ab(cd)?)/defined($2)?$1:""/e'
Simpler:
echo abcd | perl -pe's/ab(?!cd)//'
Either avoid single-quotes in your program[2], or use '\'' to "escape" them.
You can usually use q{} instead of single-quotes. You can also switch to using double-quotes. Inside of double-quotes, you can use \x27 for an apostrophe.
Why torture yourself, just use a branch reset.
Find (?|(abcd)|ab())
Replace $1
And a couple of even better ways
Find abcd(*SKIP)(*FAIL)|ab
Replace ""
Find (?:abcd)*\Kab
Replace ""
These use regex wisely.
There is really no need nowadays to have to use the eval form
of the regex substitution construct s///e in conjunction with defined().
This is especially true when using the perl command line.
Good luck...

How do I reference a shell variable and arbitrary digits inside a grep regex?

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.
inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

BASH escaping double quotes within single quotes

I'm trying to write a bash function that would escape all double quotes within single quotes, eg:
'I need to escape "these" quotes with backslashes'
would become
'I need to escape \"these\" quotes with backslashes'
My take on it was:
Find pairs of single quotes in the input and extract them with grep
Pipe into sed, escape double quotes
Sed again the whole input and replace grep match with sedded match
I managed to get it working to the part of having correctly escaped quotes section, but replacing it in the whole input fails.
The script code copypaste:
# $1 - Full name, $2 - minified name
adjust_quotes ()
{
SINGLE_QUOTES=`grep -Eo "'.*'" $2`
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
sed -r "s|'.*'|$ESCAPED_QUOTES|g" "$2" > "$2.escaped"
mv "$2.escaped" $2
echo "Quotes escaped within single quotes on $2"
}
Random additional questions:
In the console, escaping the quote with only two backslashes works, but when code is put in the script - I need four. I'd love to know
Could I modify this code into a loop to escape all pairs of single quotes, one after another until EOF?
Thanks!
P.S. I know this would probably be easier to do in eg. python, but I really need to keep it in bash.
Using BASH string replacement:
s='I need to escape "these" quotes with backslashes'
r="${s//\"/\\\"}"
echo "$r"
I need to escape \"these\" quotes with backslashes
Here's a pure bash solution, which does the transformation on stdin, printing to stdout. It reads the entire input into memory, so it won't work with really enormous files.
escape_enclosed_quotes() (
IFS=\'
read -d '' -r -a fields
for ((i=1; i<${#fields[#]}; i+=2)); do
fields[i]=${fields[i]//\"/\\\"}
done
printf %s "${fields[*]}"
)
I deliberately enclosed the body of the function in parentheses rather than braces, in order to force the body to run in a subshell. That limits the modification of IFS to the body, as well as implicitly making the variables used local.
The function uses the read builtin to read the entire input (since the line delimiter is set to NUL with -d '') into an array (-a) using a single quote as the field separator (IFS=\'). The result is that the parts of the input surrounded with single quotes are in the odd positions of the array, so the function loops over the odd indices to do the substitution only for those fields. I use bash's find-and-replace syntax instead of deferring to an external utility like sed.
This being bash, there are a couple of gotchas:
If the file contains a NUL, the rest of the file will be ignored.
If the last line of the file does not end with a newline, and the last character of that line is a single quote, it will not be output.
Both of the above conditions are impossible in a portable text file, so it's probably OK. All the same, worth taking note.
The supplementary question: why are the extra backslashes needed in
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
Answer: It has nothing to do with that line being in a script. It has to do with your use of backticks (...) for command substitution, and the idiosyncratic and often unpredictable handling of backslashes inside backticks. This syntax is deprecated. Do not use it. (Not even if you see someone else using it in some random example on the internet.) If you had used the recommended $(...) syntax for command substitution, it would have worked as expected:
ESCAPED_QUOTES=$(echo $SINGLE_QUOTES | sed 's|"|\\"|g')
(More information is in the Bash FAQ linked above.)

sed: Replacing a double quote in a quoted field within a delmited record

Given an optionally quoted, pipe delimited file with the following records:
"foo"|"bar"|123|"9" Nails"|"2"
"blah"|"blah"|456|"Guns "N" Roses"|"7"
"brik"|"brak"|789|""BB" King"|"0"
"yin"|"yang"|789|"John "Cougar" Mellencamp"|"5"
I want to replace any double quotes not next to a delimiter.
I used the following and it almost works. With one exception.
sed "s/\([^|]\)\"\([^|]\)/\1'\2/g" a.txt
The output looks like this:
"foo"|"bar"|123|"9' Nails"|"2"
"blah"|"blah"|456|"Guns 'N" Roses"|"7"
"brik"|"brak"|789|"'BB' King"|"0"
"yin"|"yang"|789|"John 'Cougar' Mellencamp"|"5"
It doesn't catch the second set of quotes if they are separated by a single character as in Guns "N" Roses. Does anyone know why that is and how it can be fixed? In the mean time I'm just piping the output to a second regex to handle the special case. I'd prefer to do this in one pass since some of the files can be largish.
Thanks in advance.
You can use substitution twice in sed:
sed -r "s/([^|])\"([^|])/\1'\2/g; s/([^|])\"([^|])/\1'\2/g" file
"foo"|"bar"|123|"9' Nails"|"2"
"blah"|"blah"|456|"Guns 'N' Roses"|"7"
"brik"|"brak"|789|"'BB' King"|"0"
"yin"|"yang"|789|"John 'Cougar' Mellencamp"|"5"
sed kind of implements a "while" loop:
sed ':a; s/\([^|]\)"\([^|]\)/\1'\''\2/g; ta' file
The t command loops to the label a if the previous s/// command replaced something. So that will repeat the replacement until no other matches are found.
Also, perl handles your case without looping, thanks to zero-width look-ahead:
perl -pe 's/[^|]\K"(?!\||$)/'\''/g'
But it doesn't handle consecutive double quotes, so the loop:
perl -pe 's//'\''/g while /[^|]\K"(?!\||$)/' file
You may like to use \x27 instead of the awkward '\'' method to insert a single quote in a single quoted string. Works with perl and GNU sed.