Can't understand perl regex to unquote meta - regex

I came across this awesome regex:
s/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg
It does magic, but is so obscure I can't understand it. It works very well:
echo 'a\tb\nc\r\n' | perl -lpe 's/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg'
a b
c
Let us watch it with cat -A :
echo 'a\tb\nc\r\n' | perl -lpe 's/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg' | cat -A
a^Ib$
c^M$
$
I will keep it for future reference, but it would be really cool to understand it. I know /ee modifier evaluates RHS, but what are those qqs? Is the function qq for double quotes ? I would appreciate if someone could explain.
PS. I found this regex here

In perl re's you have single and double quotes, where "$foo" is expanded and '$foo' is literal.
The q operator lets you set which character does '
The qqoperator sets the character for ".
So in the awesome example, [ is getting set to expand variables, and perl magic is making it more readable by pairing ] with [. So it's expanding the variable twice, which without that highlighting would be deeply mysterious, and the " quotes get very confusing when mixed in with shell quoting.
A simple example to try out :
% perl -E '$foo=bar; say qq[$foo];'
bar
%

qq is the interpolating quote operator. It's the same thing as putting a string between double quotes, but can use open-close character pairs like [] here. This has the advantage that you can nest it, which you couldn't do with double quotes.

Related

Replacing all commas inside quotations marks (or the inverse)

Let's say I have the following sentence
Apples, "This, is, a test",409, James,46,90
I want to change the commas inside the quotation marks by ;. Or, alternatively, the ones outside the quotation marks by the same character ;. So far I thought of something like
perl -pe 's/(".*)\K,(?=.*")/;/g' <mystring>
However, this is only matching the last comma inside quotation marks because I am restarting the regex engine with \K. I also tried some regex's to change the commas outside quotation marks but I can't get it to work.
Note that the spaces after commas outside the quotation marks are there on purpose, so that
perl -pe 's/,\s/;/g' <mystring>
is not a valid answer.
The desired output would be
Apples, "This; is; a test",409, James,46,90
Or alternatively
Apples; "This, is, a test";409; James;46;90
Any thoughts on how to approach this problem?
I'd use an actual CSV parser instead of trying to hack something up with regular expressions. The very useful Text::AutoCSV module makes it easy to convert the comma field separators to semicolons in a one-liner:
$ echo 'Apples, "This, is, a test",409, James,46,90' |
perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_sep_char => ";")->write()'
Apples;"This, is, a test";409;James;46;90
For a non-perl solution, csvformat from csvkit is another handy tool, though it's harder to get the quoting the same:
$ echo 'Apples, "This, is, a test",409, James,46,90' |
csvformat -S -U2 -D';'
"Apples";"This, is, a test";"409";"James";"46";"90"
Or (Self promotion alert!) my tawk utility (Which also won't get the quotes the same):
$ echo 'Apples, "This, is, a test",409, James,46,90' |
tawk -csv -quoteall 'line { set F(1) $F(1); print }' OFS=";"
"Apples";" This, is, a test";"409";" James";"46";"90"

Perl Regex Command Line Issue

I'm trying to use a negative lookahead in perl in command line:
echo 1.41.1 | perl -pe "s/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g"
to get an incremented version that looks like this:
1.41.2
but its just returning me:
![0-9]+\.[0-9]+\.: event not found
i've tried it in regex101 (PCRE) and it works fine, so im not sure why it doesn't work here
In Bash, ! is the "history expansion character", except when escaped with a backslash or single-quotes. (Double-quotes do not disable this; that is, history expansion is supported inside double-quotes. See Difference between single and double quotes in Bash)
So, just change your double-quotes to single-quotes:
echo 1.41.1 | perl -pe 's/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g'
and voilĂ :
1.41.2
I'm guessing that this expression also might work:
([0-9.]+)\.([0-9]+)
Test
perl -e'
my $name = "1.41.1";
$name =~ s/([0-9.]+)\.([0-9]+)/$1\.2/;
print "$name\n";
'
Output
1.41.2
Please see the demo here.
If you want to "increment" a number then you can't hard-code the new value but need to capture what is there and increment that
echo "1.41.1" | perl -pe's/[0-9]+\.[0-9]+\.\K([0-9]+)/$1+1/e'
Here /e modifier makes it so that the replacement side is evaluated as code, and we can +1 the captured number, what is then substituted. The \K drops previous matches so we don't need to put them back; see "Lookaround Assertions" in Extended Patterns in perlre.
The lookarounds are sometimes just the thing you want, but they increase the regex complexity (just by being there), can be tricky to get right, and hurt efficiency. They aren't needed here.
The strange output you get is because the double quotes used around the Perl program "invite" the shell to look at what's inside whereby it interprets the ! as history expansion and runs that, as explained in ruakh's post.
As an alternate to lookahead, we can use capture groups, e.g. the following will capture the version number into 3 capture groups.
(\d+)\.(\d+)\.(\d+)
If you wanted to output the captured version number as is, it would be:
\1.\2.\3
And to just replace the 3rd part with the number "2" would be:
\1.\2.2
To adapt this to the OP's question, it would be:
$ echo 1.14.1 | perl -pe 's/(\d+)\.(\d+)\.(\d+)/\1.\2.2/'
1.14.2
$

Conditional in perl regex replacement

I'm trying to return different replacement results with a perl regex one-liner if it matches a group. So far I've got this:
echo abcd | perl -pe "s/(ab)(cd)?/defined($2)?\1\2:''/e"
But I get
Backslash found where operator expected at -e line 1, near "1\"
(Missing operator before \?)
syntax error at -e line 1, near "1\"
Execution of -e aborted due to compilation errors.
If the input is abcd I want to get abcd out, if it's ab I want to get an empty string. Where am I going wrong here?
You used regex atoms \1 and \2 (match what the first or second capture captured) outside of a regex pattern. You meant to use $1 and $2 (as you did in another spot).
Further more, dollar signs inside double-quoted strings have meaning to your shell. It's best to use single quotes around your program[1].
echo abcd | perl -pe's/(ab)(cd)?/defined($2)?$1.$2:""/e'
Simpler:
echo abcd | perl -pe's/(ab(cd)?)/defined($2)?$1:""/e'
Simpler:
echo abcd | perl -pe's/ab(?!cd)//'
Either avoid single-quotes in your program[2], or use '\'' to "escape" them.
You can usually use q{} instead of single-quotes. You can also switch to using double-quotes. Inside of double-quotes, you can use \x27 for an apostrophe.
Why torture yourself, just use a branch reset.
Find (?|(abcd)|ab())
Replace $1
And a couple of even better ways
Find abcd(*SKIP)(*FAIL)|ab
Replace ""
Find (?:abcd)*\Kab
Replace ""
These use regex wisely.
There is really no need nowadays to have to use the eval form
of the regex substitution construct s///e in conjunction with defined().
This is especially true when using the perl command line.
Good luck...

How do I reference a shell variable and arbitrary digits inside a grep regex?

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.
inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

Why do I get "-bash: syntax error near unexpected token `('" when I run my Perl one-liner?

This is driving me insane. Here's my dilemma, I have a file in which I need to make a match. Usually I use Perl and it works like a charm but in this case I am writing a shell script and for some reason it is throwing errors.
Here is what I am trying to match:
loop_loopStorage_rev='latest.integration'
I need to match loop and latest.integration.
This is my regex:
^(?!\#)(loop_.+rev).*[\'|\"](.*)[\'|\"]$
When I use this in a Perl script, $1 and $2 give me the appropriate output. If I do this:
perl -nle "print qq{$1 => $2} while /^(?!#)(loop_.+rev).+?[\'|\"](.+?)[\'|\"]$/g" non-hadoop.env
I get the error:
-bash: syntax error near unexpected token `('
I believe it has something to do with the beginning part of my regex. So my real question is would there be an easier solution using sed, egrep or awk? If so, does any one know where to begin?
Using single quotes around your arguments to prevent special processing of $, \, etc. If you need to include a single quote within, the generic solution is to use '\''. In this particular case, however, we can avoid trying to include a ' by using the equivalent \x27 in the regex pattern instead.
perl -nle'
print "$1 => $2"
while /^(?!#)(loop_.+rev).+?[\x27\"|](.+?)[\x27\"|]$/g;
' non-hadoop.env
[I added some line breaks for readability. You can actually leave them in if you want to, but you don't need to.]
Note that there are some problems with your regex pattern.
(?!\#)(loop_.+rev) is the same as (loop_.+rev) since l isn't #, so (?!\#) isn't doing whatever you think it's doing.
[\'|\"] matches ', " and |, but I think you only meant it to match ' and ". If so, you want to use [\'\"], which can be simplified to ['"].
Don't use the non-greedy modifier (? after +, *, etc). It's used for optimization, not for excluding characters. In fact, the second ? in your pattern has absolutely no effect, so it's not doing what you think it's doing.
Fixed?
perl -nle'
print "$1 => $2"
while /^(loop_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g;
' non-hadoop.env
Double quotes cause Bash to replace variable references like $1 and $2 with the values of these shell variables. Use single quotes around your Perl script to avoid this (or quote every dollar sign, backtick, etc in the script).
However, you cannot escape single quotes inside single quotes easily; a common workaround in Perl strings is to use the character code \x27 instead. If you need single-quoted Perl strings, use the generalized single-quoting operator q{...}.
If you need to interpolate a shell variable name, a common trick is to use "see-saw" quoting. The string 'str'"in"'g' in the shell is equal to 'string' after quote removal; you can similarly use adjacent single-quoted and double-quoted strings to build your script ... although it does tend to get rather unreadable.
perl -nle 'print "Instance -> $1\nRevision -> $2"
while /^(?!#)('"$NAME"'_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g' non-hadoop.en
(Notice that the options -nle are not part of the script; the script is the quoted argument to the -e option. In fact perl '-nle script ...' coincidentally works, but it is decidedly unidiomatic, to the point of confusing.)
I ended up figuring out due to all of you guys help. Thanks again. Here is my final command
perl -nle 'print "$1 $2" while /^($ENV{NAME}_.+rev).+\x27(.+)\x27/g;' $ENVFILE