How do I reference a shell variable and arbitrary digits inside a grep regex? - regex

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt

I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.

inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

Related

Perl Regex Command Line Issue

I'm trying to use a negative lookahead in perl in command line:
echo 1.41.1 | perl -pe "s/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g"
to get an incremented version that looks like this:
1.41.2
but its just returning me:
![0-9]+\.[0-9]+\.: event not found
i've tried it in regex101 (PCRE) and it works fine, so im not sure why it doesn't work here
In Bash, ! is the "history expansion character", except when escaped with a backslash or single-quotes. (Double-quotes do not disable this; that is, history expansion is supported inside double-quotes. See Difference between single and double quotes in Bash)
So, just change your double-quotes to single-quotes:
echo 1.41.1 | perl -pe 's/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g'
and voilĂ :
1.41.2
I'm guessing that this expression also might work:
([0-9.]+)\.([0-9]+)
Test
perl -e'
my $name = "1.41.1";
$name =~ s/([0-9.]+)\.([0-9]+)/$1\.2/;
print "$name\n";
'
Output
1.41.2
Please see the demo here.
If you want to "increment" a number then you can't hard-code the new value but need to capture what is there and increment that
echo "1.41.1" | perl -pe's/[0-9]+\.[0-9]+\.\K([0-9]+)/$1+1/e'
Here /e modifier makes it so that the replacement side is evaluated as code, and we can +1 the captured number, what is then substituted. The \K drops previous matches so we don't need to put them back; see "Lookaround Assertions" in Extended Patterns in perlre.
The lookarounds are sometimes just the thing you want, but they increase the regex complexity (just by being there), can be tricky to get right, and hurt efficiency. They aren't needed here.
The strange output you get is because the double quotes used around the Perl program "invite" the shell to look at what's inside whereby it interprets the ! as history expansion and runs that, as explained in ruakh's post.
As an alternate to lookahead, we can use capture groups, e.g. the following will capture the version number into 3 capture groups.
(\d+)\.(\d+)\.(\d+)
If you wanted to output the captured version number as is, it would be:
\1.\2.\3
And to just replace the 3rd part with the number "2" would be:
\1.\2.2
To adapt this to the OP's question, it would be:
$ echo 1.14.1 | perl -pe 's/(\d+)\.(\d+)\.(\d+)/\1.\2.2/'
1.14.2
$

Conditional in perl regex replacement

I'm trying to return different replacement results with a perl regex one-liner if it matches a group. So far I've got this:
echo abcd | perl -pe "s/(ab)(cd)?/defined($2)?\1\2:''/e"
But I get
Backslash found where operator expected at -e line 1, near "1\"
(Missing operator before \?)
syntax error at -e line 1, near "1\"
Execution of -e aborted due to compilation errors.
If the input is abcd I want to get abcd out, if it's ab I want to get an empty string. Where am I going wrong here?
You used regex atoms \1 and \2 (match what the first or second capture captured) outside of a regex pattern. You meant to use $1 and $2 (as you did in another spot).
Further more, dollar signs inside double-quoted strings have meaning to your shell. It's best to use single quotes around your program[1].
echo abcd | perl -pe's/(ab)(cd)?/defined($2)?$1.$2:""/e'
Simpler:
echo abcd | perl -pe's/(ab(cd)?)/defined($2)?$1:""/e'
Simpler:
echo abcd | perl -pe's/ab(?!cd)//'
Either avoid single-quotes in your program[2], or use '\'' to "escape" them.
You can usually use q{} instead of single-quotes. You can also switch to using double-quotes. Inside of double-quotes, you can use \x27 for an apostrophe.
Why torture yourself, just use a branch reset.
Find (?|(abcd)|ab())
Replace $1
And a couple of even better ways
Find abcd(*SKIP)(*FAIL)|ab
Replace ""
Find (?:abcd)*\Kab
Replace ""
These use regex wisely.
There is really no need nowadays to have to use the eval form
of the regex substitution construct s///e in conjunction with defined().
This is especially true when using the perl command line.
Good luck...

Find all text within square brackets using regex

I have a problem that because of PHP version, I need to change my code from $array[stringindex] to $array['stringindex'];
So I want to find all the text using regex, and replace them all. How to find all strings that look like this? $array[stringindex].
Here's a solution in PHP:
$re = "/(\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\])/";
$str = "here is \$array[stringindex] but not \$array['stringindex'] nor \$3array[stringindex] nor \$array[4stringindex]";
$subst = "$1'$2'$3";
$result = preg_replace($re, $subst, $str);
You can try it out interactively here. I search for variables beginning with a letter, otherwise things like $foo[42] would be converted to $foo['42'], which might not be desirable.
Note that all the solutions here will not handle every case correctly.
Looking at the Sublime Text regex help, it would seem you could just paste (\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\]) into the Search box and $1'$2'$3 into the Replace field.
It depends of the tool you want to use to do the replacement.
with sed for exemple, it would be something like that:
sed "s/\(\$array\)\[\([^]]*\)\]/\1['\2']/g"
If sed is allowed you could simply do:
sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" file
Explanation:
sed "s/pattern/replace/g" is a sed command which searches for pattern and replaces it with replace. The g options means replace multiple times per line.
(\$[^[]*[)([^]]*)] this pattern consists of two groups (in between brackets). The first is a dollar followed by a series of non [ chars. Then an opening square bracket follows, followed by a series of non closing brackets which is then followed by a closing square bracket.
\1'\2'] the replacement string: \1 means insert the first captured group (analogous for \2. Basically we wrap \2 in quotes (which is what you wanted).
the -i options means that the changes should be applied to the original file, which is supplied at the end.
For more information, see man sed.
This can be combined with the find command, as follows:
find . -name '*.php' -exec sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" '{}' \;
This will apply the sed command to all php files found.

Why do I get "-bash: syntax error near unexpected token `('" when I run my Perl one-liner?

This is driving me insane. Here's my dilemma, I have a file in which I need to make a match. Usually I use Perl and it works like a charm but in this case I am writing a shell script and for some reason it is throwing errors.
Here is what I am trying to match:
loop_loopStorage_rev='latest.integration'
I need to match loop and latest.integration.
This is my regex:
^(?!\#)(loop_.+rev).*[\'|\"](.*)[\'|\"]$
When I use this in a Perl script, $1 and $2 give me the appropriate output. If I do this:
perl -nle "print qq{$1 => $2} while /^(?!#)(loop_.+rev).+?[\'|\"](.+?)[\'|\"]$/g" non-hadoop.env
I get the error:
-bash: syntax error near unexpected token `('
I believe it has something to do with the beginning part of my regex. So my real question is would there be an easier solution using sed, egrep or awk? If so, does any one know where to begin?
Using single quotes around your arguments to prevent special processing of $, \, etc. If you need to include a single quote within, the generic solution is to use '\''. In this particular case, however, we can avoid trying to include a ' by using the equivalent \x27 in the regex pattern instead.
perl -nle'
print "$1 => $2"
while /^(?!#)(loop_.+rev).+?[\x27\"|](.+?)[\x27\"|]$/g;
' non-hadoop.env
[I added some line breaks for readability. You can actually leave them in if you want to, but you don't need to.]
Note that there are some problems with your regex pattern.
(?!\#)(loop_.+rev) is the same as (loop_.+rev) since l isn't #, so (?!\#) isn't doing whatever you think it's doing.
[\'|\"] matches ', " and |, but I think you only meant it to match ' and ". If so, you want to use [\'\"], which can be simplified to ['"].
Don't use the non-greedy modifier (? after +, *, etc). It's used for optimization, not for excluding characters. In fact, the second ? in your pattern has absolutely no effect, so it's not doing what you think it's doing.
Fixed?
perl -nle'
print "$1 => $2"
while /^(loop_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g;
' non-hadoop.env
Double quotes cause Bash to replace variable references like $1 and $2 with the values of these shell variables. Use single quotes around your Perl script to avoid this (or quote every dollar sign, backtick, etc in the script).
However, you cannot escape single quotes inside single quotes easily; a common workaround in Perl strings is to use the character code \x27 instead. If you need single-quoted Perl strings, use the generalized single-quoting operator q{...}.
If you need to interpolate a shell variable name, a common trick is to use "see-saw" quoting. The string 'str'"in"'g' in the shell is equal to 'string' after quote removal; you can similarly use adjacent single-quoted and double-quoted strings to build your script ... although it does tend to get rather unreadable.
perl -nle 'print "Instance -> $1\nRevision -> $2"
while /^(?!#)('"$NAME"'_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g' non-hadoop.en
(Notice that the options -nle are not part of the script; the script is the quoted argument to the -e option. In fact perl '-nle script ...' coincidentally works, but it is decidedly unidiomatic, to the point of confusing.)
I ended up figuring out due to all of you guys help. Thanks again. Here is my final command
perl -nle 'print "$1 $2" while /^($ENV{NAME}_.+rev).+\x27(.+)\x27/g;' $ENVFILE

Is there a truly universal wildcard in Grep? [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 3 years ago.
Really basic question here. So I'm told that a dot . matches any character EXCEPT a line break. I'm looking for something that matches any character, including line breaks.
All I want to do is to capture all the text in a website page between two specific strings, stripping the header and the footer. Something like HEADER TEXT(.+)FOOTER TEXT and then extract what's in the parentheses, but I can't find a way to include all text AND line breaks between header and footer, does this make sense? Thanks in advance!
When I need to match several characters, including line breaks, I do:
[\s\S]*?
Note I'm using a non-greedy pattern
You could do it with Perl:
$ perl -ne 'print if /HEADER TEXT/ .. /FOOTER TEXT/' file.html
To print only the text between the delimiters, use
$ perl -000 -lne 'print $1 while /HEADER TEXT(.+?)FOOTER TEXT/sg' file.html
The /s switch makes the regular expression matcher treat the entire string as a single line, which means dot matches newlines, and /g means match as many times as possible.
The examples above assume you're cranking on HTML files on the local disk. If you need to fetch them first, use get from LWP::Simple:
$ perl -MLWP::Simple -le '$_ = get "http://stackoverflow.com";
print $1 while m!<head>(.+?)</head>!sg'
Please note that parsing HTML with regular expressions as above does not work in the general case! If you're working on a quick-and-dirty scanner, fine, but for an application that needs to be more robust, use a real parser.
By definition, grep looks for lines which match; it reads a line, sees whether it matches, and prints the line.
One possible way to do what you want is with sed:
sed -n '/HEADER TEXT/,/FOOTER TEXT/p' "$#"
This prints from the first line that matches 'HEADER TEXT' to the first line that matches 'FOOTER TEXT', and then iterates; the '-n' stops the default 'print each line' operation. This won't work well if the header and footer text appear on the same line.
To do what you want, I'd probably use perl (but you could use Python if you prefer). I'd consider slurping the whole file, and then use a suitably qualified regex to find the matching portions of the file. However, the Perl one-liner given by '#gbacon' is an almost exact transliteration into Perl of the 'sed' script above and is neater than slurping.
The man page of grep says:
grep, egrep, fgrep, rgrep - print lines matching a pattern
grep is not made for matching more than a single line. You should try to solve this task with perl or awk.
As this is tagged with 'bbedit' and BBedit supports Perl-Style Pattern Modifiers you can allow the dot to match linebreaks with the switch (?s)
(?s).
will match ANY character. And yes,
(?s).+
will match the whole text.
As pointed elsewhere, grep will work for single line stuff.
For multiple-lines (in ruby with Regexp::MULTILINE, or in python, awk, sed, whatever), "\s" should also capture line breaks, so
HEADER TEXT(.*\s*)FOOTER TEXT
might work ...
here's one way to do it with gawk, if you have it
awk -vRS="FOOTER" '/HEADER/{gsub(/.*HEADER/,"");print}' file