Difference between grep and perl regex?

Difference between grep and perl regex? - regex

I have a problem with what I think is a difference in grep's regex and perl's regex. Consider the following little test:
$ cat testfile.txt
A line of text
SOME_RULE = $(BIN)
Another line of text
$ grep "SOME_RULE\s*=\s*\$(BIN)" testfile.txt
SOME_RULE = $(BIN)
$ perl -p -e "s/SOME_RULE\s*=\s*\$(BIN)/Hello/g" testfile.txt
A line of text
SOME_RULE = $(BIN)
Another line of text
As you can see, using the regex "SOME_RULE\s*=\s*$(BIN)", grep could find the match, but perl was unable to update the file using the same expression. How should I solve this problem?

Perl wants the '(' and ')' to be escaped. Also, the shell eats the '\' on the '$', so you need:
$ perl -p -e "s/SOME_RULE\s*=\s*\\$\(BIN\)/Hello/g" testfile.txt
(or use single quotes--which is highly advisable in any case.)

You need to escape ( and )(Capturing group).
perl -p -e 's/SOME_RULE\s*=\s*\$\(BIN\)/Hello/g' testfile.txt
Actually you need it in Extended Regular Expression(ERE):
grep -E "SOME_RULE\s*=\s*\$\(BIN\)" testfile.txt

perl -ne '(/SOME_RULE\s*?=\s*?\$\(BIN\)/) && print' testfile.txt
If you want to modify use
perl -pe 's/SOME_RULE\s*?=\s*?\$\(BIN\)/Hello/' testfile.txt

Perl's regex syntax is different to the POSIX regexes used by grep. In this case, you're falling foul of parentheses being metacharacters in Perl's regexes - they denote a capturing group.
You should have more success by altering the Perl regex:
s/SOME_RULE\s*=\s*\$\(BIN\)/Hello/g
which will then match the literal parentheses in the source text.

Related

Why does this regex work in grep but not sed?

I have two regular expressions:
$ grep -E '\-\- .*$' *.sql
$ sed -E '\-\- .*$' *.sql
(I am trying to grep lines in sql files that have comments and remove lines in sql files that have comments)
The grep command works using this regex; however, the sed returns the following error:
sed: -e expression #1, char 7: unterminated address regex
What am I doing incorrectly with sed?
(The space after the two hyphens is required for sql comments if you are unfamiliar with MySql comments of this type)

You're trying to use:
sed -E '\-\- .*$' *.sql
Here sed command is not correct because you're not really telling sed to do something.
It should be:
sed -n '/-- /p' *.sql
and equivalent grep would be:
grep -- '-- ' *.sql
or even better with a fixed string search:
grep -F -- '-- ' *.sql
Using -- to separate pattern and arguments in grep command.
There is no need to escape - in a regex if it is outside bracket expression (or character class) i.e. [...].
Based on comments below it seems OP's intent is to remove commented section in all *.sql files that start with 2 hyphens.
You may use this sed for that:
sed -i 's/-- .*//g' *.sql

The problem here is not the regex, the problem is that sed requires a command. The equivalent of your grep would be:
sed -n '/\-\- .*$/p'
You suppress output for non-matching lines -n ... you search (wrap your regex in slashes) and you print p (after the last slash).
P.S.: As Anub pointed out, escaping the hyphens - inside the regex is unnecessary.

You are trying to use sed's \cregexpc syntax where with \-<...> you are telling sed the delimiter character you want use is a dash -, but you didn't terminate it where it should be: \-<...>- also add d command to delete those lines.
sed '\-\-\-.*$-d' infile
see man sed about that:
\cregexpc
Match lines matching the regular expression regexp. The c may be any character.
if default / was used this was not required so:
sed '/--.*$/d' infile
or simply:
sed '/^--/d' infile
and more accurately:
sed '/^[[:blank:]]*--/d' infile

Get substring using either perl or sed

I can't seem to get a substring correctly.
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g')
That still returns bugfix/US3280841-something-duh.
If I try an use perl instead:
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9]|[A-Z0-9])+/; print $1');
That outputs nothing.
What am I doing wrong?

Using bash parameter expansion only:
$: # don't use caps; see below.
$: declare branch="bugfix/US3280841-something-duh"
$: tmp="${branch##*/}"
$: echo "$tmp"
US3280841-something-duh
$: trimmed="${tmp%%-*}"
$: echo "$trimmed"
US3280841
Which means:
$: tmp="${branch_name##*/}"
$: trimmed="${tmp%%-*}"
does the job in two steps without spawning extra processes.
In sed,
$: sed -E 's#^.*/([^/-]+)-.*$#\1#' <<< "$branch"
This says "after any or no characters followed by a slash, remember one or more that are not slashes or dashes, followed by a not-remembered dash and then any or no characters, then replace the whole input with the remembered part."
Your original pattern was
's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g'
This says "remember any number of anything followed by a slash, then a lowercase letter or a digit, then a pipe character (because those only work with -E), then a capital letter or digit, then a literal plus sign, and then replace it all with what you remembered."
GNU's manual is your friend. I look stuff up all the time to make sure I'm doing it right. Sometimes it still takes me a few tries, lol.
An aside - try not to use all-capital variable names. That is a convention that indicates it's special to the OS, like RANDOM or IFS.

You may use this sed:
sed -E 's~^.*/|-.*$~~g' <<< "$BRANCH_NAME"
US3280841
Ot this awk:
awk -F '[/-]' '{print $2}' <<< "$BRANCH_NAME"
US3280841

sed 's:[^/]*/\([^-]*\)-.*:\1:'<<<"bugfix/US3280841-something-duh"

Perl version just has + in wrong place. It should be inside the capture brackets:
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9A-Z]+)/; print $1');

Just use a ^ before A-Z0-9
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[^A-Z0-9]\+/\1/g')
in your sed case.
Alternatively and briefly, you can use
TRIMMED=$(echo $BRANCH_NAME | sed "s/[a-z\/\-]//g" )
too.

type on shell terminal
$ BRANCH_NAME="bugfix/US3280841-something-duh"
$ echo $BRANCH_NAME| perl -pe 's/.*\/(\w\w[0-9]+).+/\1/'
use s (substitute) command instead of m (match)
perl is a superset of sed so it'd be identical 'sed -E' instead of 'perl -pe'

Another variant using Perl Regular Expression Character Classes (see perldoc perlrecharclass).
echo $BRANCH_NAME | perl -nE 'say m/^.*\/([[:alnum:]]+)/;'

Replace the separator between pairs of numbers

I want to replace all strings like [0-9][0-9]-[0-9][0-9] with [0-9][0-9]/[0-9][0-9] using sed.
In other words, I want to replace - with /.
If I have somewhere in my text:
09-36
32-43
54-65
I want this change:
09/36
32/43
54/65

Using GNU sed:
$ echo '09-36 32-43 54-65' | sed -r 's|\<([0-9]{2})-([0-9]{2})\>|\1/\2|g'
09/36 32/43 54/65
-r turns on extended regular expressions, which:
doesn't require \-escaping ( ) { } char.
enables use of \< and /> to only match at word boundaries (if the expression should only match full lines, use ^ and $ instead, and omit the g option)
| is used as an alternative regex delimiter so that / can be used without \-escaping.
A BSD/macOS sed solution would look slightly different:
echo '09-36 32-43 54-65' | sed -E 's|[[:<:]]([0-9]{2})-([0-9]{2})[[:>:]]|\1/\2|g'

sed -e 's/\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1\/\2/g'
Might not be the most elegant version, but works for me. The gazillion backslashes make this rather unreadable in my opinion. You might improve the readability by not using / to separate the pattern and the replacement maybe?

perl -C -npe 's/(?<!\d)(\d\d)-(\d\d)(?!\d)/\1\/\2/g' file
Input
维基 1-11 22-33 444-44 55-555 66-66百科
77-77
8 88-88
Output
维基 1-11 22/33 444-44 55-555 66/66百科
77/77
8 88/88
In the command above
-C enables Unicode;
-n causes Perl to process the script for each input line;
-p causes Perl to print the result of the script to the standard output;
-e accepts a Perl expression (particularly, it is a substitution).
In this mode (-npe), Perl works just like sed. The script substitutes each pair of digits separated with - to the same pair separated with a slash.
(?<!\d) and (?!\d) are negative lookaround expressions.
To edit the file in place use -i option: perl -C -i.backup -npe ....
If the input is not a file, you can pass the input to Perl via pipe, e.g.:
echo '维基 1-11 22-33 444-44 55-555 66-66百科' | \
perl -C -npe 's/(?<!\d)(\d\d)-(\d\d)(?!\d)/\1\/\2/g'

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.

With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.

with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

How to grep file to find lines like <version>1.1.9-beta</version>?

Looking for suggestion to cat file | grep REGEX to get the lines with <version>anything</version>.

grep -F '<version>1.1.9-beta</version>' file
-F will match your pattern as literal text
you don't need that useless cat
if you really mean anything: try grep '<version>.*</version>' file or grep -P '<version>.*?</version>' file , however searching xml with regex is bad idea.

Use the -E option to match a regular expression:
grep -E "<version>.*</version>" file
Refer to these rules for the regular expression: https://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html#Regular-Expressions
For example, to match the typical version format (3.14, or 13.14, or 0.1458) you can type:
grep -E "<version>[0-9]?\.[0-9]?</version>" file

You can do:
grep '<version>[^<]*</version>' file.xml
[^<]* will match zero or more characters upto next <.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Difference between grep and perl regex? - regex

Perl wants the '(' and ')' to be escaped. Also, the shell eats the '\' on the '$', so you need: $ perl -p -e "s/SOME_RULE\s=\s\\$\(BIN\)/Hello/g" testfile.txt (or use single quotes--which is highly advisable in any case.)

You need to escape ( and )(Capturing group). perl -p -e 's/SOME_RULE\s=\s\$\(BIN\)/Hello/g' testfile.txt Actually you need it in Extended Regular Expression(ERE): grep -E "SOME_RULE\s=\s\$\(BIN\)" testfile.txt

perl -ne '(/SOME_RULE\s?=\s?\$\(BIN\)/) && print' testfile.txt If you want to modify use perl -pe 's/SOME_RULE\s?=\s?\$\(BIN\)/Hello/' testfile.txt

Related

Why does this regex work in grep but not sed?

Get substring using either perl or sed

Replace the separator between pairs of numbers

sed regex with alternative on Solaris doesn't work

How to grep file to find lines like <version>1.1.9-beta</version>?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Difference between grep and perl regex? - regex

Perl wants the '(' and ')' to be escaped. Also, the shell eats the '\' on the '$', so you need: $ perl -p -e "s/SOME_RULE\s*=\s*\\$\(BIN\)/Hello/g" testfile.txt (or use single quotes--which is highly advisable in any case.)

You need to escape ( and )(Capturing group). perl -p -e 's/SOME_RULE\s*=\s*\$\(BIN\)/Hello/g' testfile.txt Actually you need it in Extended Regular Expression(ERE): grep -E "SOME_RULE\s*=\s*\$\(BIN\)" testfile.txt

perl -ne '(/SOME_RULE\s*?=\s*?\$\(BIN\)/) && print' testfile.txt If you want to modify use perl -pe 's/SOME_RULE\s*?=\s*?\$\(BIN\)/Hello/' testfile.txt

Related

Why does this regex work in grep but not sed?

Get substring using either perl or sed

Replace the separator between pairs of numbers

sed regex with alternative on Solaris doesn't work

How to grep file to find lines like <version>1.1.9-beta</version>?

Categories

Resources

Perl wants the '(' and ')' to be escaped. Also, the shell eats the '\' on the '$', so you need: $ perl -p -e "s/SOME_RULE\s=\s\\$\(BIN\)/Hello/g" testfile.txt (or use single quotes--which is highly advisable in any case.)

You need to escape ( and )(Capturing group). perl -p -e 's/SOME_RULE\s=\s\$\(BIN\)/Hello/g' testfile.txt Actually you need it in Extended Regular Expression(ERE): grep -E "SOME_RULE\s=\s\$\(BIN\)" testfile.txt

perl -ne '(/SOME_RULE\s?=\s?\$\(BIN\)/) && print' testfile.txt If you want to modify use perl -pe 's/SOME_RULE\s?=\s?\$\(BIN\)/Hello/' testfile.txt