Grep regex capturing issue - regex

Why this doesn't match the capturing group?
grep -rPo 'ServerMethod\(me\.[a-zA-Z]*\.([a-zA-Z]*)\)'
it returns :
test.js:ServerMethod(me.obProcedures.SaveProcess)
test.js:ServerMethod(me.obProcedures.Commit)
but I need just:
SaveProcess
Commit
cygwin version:
2.5.2(0.297/5/3)

It happens so because grep does not return capture group contents, only the whole matches.
You may use \K match reset operator and and a positive lookahead instead:
grep -Po 'ServerMethod\(me\.[a-zA-Z]*\.\K[a-zA-Z]+(?=\))'
See the online demo
Details:
ServerMethod\(me\. - matches a literal string ServerMethod(me.
[a-zA-Z]* - 0 or more ASCII letters
\. - a literal dot
\K - omits the text matched so far from the match
[a-zA-Z]+ - 1 or more ASCII letters
(?=\)) - a positive lookahead that requires a ) immediately to the right of the current location, but does not add it to the match (as it is a non-consuming pattern).
Alternatively, as a PCRE grep option is not always available, use sed with grep:
grep 'ServerMethod(me\.' | sed 's/.*ServerMethod(me\.[a-zA-Z]*\.\([a-zA-Z]*\)).*/\1/'
See another demo.
Here, the patterns are POSIX BRE compliant:
ServerMethod(me\. - matches a literal ServerMethod(me. text, grep gets the lines with this text
.*ServerMethod(me\.[a-zA-Z]*\.\([a-zA-Z]*\)).* - matches a line that has
.* - any 0+ chars as many as possible
ServerMethod(me\. - a literal ServerMethod(me. text
[a-zA-Z]* - 0+ ASCII letters
\. - a literal dot
\([a-zA-Z]*\) - Capturing group 1 (referred to via \1): 0+ ASCII letters
) - a literal )
.* - any 0+ chars as many as possible

Related

regex getting words between '|'

I am trying to get the full words between two '|' characters
example string: {{person label|Jens Addle|border=red}}
here I would like to get the string: Jens Addle
I have attempted with the following:
(([A-Z]\w+))
However, this separates the result into two words and I would like to get it as a single entity.
This should put the value into $1.
Key is escaping the pipes, capturing what is in between and being non-greedy about it.
\|(.+?)\|
This should work in your case: /\|(.*?)\|/gm, or without the flags \|(.*?)\|.
This regex matches all character between two | characters. (\| - the | character, (.*?) - match everything and capture)
Here is the regex101 page.
You can use
\|\K[^|]*(?=\|)
(?<=\|)[^|]*(?=\|)
See the regex #1 demo and regex #2 demo.
Details:
(?<=\|) - a location that is immediately preceded with a | char
\|\K - matches a | char and then "forgets" it
[^|]* - zero or more chars other than a | char
(?=\|) - a location that is immediately followed with a | char.
Matching 1 ore more words between the pipe chars can be done using a capture group.
Note that [A-Z]\w+ matches at least 2 characters.
\|([A-Z]\w+(?: \w+)*)(?=\|)
\| Match |
( Capture group 1
[A-Z]\w+ Match an uppercase char A-Z and 1+ word characters
(?: \w+)* Optionally repeat matching a space and 1+ word characters
) Close group 1
(?=\|) Positive lookahead, assert | to the right
See a regex demo.
To take the format of the example string into account, you might also make the pattern a bit more specific:
{{[^|]*\|([A-Z]\w+(?: \w+)*)\|[^|]*}}
See another regex demo.

Matching first and last three characters of regex (including overlap)

I am trying to put together a regex expression that matches a word (only one per line) that starts and ends with the same three characters.
I was able to write a solution for words that are at least 6 characters long (meaning there is no overlap), but I am unsure how to do it for overlapping starts and ends such as "heheh".
This is what I have, nice and simple:
^(...).*\1$
I am inclined to believe that this might have something with lookahead and lookbehind but I am not sure.
Any help would be appreciated, thank you!
You will need lookarounds since they are non-consuming patterns, i.e. the regex index is not advanced when the lookaround pattern is matched.
For example, you may do this with GNU grep:
grep -P '^(?=(...)).+\1$' file
grep -P '^(?=(\S{3})).+\1$' file # To avoid counting in spaces
grep -P '^(?=(\w{3})).+\1$' file # Or only allowing letters/digits/underscores
grep -P '^(?=(\p{L}{3})).+\1$' file # Or only allowing letters
See the regex demo
Details
^ - start of string
(?=(...)) - a positive lookahead with a capturing group inside that matches any 3 chars
.+ - any 1+ chars other than line break chars as many as possible
\1 - Group 1 value
$ - the end of string.
To extract words, you may use \w shorthand (that matches letters, digits and underscores) and word boundaries \b:
grep -oP '\b(?=(\w{3}))\w+\1\b' file
See another demo.
Details
\b - a word boundary (start of word here, because it is followed with word chars)
(?=(\w{3})) - a positive lookahead making sure there are 3 word chars while capturing them into Group 1
\w+ - 1+ word chars (not 0 or more because otherwise a 3-char word would be matched)
\1 - Group 1 value
\b - end of word here (as it is preceded with word chars).

How to write a regex for this?

Requirements: only grep/cut/join/regex.
I have data like this:
798 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
15386 /usr/bin/nautilus --gapplication-service
16051 /usr/bin/zeitgeist-daemon
I want to extract rows data from the number to second ending space, like
798 /usr/bin/dbus-daemon
using only grep/cut/join with or without regex.
I have tried
grep -oe "[^ ][^ ]* *[a-zA-Z\]*$"
but the result isn't as expected.
You may use
# With GNU grep:
grep -oP '^\s*\K\S+\s+\S+' <<< "$s"
# With a POSIX ERE pattern:
grep -oE '[0-9][^ ]* +[^ ]+' <<< "$s"
See the online demo
o - match output mode (not line)
P - PCRE regex engine is used to parse the pattern
The PCRE pattern details:
^ - start of line
\s* - 0+ whitespaces
\K - match reset operator discarding the whole text matched so far
\S+ - 1+ non-whitespace chars
\s+\S+ - 1+ whitespaces and 1+ non-whitespace chars.
The POSIX ERE pattern matches
[0-9] - a digit
[^ ]* - 0+ chars other than space
+ - 1 or more spaces
[^ ]+ - 1+ chars other than a space.

Extract Values Between Pattern Match

I'm trying to extract any numerical values between a pattern match in a text file.
Parsed Log File Text
> GET /pub/data/nccf/com/hiresw/prod/hiresw.20180921/hiresw.t00z.nmmb_2p5km.f25.conus.grib2
I want to pull the 25 from f25 in nmmb_2p5km.f25.conus.grib2
Attempted Code
sed -e 's/nmmb_2p5km\(.*\)grib2/\1/'
You may use
log="GET /pub/data/nccf/com/hiresw/prod/hiresw.20180921/hiresw.t00z.nmmb_2p5km.f25.conus.grib2"
sed 's/.*nmmb_2p5km[^0-9]*\([0-9]*\)[^0-9]*grib2.*/\1/' <<< "$log"
The .*nmmb_2p5km[^0-9]*\([0-9]*\)[^0-9]*grib2.* pattern matches
.* - any 0+ chars
nmmb_2p5km - a literal substring
[^0-9]* - 0+ non-digit chars
\([0-9]*\) - Capturing group 1 (later referred to with \1 from the replacement pattern): 0+ digits
[^0-9]* - 0+ non-digit chars
grib2.* - grib2 and any 0+ chars.
Alternatively, you may use grep with a PCRE pattern like
grep -Po 'nmmb_2p5km\D*\K\d+' <<< "$log"
Details
nmmb_2p5km - a literal substring
\D* - 0+ non-digit chars
\K - match reset oeprator discarding all text matched so far
\d+ - 1+ digits.
See the online sed and grep demo.
Using perl one-liner
> export log="GET /pub/data/nccf/com/hiresw/prod/hiresw.20180921/hiresw.t00z.nmmb_2p5km.f25.conus.grib2"
> perl -ne ' BEGIN { $x=$ENV{log};$x=~s/(.+?)(\d+)\.conus\.(.+)/\2/g; print "$x\n"; exit } '
25
>

php regex pregmatch remove zeros between characters

I have a string and I want to achieve to remove all zeros between the characters -s and the first number.
1v-s001v => 1v-s1v
2v-s030r => 2v-s30r
3v-s021v => 3v-s21v
I'm trying with:
\w+-s0*(\d)
but it does not match the subject string.
You may use
(-s)0+(\d)
and replace with $1$2. You may replace \d with [0-9] in case the \d is not supported by your regex flavor.
See the regex demo
Details
(-s) - Capturing group 1 (later referred to with $1 placeholder/replacement backreference from the replacement pattern): a -s substring
0+ - one or more 0 chars
(\d) - Capturing group 2 (later referred to with $2 placeholder/replacement backreference from the replacement pattern): any one digit