How to choose the right words in regular expressions? - regex

how to use grep to get numbers that will not contain 3 and 7, not strings!
I try that
grep -o '[[:digit:]^37]*' test
but its not work

If you have a GNU grep, you can use
grep -oP '\b[^\D37]+\b' file
The grep -oP '\b[^[:^digit:]37]+\b' is a synonymic command.
Details:
\b - a word boundary (may be replaced with (?<!\d) if you simply want to make sure there are no other digits immediately on the left)
[^ - start of a negated bracket expression that matches chars other than:
\D - any non-digit char
37 - 3 and 7
]+ - end of the bracket expression, repeat one or more times
\b - a word boundary (may be replaced with (?!\d) if you simply want to make sure there are no other digits immediately on the right).
See the online demo:
s='123 456 857 112 i21.'
grep -oP '\b[^\D37]+\b' <<< "$s"
Output:
456
112
To use the same approach for letters, relace \D with \P{L} or [:^digit:] with [:^alpha:].

Related

Bash regex for same sender and receiver with backreference

I try to make a regex (important that ist a regex because i need it for fail2ban) to match when
the receiver and the sender are the same person:
echo "from=<test#test.ch> to=<test#test.ch>" | grep -E -o '([^=]*\s)[ ]*\1'
What am I doing wrong ?
You might use a pattern to match the format of the string between the brackets with a backreference to that capture.
from(=<[^\s#<>]+#[^\s#<>]+>)\s*to\1
Explanation
from Match literally
( Capture group 1
=< Match literally
[^\s#<>]+ Match 1+ times any char except a whitespace char or # < >
# Match literally
[^\s#<>]+ Again match 1+ times any char except a whitespace char or # < >
> Match literally
) Close group 1
\s*to\1 Match 0+ whitespace chars, to and the backreference to group 1
Regex demo | Bash demo
Use grep -P instead of -E for Perl compatible regular expressions.
For example
echo "from=<test#test.ch> to=<test#test.ch>" | grep -oP 'from(=<[^\s#<>]+#[^\s#<>]+>)\s*to\1'
A bit broader match could be capturing what is between the brackets
[^=\s]+(=<[^<>]+>)\s*[^=\s]+\1
Regex demo

Matching first and last three characters of regex (including overlap)

I am trying to put together a regex expression that matches a word (only one per line) that starts and ends with the same three characters.
I was able to write a solution for words that are at least 6 characters long (meaning there is no overlap), but I am unsure how to do it for overlapping starts and ends such as "heheh".
This is what I have, nice and simple:
^(...).*\1$
I am inclined to believe that this might have something with lookahead and lookbehind but I am not sure.
Any help would be appreciated, thank you!
You will need lookarounds since they are non-consuming patterns, i.e. the regex index is not advanced when the lookaround pattern is matched.
For example, you may do this with GNU grep:
grep -P '^(?=(...)).+\1$' file
grep -P '^(?=(\S{3})).+\1$' file # To avoid counting in spaces
grep -P '^(?=(\w{3})).+\1$' file # Or only allowing letters/digits/underscores
grep -P '^(?=(\p{L}{3})).+\1$' file # Or only allowing letters
See the regex demo
Details
^ - start of string
(?=(...)) - a positive lookahead with a capturing group inside that matches any 3 chars
.+ - any 1+ chars other than line break chars as many as possible
\1 - Group 1 value
$ - the end of string.
To extract words, you may use \w shorthand (that matches letters, digits and underscores) and word boundaries \b:
grep -oP '\b(?=(\w{3}))\w+\1\b' file
See another demo.
Details
\b - a word boundary (start of word here, because it is followed with word chars)
(?=(\w{3})) - a positive lookahead making sure there are 3 word chars while capturing them into Group 1
\w+ - 1+ word chars (not 0 or more because otherwise a 3-char word would be matched)
\1 - Group 1 value
\b - end of word here (as it is preceded with word chars).

How to write a regex for this?

Requirements: only grep/cut/join/regex.
I have data like this:
798 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
15386 /usr/bin/nautilus --gapplication-service
16051 /usr/bin/zeitgeist-daemon
I want to extract rows data from the number to second ending space, like
798 /usr/bin/dbus-daemon
using only grep/cut/join with or without regex.
I have tried
grep -oe "[^ ][^ ]* *[a-zA-Z\]*$"
but the result isn't as expected.
You may use
# With GNU grep:
grep -oP '^\s*\K\S+\s+\S+' <<< "$s"
# With a POSIX ERE pattern:
grep -oE '[0-9][^ ]* +[^ ]+' <<< "$s"
See the online demo
o - match output mode (not line)
P - PCRE regex engine is used to parse the pattern
The PCRE pattern details:
^ - start of line
\s* - 0+ whitespaces
\K - match reset operator discarding the whole text matched so far
\S+ - 1+ non-whitespace chars
\s+\S+ - 1+ whitespaces and 1+ non-whitespace chars.
The POSIX ERE pattern matches
[0-9] - a digit
[^ ]* - 0+ chars other than space
+ - 1 or more spaces
[^ ]+ - 1+ chars other than a space.

Extract Values Between Pattern Match

I'm trying to extract any numerical values between a pattern match in a text file.
Parsed Log File Text
> GET /pub/data/nccf/com/hiresw/prod/hiresw.20180921/hiresw.t00z.nmmb_2p5km.f25.conus.grib2
I want to pull the 25 from f25 in nmmb_2p5km.f25.conus.grib2
Attempted Code
sed -e 's/nmmb_2p5km\(.*\)grib2/\1/'
You may use
log="GET /pub/data/nccf/com/hiresw/prod/hiresw.20180921/hiresw.t00z.nmmb_2p5km.f25.conus.grib2"
sed 's/.*nmmb_2p5km[^0-9]*\([0-9]*\)[^0-9]*grib2.*/\1/' <<< "$log"
The .*nmmb_2p5km[^0-9]*\([0-9]*\)[^0-9]*grib2.* pattern matches
.* - any 0+ chars
nmmb_2p5km - a literal substring
[^0-9]* - 0+ non-digit chars
\([0-9]*\) - Capturing group 1 (later referred to with \1 from the replacement pattern): 0+ digits
[^0-9]* - 0+ non-digit chars
grib2.* - grib2 and any 0+ chars.
Alternatively, you may use grep with a PCRE pattern like
grep -Po 'nmmb_2p5km\D*\K\d+' <<< "$log"
Details
nmmb_2p5km - a literal substring
\D* - 0+ non-digit chars
\K - match reset oeprator discarding all text matched so far
\d+ - 1+ digits.
See the online sed and grep demo.
Using perl one-liner
> export log="GET /pub/data/nccf/com/hiresw/prod/hiresw.20180921/hiresw.t00z.nmmb_2p5km.f25.conus.grib2"
> perl -ne ' BEGIN { $x=$ENV{log};$x=~s/(.+?)(\d+)\.conus\.(.+)/\2/g; print "$x\n"; exit } '
25
>

Grep regex capturing issue

Why this doesn't match the capturing group?
grep -rPo 'ServerMethod\(me\.[a-zA-Z]*\.([a-zA-Z]*)\)'
it returns :
test.js:ServerMethod(me.obProcedures.SaveProcess)
test.js:ServerMethod(me.obProcedures.Commit)
but I need just:
SaveProcess
Commit
cygwin version:
2.5.2(0.297/5/3)
It happens so because grep does not return capture group contents, only the whole matches.
You may use \K match reset operator and and a positive lookahead instead:
grep -Po 'ServerMethod\(me\.[a-zA-Z]*\.\K[a-zA-Z]+(?=\))'
See the online demo
Details:
ServerMethod\(me\. - matches a literal string ServerMethod(me.
[a-zA-Z]* - 0 or more ASCII letters
\. - a literal dot
\K - omits the text matched so far from the match
[a-zA-Z]+ - 1 or more ASCII letters
(?=\)) - a positive lookahead that requires a ) immediately to the right of the current location, but does not add it to the match (as it is a non-consuming pattern).
Alternatively, as a PCRE grep option is not always available, use sed with grep:
grep 'ServerMethod(me\.' | sed 's/.*ServerMethod(me\.[a-zA-Z]*\.\([a-zA-Z]*\)).*/\1/'
See another demo.
Here, the patterns are POSIX BRE compliant:
ServerMethod(me\. - matches a literal ServerMethod(me. text, grep gets the lines with this text
.*ServerMethod(me\.[a-zA-Z]*\.\([a-zA-Z]*\)).* - matches a line that has
.* - any 0+ chars as many as possible
ServerMethod(me\. - a literal ServerMethod(me. text
[a-zA-Z]* - 0+ ASCII letters
\. - a literal dot
\([a-zA-Z]*\) - Capturing group 1 (referred to via \1): 0+ ASCII letters
) - a literal )
.* - any 0+ chars as many as possible