Regex Pattern matching and extraction using grep [duplicate] - regex

This question already has answers here:
How to use sed/grep to extract text between two words?
(14 answers)
Closed 2 years ago.
I have very strange interest to pattern match a line for a string and extract a value using grep. Below is the input and I want to extract the date alone from the string.
Input Host-GOOGLE-production.2015-08-01-21.migrant.deploy:{R:[{A:"0b87654nuy",RC:"JAVA".....[and the line continues]
For the above input, I wanted to write a regex that matches the date and string that comes after {A:" and before ",RC:. I know I can do this through sed and awk but I wanted to perform this task only through grep.
As a first step, to extract only the data, I tried the below command but it dint work.
Someone know how to extract both these strings to extract the values. please share your thoughts. It would be nice if I get an answers/suggestion that extract both values 2015-08-01 & 0b87654nuy in one single command using grep
$grep -o --perl-regexp "(Host-GOOGLE-production.([0-9]+?-[0-9]+?-[0-9]+)?-.*)"
Desired O/P for the above command: 2015-08-01

I wanted to write a regex that matches the date and string that comes after {A:" and before ",RC:
You can use this grep:
grep -oP '(?<=A:").*?(?=",RC:)' file
0b87654nuy

It would be nice if I get an answers/suggestion that extract both values 2015-08-01 & 0b87654nuy in one single command using grep
Use \K and alternation operator to get both outputs.
grep -oP '\bHost-GOOGLE-production\.\K[0-9]+-[0-9]+-[0-9]+(?=-)|A:"\K.[^"]*(?=",RC:)'
Example:
$ echo 'Host-GOOGLE-production.2015-08-01-21.migrant.deploy:{R:[{A:"0b87654nuy",RC:"JAVA".....[and the line continues]' | grep -oP '\bHost-GOOGLE-production\.\K[0-9]+-[0-9]+-[0-9]+(?=-)|A:"\K.[^"]*(?=",RC:)'
2015-08-01
0b87654nuy

Related

Print specific string in a line after matched word in bash [duplicate]

This question already has answers here:
How to extract string following a pattern with grep, regex or perl [duplicate]
(8 answers)
Closed 4 years ago.
Situation
There is a file called test that consists on the following text:
this is the first line
version=1.2.3.4
this is the third line
How can i print via bash only:
1.2.3.4
Note: I want always to print until end of line what is after "version=" not searching for 1.2.3.4
Thank you
Using GNU grep :
grep -Po '^version=\K.*'
-P enables PCRE regex, -o is used to only display what is matched rather than whole lines and the \K meta-character specifies not to display what precedes.
Using sed :
sed -n 's/^version=\(.*\)/\1/p'
-n disables auto-printing, then the substitution command will replace the "version=[...]" line by only its end through a capturing group. The substitution is only effective on the second line, which trigger the p instruction to print the (transformed) line.
you can use:
grep version file | cut -d\= -f2

Exclude pattern in a Grep using extended regex [duplicate]

This question already has answers here:
How to invert a grep expression
(5 answers)
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 5 years ago.
I got a Grep that is killing me
Let's suppose i got the a file (file.xml) with the two below entries:
pos_ADF_datasource-1450-jdbc.xml
datasource-1450-jdbc.xml
Now If i run the below grep:
grep -E '(ADF)' file.txt
I got the below output:
pos_ADF_datasource-1450-jdbc.xml
Now i want to exclude ADF to get the other entry, it should be easy, but i tried it all and I'm unable to let it works:
grep -E '(?<!ADF)' file.txt
I tried many but i'm sure there is something i'm not considering that is making my expression not working...
I need and want to use the -E, i know it works not using the extended regex!
Please guys me light me!
RESOLVED:
Thanks Wiktor for the below consideration:
ERE POSIX does not support lookarounds. Even if you use -P excluding 'ADF' it will just match any position that is not preceded with ADF
You cannot check with an ERE regex if a string does not contain a pattern. Only if it is not equal, does not start/end with a pattern. You may only do it with a PCRE regex. grep -P '^(?!.*ADF)' file.txt
Then i figured it out with grep -Pe:
grep -Pe "^((?!.*ADF).)*-jdbc.xml$" file.xml

Extract word after a known pattern in UNIX [duplicate]

This question already has answers here:
get the next word after grep matching [duplicate]
(3 answers)
Closed 7 years ago.
I have a file called in.txt which contains a whole bunch of code, however I need to extract a user ID which is guaranteed to be of the form 'EID:nmb685', potentially with content before and/or after the guaranteed format. I want to extract the 'nmb685' using a bash script. I've tried some combinations of grep and sed but nothing has worked.
if your grep doesn't support -p but supports -o, you can combine grep and awk.
grep -o 'EID:\w\+' file|awk -F':' '{print $2}'
Though can it be done by awk alone, but this is more straightforward.
If your grep supports -P, perl-regexp parameter, you may use this.
grep -oP 'EID:\K\w+' file
What is being output after the ID? Is there anything consistent that you can match against?
If you know the length of the userid you can use:
grep "EID:......" in.txt > out.txt
or if you don't maybe something like this (checks all char/num followed by space, preceeded by EID:)
grep "EID:[A-Za-z0-9]* " in.txt > out.txt
Not very elegant, but this works:
grep "EID:" in.txt | sed 's/\(.*\EID:......\).*/\1/g' | sed 's/^.*EID://'
Select all lines with the substring "EID:"
Remove everything after "EID:" plus 6 characters
Remove everything before (and including) "EID:"

regex: find strings that do not begin with a certain prefix [duplicate]

This question already has an answer here:
Regular expression for a string that does not start with a sequence
(1 answer)
Closed 9 years ago.
I want to find a word in strings, but only if it doesn't begin with a prefix.
for example.
I'd like to find all the appearances of APP_PERFORM_TASK, but only if they are not starting with a prefix of CMD_DO("
so,
CMD_DO("APP_PERFORM_TASK") <- OK (i don't need to know about this)
BLAH("APP_PERFORM_TASK") <-- NOT OK, this should match my search.
I tried:
(?!CMD_DO\(")APP_PERFORM_TASK
But that doesn't produce the results I need. What I doing wrong?
Here's a quick way:
Use the --invert-match (also known as -v) flag to ignore CMD_DO and pipe the results to a second grep that only matches BLAH:
grep -v CMD_DO dummy | grep BLAH
Try replacing NegativeLookAhead (?!) with NegativeLookBehind (?<!) in your regex
(?<!CMD_DO\(")APP_PERFORM_TASK
Check this in action here
Based on your comment: Let's concentrate on command line tool grep
Here is grep solution without using -P switch (perl like regex):
grep 'APP_PERFORM_TASK' file | grep -v '^CMD_DO("'
Here is grep solution using -P switch and negative lokbehind:
grep -P '(?<!^CMD_DO\(")APP_PERFORM_TASK' file
Try this
(?!CMD_DO\(").*APP_PERFORM_TASK.*
To handle an input line with both the desirable and undesirable forms like:
CMD_DO("APP_PERFORM_TASK") BLAH("APP_PERFORM_TASK")
you'd need something like this in awk (using GNU awk for gensub()):
awk -v s="APP_PERFORM_TASK" 'gensub("CMD_DO\\(\\""s,"","") ~ s' file
i.e. get rid of all of the unwanted occurrences of the string then test whats left.
An awk version
awk '/APP_PERFORM_TASK/ && !/^CMD_DO/' file

Not operator in regex [duplicate]

This question already has answers here:
Negative matching using grep (match lines that do not contain foo)
(3 answers)
Closed 12 months ago.
I have a file and I wish to grep out all the lines that do not start with a timestamp. I tried using the following regex but it did not work:
cat myFile | grep '^(?!\[0-9\]$).*$'
Any other suggestions or something that I might be doing wrong here?
Why not simply use grep -v option like this to negate:
grep -v "<pattern>" file
Let's say you want to grep all the lines in a shell script that are not commented ( do not have # at start ) then you can use:
grep -v "^\s*#" file.sh
Try this:
cat myFile | grep '^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d'
This assumes your timestamp is of the pattern dddd-dd-dd dd:dd:dd, but you change it to what matches your timestamp if it's something else.
Note: Unless you're using some kind of cmd chaining, grep pattern file is a simpler syntax
BTW: Your use of a double-negative makes me unsure if you want the timestamp lines or you want the non-timestamp lines.
You don't need a not operator, just use grep as it is most easily used: finding a pattern:
grep '^[0-9]' myFile