Get Capture Group and Line before Match using Grep

Get Capture Group and Line before Match using Grep - regex

Suppose I have a file called 'test.txt':
>reference1
fooHappybar
>reference2
fooBirthdaybar
I need a grep command that will capture the string between foo and bar, and the line directly above the match. The command should result in the following output:
>reference1
Happy
>reference2
Birthday
Here is what I have so far:
grep -oP 'foo\K\w+(?=bar)' test.txt
which gives:
Happy
Birthday
I know that grep -B 1 outputs the match and line before the match. I tried:
grep -oP -B 1 'foo\K\w+(?=bar)' test.txt
But that doesn't work.
Any guidance is appreciated.
EDIT:
How would the awk command change if I had this file instead?
>reference1
AGTCTGCAFOOHAPPYBARGTACAC
>reference2
GTACAFOOBIRTHDAYBARGACCAT
expected output:
>reference1
HAPPY
>reference2
BIRTHDAY

Grep solution
grep -zPo '(foo)\K(\w+(?=bar))|.*(?=\n(?1)(?2))' | tr '\0' '\n'
Perl solution
perl -nE '/^foo(.*)bar$/&&say$p.$1;$p=$_'

You may use this awk:
awk '/FOO.+BAR/{gsub(/.*FOO|BAR.*/, ""); print p ORS $0} {p=$0}' file
>reference1
HAPPY
>reference2
BIRTHDAY

I am afraid this is impossible only using grep.
The reason is, that -o disables -B.
Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.

Related

Validating specific column in grep

Ok this is driving me crazy. I have a text file with the following content:
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
etc.
I want to filter lines on which the second date is in december, I tried things like:
grep '.*(\d{4}-\d{2}-\d{2}).*(2020-12-).*' > output.txt
grep '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
grep -P '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
But nothing seems to work. Is there any way to accomplish this with either grep, egrep, sed or awk?

You need to use -P option of grep to enable perl compatible regular expressions, could you please try following. Written and tested with your shown samples.
grep -P '("\d+",){4}"[a-zA-Z]+","2020-12-\d{2}"' Input_file
Explanation: Adding explanation for above, following is only for explanation purposes.
grep ##Starting grep command from here.
-P ##Mentioning -P option for enabling PCRE regex with grep.
'("\d+",){4} ##Looking for " digits " comma this combination 4 times here.
"[a-zA-Z]+", ##Then looking for " alphabets ", with this one.
"2020-12-\d{2}" ##Then looking for " 2020-12-07 date " which OP needs.
' Input_file ##Mentioning Input_file name here.

I suggest an alternate solution awk due to input data structured in rows and columns using a common delimiter:
awk -F, '$7 ~ /-12-/' file
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"

Use either grep -P or egrep for short:
$ cat test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
$
$ grep -P '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
$
$ egrep '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
Explanation:
^" - expect a " to start
([^"]*","){6} - scan over all chars other than ", followed by ","; repeat that 6 times
2020-12- - expect 202012-

The problem is in:
egrep '.*\d{4}-\d{2}-\d{2}.2020-12-.' > output.txt
^ HERE
The . just matches a single character, but you want to skip ",", so change to:
egrep '.*\d{4}-\d{2}-\d{2}.+2020-12-.' > output.txt
^^ HERE
The . becomes a .+.

Grep value between strings with regex

$ acpi
Battery 0: Charging, 18%, 01:37:09 until charged
How to grep the battery level value without percentage character (18)?
This should do it but I'm getting an empty result:
acpi | grep -e '(?<=, )(.*)(?=%)'

Your regex is correct but will work with experimental -P or perl mode regex option in gnu grep. You will also need -o to show only matching text.
Correct command would be:
grep -oP '(?<=, )\d+(?=%)'
However, if you don't have gnu grep then you can also use sed like this:
sed -nE 's/.*, ([0-9]+)%.*/\1/p' file
18

Could you please try following, written and tested in link https://ideone.com/nzSGKs
your_command | awk 'match($0,/Charging, [0-9]+%/){print substr($0,RSTART+10,RLENGTH-11)}'
Explanation: Adding detailed explanation for above only for explanation purposes.
your_command | ##Running OP command and passing its output to awk as standrd input here.
awk ' ##Starting awk program from here.
match($0,/Charging, [0-9]+%/){ ##Using match function to match regex Charging, [0-9]+% in line here.
print substr($0,RSTART+10,RLENGTH-11) ##Printing sub string and printing from 11th character from starting and leaving last 11 chars here in matched regex of current line.
}'

Using awk:
awk -F"," '{print $2+0}'
Using GNU sed:
sed -rn 's/.*\, *([0-9]+)\%\,.*/\1/p'

You can use sed:
$ acpi | sed -nE 's/.*Charging, ([[:digit:]]*)%.*/\1/p'
18
Or, if Charging is not always in the string, you can look for the ,:
$ acpi | sed -nE 's/[^,]*, ([[:digit:]]*)%.*/\1/p'

Using bash:
s='Battery 0: Charging, 18%, 01:37:09 until charged'
res="${s#*, }"
res="${res%%%*}"
echo "$res"
Result: 18.
res="${s#*, }" removes text from the beginning to the first comma+space and "${res%%%*}" removes all text from end till (and including) the last occurrence of %.

unix - pattern matching in file

so I have a file with the following:
username=jsmith
api=3434kjklj23j4l3kj4l34j3l4j
I would like to return using regular expression "jsmith" and "3434kjklj23j4l3kj4l34j3l4j"
I know the regular expression for it is:
(username=)(.*) > \2
(api=)(.*) > \2
however using grep or sed or awk. I can't seem to figure out the way to use them without return the entire line.
How would you go about doing that with a commandline command?

awk is made for this task:
awk -F= '{print$2}' file
If the file has other entries, you can limit the output with a condition:
awk -F= '$1=="username"||$1=="api"{print$2}' file

Here is one using bash, PCRE and positive lookbehind (where supported):
$ grep -Po "((?<=^username=)|(?<=^api=)).*" file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. output everything that is preceeded by username= or api= that start the lines.
And one in awk:
$ awk 'sub(/^(username|api)=/,""){print}' file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. print lines where preceeding ^username= or ^api= are removed first.

Since you want to see chess with the input game=chess, here some solutions without matching username= or api=
cut -d"=" -f2- file
# or
sed -n 's/[^=]*=//p' file

here's the answer that worked on the macos and RHEl7.
awk -F= '$1=="username"{print$2}' testfile.txt
awk -F= '$1=="api"{print$2}' testfile.txt
testfile.txt
username=user1
api=pass1
username=user2
api =pass2

SED: Number of returned lines

To a file jungle.txt with following text ...
A lion sleeps in the jungle
A lion sleeps tonight
A tiger awakens in the swamp
The parrot observes
Wimoweh, wimoweh, wimoweh, wimoweh
... one could perform GREP search ...
$ grep lion jungle.txt
... or SED search ...
$ sed "/lion/p" jungle.txt
... to find occurences of a pattern ("lion" in this case).
Is there some easy way to get a number of returned lines? Or at least to know that there was more than 1 found? As always, I've googled a lot first, but surprisingly found no answer.
Thanks!

grep can count matching lines:
grep -c 'lion' file
Output:
2
Syntax:
-c: Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines. (-c is specified by POSIX.)

This might work for you (GNU sed):
sed '/lion/!d' file | sed '$=;d'
or if you prefer:
sed -n '/lion/p' file | sed -n '$='
N.B. if the file is empty or the first sed command finds nothing the result of the second sed command is blank.

You can use awk
awk '/lion/ {a++} END {print a+0}'
2
But I would say that the best solution is the one posted by Cyros using grep -c 'lion' file

Just pass the grep command output to wc- l command to count the number of returned lines,
$ grep 'lion' file | wc -l
2
From wc --help
-l, --lines print the newline counts

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

please refer the file contents below.
#HD VN:1.0 SO:unsorted
#SQ SN:Chr1 LN:30427680
#PG ID:bowtie2 PN:bowtie2 VN:2.1.0
how can i extract just the number 30427680 using awk or any other unix command.

Using sed
sed -n 's/.*LN://p' < input.txt
This will erase everything up until LN:, and print what's left, and only if a substitution did take place.
Using awk
awk -v FS=: '/LN:/ { print $3; }' < input.txt
This will match lines that contain LN:, use : as field separator, and print the 3rd column.
Using grep
grep -o '[0-9]\{3,\}' < input.txt
This will match sequences of 3 or more digits, and print only the matched pattern thanks to the -o.
Depending on other cases not included in your question, you might have to make the patterns more strict.

Using grep:
grep -oP 'LN:\K.*' filename

Just use grep:
grep -o 30427680 file
-o, --only-matching
Prints only the matching part of the lines.

Using perl :
perl -ne 'print $& if /LN:\K.*/' filename
or
perl -ne 'print $1 if /LN:(.*)/' filename

Another awk
awk -F"LN:" 'NF>1 {print $2}' file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Get Capture Group and Line before Match using Grep - regex

Grep solution grep -zPo '(foo)\K(\w+(?=bar))|.(?=\n(?1)(?2))' | tr '\0' '\n' Perl solution perl -nE '/^foo(.)bar$/&&say$p.$1;$p=$_'

You may use this awk: awk '/FOO.+BAR/{gsub(/.FOO|BAR./, ""); print p ORS $0} {p=$0}' file >reference1 HAPPY >reference2 BIRTHDAY

Related

Validating specific column in grep

Grep value between strings with regex

unix - pattern matching in file

SED: Number of returned lines

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Get Capture Group and Line before Match using Grep - regex

Grep solution grep -zPo '(foo)\K(\w+(?=bar))|.*(?=\n(?1)(?2))' | tr '\0' '\n' Perl solution perl -nE '/^foo(.*)bar$/&&say$p.$1;$p=$_'

You may use this awk: awk '/FOO.+BAR/{gsub(/.*FOO|BAR.*/, ""); print p ORS $0} {p=$0}' file >reference1 HAPPY >reference2 BIRTHDAY

Related

Validating specific column in grep

Grep value between strings with regex

unix - pattern matching in file

SED: Number of returned lines

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

Categories

Resources

Grep solution grep -zPo '(foo)\K(\w+(?=bar))|.(?=\n(?1)(?2))' | tr '\0' '\n' Perl solution perl -nE '/^foo(.)bar$/&&say$p.$1;$p=$_'

You may use this awk: awk '/FOO.+BAR/{gsub(/.FOO|BAR./, ""); print p ORS $0} {p=$0}' file >reference1 HAPPY >reference2 BIRTHDAY