grep regex to pull out a string between two known strings - regex

I have a string of text in a file that I am parsing out, I almost got it but not sure what I am missing
basic expression I am using is
cat cred.txt | grep -m 1 -o '&CD=[^&]*'
I am getting a results of
&CD=u8AA-RaF-97gc_SdZ0J74gc_SdZ0J196gc_SdZ0J211
I do not want the &CD= part in the resulting string, how would I do that.
The string I am parsing from is:
webpage.asp?UserName=username&CD=u8AA-RaF-97gc_SdZ0J74gc_SdZ0J196gc_SdZ0J211&Country=USA

If your grep knows Perl regex:
grep -m 1 -oP '(?<=&CD=)[^&]*' cred.txt
If not:
sed '1s/.*&CD=\([^&]*\).*/\1/' cred.txt

Many ways to skin this cat.
Extend your pipe:
grep -o 'CD=[^&]*' cred.txt | cut -d= -f2
Or do a replacement in sed:
sed -r 's/.*[&?]CD=([^&]*).*/\1/' cred.txt
Or get really fancy and parse the actual QUERY_STRING in awk:
awk -F'?' '{ split($2, a, "&"); for(i in a){split(a[i], kv, "="); out[kv[1]]=kv[2];} print out["CD"];}'

Related

Grep first line which contain a date

I'm trying to fetch the first line in a log file which contain a date.
Here is an example of the log file :
SOME
LOG
2021-1-1 21:50:19.0|LOG|DESC1
2021-1-4 21:50:19.0|LOG|DESC2
2021-1-5 21:50:19.0|LOG|DESC3
2021-1-5 21:50:19.0|LOG|DESC4
In this context I need to get the following line:
2021-1-1 21:50:19.0|LOG|DESC1
An other log file example :
SOME
LOG
21-1-3 21:50:19.0|LOG|DESC1
21-1-3 21:50:19.0|LOG|DESC2
21-1-4 21:50:19.0|LOG|DESC3
21-1-5 21:50:19.0|LOG|DESC4
I need to fetch :
21-1-3 21:50:19.0|LOG|DESC1
At the moment I tried the following command :
cat /path/to/file | grep "$(date +"%Y-%m-%d")" | tail -1
cat /path/to/file | grep "$(date +"%-Y-%-m-%-d")" | tail -1
cat /path/to/file | grep -E "[0-9]+-[0-9]+-[0-9]" | tail -1
In case you are ok with awk, could you please try following. This will find the matched regex first line and exit from program, which will be faster since its NOT reading whole Input_file.
awk '
/^[0-9]{2}([0-9]{2})?-[0-9]{1,2}-[0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]+/{
print
exit
}' Input_file
Using sed, being not too concerned about exactly how many digits are present:
sed -En '/^[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+[.][0-9]+[|]/ {p; q}' file
$ grep -m1 '^[0-9]' file1
2021-1-1 21:50:19.0|LOG|DESC1
$ grep -m1 '^[0-9]' file2
21-1-3 21:50:19.0|LOG|DESC1
If that's not all you need then edit your question to provide more truly representative sample input/output.
A simple grep with -m 1 (to exit after finding first match):
grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file1
2021-1-1 21:50:19.0|LOG|DESC1
grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file2
21-1-3 21:50:19.0|LOG|DESC1
This sed works with either GNU or POSIX sed:
sed -nE '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{p;q;}' file
But awk, with the same BRE, is probably better:
awk '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{print; exit}' file

grep within nested brackets

How do I grep strings in between nested brackets using bash? Is it possible without the use of loops? For example, if I have a string like:
[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]
I wish to grep only the two target strings inside the [[]]:
TargetString1
TargetString2
I tried the following command which cannot get TargetString2
grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
With GNU's grep P option:
grep -oP "(?<=\[\[)[\w\s]+"
The regex will match a sequence of word characters (\w+) when followed by two brackets ([[). This works for your sample string, but will not work for more complicated constructs like:
[[[[TargetString1]]TargetString2:SomethingIDontWantAfterColon[[TargetString3]]]]
where only TargetString1 and TargetString3 are matched.
To extract from nested [[]] brackets, you can use sed
#!/bin/bash
str="[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]"
echo $str | grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
echo $str | sed 's/.*\[\([^]]*\)\].*/\1/g' #which works only if string exsit between []
Output:
TargetString1
TargetString2
You can use grep regex grep -Eo '\[\[\w+' | sed 's/\[\[//g' for doing this
[root#localhost ~]# echo "[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]" | grep -Eo '\[\[\w+' | sed 's/\[\[//g'
TargetString1
TargetString2
[root#localhost ~]#

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/
Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy
Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy
awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy
The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

How to extract value from the string in bash?

I have an input string in the following format:
bugfix/ABC-12345-1-00
I want to extract "ABC-12345". Regex for that format in C# looks like this:
.\*\\/([A-Z]+-[0-9]+).\*
How can I do that in a bash script? I've tried sed and awk but had no success because I need to extract value from the capturing group and skip the rest.
If your grep supports -P then you could use the below grep commands.
$ echo 'bugfix/ABC-12345-1-00' | grep -oP '/\K[A-Z]+-\d+'
ABC-12345
\K keeps the text matched so far out of the overall regex match.
$ echo 'bugfix/ABC-12345-1-00' | grep -oP '(?<=/)[A-Z]+-\d+'
ABC-12345
(?<=/) Positive lookbehind which asserts that the match must be preceded by a / symbol.
Through sed,
$ echo 'bugfix/ABC-12345-1-00' | sed 's~.*/\([A-Z]\+-[0-9]\+\).*~\1~'
ABC-12345
echo "bugfix/ABC-12345-1-00"| perl -ane '/.*?([A-Z]+\-[0-9]+).*/;print $1."\n"'
You could try something like:
echo "bugfix/ABC-12345-1-00" | egrep -o '[A-Z]+-[0-9]+'
OUTPUT:
ABC-12345
If you do not like to use regex, you can use this awk:
echo "bugfix/ABC-12345-1-00" | awk -F\/ '{print $NF}'
ABC-12345-1-00
Or just this:
awk -F\/ '$0=$NF'

Match two strings in one line with grep

I am trying to use grep to match lines that contain two different strings. I have tried the following but this matches lines that contain either string1 or string2 which not what I want.
grep 'string1\|string2' filename
So how do I match with grep only the lines that contain both strings?
You can use
grep 'string1' filename | grep 'string2'
Or
grep 'string1.*string2\|string2.*string1' filename
I think this is what you were looking for:
grep -E "string1|string2" filename
I think that answers like this:
grep 'string1.*string2\|string2.*string1' filename
only match the case where both are present, not one or the other or both.
To search for files containing all the words in any order anywhere:
grep -ril \'action\' | xargs grep -il \'model\' | xargs grep -il \'view_type\'
The first grep kicks off a recursive search (r), ignoring case (i) and listing (printing out) the name of the files that are matching (l) for one term ('action' with the single quotes) occurring anywhere in the file.
The subsequent greps search for the other terms, retaining case insensitivity and listing out the matching files.
The final list of files that you will get will the ones that contain these terms, in any order anywhere in the file.
If you have a grep with a -P option for a limited perl regex, you can use
grep -P '(?=.*string1)(?=.*string2)'
which has the advantage of working with overlapping strings. It's somewhat more straightforward using perl as grep, because you can specify the and logic more directly:
perl -ne 'print if /string1/ && /string2/'
Your method was almost good, only missing the -w
grep -w 'string1\|string2' filename
You could try something like this:
(pattern1.*pattern2|pattern2.*pattern1)
The | operator in a regular expression means or. That is to say either string1 or string2 will match. You could do:
grep 'string1' filename | grep 'string2'
which will pipe the results from the first command into the second grep. That should give you only lines that match both.
And as people suggested perl and python, and convoluted shell scripts, here a simple awk approach:
awk '/string1/ && /string2/' filename
Having looked at the comments to the accepted answer: no, this doesn't do multi-line; but then that's also not what the author of the question asked for.
Don't try to use grep for this, use awk instead. To match 2 regexps R1 and R2 in grep you'd think it would be:
grep 'R1.*R2|R2.*R1'
while in awk it'd be:
awk '/R1/ && /R2/'
but what if R2 overlaps with or is a subset of R1? That grep command simply would not work while the awk command would. Lets say you want to find lines that contain the and heat:
$ echo 'theatre' | grep 'the.*heat|heat.*the'
$ echo 'theatre' | awk '/the/ && /heat/'
theatre
You'd have to use 2 greps and a pipe for that:
$ echo 'theatre' | grep 'the' | grep 'heat'
theatre
and of course if you had actually required them to be separate you can always write in awk the same regexp as you used in grep and there are alternative awk solutions that don't involve repeating the regexps in every possible sequence.
Putting that aside, what if you wanted to extend your solution to match 3 regexps R1, R2, and R3. In grep that'd be one of these poor choices:
grep 'R1.*R2.*R3|R1.*R3.*R2|R2.*R1.*R3|R2.*R3.*R1|R3.*R1.*R2|R3.*R2.*R1' file
grep R1 file | grep R2 | grep R3
while in awk it'd be the concise, obvious, simple, efficient:
awk '/R1/ && /R2/ && /R3/'
Now, what if you actually wanted to match literal strings S1 and S2 instead of regexps R1 and R2? You simply can't do that in one call to grep, you have to either write code to escape all RE metachars before calling grep:
S1=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< 'R1')
S2=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< 'R2')
grep 'S1.*S2|S2.*S1'
or again use 2 greps and a pipe:
grep -F 'S1' file | grep -F 'S2'
which again are poor choices whereas with awk you simply use a string operator instead of regexp operator:
awk 'index($0,S1) && index($0.S2)'
Now, what if you wanted to match 2 regexps in a paragraph rather than a line? Can't be done in grep, trivial in awk:
awk -v RS='' '/R1/ && /R2/'
How about across a whole file? Again can't be done in grep and trivial in awk (this time I'm using GNU awk for multi-char RS for conciseness but it's not much more code in any awk or you can pick a control-char you know won't be in the input for the RS to do the same):
awk -v RS='^$' '/R1/ && /R2/'
So - if you want to find multiple regexps or strings in a line or paragraph or file then don't use grep, use awk.
git grep
Here is the syntax using git grep with multiple patterns:
git grep --all-match --no-index -l -e string1 -e string2 -e string3 file
You may also combine patterns with Boolean expressions such as --and, --or and --not.
Check man git-grep for help.
--all-match When giving multiple pattern expressions, this flag is specified to limit the match to files that have lines to match all of them.
--no-index Search files in the current directory that is not managed by Git.
-l/--files-with-matches/--name-only Show only the names of files.
-e The next parameter is the pattern. Default is to use basic regexp.
Other params to consider:
--threads Number of grep worker threads to use.
-q/--quiet/--silent Do not output matched lines; exit with status 0 when there is a match.
To change the pattern type, you may also use -G/--basic-regexp (default), -F/--fixed-strings, -E/--extended-regexp, -P/--perl-regexp, -f file, and other.
Related:
How to grep for two words existing on the same line?
Check if all of multiple strings or regexes exist in a file
How to run grep with multiple AND patterns? & Match all patterns from file at once
For OR operation, see:
How do I grep for multiple patterns with pattern having a pipe character?
Grep: how to add an “OR” condition?
Found lines that only starts with 6 spaces and finished with:
cat my_file.txt | grep
-e '^ .*(\.c$|\.cpp$|\.h$|\.log$|\.out$)' # .c or .cpp or .h or .log or .out
-e '^ .*[0-9]\{5,9\}$' # numers between 5 and 9 digist
> nolog.txt
Let's say we need to find count of multiple words in a file testfile.
There are two ways to go about it
1) Use grep command with regex matching pattern
grep -c '\<\(DOG\|CAT\)\>' testfile
2) Use egrep command
egrep -c 'DOG|CAT' testfile
With egrep you need not to worry about expression and just separate words by a pipe separator.
grep ‘string1\|string2’ FILENAME
GNU grep version 3.1
Place the strings you want to grep for into a file
echo who > find.txt
echo Roger >> find.txt
echo [44][0-9]{9,} >> find.txt
Then search using -f
grep -f find.txt BIG_FILE_TO_SEARCH.txt
grep '(string1.*string2 | string2.*string1)' filename
will get line with string1 and string2 in any order
for multiline match:
echo -e "test1\ntest2\ntest3" |tr -d '\n' |grep "test1.*test3"
or
echo -e "test1\ntest5\ntest3" >tst.txt
cat tst.txt |tr -d '\n' |grep "test1.*test3\|test3.*test1"
we just need to remove the newline character and it works!
You should have grep like this:
$ grep 'string1' file | grep 'string2'
I often run into the same problem as yours, and I just wrote a piece of script:
function m() { # m means 'multi pattern grep'
function _usage() {
echo "usage: COMMAND [-inH] -p<pattern1> -p<pattern2> <filename>"
echo "-i : ignore case"
echo "-n : show line number"
echo "-H : show filename"
echo "-h : show header"
echo "-p : specify pattern"
}
declare -a patterns
# it is important to declare OPTIND as local
local ignorecase_flag filename linum header_flag colon result OPTIND
while getopts "iHhnp:" opt; do
case $opt in
i)
ignorecase_flag=true ;;
H)
filename="FILENAME," ;;
n)
linum="NR," ;;
p)
patterns+=( "$OPTARG" ) ;;
h)
header_flag=true ;;
\?)
_usage
return ;;
esac
done
if [[ -n $filename || -n $linum ]]; then
colon="\":\","
fi
shift $(( $OPTIND - 1 ))
if [[ $ignorecase_flag == true ]]; then
for s in "${patterns[#]}"; do
result+=" && s~/${s,,}/"
done
result=${result# && }
result="{s=tolower(\$0)} $result"
else
for s in "${patterns[#]}"; do
result="$result && /$s/"
done
result=${result# && }
fi
result+=" { print "$filename$linum$colon"\$0 }"
if [[ ! -t 0 ]]; then # pipe case
cat - | awk "${result}"
else
for f in "$#"; do
[[ $header_flag == true ]] && echo "########## $f ##########"
awk "${result}" $f
done
fi
}
Usage:
echo "a b c" | m -p A
echo "a b c" | m -i -p A # a b c
You can put it in .bashrc if you like.
grep -i -w 'string1\|string2' filename
This works for exact word match and matching case insensitive words ,for that -i is used
When the both strings are in sequence then put a pattern in between on grep command:
$ grep -E "string1(?.*)string2" file
Example if the following lines are contained in a file named Dockerfile:
FROM python:3.8 as build-python
FROM python:3.8-slim
To get the line that contains the strings: FROM python and as build-python then use:
$ grep -E "FROM python:(?.*) as build-python" Dockerfile
Then the output will show only the line that contain both strings:
FROM python:3.8 as build-python
If git is initialized and added to the branch then it is better to use git grep because it is super fast and it will search inside the whole directory.
git grep 'string1.*string2.*string3'
searching for two String and highlight only string1 and string2
grep -E 'string1.*string2|string2.*string1' filename | grep -E 'string1|string2'
or
grep 'string1.*string2\|string2.*string1' filename | grep -E 'string1\|string2'
ripgrep
Here is the example using rg:
rg -N '(?P<p1>.*string1.*)(?P<p2>.*string2.*)' file.txt
It's one of the quickest grepping tools, since it's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.
Use it, especially when you're working with a large data.
See also related feature request at GH-875.