How to extract a number from a string using grep and regex

How to extract a number from a string using grep and regex - regex

I make a cat of a file and apply on it a grep with a regular expression like this
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55"
the command display the following output
toto.titi[12].tata=55
is it possible to modify my grep command in order to extract the number 12 as displayed output of the command?

You can grab this in pure BASH using its regex capabilities:
s='toto.titi[12].tata=55'
[[ "$s" =~ ^toto.titi\[([0-9]+)\]\.tata=[0-9]+$ ]] && echo "${BASH_REMATCH[1]}"
12
You can also use sed:
sed 's/toto.titi\[\([0-9]*\)\].tata=55/\1/' <<< "$s"
12
OR using awk:
awk -F '[\\[\\]]' '{print $2}' <<<"$s"
12

use lookahead
echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+'
12
without perl regex,use sed to replace "["
echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g'
12

Pipe it to sed and use a back reference:
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.*\[(\d*)\].*/\1/'

Related

Using grep regex to select to first hyphen

echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | grep -oE "([^\/]+$)"
This prints just the filename, without the directory structure, but I cannot manage to print just mainbinary from that string. Suggestions?

And a sed alternative to PS.'s great grep -oP
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |sed -r 's#^.*/([^-]+).*#\1#'
mainbinary

echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |grep -oP '.*/\K[^-]+'
mainbinary
This will scan till last / and ignore everything to its left and keep moving until - (excluding)

With any awk in any shell on any UNIX machine:
$ echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | awk -F'[/-]' '{print $3}'
mainbinary

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/

Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy

Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy

awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy

The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

bash - Extract part of string

I have a string something like this
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
I want to extract the following from it :
AppointmentManagementService.xsd6.xsd
I have tried using regex, bash and sed with no success. Can someone please help me out with this?
The regex that I used was this :
/AppointmentManagementService.xsd\d{1,2}.xsd/g

Your string is:
nampt#nampt-desktop:$ cat 1
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
Try with awk:
cat 1 | awk -F "\"" '{print $2}'
Output:
AppointmentManagementService.xsd6.xsd

sed doesn't recognize \d, use [0-9] or [[:digit:]] instead:
sed 's/^.*schemaLocation="\([^"]\+[[:digit:]]\{1,2\}\.xsd\)".*$/\1/g'
## or
sed 's/^.*schemaLocation="\([^"]\+[0-9]\{1,2\}\.xsd\)".*$/\1/g'

You can use bash native regex matching:
$ in='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
$ if [[ $in =~ \"(.+)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
Output:
AppointmentManagementService.xsd6.xsd
Based on your example, if you want to grant, at least, 1 or, at most, 2 digits in the .xsd... component, you can fine tune the regex with:
$ if [[ $in =~ \"(AppointmentManagementService.xsd[0-9]{1,2}.xsd)\" ]]; then echo "${BASH_REMATCH[1]}"; fi

using PCRE in GNU grep
grep -oP 'schemaLocation="\K.*?(?=")'
this will output pattern matched between schemaLocation=" and very next occurrence of "
Reference:
https://unix.stackexchange.com/a/13472/109046

Also we can use 'cut' command for this purpose,
[root#code]# echo "xsd:import schemaLocation=\"AppointmentManagementService.xsd6.xsd\" namespace=" | cut -d\" -f 2
AppointmentManagementService.xsd6.xsd

s='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
echo $s | sed 's/.*schemaLocation="\(.*\)" namespace=.*/\1/'

How to display part of matched pattern in grep?

I wanted to extract 12 from a text like "abc_12_1". I am trying like this
echo "abc_12_1" | grep -Eo '[a-zA-Z]+_[0-9]+_1'
abc_12_1
But I am not able to select the digit after first _ in string, the output of above command is whole string. I am looking for some alternative in grep which I have in following Perl pattern matching.
perl -e '"abc_55_1" =~ m/[a-zA-Z]+_([0-9]+)_1/ ; print $1'
55
Is it possible with grep?

Using perl:
$ echo "abc_12_1" | perl -lne 'print /_(\d+)_/'
12
or grep:
$ echo "abc_12_1" | grep -oP '(?<=_)\d+(?=_)'
12

You could use cut:
cut -d_ -f2 <<< "abc_12_1"
Using grep:
grep -oP '(?<=_).*?(?=_)' <<< "abc_12_1"
Both would yield 12.

One way is to use awk
echo "abc_12_1" | awk -F_ '{print $2}'
12
Or grep
echo "abc_12_1" | grep -o "[0-9][0-9]"
12
Using grep with extended regex
grep -oE "[0-9]{2}" # Get only hits with two digits
grep -oE "[0-9]{2,}" # Get hits with two or more digits

Can not extract the capture group with either sed or grep

I want to extract the value pair from a key-value pair syntax but I can not.
Example I tried:
echo employee_id=1234 | sed 's/employee_id=\([0-9]+\)/\1/g'
But this gives employee_id=1234 and not 1234 which is actually the capture group.
What am I doing wrong here? I also tried:
echo employee_id=1234| egrep -o employee_id=([0-9]+)
but no success.

1. Use grep -Eo: (as egrep is deprecated)
echo 'employee_id=1234' | grep -Eo '[0-9]+'
1234
2. using grep -oP (PCRE):
echo 'employee_id=1234' | grep -oP 'employee_id=\K([0-9]+)'
1234
3. Using sed:
echo 'employee_id=1234' | sed 's/^.*employee_id=\([0-9][0-9]*\).*$/\1/'
1234

To expand on anubhava's answer number 2, the general pattern to have grep return only the capture group is:
$ regex="$precedes_regex\K($capture_regex)(?=$follows_regex)"
$ echo $some_string | grep -oP "$regex"
so
# matches and returns b
$ echo "abc" | grep -oP "a\K(b)(?=c)"
b
# no match
$ echo "abc" | grep -oP "z\K(b)(?=c)"
# no match
$ echo "abc" | grep -oP "a\K(b)(?=d)"

Using awk
echo 'employee_id=1234' | awk -F= '{print $2}'
1234

use sed -E for extended regex
echo employee_id=1234 | sed -E 's/employee_id=([0-9]+)/\1/g'

You are specifically asking for sed, but in case you may use something else - any POSIX-compliant shell can do parameter expansion which doesn't require a fork/subshell:
foo='employee_id=1234'
var=${foo%%=*}
value=${foo#*=}
 
$ echo "var=${var} value=${value}"
var=employee_id value=1234

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract a number from a string using grep and regex - regex

use lookahead echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+' 12 without perl regex,use sed to replace "[" echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g' 12

Pipe it to sed and use a back reference: cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.\[(\d)\].*/\1/'

Related

Using grep regex to select to first hyphen

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

bash - Extract part of string

How to display part of matched pattern in grep?

Can not extract the capture group with either sed or grep

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract a number from a string using grep and regex - regex

use lookahead echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+' 12 without perl regex,use sed to replace "[" echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g' 12

Pipe it to sed and use a back reference: cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.*\[(\d*)\].*/\1/'

Related

Using grep regex to select to first hyphen

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

bash - Extract part of string

How to display part of matched pattern in grep?

Can not extract the capture group with either sed or grep

Categories

Resources

Pipe it to sed and use a back reference: cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.\[(\d)\].*/\1/'