I have an input string in the following format:
bugfix/ABC-12345-1-00
I want to extract "ABC-12345". Regex for that format in C# looks like this:
.\*\\/([A-Z]+-[0-9]+).\*
How can I do that in a bash script? I've tried sed and awk but had no success because I need to extract value from the capturing group and skip the rest.
If your grep supports -P then you could use the below grep commands.
$ echo 'bugfix/ABC-12345-1-00' | grep -oP '/\K[A-Z]+-\d+'
ABC-12345
\K keeps the text matched so far out of the overall regex match.
$ echo 'bugfix/ABC-12345-1-00' | grep -oP '(?<=/)[A-Z]+-\d+'
ABC-12345
(?<=/) Positive lookbehind which asserts that the match must be preceded by a / symbol.
Through sed,
$ echo 'bugfix/ABC-12345-1-00' | sed 's~.*/\([A-Z]\+-[0-9]\+\).*~\1~'
ABC-12345
echo "bugfix/ABC-12345-1-00"| perl -ane '/.*?([A-Z]+\-[0-9]+).*/;print $1."\n"'
You could try something like:
echo "bugfix/ABC-12345-1-00" | egrep -o '[A-Z]+-[0-9]+'
OUTPUT:
ABC-12345
If you do not like to use regex, you can use this awk:
echo "bugfix/ABC-12345-1-00" | awk -F\/ '{print $NF}'
ABC-12345-1-00
Or just this:
awk -F\/ '$0=$NF'
Related
How to use sed and regex to replace the text between a variable number of one token?
Example of input:
/abc/bcd/cde/
Expected output:
/../../../
Tried:
Command: echo "/abc/bcd/cde/" | sed 's/\/.*\//\/..\//g' output: /../
Using perl and look around assertions :
$ perl -pe 's|(?<=/)\w{3}(?=/)|..|g' file
/../../../
Using sed :
$ echo "/abc/bcd/cde/" | sed -E 's|[a-z]{3}|..|g'
/../../../
Replace every substring of non-slashes ([^/]\+) with two dots:
$> echo "/abc/bcd/cde/" | sed 's$[^/]\+$..$g'
# => /../../../
Base on #Gilles Quenot implementation but, capturing any alpha numeric chars between //
$ echo "/abddc/bcqsdd/cdde/" | sed -E 's|(/)?[^/]+/|\1../|g'
I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/
Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy
Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy
awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy
The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!
I have an output like Johny-Smith, Juarez-Hugo, etc. and I need instead S, H, etc. Basically, I need the last uppercase letter in a string and that's it. If this is possible in any built in Linux tools (ex awk, sed, grep, etc.) it would be greatly appreciated.
Do you need like this ?
echo "Johny-Smith" | sed 's/^.*\([A-Z]\)[^A-Z]*$/\1/g'
Test:
$ echo "Johny-Smith-Hello Johny-Smith" | sed 's/.*\([A-Z]\)[^A-Z]*/\1/g'
S
With GNU grep and if PCRE option is available
$ echo 'Johny-Smith' | grep -oP '.*\K[A-Z]'
S
$ echo 'Juarez-Hugo' | grep -oP '.*\K[A-Z]'
H
-o prints only matched portion
-P Perl regular expression
.*\K positive lookbehind, not part of output
[A-Z] any uppercase character
with perl, see perldoc for command line options explanation
$ # prints the string within captured group
$ echo 'Johny-Smith' | perl -lne 'print /.*([A-Z])/'
S
$ echo 'Juarez-Hugo' | perl -lne 'print /.*([A-Z])/'
H
In Bash:
$ var="Johny-Smith-Hello Johny-Smith"; var="${var//[^[:upper:]]/}";echo "${var: -1}"
S
${var//[^[:upper:]]/} remove all non-upper case letter chars
echo ${var: -1} output the last one
I wanted to extract 12 from a text like "abc_12_1". I am trying like this
echo "abc_12_1" | grep -Eo '[a-zA-Z]+_[0-9]+_1'
abc_12_1
But I am not able to select the digit after first _ in string, the output of above command is whole string. I am looking for some alternative in grep which I have in following Perl pattern matching.
perl -e '"abc_55_1" =~ m/[a-zA-Z]+_([0-9]+)_1/ ; print $1'
55
Is it possible with grep?
Using perl:
$ echo "abc_12_1" | perl -lne 'print /_(\d+)_/'
12
or grep:
$ echo "abc_12_1" | grep -oP '(?<=_)\d+(?=_)'
12
You could use cut:
cut -d_ -f2 <<< "abc_12_1"
Using grep:
grep -oP '(?<=_).*?(?=_)' <<< "abc_12_1"
Both would yield 12.
One way is to use awk
echo "abc_12_1" | awk -F_ '{print $2}'
12
Or grep
echo "abc_12_1" | grep -o "[0-9][0-9]"
12
Using grep with extended regex
grep -oE "[0-9]{2}" # Get only hits with two digits
grep -oE "[0-9]{2,}" # Get hits with two or more digits
I want to extract the value pair from a key-value pair syntax but I can not.
Example I tried:
echo employee_id=1234 | sed 's/employee_id=\([0-9]+\)/\1/g'
But this gives employee_id=1234 and not 1234 which is actually the capture group.
What am I doing wrong here? I also tried:
echo employee_id=1234| egrep -o employee_id=([0-9]+)
but no success.
1. Use grep -Eo: (as egrep is deprecated)
echo 'employee_id=1234' | grep -Eo '[0-9]+'
1234
2. using grep -oP (PCRE):
echo 'employee_id=1234' | grep -oP 'employee_id=\K([0-9]+)'
1234
3. Using sed:
echo 'employee_id=1234' | sed 's/^.*employee_id=\([0-9][0-9]*\).*$/\1/'
1234
To expand on anubhava's answer number 2, the general pattern to have grep return only the capture group is:
$ regex="$precedes_regex\K($capture_regex)(?=$follows_regex)"
$ echo $some_string | grep -oP "$regex"
so
# matches and returns b
$ echo "abc" | grep -oP "a\K(b)(?=c)"
b
# no match
$ echo "abc" | grep -oP "z\K(b)(?=c)"
# no match
$ echo "abc" | grep -oP "a\K(b)(?=d)"
Using awk
echo 'employee_id=1234' | awk -F= '{print $2}'
1234
use sed -E for extended regex
echo employee_id=1234 | sed -E 's/employee_id=([0-9]+)/\1/g'
You are specifically asking for sed, but in case you may use something else - any POSIX-compliant shell can do parameter expansion which doesn't require a fork/subshell:
foo='employee_id=1234'
var=${foo%%=*}
value=${foo#*=}
$ echo "var=${var} value=${value}"
var=employee_id value=1234