Regex to grab group of numbers - regex

I have been working for a few hours on how to write a regex for a bash script that will only grab a group of more than 2 numbers. For example, if I had #jk2478_0.JPEG, I would only want to return 2478. I can return all of the numbers, but can't figure how to not include the 0 in the result for this example. Here is what I have so far.
i='#jk2478_0.JPEG';
f=`echo $i | sed s/[^0-9]*[^0-9]//g`
echo $f #24780

$ echo '#jk2478_0.JPEG,' | grep -E -o '[0-9]{2,}'
2478
-o means match only

Other way using sed
echo '#jk2478_0.JPEG,' | sed -re 's/(.*)([a-zA-Z]+)([0-9]+)(.*)/\3/'

perhaps this?
f=`echo $i | sed s/.*([0-9]\{2,\}.*/\1/`

Related

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/
Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy
Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy
awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy
The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

bash - Extract part of string

I have a string something like this
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
I want to extract the following from it :
AppointmentManagementService.xsd6.xsd
I have tried using regex, bash and sed with no success. Can someone please help me out with this?
The regex that I used was this :
/AppointmentManagementService.xsd\d{1,2}.xsd/g
Your string is:
nampt#nampt-desktop:$ cat 1
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
Try with awk:
cat 1 | awk -F "\"" '{print $2}'
Output:
AppointmentManagementService.xsd6.xsd
sed doesn't recognize \d, use [0-9] or [[:digit:]] instead:
sed 's/^.*schemaLocation="\([^"]\+[[:digit:]]\{1,2\}\.xsd\)".*$/\1/g'
## or
sed 's/^.*schemaLocation="\([^"]\+[0-9]\{1,2\}\.xsd\)".*$/\1/g'
You can use bash native regex matching:
$ in='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
$ if [[ $in =~ \"(.+)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
Output:
AppointmentManagementService.xsd6.xsd
Based on your example, if you want to grant, at least, 1 or, at most, 2 digits in the .xsd... component, you can fine tune the regex with:
$ if [[ $in =~ \"(AppointmentManagementService.xsd[0-9]{1,2}.xsd)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
using PCRE in GNU grep
grep -oP 'schemaLocation="\K.*?(?=")'
this will output pattern matched between schemaLocation=" and very next occurrence of "
Reference:
https://unix.stackexchange.com/a/13472/109046
Also we can use 'cut' command for this purpose,
[root#code]# echo "xsd:import schemaLocation=\"AppointmentManagementService.xsd6.xsd\" namespace=" | cut -d\" -f 2
AppointmentManagementService.xsd6.xsd
s='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
echo $s | sed 's/.*schemaLocation="\(.*\)" namespace=.*/\1/'

How to extract value from the string in bash?

I have an input string in the following format:
bugfix/ABC-12345-1-00
I want to extract "ABC-12345". Regex for that format in C# looks like this:
.\*\\/([A-Z]+-[0-9]+).\*
How can I do that in a bash script? I've tried sed and awk but had no success because I need to extract value from the capturing group and skip the rest.
If your grep supports -P then you could use the below grep commands.
$ echo 'bugfix/ABC-12345-1-00' | grep -oP '/\K[A-Z]+-\d+'
ABC-12345
\K keeps the text matched so far out of the overall regex match.
$ echo 'bugfix/ABC-12345-1-00' | grep -oP '(?<=/)[A-Z]+-\d+'
ABC-12345
(?<=/) Positive lookbehind which asserts that the match must be preceded by a / symbol.
Through sed,
$ echo 'bugfix/ABC-12345-1-00' | sed 's~.*/\([A-Z]\+-[0-9]\+\).*~\1~'
ABC-12345
echo "bugfix/ABC-12345-1-00"| perl -ane '/.*?([A-Z]+\-[0-9]+).*/;print $1."\n"'
You could try something like:
echo "bugfix/ABC-12345-1-00" | egrep -o '[A-Z]+-[0-9]+'
OUTPUT:
ABC-12345
If you do not like to use regex, you can use this awk:
echo "bugfix/ABC-12345-1-00" | awk -F\/ '{print $NF}'
ABC-12345-1-00
Or just this:
awk -F\/ '$0=$NF'

Bash shave a first and/or last character from string, but only if it is a certain character

In bash I need to shave a first and/or last character from string, but only if it is a certain character.
If I have | I need
/foo/bar/hah/ => foo/bar/hah
foo/bar/hah => foo/bar/hah
You can downvote me for not listing everything I've tried. But the fact is I've tried at least 35 differents sed strings and bash character stuff, many of which was from stack overflow. I simply cannot get this to happen.
what's the problem with the simple one?
sed "s/^\///;s/\/$//"
Output is
foo/bar/hah
foo/bar/hah
In pure bash :
$ var=/foo/bar/hah/
$ var=${var%/}
$ echo ${var#/}
foo/bar/hah
$
Check bash parameter expansion
or with sed :
$ sed -r 's#(^/|/$)##g' file
How about simply this:
echo "$x" | sed -e 's:^/::' -e 's:/$::'
Further to #sputnick's answer and from this answer, here's a function that would do it:
STR="/foo/bar/etc/";
STRB="foo/bar/etc";
function trimslashes {
STR="$1"
STR=${STR#"/"}
STR=${STR%"/"}
echo "$STR"
}
trimslashes $STR
trimslashes $STRB
# foo/bar/etc
# foo/bar/etc
echo '/foo/bar/hah/' | sed 's#^/##' | sed 's#/$##'
assuming the / character is the only one you're trying to remove, then sed -E 's_^[/](.*)_\1_' should do the job:
$ echo "$var1"; echo "$var2"
/foo/bar/hah
foo/bar/hah
$ echo "$var1" | sed -E 's_^[/](.*)_\1_'
foo/bar/hah
$ echo "$var2" | sed -E 's_^[/](.*)_\1_'
foo/bar/hah
if you also need to replace other characters at the start of the line, add it to the [/] class. for example, if you need to replace / or -, it would be sed -E 's_^[/-](.*)_\1_'
Here is an awk version:
echo "/foo/bar/hah/" | awk '{gsub(/^\/|\/$/,"")}1'
foo/bar/hah

Can not extract the capture group with either sed or grep

I want to extract the value pair from a key-value pair syntax but I can not.
Example I tried:
echo employee_id=1234 | sed 's/employee_id=\([0-9]+\)/\1/g'
But this gives employee_id=1234 and not 1234 which is actually the capture group.
What am I doing wrong here? I also tried:
echo employee_id=1234| egrep -o employee_id=([0-9]+)
but no success.
1. Use grep -Eo: (as egrep is deprecated)
echo 'employee_id=1234' | grep -Eo '[0-9]+'
1234
2. using grep -oP (PCRE):
echo 'employee_id=1234' | grep -oP 'employee_id=\K([0-9]+)'
1234
3. Using sed:
echo 'employee_id=1234' | sed 's/^.*employee_id=\([0-9][0-9]*\).*$/\1/'
1234
To expand on anubhava's answer number 2, the general pattern to have grep return only the capture group is:
$ regex="$precedes_regex\K($capture_regex)(?=$follows_regex)"
$ echo $some_string | grep -oP "$regex"
so
# matches and returns b
$ echo "abc" | grep -oP "a\K(b)(?=c)"
b
# no match
$ echo "abc" | grep -oP "z\K(b)(?=c)"
# no match
$ echo "abc" | grep -oP "a\K(b)(?=d)"
Using awk
echo 'employee_id=1234' | awk -F= '{print $2}'
1234
use sed -E for extended regex
echo employee_id=1234 | sed -E 's/employee_id=([0-9]+)/\1/g'
You are specifically asking for sed, but in case you may use something else - any POSIX-compliant shell can do parameter expansion which doesn't require a fork/subshell:
foo='employee_id=1234'
var=${foo%%=*}
value=${foo#*=}
 
$ echo "var=${var} value=${value}"
var=employee_id value=1234