Regex to get number after last underscore - regex

I am having trouble coming up with the regex command that will get me Y in the following string X_X_X_Y . BTW: Y is an interger, but can validate that after.

You could use shell parameter expansion:
$ s="X_X_X_Y"
$ echo "${s##*_}"
Y
Using sed:
$ sed 's/.*_//' <<< "$s"
Y
Using grep:
$ grep -oP '.*_\K.*' <<< "$s"
Y

This regex will work as long at the stuff you're matching for is an integer
[^_]+_[^_]+_[^_]+_(\d+)

as an alternative, if you are always tokenizing on the _ char you can skip regex and use awk
echo 'X_X_X_Y' | awk -F_ '{print $NF}'

Using BASH regex:
s='s="X_X_X_10'
[[ "$s" =~ [^_]+$ ]] && echo "${BASH_REMATCH[0]}"
10

This will print an integer at the end of the string after an underscore.
perl -e '"0_0_0_1" =~ /_([0-9]+)$/; print $1,"\n" if defined $1'
1

This might work for you:
sed 's/.*_\([0-9][0-9]*\)/\1/' file

Related

How to parse every match of sed command

I have a string [u'SOMEVALUE1', u'SOMEVALUE2', u'SOMEVALUE3'], I would like to parse every element matched by my sed command. The element matched are in the single quote. Here is my script
#!/bin/bash
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for id in $(sed -n "s/^.*'\(.*\)'.*$/\1/ p" <<< ${ARR});
do
echo "$id"
done
I have only the first value returned.
The wildcard .* will match the longest leftmost possible string. If your intention is to match the individual substrings which are in single quotes, try
grep -o "'[^']*'" <<<"$ARR"
To remove the single quotes around the values, simply pipe to sed "s/'//g" and to loop over the lines printed by a pipe, do
... commands ... |
while read -r id; do
: things with "$id"
done
BASH can match regular expressions with the help of =~ (see man bash). Matching more than once is a bit painful but in your case we can split the input on white space and match once per item:
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for A in $ARR
do
[[ $A =~ u\'(.+)\' ]] && echo ${BASH_REMATCH[1]}
done
results in
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1
is this what you're trying to do?
$ ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
$ awk -v RS="'" '!(NR%2)' <<< "$ARR"
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1
$ awk -v RS="'" '!(NR%2)' <<< "$ARR" |
while IFS= read -r id; do echo "id=$id"; done
id=SOMEVALUE1
id=SOMEVALUE1
id=SOMEVALUE1

Bash replace string between tokens

How to use sed and regex to replace the text between a variable number of one token?
Example of input:
/abc/bcd/cde/
Expected output:
/../../../
Tried:
Command: echo "/abc/bcd/cde/" | sed 's/\/.*\//\/..\//g' output: /../
Using perl and look around assertions :
$ perl -pe 's|(?<=/)\w{3}(?=/)|..|g' file
/../../../
Using sed :
$ echo "/abc/bcd/cde/" | sed -E 's|[a-z]{3}|..|g'
/../../../
Replace every substring of non-slashes ([^/]\+) with two dots:
$> echo "/abc/bcd/cde/" | sed 's$[^/]\+$..$g'
# => /../../../
Base on #Gilles Quenot implementation but, capturing any alpha numeric chars between //
$ echo "/abddc/bcqsdd/cdde/" | sed -E 's|(/)?[^/]+/|\1../|g'

bash - Extract part of string

I have a string something like this
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
I want to extract the following from it :
AppointmentManagementService.xsd6.xsd
I have tried using regex, bash and sed with no success. Can someone please help me out with this?
The regex that I used was this :
/AppointmentManagementService.xsd\d{1,2}.xsd/g
Your string is:
nampt#nampt-desktop:$ cat 1
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
Try with awk:
cat 1 | awk -F "\"" '{print $2}'
Output:
AppointmentManagementService.xsd6.xsd
sed doesn't recognize \d, use [0-9] or [[:digit:]] instead:
sed 's/^.*schemaLocation="\([^"]\+[[:digit:]]\{1,2\}\.xsd\)".*$/\1/g'
## or
sed 's/^.*schemaLocation="\([^"]\+[0-9]\{1,2\}\.xsd\)".*$/\1/g'
You can use bash native regex matching:
$ in='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
$ if [[ $in =~ \"(.+)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
Output:
AppointmentManagementService.xsd6.xsd
Based on your example, if you want to grant, at least, 1 or, at most, 2 digits in the .xsd... component, you can fine tune the regex with:
$ if [[ $in =~ \"(AppointmentManagementService.xsd[0-9]{1,2}.xsd)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
using PCRE in GNU grep
grep -oP 'schemaLocation="\K.*?(?=")'
this will output pattern matched between schemaLocation=" and very next occurrence of "
Reference:
https://unix.stackexchange.com/a/13472/109046
Also we can use 'cut' command for this purpose,
[root#code]# echo "xsd:import schemaLocation=\"AppointmentManagementService.xsd6.xsd\" namespace=" | cut -d\" -f 2
AppointmentManagementService.xsd6.xsd
s='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
echo $s | sed 's/.*schemaLocation="\(.*\)" namespace=.*/\1/'

How to extract a number from a string using grep and regex

I make a cat of a file and apply on it a grep with a regular expression like this
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55"
the command display the following output
toto.titi[12].tata=55
is it possible to modify my grep command in order to extract the number 12 as displayed output of the command?
You can grab this in pure BASH using its regex capabilities:
s='toto.titi[12].tata=55'
[[ "$s" =~ ^toto.titi\[([0-9]+)\]\.tata=[0-9]+$ ]] && echo "${BASH_REMATCH[1]}"
12
You can also use sed:
sed 's/toto.titi\[\([0-9]*\)\].tata=55/\1/' <<< "$s"
12
OR using awk:
awk -F '[\\[\\]]' '{print $2}' <<<"$s"
12
use lookahead
echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+'
12
without perl regex,use sed to replace "["
echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g'
12
Pipe it to sed and use a back reference:
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.*\[(\d*)\].*/\1/'

remove last 14 digits from string and a underscore, if there are 14 digits

I have a string like this:
data-c(huk24-small1);divider-bin-1.4.4;divider-conf-1.3.3-w(1,16);storage-bin-1.5.4;storage-conf-1.5.0-w(1);worker-bin-4.5.1;worker-conf-4.4.1-c(huk24)_20130620200658
where the timestamp with 14 digits and the underscore should be removed. So it should look like this:
data-c(huk24-small1);divider-bin-1.4.4;divider-conf-1.3.3-w(1,16);storage-bin-1.5.4;storage-conf-1.5.0-w(1);worker-bin-4.5.1;worker-conf-4.4.1-c(huk24)
How can I achieve this in a bash script? Note that removing should only happen, when there really is an underscore and 14 digits.
Use parameter expansion:
string=${string%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]}
Use sed:
echo $str | sed 's/_[0-9]\{14\}$//'
OR
echo ${str%_[0-9]*}
For example:
perl -plE 's/_\d{14}$//' < input > output
e.g
echo 'data-c(huk24-small1);divider-bin-1.4.4;divider-conf-1.3.3-w(1,16);storage-bin-1.5.4;storage-conf-1.5.0-w(1);worker-bin-4.5.1;worker-conf-4.4.1-c(huk24)_20130620200658' | perl -plE 's/_\d{14}$//'
produces:
data-c(huk24-small1);divider-bin-1.4.4;divider-conf-1.3.3-w(1,16);storage-bin-1.5.4;storage-conf-1.5.0-w(1);worker-bin-4.5.1;worker-conf-4.4.1-c(huk24)
Use bash regular expressions:
[[ $string =~ ^(.*)_[[:digit:]]{14}$ ]] && string=${BASH_REMATCH[1]}
With awk:
awk --re-interval -F_ '{$NF~/^[0-9]{14}$/?NF--:NF}1' <<< $var