Regular expression to extract a percentage - regex

I have strings like the following: blabla a13724bla-bla244 35%
Notice that there is always a space before the percentage. I would like to extract the percentage number (so, without the %) from these strings using the Linux shell.

Assuming you have GNU grep:
$ grep -oP '\d+(?=%)' <<< "blabla a13724bla-bla244 35%"
35

Using sed:
echo blabla a13724bla-bla244 35% | sed 's/.*[ \t][ \t]*\([0-9][0-9]*\)%.*/\1/'
If you expect to have multiple percentages in a line then:
echo blabla 20% a13724bla-bla244 35% | \
sed -e 's/[^%0-9 ]*//g;s/ */\n/g' | sed -n '/%/p'

You can try this
echo "blabla a13724bla-bla244 35%" | cut -d' ' -f3 | sed 's/\%//g'
NOTE: Assumption is the input is always in this format and percentage is 3rd token separated by space.

You may try this regular expression:
/\s(\d+%)/

Use this regular expression:
\s(\d{1,3})%
If you need it in shell, you can use sed or this perl one-liner:
echo "blah 35%" | perl -pe "s/.*\s(\d{1,3})%/\1/g"
35

If you always have a number of continuous columns maybe you should try with awk instead of a regular expresion.
cat file.txt |awk '{print $3}' |cut -d "%" -f 1
With this code you obtain the third column.

Related

unexpected result by cutting the last column with sed

echo '60 test' | sed -r 's/(.*)\s+[^\s]+$/\1/'
result:
60 test
the last column is not cut. but it works pretty well with
echo '60 home' | sed -r 's/(.*)\s+[^\s]+$/\1/'
result:
60
why?
[^\s]+ means not backslash or s repeated 1 or more times and test contains s while home does not and so the latter matches the regexp while the former doesn't.
You should have used either of these instead to match non-space:
$ echo '60 test' | sed -r 's/(.*)\s+\S+$/\1/'
60
$ echo '60 test' | sed -r 's/(.*)\s+[^[:space:]]+$/\1/'
60
As #potong suggested in a comment, to remove the last column with sed all you really need is:
sed -E 's/\s+\S+$//'
I switched from -r to -E as -r is GNU sed only while -E is GNU or OSX/BSD sed so it's generally the better option to use BUT OSX/BSD sed won't recognize \s or \S so changing from -r to -E doesn't really make the script more portable in this case, you'd have to use this instead:
sed -E 's/[[:space:]]+[^[:space:]]+//'
and then to be completely portable to all POSIX seds it'd be:
sed 's/[[:space:]]\{1,\}[^[:space:]]\{1,\}//'
or this would behave the same if there's always 2 or more fields:
sed 's/[[:space:]]*[^[:space:]]*//'
If you are just printing the first part of your string before the space without doing any other modification, you can simply use cut
echo '60 test' | cut -d' ' -f1
60
where you define your delimiter (-d) and the field (-f) you want to select.
No need to go for a complex solution using sed and doing some replacement operations.
With awk you can also print the first field:
echo '60 test' | awk '{print $1}'
60
or via grep in perl mode to have the \s taken into account
echo '60 test' | grep -oP '^.*?(?=\s)'
60

How to display part of matched pattern in grep?

I wanted to extract 12 from a text like "abc_12_1". I am trying like this
echo "abc_12_1" | grep -Eo '[a-zA-Z]+_[0-9]+_1'
abc_12_1
But I am not able to select the digit after first _ in string, the output of above command is whole string. I am looking for some alternative in grep which I have in following Perl pattern matching.
perl -e '"abc_55_1" =~ m/[a-zA-Z]+_([0-9]+)_1/ ; print $1'
55
Is it possible with grep?
Using perl:
$ echo "abc_12_1" | perl -lne 'print /_(\d+)_/'
12
or grep:
$ echo "abc_12_1" | grep -oP '(?<=_)\d+(?=_)'
12
You could use cut:
cut -d_ -f2 <<< "abc_12_1"
Using grep:
grep -oP '(?<=_).*?(?=_)' <<< "abc_12_1"
Both would yield 12.
One way is to use awk
echo "abc_12_1" | awk -F_ '{print $2}'
12
Or grep
echo "abc_12_1" | grep -o "[0-9][0-9]"
12
Using grep with extended regex
grep -oE "[0-9]{2}" # Get only hits with two digits
grep -oE "[0-9]{2,}" # Get hits with two or more digits

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq
It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.
I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1
Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq
Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file
Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq
perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here

(GNU)Sed: how to replace any character from nth character to nth+10?

I need to replace characters from 10th to 20th in the string which looks like that:
123456789012345678901234567890
So far I've tried:
a)
Works for the 10th character ONLY:
echo "123456789012345678901234567890" | sed 's/./X/10'
b)
Doesn't work on the range:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
echo "123456789012345678901234567890" | sed 's/./X/10\,20'
echo "123456789012345678901234567890" | sed 's/./X/\{10,20\}'
echo "123456789012345678901234567890" | sed 's/./X/\{10\,20\}'
Does not work and I get error
unknown option to `s'
So - the question is - how do I make this to work:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
Try:
$ sed -r "s/^(.{9})(.{11})/\1XXXXXXXXXX/" <<< 123456789012345678901234567890
123456789XXXXXXXXXX1234567890
It is a complex sed problem, I could just find this solution:
$ sed 's/^\(.\{10\}\)\(.\{10\}\)/\1XXXXXXXXXX/' <<< 123456789012345678901234567890
1234567890XXXXXXXXXX1234567890
With awk it looks nicer:
$ awk 'BEGIN{FS=OFS=""} {for (i=10;i<=20;i++) $i="X"} {print}' <<< 123456789012345678901234567890
123456789XXXXXXXXXXX1234567890
You can do it with bash parameter substitution like this:
#!/bin/bash
s="123456789012345678901234567890"
l=${s:0:9} # Extract left part
m=${s:10:11} # Extract middle part
r=${s:20} # Extract right part
# Diddle with middle part to your heart's content and re-assemble "$l$m$r" when done
m=$(sed 's/./X/g' <<<$m)
See here for more explanation and examples.
Or, you can do this:
transform the row of letters into a column so each is on its own line
apply your edits to LINES 10 through 20 (as opposed to characters 10 through 20)
transform column of letters back into a row (by deleting linefeeds)
as shown in the one-liner below:
$ echo "123456789012345678901234567890" | sed "s/\(.\)/\1\n/g" | sed "10,20s/./X/" | tr -d "\n"
I know, that it looks ugly, but:
echo "123456789012345678901234567890" | \
sed 's/^\(.\{10\}\).\{10\}\(.*\)/\1XXXXXXXXXX\2/'
Without placing multiple X in sed command:
sed -r 's/^(.{9})(.{10,20})(.*)$/\1\n\2\n\3/' | sed -e '2s/./X/g' -e 'N;N;s/\n//g'
To replace the 10th to 20th characters, inclusive, try:
echo 123456789012345678901234567890 | sed 's/\(.\{9\}\).\{11\}/\1XXXXXXXXXX/'
123456789XXXXXXXXXX1234567890
With the GNU sed, you can use the -r switch to remove most of the backslashes:
echo 123456789012345678901234567890 | sed -r 's/(.{9}).{11}/\1XXXXXXXXXX/'
Or the naive approach also works here:
echo 123456789012345678901234567890 | sed 's/\(.........\).........../\1XXXXXXXXXX/'
This might work for you (GNU sed):
sed ':a;/.\{9\}X\{11\}/!s/\(.\{9\}X*\)./\1X/;ta' file
or with a bit of syntactic sugar:
sed -r ':a;/.{9}X{11}/!s/(.{9}X*)./\1X/;ta' file

How to extract a number from a string using grep and regex

I make a cat of a file and apply on it a grep with a regular expression like this
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55"
the command display the following output
toto.titi[12].tata=55
is it possible to modify my grep command in order to extract the number 12 as displayed output of the command?
You can grab this in pure BASH using its regex capabilities:
s='toto.titi[12].tata=55'
[[ "$s" =~ ^toto.titi\[([0-9]+)\]\.tata=[0-9]+$ ]] && echo "${BASH_REMATCH[1]}"
12
You can also use sed:
sed 's/toto.titi\[\([0-9]*\)\].tata=55/\1/' <<< "$s"
12
OR using awk:
awk -F '[\\[\\]]' '{print $2}' <<<"$s"
12
use lookahead
echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+'
12
without perl regex,use sed to replace "["
echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g'
12
Pipe it to sed and use a back reference:
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.*\[(\d*)\].*/\1/'