Regular expression to extract a percentage

Regular expression to extract a percentage - regex

I have strings like the following: blabla a13724bla-bla244 35%
Notice that there is always a space before the percentage. I would like to extract the percentage number (so, without the %) from these strings using the Linux shell.

Assuming you have GNU grep:
$ grep -oP '\d+(?=%)' <<< "blabla a13724bla-bla244 35%"
35

Using sed:
echo blabla a13724bla-bla244 35% | sed 's/.*[ \t][ \t]*\([0-9][0-9]*\)%.*/\1/'
If you expect to have multiple percentages in a line then:
echo blabla 20% a13724bla-bla244 35% | \
sed -e 's/[^%0-9 ]*//g;s/ */\n/g' | sed -n '/%/p'

You can try this
echo "blabla a13724bla-bla244 35%" | cut -d' ' -f3 | sed 's/\%//g'
NOTE: Assumption is the input is always in this format and percentage is 3rd token separated by space.

You may try this regular expression:
/\s(\d+%)/

Use this regular expression:
\s(\d{1,3})%
If you need it in shell, you can use sed or this perl one-liner:
echo "blah 35%" | perl -pe "s/.*\s(\d{1,3})%/\1/g"
35

If you always have a number of continuous columns maybe you should try with awk instead of a regular expresion.
cat file.txt |awk '{print $3}' |cut -d "%" -f 1
With this code you obtain the third column.

Related

unexpected result by cutting the last column with sed

echo '60 test' | sed -r 's/(.*)\s+[^\s]+$/\1/'
result:
60 test
the last column is not cut. but it works pretty well with
echo '60 home' | sed -r 's/(.*)\s+[^\s]+$/\1/'
result:
60
why?

[^\s]+ means not backslash or s repeated 1 or more times and test contains s while home does not and so the latter matches the regexp while the former doesn't.
You should have used either of these instead to match non-space:
$ echo '60 test' | sed -r 's/(.*)\s+\S+$/\1/'
60
$ echo '60 test' | sed -r 's/(.*)\s+[^[:space:]]+$/\1/'
60
As #potong suggested in a comment, to remove the last column with sed all you really need is:
sed -E 's/\s+\S+$//'
I switched from -r to -E as -r is GNU sed only while -E is GNU or OSX/BSD sed so it's generally the better option to use BUT OSX/BSD sed won't recognize \s or \S so changing from -r to -E doesn't really make the script more portable in this case, you'd have to use this instead:
sed -E 's/[[:space:]]+[^[:space:]]+//'
and then to be completely portable to all POSIX seds it'd be:
sed 's/[[:space:]]\{1,\}[^[:space:]]\{1,\}//'
or this would behave the same if there's always 2 or more fields:
sed 's/[[:space:]]*[^[:space:]]*//'

If you are just printing the first part of your string before the space without doing any other modification, you can simply use cut
echo '60 test' | cut -d' ' -f1
60
where you define your delimiter (-d) and the field (-f) you want to select.
No need to go for a complex solution using sed and doing some replacement operations.
With awk you can also print the first field:
echo '60 test' | awk '{print $1}'
60
or via grep in perl mode to have the \s taken into account
echo '60 test' | grep -oP '^.*?(?=\s)'
60

How to display part of matched pattern in grep?

I wanted to extract 12 from a text like "abc_12_1". I am trying like this
echo "abc_12_1" | grep -Eo '[a-zA-Z]+_[0-9]+_1'
abc_12_1
But I am not able to select the digit after first _ in string, the output of above command is whole string. I am looking for some alternative in grep which I have in following Perl pattern matching.
perl -e '"abc_55_1" =~ m/[a-zA-Z]+_([0-9]+)_1/ ; print $1'
55
Is it possible with grep?

Using perl:
$ echo "abc_12_1" | perl -lne 'print /_(\d+)_/'
12
or grep:
$ echo "abc_12_1" | grep -oP '(?<=_)\d+(?=_)'
12

You could use cut:
cut -d_ -f2 <<< "abc_12_1"
Using grep:
grep -oP '(?<=_).*?(?=_)' <<< "abc_12_1"
Both would yield 12.

One way is to use awk
echo "abc_12_1" | awk -F_ '{print $2}'
12
Or grep
echo "abc_12_1" | grep -o "[0-9][0-9]"
12
Using grep with extended regex
grep -oE "[0-9]{2}" # Get only hits with two digits
grep -oE "[0-9]{2,}" # Get hits with two or more digits

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq

It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.

I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1

Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq

Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file

Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq

perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here

(GNU)Sed: how to replace any character from nth character to nth+10?

I need to replace characters from 10th to 20th in the string which looks like that:
123456789012345678901234567890
So far I've tried:
a)
Works for the 10th character ONLY:
echo "123456789012345678901234567890" | sed 's/./X/10'
b)
Doesn't work on the range:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
echo "123456789012345678901234567890" | sed 's/./X/10\,20'
echo "123456789012345678901234567890" | sed 's/./X/\{10,20\}'
echo "123456789012345678901234567890" | sed 's/./X/\{10\,20\}'
Does not work and I get error
unknown option to `s'
So - the question is - how do I make this to work:
echo "123456789012345678901234567890" | sed 's/./X/10,20'

Try:
$ sed -r "s/^(.{9})(.{11})/\1XXXXXXXXXX/" <<< 123456789012345678901234567890
123456789XXXXXXXXXX1234567890

It is a complex sed problem, I could just find this solution:
$ sed 's/^\(.\{10\}\)\(.\{10\}\)/\1XXXXXXXXXX/' <<< 123456789012345678901234567890
1234567890XXXXXXXXXX1234567890
With awk it looks nicer:
$ awk 'BEGIN{FS=OFS=""} {for (i=10;i<=20;i++) $i="X"} {print}' <<< 123456789012345678901234567890
123456789XXXXXXXXXXX1234567890

You can do it with bash parameter substitution like this:
#!/bin/bash
s="123456789012345678901234567890"
l=${s:0:9} # Extract left part
m=${s:10:11} # Extract middle part
r=${s:20} # Extract right part
# Diddle with middle part to your heart's content and re-assemble "$l$m$r" when done
m=$(sed 's/./X/g' <<<$m)
See here for more explanation and examples.
Or, you can do this:
transform the row of letters into a column so each is on its own line
apply your edits to LINES 10 through 20 (as opposed to characters 10 through 20)
transform column of letters back into a row (by deleting linefeeds)
as shown in the one-liner below:
$ echo "123456789012345678901234567890" | sed "s/\(.\)/\1\n/g" | sed "10,20s/./X/" | tr -d "\n"

I know, that it looks ugly, but:
echo "123456789012345678901234567890" | \
sed 's/^\(.\{10\}\).\{10\}\(.*\)/\1XXXXXXXXXX\2/'

Without placing multiple X in sed command:
sed -r 's/^(.{9})(.{10,20})(.*)$/\1\n\2\n\3/' | sed -e '2s/./X/g' -e 'N;N;s/\n//g'

To replace the 10th to 20th characters, inclusive, try:
echo 123456789012345678901234567890 | sed 's/\(.\{9\}\).\{11\}/\1XXXXXXXXXX/'
123456789XXXXXXXXXX1234567890
With the GNU sed, you can use the -r switch to remove most of the backslashes:
echo 123456789012345678901234567890 | sed -r 's/(.{9}).{11}/\1XXXXXXXXXX/'
Or the naive approach also works here:
echo 123456789012345678901234567890 | sed 's/\(.........\).........../\1XXXXXXXXXX/'

This might work for you (GNU sed):
sed ':a;/.\{9\}X\{11\}/!s/\(.\{9\}X*\)./\1X/;ta' file
or with a bit of syntactic sugar:
sed -r ':a;/.{9}X{11}/!s/(.{9}X*)./\1X/;ta' file

How to extract a number from a string using grep and regex

I make a cat of a file and apply on it a grep with a regular expression like this
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55"
the command display the following output
toto.titi[12].tata=55
is it possible to modify my grep command in order to extract the number 12 as displayed output of the command?

You can grab this in pure BASH using its regex capabilities:
s='toto.titi[12].tata=55'
[[ "$s" =~ ^toto.titi\[([0-9]+)\]\.tata=[0-9]+$ ]] && echo "${BASH_REMATCH[1]}"
12
You can also use sed:
sed 's/toto.titi\[\([0-9]*\)\].tata=55/\1/' <<< "$s"
12
OR using awk:
awk -F '[\\[\\]]' '{print $2}' <<<"$s"
12

use lookahead
echo toto.titi[12].tata=55|grep -oP '(?<=\[)\d+'
12
without perl regex,use sed to replace "["
echo toto.titi[12].tata=55|grep -o "\[[0-9]\+"|sed 's/\[//g'
12

Pipe it to sed and use a back reference:
cat /tmp/tmp_file | grep "toto.titi\[[0-9]\+\].tata=55" | sed 's/.*\[(\d*)\].*/\1/'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression to extract a percentage - regex

I have strings like the following: blabla a13724bla-bla244 35% Notice that there is always a space before the percentage. I would like to extract the percentage number (so, without the %) from these strings using the Linux shell.

Assuming you have GNU grep: $ grep -oP '\d+(?=%)' <<< "blabla a13724bla-bla244 35%" 35

Using sed: echo blabla a13724bla-bla244 35% | sed 's/.[ \t][ \t]\([0-9][0-9]\)%./\1/' If you expect to have multiple percentages in a line then: echo blabla 20% a13724bla-bla244 35% | \ sed -e 's/[^%0-9 ]//g;s/ /\n/g' | sed -n '/%/p'

You can try this echo "blabla a13724bla-bla244 35%" | cut -d' ' -f3 | sed 's/\%//g' NOTE: Assumption is the input is always in this format and percentage is 3rd token separated by space.

You may try this regular expression: /\s(\d+%)/

Use this regular expression: \s(\d{1,3})% If you need it in shell, you can use sed or this perl one-liner: echo "blah 35%" | perl -pe "s/.*\s(\d{1,3})%/\1/g" 35

If you always have a number of continuous columns maybe you should try with awk instead of a regular expresion. cat file.txt |awk '{print $3}' |cut -d "%" -f 1 With this code you obtain the third column.

Related

unexpected result by cutting the last column with sed

How to display part of matched pattern in grep?

How to use sed to identify a string in brackets?

(GNU)Sed: how to replace any character from nth character to nth+10?

How to extract a number from a string using grep and regex

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression to extract a percentage - regex

I have strings like the following: blabla a13724bla-bla244 35% Notice that there is always a space before the percentage. I would like to extract the percentage number (so, without the %) from these strings using the Linux shell.

Assuming you have GNU grep: $ grep -oP '\d+(?=%)' <<< "blabla a13724bla-bla244 35%" 35

Using sed: echo blabla a13724bla-bla244 35% | sed 's/.*[ \t][ \t]*\([0-9][0-9]*\)%.*/\1/' If you expect to have multiple percentages in a line then: echo blabla 20% a13724bla-bla244 35% | \ sed -e 's/[^%0-9 ]*//g;s/ */\n/g' | sed -n '/%/p'

You can try this echo "blabla a13724bla-bla244 35%" | cut -d' ' -f3 | sed 's/\%//g' NOTE: Assumption is the input is always in this format and percentage is 3rd token separated by space.

You may try this regular expression: /\s(\d+%)/

Use this regular expression: \s(\d{1,3})% If you need it in shell, you can use sed or this perl one-liner: echo "blah 35%" | perl -pe "s/.*\s(\d{1,3})%/\1/g" 35

If you always have a number of continuous columns maybe you should try with awk instead of a regular expresion. cat file.txt |awk '{print $3}' |cut -d "%" -f 1 With this code you obtain the third column.

Related

unexpected result by cutting the last column with sed

How to display part of matched pattern in grep?

How to use sed to identify a string in brackets?

(GNU)Sed: how to replace any character from nth character to nth+10?

How to extract a number from a string using grep and regex

Categories

Resources

Using sed: echo blabla a13724bla-bla244 35% | sed 's/.[ \t][ \t]\([0-9][0-9]\)%./\1/' If you expect to have multiple percentages in a line then: echo blabla 20% a13724bla-bla244 35% | \ sed -e 's/[^%0-9 ]//g;s/ /\n/g' | sed -n '/%/p'