get number value between two strings using regex - regex

I have a string with multiple value outputs that looks like this:
SD performance read=1450kB/s write=872kB/s no error (0 0), ManufactorerID 27 Date 2014/2 CardType 2 Blocksize 512 Erase 0 MaxtransferRate 25000000 RWfactor 2 ReadSpeed 22222222Hz WriteSpeed 22222222Hz MaxReadCurrentVDDmin 3 MaxReadCurrentVDDmax 5 MaxWriteCurrentVDDmin 3 MaxWriteCurrentVDDmax 1
I would like to output only the read value (1450kB/s) using bash and sed.
I tried
sed 's/read=\(.*\)kB/\1/'
but that outputs read=1450kB but I only want the number.
Thanks for any help.

Sample input shortened for demo:
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/read=\(.*\)kB/\1/'
SD performance 1450kB/s write=872/s no error
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/.*read=\(.*\)kB.*/\1/'
1450kB/s write=872
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/.*read=\([0-9]*\)kB.*/\1/'
1450
Since entire line has to be replaced, add .* before and after search pattern
* is greedy, will try to match as much as possible, so in 2nd example it can be seen that it matched even the values of write
Since only numbers after read= is needed, use [0-9] instead of .

Running
sed 's/read=\(.*\)kB/\1/'
will replace read=[digits]kB with [digit]. If you want to replace the whole string, use
sed 's/.*read=\([0-9]*\)kB.*/\1/'
instead.
As Sundeep noticed, sed doesn't support non-greedy pattern, updated for [0-9]* instead

Related

How to search for multiple words of a specific pattern and separator?

I'm trying to trim out multiple hex words from my string. I'm searching for exactly 3 words, separated by exactly 1 dash each time.
i.e. for this input:
wonder-indexing-service-0.20.0-1605296913-49b045f-19794354.jar
I'd like to get this output:
wonder-indexing-service-0.20.0.jar
I was able to remove the hex words by repeating the pattern. How can I simplify it? Also, I wasn't able to change * to +, to avoid allowing empty words. Any idea how to do that?
What I've got so far:
# Good, but how can I simplify?
% echo 'wonder-indexing-service-0.20.0-1605296913-49b045f-19794354.jar' | sed 's/\-[a-fA-F0-9]*\-[a-fA-F0-9]*\-[a-fA-F0-9]*//g'
druid-indexing-service-0.20.0.jar
# Bad, I'm allowing empty words
% echo 'wonder-indexing-service-0.20.0-1605296913-49b045f-.jar' | sed 's/\-[a-fA-F0-9]*\-[a-fA-F0-9]*\-[a-fA-F0-9]*//g'
druid-indexing-service-0.20.0.jar
Thank you!
EDIT: I had a typo in original output, thank you anubhava for pointing out.
You may use this sed:
s='wonder-indexing-service-0.20.0-1605296913-49b045f-19794354.jar'
sed -E 's/(-[a-fA-F0-9]{3,})+//' <<< "$s"
wonder-indexing-service-0.20.0.jar
Breakup:
(: Start a group
-: Match a hyphen
[a-fA-F0-9]{3,}: Match 3 or more hex characters
)+: End the group. Repeat this group 1+ times
If you want to use the + you have to escape it \+, but you can repeat matching 3 words prepended by a hyphen using a quantifier which also need escaping
\(-[a-fA-F0-9]\+\)\{3\}
Example
echo 'wonder-indexing-service-0.20.0-1605296913-49b045f-19794354.jar' | sed 's/\(-[a-fA-F0-9]\+\)\{3\}//g'
Output
wonder-indexing-service-0.20.0.jar
If you don't want to allow a trailing - then you can match the .jar and put that back in the replacement.
echo 'wonder-indexing-service-0.20.0-1605296913-49b045f-19794354.jar' | sed 's/\(-[a-fA-F0-9]\+\)\{3\}\(\.jar$\)/\2/g'
printf "wonder-indexing-service-0.20.0-1605296913-49b045f-19794354.jar" | cut -d'-' -f1-4 | sed s'#$#.jar#'

regex, repeat, count group

i need some help with a regex that follows up this format:
First part of the string is a email address, followed by eight columns divided by ";".
a.test#test.com;Alex;Test;Alex A.Test;Alex;12;34;56;78
the first part i have is (.*#.*com)
these are also possible source strings:
a.test#test.com;Alex;;Alex A.Test;;12;34;56;78
a.test#test.com;Alex;;Alex A.Test;Alex;;34;;78
a.test#test.com;Alex;Test;;Alex;12;34;56; and so on
You can try this regex:
^(.*#.*com)(([^";\n]*|"[^"\n]*");){8}(([^";\n]*|"[^"\n]*"))$
If you have a different number of columns after the adress change the number between { and }
For your data here the catches:
1. `a.test#test.com`
2. `56;`
3. `56`
4. `78`
Here the test
If you are sure there will be no " in your strings you can use this:
^(.*#.*com)(([^;\n]*);){8}([^;\n]*)$
Here the test
Edit:
OP suggested this usage:
For use the first regex with sed you need -i -n -E flags and escape the " char.
The result will look like this:
sed -i -n -E "/(.*#.*com)(([^\";\n]*|\"[^\"\n]*\");){8}(([^\";\n]*|\"[^\"\n]*\"))/p"
you can have something like
".*#.*\.com;[A-Z,a-z]*;[A-Z,a-z]*;[A-Z,a-z, ,.,]*;[A-Z,a-z]*;[0-9][0-9];[0-9][0-9];[0-9][0-9];[0-9][0-9]"
Assuming the numbers are only two digit
Using awk you can do this easily:
awk -F ';' '$1 ~ /\.com$/{print NF}' file
9
9
9
cat file
a.test#test.com;Alex;;Alex A.Test;;12;34;56;78
a.test#test.com;Alex;;Alex A.Test;Alex;;34;;78
a.test#test.com;Alex;Test;;Alex;12;34;56; and so on

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.
You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38
Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.
There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

regexp to filter out lines in a file

Hi I have big file that have two kinds of lines. One that ends with .1 and the other ends with .2. Now i have to filter out all the ones with .2.
Here are the first two lines of the file.
>AT1G53860.1 | Symbols: | Remorin family protein | chr1:20107165-20109458 REVERSE LENGTH=1329
>AT1G34370.2 | Symbols: STOP1 | C2H2 and C2HC zinc fingers superfamily protein | chr1:12551002-12552501 FORWARD LENGTH=1500
When try to use grep -v "\.2*" test.txt > out.txt, i am getting both the lines. What am i doing wrong?
Thanks
Upendra
2* means that there may be as many twos as you want -- including none of them!
I suggest being a bit more precise with your regex, or you might filter out what you don't want filtered:
grep -Ev '^>\w{9}\.2' test.txt > out.txt
So, we want:
^ -- looking from the beginning of the line,
> -- exactly one ">" char,
\w{9} -- exactly nine chars or digits or underscores,
. -- exactly one dot,
2 -- digit "2".
The argument -E means extended regex, so that \w and {9} would work as needed.
You don't need * in search pattern. Following should work:
grep -v "\.2" test.txt > out.txt
EDIT
Moreover as pointed out by drahnr, above would match .2 anywhere in the line. Looking at the specific pattern of sample input, match pattern should be modified to match .2 only at the end of the first word in the line.
egrep -v "^>\w+\.2" test.txt > out.txt
Your file seems to be column based. You can also use awk regex to match the first column.
awk '$1!~/\.2$/' file

Simplify points in KML using regex

I am trying to cut down the file size of a kml file I have.
The coordinates for the polygons are this accurate:
-113.52106535153605,53.912817815321503,0.
I am not very good with regex, but I think it would be possible to write one that selects the eight characters before the commas. I'd run a search and replace so the result would be
-113.521065,53.9128178,0.
Any regex experts out there think this is possible?
Try this
\d{8}(?=,)
and replace with an empty string
See it here on Regexr
Here is something that might work. Replaces 8 chars and the coma with a coma: s/(.{8}),/,/g;
echo "-113.52106535153605,53.912817815321503,0." | sed 's/.\{8\},/,/'
So you can cat the file you have to a sed command like this:
cat file.kml | sed 's/.\{8\},/,/' > newfile.kml
I Just had to do the same thing. This is perl instead of sed, but it will look for a string of eight uninterrupted digits and then replace any number of uninterrupted digits after that with nothing. It worked great.
cat originalfile.kml | perl -pe 's/(?<=\d{8})\d*//g' > shortenedfile.kml