Convert regex from pcre to sed to split strings - regex

I have a version regex which works in PCRE format while am having trouble getting this to work with sed using match groups.
Regex:
((^[[:alnum:]]+.*)-(\d+\.\d+\.\d+-VERS|\d+\.\d+\.\d+))
Input:
aaa1-bbb2-ccc3-dddd4-ffff5-1.0.0-VERS
aaa1-bbb2-ccc3-dddd4-ffff5-11.22.33-VERS
zzz1-bbb2-ccc3-1.0.1
zzz1-1.0.1-VERS
expected output: split strings and separate the version string
group2="aaa1-bbb2-ccc3-dddd4-ffff5"
group3="1.0.0-VERS"
group2="aaa1-bbb2-ccc3-dddd4-ffff5"
group3="11.22.33-VERS"
group2="zzz1-bbb2-ccc3"
group3="1.0.1"
group2="zzz1"
group3="1.0.1-VERS"
The above output work as expected here
However, trying to use the same version with sed does not work. What am I missing?
echo "aaa1-bbb2-ccc3-dddd4-ffff5-11.22.33-VERS" | sed -E 's#((^[[:alnum:]]+.*)-(\d+\.\d+\.\d+-VERS|\d+\.\d+\.\d+))#\3 \2#p'

Why such a complicated regexp?
$ sed -E 's/(.*)-([0-9.]+(-VERS)?)$/\2\t\1/' file
1.0.0-VERS aaa1-bbb2-ccc3-dddd4-ffff5
11.22.33-VERS aaa1-bbb2-ccc3-dddd4-ffff5
1.0.1 zzz1-bbb2-ccc3
1.0.1-VERS zzz1
or:
$ sed -E 's/(.*)-([^-]+-[^-]+)$/\2\t\1/' file
1.0.0-VERS aaa1-bbb2-ccc3-dddd4-ffff5
11.22.33-VERS aaa1-bbb2-ccc3-dddd4-ffff5
ccc3-1.0.1 zzz1-bbb2
1.0.1-VERS zzz1
depending on what the output should be for input zzz1-bbb2-ccc3-1.0.1.

I think \d isn't recognised by sed. This works for me on OSX.
sed -E 's/([[:alnum:]]+.*)-([0-9]+\.[0-9]+\.[0-9]+|[0-9]+\.[0-9]+\.[0-9]+-VERS)/\1 \2/'
Input:
aaa1-bbb2-ccc3-dddd4-ffff5-11.22.33-VERS
aaa1-bbb2-ccc3-dddd4-ffff5-1.0.0-VERS
zzz1-bbb2-ccc3-1.0.1
zzz1-1.0.1-VERS
Output:
aaa1-bbb2-ccc3-dddd4-ffff5 11.22.33-VERS
aaa1-bbb2-ccc3-dddd4-ffff5 1.0.0-VERS
zzz1-bbb2-ccc3 1.0.1
zzz1 1.0.1-VERS

As #Sundeep pointed out \d+ does not work with sed and should be using [0-9]+ instead.
echo "aaa1-bbb2-ccc3-dddd4-ffff5-11.22.33-VERS" | sed -E 's#((^[[:alnum:]]+.*)-([0-9]+\.[0-9]+\.[0-9]+-VERS|[0-9]+\.[0-9]+\.[0-9]+))#\3 \2#g'

This might work for you (GNU sed):
sed -r 'h;s/^(([[:alnum:]]+-?)+)-(([[:digit:]]+\.?){3}(-VERS)?)/group1="\1"/p;g;s//group3="\3"/p;d' file
However a simpler regexp would be:
sed -r 'h;s/^(.*)-([0-9].*)/group1="\1"/p;g;s//group2="\2"/p;d' file

Related

how to replace continuous pattern in text

i have text like 1|2|3||| , and try to replace each || with |0|, my command is following
echo '1|2|3|||' | sed -e 's/||/|0|/g'
but get result 1|2|3|0||, the pattern is only replaced once.
could someone help me improve the command, thx
Just do it 2 times
l_replace='s#||#|0|#g'
echo '1|2|3||||||||4||5|||' | sed -e "$l_replace;$l_replace"
Using any sed or any awk in any shell on every Unix box:
$ echo '1|2|3|||' | sed -e 's/||/|0|/g; s/||/|0|/g'
1|2|3|0|0|
$ echo '1|2|3|||' | awk '{while(gsub(/\|\|/,"|0|"));}1'
1|2|3|0|0|
This might work for you (GNU sed):
sed 's/||/|0|/g;s//[0]/g' file
or:
sed ':a;s/||/|0|/g;ta' file
The replacement needs to actioned twice because part of the match is in the replacement.

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

BASH: replacing PERL with SED for in-place substitution

Would like to replace this statement with perl:
perl -pe "s|(?<=://).+?(?=/)|$2:80|"
with
sed -e "s|<regex>|$2:80|"
Since sed has a much less powerful regex engine (for example it does not support look-arounds) the task boils down to writing a sed compatible regex to match only a domain name in a fully qualitied URL. Examples:
http://php2-mindaugasb.c9.io/Testing/JS/displayName.js
http://php2-mindaugasb.c9.io?a=Testing.js
http://www.google.com?a=Testing.js
Should become:
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js
A solution like this would be ok:
sed -e "s|<regex>|http://$2:80|"
Thanks :)
Use the below sed command.
$ sed "s~//[^/?]\+\([?/]\)~//\$2:80\1~g" file
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js
You must need to escape the $ at the replacement part.
sed 's|http://[^/?]*|http://$2:80|' file
Output:
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js

How to extract substring with sed but first occurence only?

Using sed -n "s/.*\(\/.*\/\).*/\1/p on /string1/string2 produces string1, as expected.
However, using the same on /string1/string2/string3 produces string2.
How can I print the first occurrence only, that is string1.
This does exactly what I wanted:
sed -n "s/[^/]*\(\/[a-z]*\).*/\1/p"
You can use this sed:
sed -n 's|/\([^/]*\)/.*|\1|p'
Avoid escaping / by using an alternate delimiter.
This might work for you (GNU sed):
sed -n 's/[^\/]*\(\/[^\/]*\/\).*/\1/p' file
perl's non-greedy regex quantifiers are handy:
perl -pe 's{.*?(/.*?/).*}{$1}' <<END
foobar/string1/string2/string3
END
/string1/
If you want to use bash shell:
str="foobar/string1/string2/string3"
string1=$( IFS=/; set -- $str; echo $2 )

sed: mix explicit and regex phrases

I'm trying to write a sed command to remove a specific string followed by two digits. So far I have:
sed -e 's/bizzbuzz\([0-9][0-9]\)//' file.txt
but I cant seem to get the syntax right. Any suggestions?
sed -re 's/bizzbuzz[0-9]{2}//' file.txt
and
sed -re 's/\bbizzbuzz[0-9]{2}\b//' file.txt
if the searched string have word boundary
sed -e 's/bizzbuzz[0-9]\{2\}//' file.txt
if you don't have GNU sed
Your current approach seems like it should work fine:
$ echo 'FOO bizzbuzz56 BAR' | sed -e 's/bizzbuzz\([0-9][0-9]\)//'
FOO BAR
As said in other answer, the syntax seems to be fine (with unnecesary parenthesis).
But may be you want to replace all the strings found in each line ? In that case, you should add a 'g' at the end of the 's' command:
sed -e 's/bizzbuzz\([0-9][0-9]\)//g' file.txt