Convert vim replace to sed / awk in Bash - regex

I want to add a ; before every negative value at the end of my data which looks like this:
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;;;;;;;;;;20,00;VAL
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;012345678;012345678901;FW02ZZZ46847351235;;;;;;;-1,13;;VAL
My trial in vim:
:%s/-\d\{0,5}\,\d\{0,2}/;&\1/g
Unfortunately, I can't call this with sed:
sed -E 's/-\d\{0,5}\,\d\{0,2}/;&\1/g'
I get the error message:
sed: 1: "s/-\d\{0,5}\,\d\{0,2}/; ...": \1 not defined in the RE
How do I convert this so that I can call it from the command line/with sed?
Thank you!

You may use
sed -E 's/-\d{0,5}(,\d{1,2})?/;&/g'
Details
- - a hyphen
\d{0,5} - 0 to 5 digits
-(,\d{1,2})? - an optional capturing group matching 1 or 0 occurrences of
, - a comma
\d{1,2} - 1 or 2 digits.
The & in the replacement pattern stands for the whole match value.
See the online sed demo:
s="29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;;;;;;;;;;20,00;VAL
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;012345678;012345678901;FW02ZZZ46847351235;;;;;;;-1,13;;VAL"
sed -E 's/-\d{0,5}(,\d{1,2})?/;&/g' <<< "$s"
Output:
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;;;;;;;;;;20,00;VAL
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;012345678;012345678901;FW02ZZZ46847351235;;;;;;;;-1,13;;VAL

Related

Sed: can not replace part of the string whithout replacing all of it

I am trying to replace part of the string, but can not find a proper regex for sed to execute it properly.
I have a string
/abc/foo/../bar
And I would like to achive the following result:
/abc/bar
I have tried to do it using this command:
echo $string | sed 's/\/[^:-]*\..\//\//'
But as result I am getting just /bar.
I understand that I must use group, but I just do not get it.
Could you, please, help me to find out this group that could be used?
You can use
#!/bin/bash
string='/abc/foo/../bar'
sed -nE 's~^(/[^/]*)(/.*)?/\.\.(/[^/]*).*~\1\3~p' <<< "$string"
See the online demo. Details:
-n - suppresses default line output
E - enables POSIX ERE regex syntax
^ - start of string
(/[^/]*) - Group 1: a / and then zero or more chars other than /
(/.*)? - an optional group 2: a / and then any text
/\.\. - a /.. fixed string
(/[^/]*) - Group 3: a / and then zero or more chars other than /
.* - the rest of the string.
\1\3 replaces the match with Group 1 and 3 values concatenated
p only prints the result of successful substitution.
You can use a capture group for the first part and then match until the last / to remove.
As you are using / to match in the pattern, you can opt for a different delimiter.
#!/bin/bash
string="/abc/foo/../bar"
sed 's~\(/[^/]*/\)[^:-]*/~\1~' <<< "$string"
The pattern in parts:
\( Capture group 1
/[^/]*/ Match from the first till the second / with any char other than / in between
\) Close group 1
[^:-]*/ Match optional chars other than : and - then match /
Output
/abc/bar
Using sed
$ sed 's#^\(/[^/]*\)/.*\(/\)#\1\2#' input_file
/abc/bar
or
$ sed 's#[^/]*/[^/]*/##2' input_file
/abc/bar
Using awk
string='/abc/foo/../bar'
awk -F/ '{print "/"$2"/"$NF}' <<< "$string"
#or
awk -F/ 'BEGIN{OFS=FS}{print $1,$2,$NF}' <<< "$string"
/abc/bar
Using bash
string='/abc/foo/../bar'
echo "${string%%/${string#*/*/}}/${string##*/}"
/abc/bar
Using any sed:
$ echo "$string" | sed 's:\(/[^/]*/\).*/:\1:'
/abc/bar

Extract string between underscores and dot

I have strings like these:
/my/directory/file1_AAA_123_k.txt
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:
AAA_123_k
CCC
KK_45
I found this solution that works:
string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed 's/^[^_:]*[_:]//'
But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).
With bash version >= 3.0 and a regex:
[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"
You can use a single sed command like
sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"
See the online demo. Details:
^ - start of string
.* - any text
/ - a / char
[^_/]* - zero or more chars other than / and _
_ - a _ char
\([^/]*\) (POSIX BRE) / ([^/]*) (POSIX ERE, enabled with E option) - Group 1: any zero or more chars other than /
\. - a dot
[^./]* - zero or more chars other than . and /
$ - end of string.
With -n, default line output is suppressed and p only prints the result of successful substitution.
With your shown samples, with GNU grep you could try following code.
grep -oP '.*?_\K([^.]*)' Input_file
Explanation: Using GNU grep's -oP options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*) to get value between 1st _ and first occurrence of .. Explanation of regex is as follows:
Explanation of regex:
.*?_ ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*) ##Matching everything till first occurrence of dot as per need.
A simpler sed solution without any capturing group:
sed -E 's/^[^_]*_|\.[^.]*$//g' file
AAA_123_k
CCC
KK_45
If you need to process the file names one at a time (eg, within a while read loop) you can perform two parameter expansions, eg:
$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k
One idea to parse a list of file names at the same time:
$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45
Using sed
$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45
This is easy, except that it includes the initial underscore:
ls | grep -o "_[^.]*"

SED invalid command code for JSON response

I am trying to get a value from a JSON from my local server (https://regex101.com/r/qeGcGu/1) on a headless mac mini (catalina), via sed. However, with the sed command I'd expect to work:
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i.bak '"hash":"(.*?)"'
sed: 1: ""hash":"(.*?)"": invalid command code "
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i.bak '\"hash\":\"(.*?)\"'
sed: 1: "\"hash\":\"(.*?)\"": unterminated regular expression
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i '' '\"hash\":\"(.*?)\"'
sed: 1: "\"hash\":\"(.*?)\"": unterminated regular expression
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i '' '"hash":"(.*?)"'
sed: 1: ""hash":"(.*?)"": invalid command code "
The file that I am trying to get the string from is a raw json.
[{"added_on":1587102956,"amount_left":0,"auto_tmm":false,"availability":-1,"category":"radarr","completed":1218638934,"completion_on":1587108704,"dl_limit":-1,"dlspeed":0,"downloaded":1220894674,"downloaded_session":0,"eta":8640000,"f_l_piece_prio":false,"force_start":true,"hash":"87802183fc647548ec6efe18feb16149522f6aa0","last_activity":1587119220,"magnet_uri":"magnet:?xt=urn:btih:87802183fc647548ec6efe18feb16149522f6aa0&dn=Fantasia%202000%20(1999)%20%5b1080p%5d%20%5bYTS.AG%5d&tr=udp%3a%2f%2ftracker.coppersurfer.tk%3a6969%2fannounce&tr=udp%3a%2f%2f9.rarbg.com%3a2710%2fannounce&tr=udp%3a%2f%2fp4p.arenabg.com%3a1337&tr=udp%3a%2f%2ftracker.leechers-paradise.org%3a6969&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.zer0day.to%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.leechers-paradise.org%3a6969%2fannounce&tr=udp%3a%2f%2fcoppersurfer.tk%3a6969%2fannounce","max_ratio":-1,"max_seeding_time":-1,"name":"Fantasia 2000 (1999) [1080p] [YTS.AG]","num_complete":22,"num_incomplete":4,"num_leechs":0,"num_seeds":0,"priority":0,"progress":1,"ratio":0.1782183661159947,"ratio_limit":-2,"save_path":"/Volumes/1049/Media/","seeding_time_limit":-2,"seen_complete":1587118087,"seq_dl":false,"size":1218638934,"state":"forcedUP","super_seeding":false,"tags":"","time_active":13224,"total_size":1218638934,"tracker":"udp://tracker.coppersurfer.tk:6969/announce","up_limit":-1,"uploaded":217585854,"uploaded_session":128831791,"upspeed":0}]
Actually what I want to accomplish is to get the first 6 chars from hash:
"hash":"87802183fc647548ec6efe18feb16149522f6aa0"
In this case my desired value is 878021
Could you please guide me in the correct direction?
You may use
sed -n 's/.*"hash":"\([^"]*\).*/\1/p' /tmp/json.out
Here, note that the file can be provided directly to the sed command, no need piping it with cat.
How it works
-n - option that suppresses the default line output (by default, sed will output non-matching lines)
s/ - substitute command (we are replacing)
.*"hash":"\([^"]*\).* - matches
.* - 0+ chars
"hash":" - "hash":" substring
\([^"]*\) - Group 1 (capturing group, \1 is used in the replacement part to refer to this value) - any 0+ chars other than "
.* - 0+ chars
\1 - the replacement is Group 1 value (it is all that remains on the matching line)
p - if there was a valid replacement print the result after replacement only.

remove certain prefix in every word separate by delimiter

How to remove certain prefix in every word separate by space? which I want to remove the prefix of abc and def from the beginning of the string. I have the sed statement which make it so long. Don't know if can make it shorter and simplier
Sed: sed -e 's/, /,/g' -e 's/'.yaml$'//g' -e 's/^abc_//g' -e 's/^def_//g' -e 's/,abc_/,/g' -e 's/,def_/,/g'
Input: abc_mscp_def.yaml_v1, def_mscp_abc.yaml_v2, abc_mscp_abc.yaml_v2, def_mscp_def.yaml_v2
Output: mscp_def_v1,mscp_abc_v2,mscp_abc_v2,mscp_def_v2
You may use
sed -E 's/(^|,) ?(abc|def)_|(,) |\.yaml/\1\3/g'
See the online demo:
s="abc_mscp_def.yaml_v1, def_mscp_abc.yaml_v2, abc_mscp_abc.yaml_v2, def_mscp_def.yaml_v2"
sed -E 's/(^|,) ?(abc|def)_|(,) |\.yaml/\1\3/g' <<< "$s"
# => mscp_def_v1,mscp_abc_v2,mscp_abc_v2,mscp_def_v2
Details
-E option enables POSIX ERE syntax and alternation
(^|,) ?(abc|def)_|(,) |\.yaml - matches:
(^|,) ?(abc|def)_ - Group 1: start of string or comma, then an optional space, and then Group 2: either abc or def
| - or
(,) - Group 3: a comma, and then a space
| - or
\.yaml - .yaml substring.
The replacement is \1\3, i.e. the values of Group 1 and 3 concatenated.

Regex & Sed: How to suppress the first and the second comma in a string containing exactly 9 commas?

I would like to suppress the two first commas in a string containing 10 and only 10 commas (11 Fields). I don't want to erase the commas of the 9 commas line.
I tried this:
sed '/^\([^,]*,\)\{10\}[^,]*$/s/,//1;s/,//2'
But it deletes commas even in the sentences containing less than 10 commas and it deletes the first and the third commas.
Example:
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGIË,06346641,0636641,NL
Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.ing,rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
Result expected:
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGIË,06346641,0636641,NL
Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.ing rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
You may use
sed -E 's/^([^,]*),([^,]*),([^,]*)((,[^,]*){7})$/\1\2\3\4/'
Details
^ - start of a line
([^,]*) - Group 1 (\1): any 0+ chars other than ,
,([^,]*) - , and Group 2 (\2) matching any 0+ chars other than ,
,([^,]*) - , and Group 3 (\3) matching any 0+ chars other than ,
((,[^,]*){7}) - seven occurrences of , followed with any 0+ chars other than ,
$ - end of string.
See the online sed demo:
s="Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.inrombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR"
sed -E 's/^([^,]*),([^,]*),([^,]*)((,[^,]*){7})$/\1\2\3\4/' <<< "$s"
# => Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.inrombach Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
I guess you're using MacOS sed / BSD sed, try this:
sed -e '/^\([^,]*,\)\{10\}[^,]*$/s/,//; tLB' -e 'b' -e ':LB' -e 's/,/ /'
I used --posix to emulate, but not sure it will work on your OS:
$ cat file
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGI?,06346641,0636641,NL
Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.ing,rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
$ sed --posix -e '/^\([^,]*,\)\{10\}[^,]*$/s/,//; tLB' -e 'b' -e ':LB' -e 's/,/ /' file
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGI?,06346641,0636641,NL
Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.ing rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
Note that the second s command, I changed to replace to a space, since Leon.ing,rombah no space inside, simpy strip the , will become Leon.ingrombach.
This might work too:
sed -e '/^\([^,]*,\)\{10\}[^,]*$/{' -e 's/,/ /' -e 's/,/ /}'
Btw, I think it's high time for you to start using GNU sed:
brew install gnu-sed
ln -s /usr/local/bin/gsed /usr/local/bin/sed
This problem is also easier to use awk instead:
awk -F, 'NF==11{sub(",","");sub(","," ")}1' file
Replace only when there're 11 comma separated fields.
This might work for you (GNU sed):
sed 's/,/&/9;T;s//&/10;t;s///;s///' file
If there are not at least 9 ,'s leave line as is. If there are 10 or more ,'s leave line as is. Otherwise remove the first 2 ,'s.
An alternative:
sed -r 's/^([^,]*),([^,]*),(([^,]*,){7}[^,]*)$/\1\2\3/' file