Convert vim replace to sed / awk in Bash

Convert vim replace to sed / awk in Bash - regex

I want to add a ; before every negative value at the end of my data which looks like this:
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;;;;;;;;;;20,00;VAL
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;012345678;012345678901;FW02ZZZ46847351235;;;;;;;-1,13;;VAL
My trial in vim:
:%s/-\d\{0,5}\,\d\{0,2}/;&\1/g
Unfortunately, I can't call this with sed:
sed -E 's/-\d\{0,5}\,\d\{0,2}/;&\1/g'
I get the error message:
sed: 1: "s/-\d\{0,5}\,\d\{0,2}/; ...": \1 not defined in the RE
How do I convert this so that I can call it from the command line/with sed?
Thank you!

You may use
sed -E 's/-\d{0,5}(,\d{1,2})?/;&/g'
Details
- - a hyphen
\d{0,5} - 0 to 5 digits
-(,\d{1,2})? - an optional capturing group matching 1 or 0 occurrences of
, - a comma
\d{1,2} - 1 or 2 digits.
The & in the replacement pattern stands for the whole match value.
See the online sed demo:
s="29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;;;;;;;;;;20,00;VAL
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;012345678;012345678901;FW02ZZZ46847351235;;;;;;;-1,13;;VAL"
sed -E 's/-\d{0,5}(,\d{1,2})?/;&/g' <<< "$s"
Output:
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;;;;;;;;;;20,00;VAL
29.01.2019;29.01.2019;KIND;NAME;ITEM;ITEMNUMBER;ITEMORDER;012345678;012345678901;FW02ZZZ46847351235;;;;;;;;-1,13;;VAL

Related

Sed: can not replace part of the string whithout replacing all of it

I am trying to replace part of the string, but can not find a proper regex for sed to execute it properly.
I have a string
/abc/foo/../bar
And I would like to achive the following result:
/abc/bar
I have tried to do it using this command:
echo $string | sed 's/\/[^:-]*\..\//\//'
But as result I am getting just /bar.
I understand that I must use group, but I just do not get it.
Could you, please, help me to find out this group that could be used?

You can use
#!/bin/bash
string='/abc/foo/../bar'
sed -nE 's~^(/[^/]*)(/.*)?/\.\.(/[^/]*).*~\1\3~p' <<< "$string"
See the online demo. Details:
-n - suppresses default line output
E - enables POSIX ERE regex syntax
^ - start of string
(/[^/]*) - Group 1: a / and then zero or more chars other than /
(/.*)? - an optional group 2: a / and then any text
/\.\. - a /.. fixed string
(/[^/]*) - Group 3: a / and then zero or more chars other than /
.* - the rest of the string.
\1\3 replaces the match with Group 1 and 3 values concatenated
p only prints the result of successful substitution.

You can use a capture group for the first part and then match until the last / to remove.
As you are using / to match in the pattern, you can opt for a different delimiter.
#!/bin/bash
string="/abc/foo/../bar"
sed 's~\(/[^/]*/\)[^:-]*/~\1~' <<< "$string"
The pattern in parts:
\( Capture group 1
/[^/]*/ Match from the first till the second / with any char other than / in between
\) Close group 1
[^:-]*/ Match optional chars other than : and - then match /
Output
/abc/bar

Using sed
$ sed 's#^\(/[^/]*\)/.*\(/\)#\1\2#' input_file
/abc/bar
or
$ sed 's#[^/]*/[^/]*/##2' input_file
/abc/bar

Using awk
string='/abc/foo/../bar'
awk -F/ '{print "/"$2"/"$NF}' <<< "$string"
#or
awk -F/ 'BEGIN{OFS=FS}{print $1,$2,$NF}' <<< "$string"
/abc/bar
Using bash
string='/abc/foo/../bar'
echo "${string%%/${string#*/*/}}/${string##*/}"
/abc/bar

Using any sed:
$ echo "$string" | sed 's:\(/[^/]*/\).*/:\1:'
/abc/bar

Extract string between underscores and dot

I have strings like these:
/my/directory/file1_AAA_123_k.txt
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:
AAA_123_k
CCC
KK_45
I found this solution that works:
string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed 's/^[^_:]*[_:]//'
But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).

With bash version >= 3.0 and a regex:
[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"

You can use a single sed command like
sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"
See the online demo. Details:
^ - start of string
.* - any text
/ - a / char
[^_/]* - zero or more chars other than / and _
_ - a _ char
\([^/]*\) (POSIX BRE) / ([^/]*) (POSIX ERE, enabled with E option) - Group 1: any zero or more chars other than /
\. - a dot
[^./]* - zero or more chars other than . and /
$ - end of string.
With -n, default line output is suppressed and p only prints the result of successful substitution.

With your shown samples, with GNU grep you could try following code.
grep -oP '.*?_\K([^.]*)' Input_file
Explanation: Using GNU grep's -oP options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*) to get value between 1st _ and first occurrence of .. Explanation of regex is as follows:
Explanation of regex:
.*?_ ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*) ##Matching everything till first occurrence of dot as per need.

A simpler sed solution without any capturing group:
sed -E 's/^[^_]*_|\.[^.]*$//g' file
AAA_123_k
CCC
KK_45

If you need to process the file names one at a time (eg, within a while read loop) you can perform two parameter expansions, eg:
$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k
One idea to parse a list of file names at the same time:
$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45

Using sed
$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45

This is easy, except that it includes the initial underscore:
ls | grep -o "_[^.]*"

SED invalid command code for JSON response

I am trying to get a value from a JSON from my local server (https://regex101.com/r/qeGcGu/1) on a headless mac mini (catalina), via sed. However, with the sed command I'd expect to work:
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i.bak '"hash":"(.*?)"'
sed: 1: ""hash":"(.*?)"": invalid command code "
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i.bak '\"hash\":\"(.*?)\"'
sed: 1: "\"hash\":\"(.*?)\"": unterminated regular expression
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i '' '\"hash\":\"(.*?)\"'
sed: 1: "\"hash\":\"(.*?)\"": unterminated regular expression
usr#mcMini ~/Documents/qBitTorrent cat /tmp/json.out | sed -i '' '"hash":"(.*?)"'
sed: 1: ""hash":"(.*?)"": invalid command code "
The file that I am trying to get the string from is a raw json.
[{"added_on":1587102956,"amount_left":0,"auto_tmm":false,"availability":-1,"category":"radarr","completed":1218638934,"completion_on":1587108704,"dl_limit":-1,"dlspeed":0,"downloaded":1220894674,"downloaded_session":0,"eta":8640000,"f_l_piece_prio":false,"force_start":true,"hash":"87802183fc647548ec6efe18feb16149522f6aa0","last_activity":1587119220,"magnet_uri":"magnet:?xt=urn:btih:87802183fc647548ec6efe18feb16149522f6aa0&dn=Fantasia%202000%20(1999)%20%5b1080p%5d%20%5bYTS.AG%5d&tr=udp%3a%2f%2ftracker.coppersurfer.tk%3a6969%2fannounce&tr=udp%3a%2f%2f9.rarbg.com%3a2710%2fannounce&tr=udp%3a%2f%2fp4p.arenabg.com%3a1337&tr=udp%3a%2f%2ftracker.leechers-paradise.org%3a6969&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.zer0day.to%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.leechers-paradise.org%3a6969%2fannounce&tr=udp%3a%2f%2fcoppersurfer.tk%3a6969%2fannounce","max_ratio":-1,"max_seeding_time":-1,"name":"Fantasia 2000 (1999) [1080p] [YTS.AG]","num_complete":22,"num_incomplete":4,"num_leechs":0,"num_seeds":0,"priority":0,"progress":1,"ratio":0.1782183661159947,"ratio_limit":-2,"save_path":"/Volumes/1049/Media/","seeding_time_limit":-2,"seen_complete":1587118087,"seq_dl":false,"size":1218638934,"state":"forcedUP","super_seeding":false,"tags":"","time_active":13224,"total_size":1218638934,"tracker":"udp://tracker.coppersurfer.tk:6969/announce","up_limit":-1,"uploaded":217585854,"uploaded_session":128831791,"upspeed":0}]
Actually what I want to accomplish is to get the first 6 chars from hash:
"hash":"87802183fc647548ec6efe18feb16149522f6aa0"
In this case my desired value is 878021
Could you please guide me in the correct direction?

You may use
sed -n 's/.*"hash":"\([^"]*\).*/\1/p' /tmp/json.out
Here, note that the file can be provided directly to the sed command, no need piping it with cat.
How it works
-n - option that suppresses the default line output (by default, sed will output non-matching lines)
s/ - substitute command (we are replacing)
.*"hash":"\([^"]*\).* - matches
.* - 0+ chars
"hash":" - "hash":" substring
\([^"]*\) - Group 1 (capturing group, \1 is used in the replacement part to refer to this value) - any 0+ chars other than "
.* - 0+ chars
\1 - the replacement is Group 1 value (it is all that remains on the matching line)
p - if there was a valid replacement print the result after replacement only.

remove certain prefix in every word separate by delimiter

How to remove certain prefix in every word separate by space? which I want to remove the prefix of abc and def from the beginning of the string. I have the sed statement which make it so long. Don't know if can make it shorter and simplier
Sed: sed -e 's/, /,/g' -e 's/'.yaml$'//g' -e 's/^abc_//g' -e 's/^def_//g' -e 's/,abc_/,/g' -e 's/,def_/,/g'
Input: abc_mscp_def.yaml_v1, def_mscp_abc.yaml_v2, abc_mscp_abc.yaml_v2, def_mscp_def.yaml_v2
Output: mscp_def_v1,mscp_abc_v2,mscp_abc_v2,mscp_def_v2

You may use
sed -E 's/(^|,) ?(abc|def)_|(,) |\.yaml/\1\3/g'
See the online demo:
s="abc_mscp_def.yaml_v1, def_mscp_abc.yaml_v2, abc_mscp_abc.yaml_v2, def_mscp_def.yaml_v2"
sed -E 's/(^|,) ?(abc|def)_|(,) |\.yaml/\1\3/g' <<< "$s"
# => mscp_def_v1,mscp_abc_v2,mscp_abc_v2,mscp_def_v2
Details
-E option enables POSIX ERE syntax and alternation
(^|,) ?(abc|def)_|(,) |\.yaml - matches:
(^|,) ?(abc|def)_ - Group 1: start of string or comma, then an optional space, and then Group 2: either abc or def
| - or
(,) - Group 3: a comma, and then a space
| - or
\.yaml - .yaml substring.
The replacement is \1\3, i.e. the values of Group 1 and 3 concatenated.

Regex & Sed: How to suppress the first and the second comma in a string containing exactly 9 commas?

I would like to suppress the two first commas in a string containing 10 and only 10 commas (11 Fields). I don't want to erase the commas of the 9 commas line.
I tried this:
sed '/^\([^,]*,\)\{10\}[^,]*$/s/,//1;s/,//2'
But it deletes commas even in the sentences containing less than 10 commas and it deletes the first and the third commas.
Example:
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGIË,06346641,0636641,NL
Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.ing,rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
Result expected:
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGIË,06346641,0636641,NL
Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.ing rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR

You may use
sed -E 's/^([^,]*),([^,]*),([^,]*)((,[^,]*){7})$/\1\2\3\4/'
Details
^ - start of a line
([^,]*) - Group 1 (\1): any 0+ chars other than ,
,([^,]*) - , and Group 2 (\2) matching any 0+ chars other than ,
,([^,]*) - , and Group 3 (\3) matching any 0+ chars other than ,
((,[^,]*){7}) - seven occurrences of , followed with any 0+ chars other than ,
$ - end of string.
See the online sed demo:
s="Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.inrombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR"
sed -E 's/^([^,]*),([^,]*),([^,]*)((,[^,]*){7})$/\1\2\3\4/' <<< "$s"
# => Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.inrombach Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR

I guess you're using MacOS sed / BSD sed, try this:
sed -e '/^\([^,]*,\)\{10\}[^,]*$/s/,//; tLB' -e 'b' -e ':LB' -e 's/,/ /'
I used --posix to emulate, but not sure it will work on your OS:
$ cat file
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGI?,06346641,0636641,NL
Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.ing,rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
$ sed --posix -e '/^\([^,]*,\)\{10\}[^,]*$/s/,//; tLB' -e 'b' -e ':LB' -e 's/,/ /' file
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGI?,06346641,0636641,NL
Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.ing rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
Note that the second s command, I changed to replace to a space, since Leon.ing,rombah no space inside, simpy strip the , will become Leon.ingrombach.
This might work too:
sed -e '/^\([^,]*,\)\{10\}[^,]*$/{' -e 's/,/ /' -e 's/,/ /}'
Btw, I think it's high time for you to start using GNU sed:
brew install gnu-sed
ln -s /usr/local/bin/gsed /usr/local/bin/sed
This problem is also easier to use awk instead:
awk -F, 'NF==11{sub(",","");sub(","," ")}1' file
Replace only when there're 11 comma separated fields.

This might work for you (GNU sed):
sed 's/,/&/9;T;s//&/10;t;s///;s///' file
If there are not at least 9 ,'s leave line as is. If there are 10 or more ,'s leave line as is. Otherwise remove the first 2 ,'s.
An alternative:
sed -r 's/^([^,]*),([^,]*),(([^,]*,){7}[^,]*)$/\1\2\3/' file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert vim replace to sed / awk in Bash - regex

Related

Sed: can not replace part of the string whithout replacing all of it

Extract string between underscores and dot

SED invalid command code for JSON response

remove certain prefix in every word separate by delimiter

Regex & Sed: How to suppress the first and the second comma in a string containing exactly 9 commas?

Categories

Resources