sed replace AFTER match and retain

sed replace AFTER match and retain - regex

I've been racking my brains for hours on this, but it seems simple enough. I have a large list of strings similar to the ones below and would like to replace the hyphens only after the comma, to commas:
abc-d-ef,1-2-3-4
gh-ij,1-2-3-4
to this
abc-def,1,2,3,4
gh-ij,1,2,3,4
I can't use s/-/,/2g to replace from second occurrence as the data differs, and also though about using cut, but there must be a way to use sed with something like:
"s/\(,\).*-/\1,&/g"
Thank you

This is more suitable for awk as we can break all lines using comma as field separator:
awk 'BEGIN{FS=OFS=","} {gsub(/-/, OFS, $2)} 1' file
abc-d-ef,1,2,3,4
gh-ij,1,2,3,4
If you want sed solution only then use:
sed -E -e ':a' -e 's/([^,]+,[^-]+)-/\1,/g;ta' file
abc-d-ef,1,2,3,4
gh-ij,1,2,3,4

An awk proposal.
awk -F, '{sub(/d-ef/,"def")gsub(/-/,",",$2)}1' OFS=, file
abc-def,1,2,3,4
gh-ij,1,2,3,4

Related

Get specific Text between Specific Tags

At the top of my HTML files, I have...
<H2>City</H2>
<P>Liverpool</P>
or
<H2>City</H2>
<P>Dublin</P>
I want to output the text between the tags straight after <H2>City</H2> instances. So in the examples above which are separate files, I want to print out Liverpool and in the second example, Dublin.
Looking at this thread, I try:
sed -e 's/City\(.*\)\/P/\1/'
which I hope would get me half way there... but that just prints out the entire file. Any ideas?

awk to the rescue! You need multi-char RS support though (gawk has it)
$ awk -F'[<>]' -v RS='<H2>City</H2>' 'NF{print $3}' file
another approach can be
$ awk 'c&&c--{sub(/<[^>]*>/,""); print} /<H2>City<\/H2>/{c=1}' file
find the next record after City and trim the angle brackets...

Try using the following regex :
(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)
see regex demo / explanation
sed
sed -e 's/(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)/'

I checked and the \s seem not work for spaces. You should use the newline character \n:
sed -e 's/<H2>City<\/H2>\n<P>\(.*\)<\/P>/\1/'
There is no need of use lookbehind (like above), that is an overkill.

With sed, you can use the n command to read next line after your pattern. Then just remove the tag to output your content:
sed -n '/<H2>City<\/H2>/n;s/ *<\/*P> *//gp;' file

I think this should work in your mac:
echo -e "<H2>City</H2>\n<P>Dublin</P>" |awk -F"[<>]" '/City/{getline;print $3}'
Dublin

Finding and replacing the last space at or before nth character works with sed but not awk, what am I doing wrong?

I have a string in a test.csv file like this:
here is my string
when I use sed it works just as I expect:
cat test.csv | sed -r 's/^(.{1,9}) /\1,/g'
here is,my string
Then when I use awk it doesn't work and I'm not sure why:
cat test.csv | awk '{gsub(/^(.{1,9}) /,","); print}'
,my string
I need to use awk because once I get this figured out I will be selecting only one column to split into two columns with the added comma. I'm using extended regex with sed, "-r" and was wondering how or if it's supported with awk, but I don't know if that really is the problem or not.

awk does not support back references in gsub. If you are on GNU awk, then gensub can be used to do what you need.
echo "here is my string" | awk '{print gensub(/^(.{1,9}) /,"\\1,","G")}'
here is,my string
Note the use of double \ inside the quoted replacement part. You can read more about gensub here.

awk: chop stuff off beginning of line according to regex

Say I have a few lines of output that look like this:
blah <foo> I want this
baz < nom> I want this too
bit <#hi> And this...
How do I use awk to chop off everything before, and including, the first ">" character on each line?

If you only have > character once you can do a simple sed substitution:
sed 's/.*>//' file
If there can be many the above greedy (*) will consume everything up to the last > character. In that case, you are better off doing:
sed 's/[^>]*>//' file

Lets not forget cut, this is what it was invented for:
cut -d\> -f2- file

This may do (if you have one >)
awk -F\> '{print $2}' file
I want this
I want this too
And this...

Using awk you can do:
awk '{sub(/^[^.]*>/, "");} 1' file
I want this
I want this too
And this...
Or using sed:
sed 's/^[^.]*>//' file
I want this
I want this too
And this...

try this :
awk -F">" '{print $1">"}' filename

Deleting lines matching a pattern from a Unix file

I have a file containing strings of the following format:
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|DELETE|REDEFINES|VARIABLE.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
I want to be able to use something like sed or awk to delete lines containing the word REDEFINES but NOT if the word PIC is also in there or if there is no full stop at the end of a line as this means the string has been split over 2 lines. So out of the 4 lines (3 strings) stated above I would only want to delete 05|DELETE|REDEFINES|VARIABLE.
I thought you might be able to use some kind of negation or lookahead but these don't seem to be available or I can't get them to work
Using awk this deletes anything containing REDEFINES in the String following the pattern in the example above:
awk '!/[[:print:]]*\REDEFINES[[:print:]]*\./'
Similarly using sed:
sed '/[[:print:]]*|REDEFINES[[:print:]]*\./d'
I just can't work out how to extend it to do what I need. Is this possible in sed or awk or do I need another tool?
Any help greatly appreciated.

Using awk
awk -v RS= '!/REDEFINES/ || /PIC/' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
Using sed (with older input data):
sed -i.bak '/REDEFINES/{/PIC/!d;}' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.

You can try the below command. Print the line if it contains PIC or if it does not contain REDEFINES. It is maintainable as it is not so tricky and could be understood without much of an effort.
cat input.txt | awk '{if ($0 ~ /PIC/ || $0 !~ /REDEFINES/){print $0}}'

Why don't you just use grep? Using negations on your question, here is what I understood:
keep the lines terminated with a full-stop, containing both REDEFINES and PIC.
So grep seems easy:
$ grep -E 'REDEFINES.*\.$' file | grep PIC
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
Hope this helps.

This might work for you (GNU sed):
sed -r '/REDEFINES/{/PIC|[^.]$/!d}' file
or perhaps more easily:
sed '/PIC/b;/REDEFINES.*\.$/d' file
or if you prefer:
sed '/PIC/!{/REDEFINES.*\.$/d}' file

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.

sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file

delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'

You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename

You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.

This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed replace AFTER match and retain - regex

This is more suitable for awk as we can break all lines using comma as field separator: awk 'BEGIN{FS=OFS=","} {gsub(/-/, OFS, $2)} 1' file abc-d-ef,1,2,3,4 gh-ij,1,2,3,4 If you want sed solution only then use: sed -E -e ':a' -e 's/([^,]+,[^-]+)-/\1,/g;ta' file abc-d-ef,1,2,3,4 gh-ij,1,2,3,4

An awk proposal. awk -F, '{sub(/d-ef/,"def")gsub(/-/,",",$2)}1' OFS=, file abc-def,1,2,3,4 gh-ij,1,2,3,4

Related

Get specific Text between Specific Tags

Finding and replacing the last space at or before nth character works with sed but not awk, what am I doing wrong?

awk: chop stuff off beginning of line according to regex

Deleting lines matching a pattern from a Unix file

how to select lines containing several words using sed?

Categories

Resources