Make matching example from sed manual working

Make matching example from sed manual working - regex

I found an example in info sed stating the following:
'^\(.*\)\n\1$'
This matches a string consisting of two equal substrings separated
by a newline.
Trying to implement it in this ways didn't
return any matching lines:
echo -e "test\ntest" | sed -n '/^\(.*\)\n\1$/p'
echo -e "test\ntest" | sed -n 's/^\(.*\)\n\1$/\0/p'
sed version I use is 4.2.2.
Please suggest the way this example can be tested.

This might work for you (GNU sed and bash);
<<<$'test\ntest' sed -En 'N;s/^(.*)\n\1$/\1 == \1/p;s/^(.*)\n(.*)$/\1 != \2/p'
Append the second line of the input to the first and if the two lines are the same, replace them by line1 == line2 otherwise replace them by line1 != line2.
N.B. That both substitutions are trying to match at least a newline and if the first substitution succeeds the second can not. Likewise, if the first substitution never happened the second must.

To make an example work, I will have to use N that will read one more line in a pattern space and allow \n to be matched.

Related

“sed” command to remove a line that matches an exact string on first word

I've found an answer to my question here: "sed" command to remove a line that match an exact string on first word
...but only partially because that solution only works if I query pretty much exactly like the answer person answered.
They answered:
sed -i "/^maria\b/Id" file.txt
...to chop out only a line starting with the word "maria" in it and not maria if it's not the first word for example.
I want to chop out a specific url in a file, example: "cnn.com" - but, I also have a bunch of local host addressses, 0.0.0.0 and both have some with a single space in front. I also don't want to chop out sub domains like ads.cnn.com so that code "should" work but doesn't when I string in more commands with the -e option. My code below seems to clean things up well except that I can't get it to whack out the cnn.com! My file is called raw.txt
sed -r -e 's/^127.0.0.1//' -e 's/^ 127.0.0.1//' -e 's/^0.0.0.0//' -e 's/^ 0.0.0.0//' -e '/#/d' -e '/^cnn.com\b/d' -e '/::/d' raw.txt | sort | tr -d "[:blank:]" | awk '!seen[$0]++' | grep cnn.com
When I grep for cnn.com I see all the cnn's INCLUDING the one I don't want which is actually "cnn.com".
ads.cnn.com
cl.cnn.com
cnn.com <-- the one I don't want
cnn.dyn.cnn.com
customad.cnn.com
gdyn.cnn.com
jfcnn.com
kermit.macnn.com
metrics.cnn.com
projectcnn.com
smetrics.cnn.com
tiads.sportsillustrated.cnn.com
trumpincnn.com
victory.cnn.com
xcnn.com
If I just use that one piece of code with the cnn.com chop out it seems to work.
sed -r '/^cnn.com\b/d' raw.txt | grep cnn.com
* I'm not using the "-e" option
Result:
ads.cnn.com
cl.cnn.com
cnn.dyn.cnn.com
customad.cnn.com
gdyn.cnn.com
jfcnn.com
kermit.macnn.com
metrics.cnn.com
projectcnn.com
smetrics.cnn.com
tiads.sportsillustrated.cnn.com
trumpincnn.com
victory.cnn.com
xcnn.com
Nothing I do seems to work when I string commands together with the "-e" option. I need some help on getting my multiple option command kicking with SED.
Any advice?
Ubuntu 12 LTS & 16 LTS.
sed (GNU sed) 4.2.2

The . is metacharacter in regex which means "Match any one character". So you accidentally created a regex that will also catch cnnPcom or cnn com or cnn\com. While it probably works for your needs, it would be better to be more explicit:
sed -r '/^cnn\.com\b/d' raw.txt
The difference here is the \ backslash before the . period. That escapes the period metacharacter so it's treated as a literal period.
As for your lines that start with a space, you can catch those in a single regex (Again escaping the period metacharacter):
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d' raw.txt
This (^[ ]*|^) says a line that starts with any number of repeating spaces ^[ ]* OR | starts with ^ which is then followed by your match for 127.0.0.1.
And then for stringing these together you can use the | OR operator inside of parantheses to catch all of your matches:
sed -r '/(^[ ]*|^)(127\.0\.0\.1|cnn\.com|0\.0\.0\.0)\b/d' raw.txt
Alternatively you can use a ; semicolon to separate out the different regexes:
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d; /(^[ ]*|^)cnn\.com\b/d; /(^[ ]*|^)0\.0\.0\.0\b/d;' raw.txt

sed doesn't understand matching on strings, only regular expressions, and it's ridiculously difficult to try to get sed to act as if it does, see Is it possible to escape regex metacharacters reliably with sed. To remove a line whose first space-separated word is "foo" is just:
awk '$1 != "foo"' file
To remove lines that start with any of "foo" or "bar" is just:
awk '($1 != "foo") && ($1 != "bar")' file
If you have more than just a couple of words then the approach is to list them all and create a hash table indexed by them then test for the first word of your line being an index of the hash table:
awk 'BEGIN{split("foo bar other word",badWords)} !($1 in badWords)' file
If that's not what you want then edit your question to clarify your requirements and include concise, testable sample input and the expected output given that input.

Sed or Awk or Perl substitution in a sentence

I need to make a substitution using Sed or other program. I have these patterns <ehh> <mmm> <mhh> repeated at the beginning of a sentences and I need to substitute for nothing.
I am trying this:
echo "$line" | sed 's/<[a-zA-z]+>//g'
But I get the same result, nothing changes. Anyone can help?
Thank you!

For me, for the test file
<ahh> test
<mmm>test 1
the following
sed 's/^<[a-zA-Z]\+>//g' testfile
produces
test
test 1
which seems to be what you want. Note that for basic regular expressions, you use \+ whereas for extended regular expressions, you use + (and need to use the -r switch for sed).
NB: I added a ^to the check since you said: at the beginning of the line.

echo '<ehh> <mmm> <mhh>blabla bla' | \
sed '^Js/^\([[:space:]]*\<[a-zA-Z]\{3\}\>\)\{1,\}//'
remove all starting occurence of your pattern (including heading space)
I escape & to be sure due to sed meaning of this character in pattern (work without on my AIX)
I don't use g because it remove several occurence of full pattern and there is only 1 begin (^) and use a multi occurence counter with group instead \(\)\{1,\}

If the goal is to get the last parameter from lines like this:
<ahh> test
<mmm>test 1
You can do:
awk -F\; '/^<[[:alpha:]]+&gt/ {print $NF}' <<< "$line"
test
test 1
It will search for pattern <[[:alpha:]]+&gt and print last field on line, separated by ;

How to match and partial substitute with sed

how can i match the substring "2153846-11" (composed sometimes by only numbers, like "2153846", sometimes like "2153846-11" or "2153846_11", sometimes like "2153846-1" always digits and in the first group no less then 5) inside the following:
"01/16/2015","2153846-11","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
and substitute the matched string with the first group (before dash/underscore) removing the second one.
The final result will be:
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
The instruction will be written a unique sed line like
sed -e 's/...//g' < myfile
Thanks

You can use this sed:
sed 's/"\([0-9]*\)[_-][0-9]*"/"\1"/g' file
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"

You could try the below sed command.
$ echo '"01/16/2015","2153846-11","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"' | sed -r 's/"(2153846)([_-]11)?"/"\1"/g'
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"

A sed command to swap first and last character of each line

I want to write a one liner sed command to swap first and last character of every line of file. The below shown command is not working
sed 's/\(.\)\(.+\)\(.\)/\3\2\1/' input.txt
I even tried adding start of line and end of line characters
sed 's/^\(.\)\(.+\)\(.\)$/\3\2\1/' input.txt
It doesn't seem to match anything in the file.

sed -E 's/(.)(.+)(.)/\3\2\1/' input.txt

You need to escape the +,
sed 's/^\(.\)\(.\+\)\(.\)$/\3\2\1/' input.txt

If you like to try some other, here is a gnu awk version
awk '{a=$1;$1=$NF;$NF=a}1' FS= OFS= input.txt
This sets a to the first character, then sets first to last and last to a
It needs gnu awk, since settings FS to nothing is not in standard awk

This works portable:
abcd | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
you can use the .*. Prints
dbca
also works with the ad too, like
echo ad | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
prints
da
The .+ isn't known for every sed e.g. for example it didn't work on OS X. Therefore I recommending to use .* or simulating the .+ with ..*, like
echo ad | sed 's/^\(.\)\(..*\)\(.\)$/\3\2\1/'
prints
ad #not swaps

echo 'are' | sed 's/\(.\)\(.*\)\(.\)/\3\2\1/'
No need of ^ nor $ becasue sed take the biggest possible by default (so the whole line)
use * instead of + because with the + you need at least a 3 char line to works where a 2 char line still should swap start and end.

cannot match multiple occurrences of character in sed regexp

I am trying to remove As at the end of line.
alice$ cat pokusni
SALALAA
alice$ sed -n 's/\(.*\)A$/\1/p' pokusni
SALALA
one A is removed just fine
alice$ sed -n 's/\(.*\)A+$/\1/p' pokusni
alice$ sed -n 's/\(.*\)AA*$/\1/p' pokusni
SALALA
multiple occurrences not:(
I am probably doing just some very stupid mistake, any help? Thanks.

Try this one 's/\(.*[^A]\)AA*$/\1/p'
Why + does not work:
Because it is just a normal character here.
Why 's/\(.*\)AA*$/\1/p' does not work:
Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA*. And A* will just match nothing.

This might work for you:
sed -n 's/AA*$//p' file
This replaces an A and zero or more A's at the end of line with nothing.
N.B.
sed -n 's/A*$//p file'
would produce the correct string however it would operate on every line and so produce false positives.

Using awk
awk '{sub(/AA$/,"A")}1' pokusni
SALALA
EDIT
Correct version, removing all A from end of line.
awk '{sub(/A*$/,x)}1' pokusni

You can use perl:
> echo "SALALAA" | perl -lne 'if(/(.*?)[A]+$/){print $1}else{print}'
SALAL

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Make matching example from sed manual working - regex

To make an example work, I will have to use N that will read one more line in a pattern space and allow \n to be matched.

Related

“sed” command to remove a line that matches an exact string on first word

Sed or Awk or Perl substitution in a sentence

How to match and partial substitute with sed

A sed command to swap first and last character of each line

cannot match multiple occurrences of character in sed regexp

Categories

Resources