how to extract these fields via sed? - regex

I'm trying to grep for individual quantities in lines like this:
foo=24.587 bar=88 fox=jobs
and extract, say, all the '88' values..the number of columns isn't consistent so awk followed by a cut wont cut it.
I tried using sed like this:
sed -e 's/.*\s\(bar=.+\)\s.*/\1/g'
and that just dumps the entire line. I'm not sure how to correct this regexp, and more importantly why this regexp doesnt do what I expect?

Use -r (extended regex). This tends to use regexen more like you may expect. You have to remove the backslashes from the parens, though:
$ echo "foo=24.587 bar=88 fox=jobs" | sed -r 's/.*\s(bar=.+)\s.*/\1/g'
bar=88

sed -r 's/.*\s(bar=.+)\s.*/\1/g'

Related

“sed” command to remove a line that matches an exact string on first word

I've found an answer to my question here: "sed" command to remove a line that match an exact string on first word
...but only partially because that solution only works if I query pretty much exactly like the answer person answered.
They answered:
sed -i "/^maria\b/Id" file.txt
...to chop out only a line starting with the word "maria" in it and not maria if it's not the first word for example.
I want to chop out a specific url in a file, example: "cnn.com" - but, I also have a bunch of local host addressses, 0.0.0.0 and both have some with a single space in front. I also don't want to chop out sub domains like ads.cnn.com so that code "should" work but doesn't when I string in more commands with the -e option. My code below seems to clean things up well except that I can't get it to whack out the cnn.com! My file is called raw.txt
sed -r -e 's/^127.0.0.1//' -e 's/^ 127.0.0.1//' -e 's/^0.0.0.0//' -e 's/^ 0.0.0.0//' -e '/#/d' -e '/^cnn.com\b/d' -e '/::/d' raw.txt | sort | tr -d "[:blank:]" | awk '!seen[$0]++' | grep cnn.com
When I grep for cnn.com I see all the cnn's INCLUDING the one I don't want which is actually "cnn.com".
ads.cnn.com
cl.cnn.com
cnn.com <-- the one I don't want
cnn.dyn.cnn.com
customad.cnn.com
gdyn.cnn.com
jfcnn.com
kermit.macnn.com
metrics.cnn.com
projectcnn.com
smetrics.cnn.com
tiads.sportsillustrated.cnn.com
trumpincnn.com
victory.cnn.com
xcnn.com
If I just use that one piece of code with the cnn.com chop out it seems to work.
sed -r '/^cnn.com\b/d' raw.txt | grep cnn.com
* I'm not using the "-e" option
Result:
ads.cnn.com
cl.cnn.com
cnn.dyn.cnn.com
customad.cnn.com
gdyn.cnn.com
jfcnn.com
kermit.macnn.com
metrics.cnn.com
projectcnn.com
smetrics.cnn.com
tiads.sportsillustrated.cnn.com
trumpincnn.com
victory.cnn.com
xcnn.com
Nothing I do seems to work when I string commands together with the "-e" option. I need some help on getting my multiple option command kicking with SED.
Any advice?
Ubuntu 12 LTS & 16 LTS.
sed (GNU sed) 4.2.2
The . is metacharacter in regex which means "Match any one character". So you accidentally created a regex that will also catch cnnPcom or cnn com or cnn\com. While it probably works for your needs, it would be better to be more explicit:
sed -r '/^cnn\.com\b/d' raw.txt
The difference here is the \ backslash before the . period. That escapes the period metacharacter so it's treated as a literal period.
As for your lines that start with a space, you can catch those in a single regex (Again escaping the period metacharacter):
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d' raw.txt
This (^[ ]*|^) says a line that starts with any number of repeating spaces ^[ ]* OR | starts with ^ which is then followed by your match for 127.0.0.1.
And then for stringing these together you can use the | OR operator inside of parantheses to catch all of your matches:
sed -r '/(^[ ]*|^)(127\.0\.0\.1|cnn\.com|0\.0\.0\.0)\b/d' raw.txt
Alternatively you can use a ; semicolon to separate out the different regexes:
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d; /(^[ ]*|^)cnn\.com\b/d; /(^[ ]*|^)0\.0\.0\.0\b/d;' raw.txt
sed doesn't understand matching on strings, only regular expressions, and it's ridiculously difficult to try to get sed to act as if it does, see Is it possible to escape regex metacharacters reliably with sed. To remove a line whose first space-separated word is "foo" is just:
awk '$1 != "foo"' file
To remove lines that start with any of "foo" or "bar" is just:
awk '($1 != "foo") && ($1 != "bar")' file
If you have more than just a couple of words then the approach is to list them all and create a hash table indexed by them then test for the first word of your line being an index of the hash table:
awk 'BEGIN{split("foo bar other word",badWords)} !($1 in badWords)' file
If that's not what you want then edit your question to clarify your requirements and include concise, testable sample input and the expected output given that input.

removing unmatched lines with SED

I'm trying to remove everything but 3 separate lines with specific matching pattern and leave just the 3 lines I want
Here is my code;
sed -n '/matching pattern/matching pattern/matching pattern/p' > file.txt
If you have multiple commands on the same line, you need to separate the commands by a ;:
sed -n '/matching pattern/p;/matching pattern2/p;/matching pattern3/p' file
Alternatively you can put them onto separate lines:
sed -n '/matching pattern/p
/matching pattern2/p
/matching pattern3/p' file
Beside that, you can also use regex alternation:
sed -rn '/(pattern|pattern2|pattern3)/p' file
or (better) use grep:
grep -E '(pattern|pattern2|pattern3)' file
However, this might get messy if the patterns getting longer and more complicated.
awk to the rescue!
awk '/pattern1/ || /pattern2/ || /pattern3/' filename
I think it's cleaner than alternatives.
Sed with Deletion
There's always more than one way to do this sort of thing, but one useful sed programming pattern is using alternation with deletion. For example:
# BSD sed
sed -E '/root|daemon|nobody/!d' /etc/passwd
# GNU sed
sed -r '/root|daemon|nobody/!d' /etc/passwd
This makes it possible to express ideas like "delete everything except for the listed terms." Even when expressions are functionally equivalent, it can be helpful to use a construct that most closely matches the idea you're trying to convey.
This might work for you (GNU sed):
sed '/pattern1/b;/pattern2/b;/pattern3/b;d' file
The normal flow of sed is to print what remains in the pattern space after processing. Therefore if the required pattern is in the pattern space let sed do its thing otherwise delete the line.
N.B. the b command is like a goto and if it has no following identifier, it means break out of any further sed commands and print (or not print if the -n option is in action) the contents of the pattern space.
If I understood you correctly:
sed -n '/\(pattern1\|pattern2\|pattern3\)/p' file > newfile

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.
sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file
delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'
You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename
You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.
This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

Points to slashes with sed

I have text file like this format:
...
SomeText.any_text/ch SomeText2.any_3/ch 5.6e-5
SomeText.any_text/ch something.else.point.separated/ch4 5.4e5
...
in line I have three elements: two - alpha-numerical-underscored-slashed strings and one - float number.
I need to replace points to slashes only at strings.
I have try to use sed with regular expression like this
sed 's/\([\w_]\+\)\(\.\)/\1\//g'
And don't have positive result.
This might work for you (GNU sed):
sed 's/[^ ]*$/\n&/;h;y/./\//;G;s/\n.*\n//' file
Explanation:
s/[^ ]*$/\n&/ insert a newline before the last field
h copy the pattern space (PS) to the hold space (HS)
y/./\// translate all .'s to /'s in the PS
G append a newline then HS to the PS
s/\n.*\n// remove everything between the first and last newlines i.e. delete the old strings
This idiom can be used to simplify changing part of a line without the need to resorting to complicated regexp's
Your elements look like fields. Therefore, my preferred method would be to use awk:
awk '{ for (i=1; i<=2; i++) gsub(/\./, "/", $i) }1' file.txt
Results:
SomeText/any_text/ch SomeText2/any_3/ch 5.6e-5
SomeText/any_text/ch something/else/point/separated/ch4 5.4e5
You can do this in classic sed notation with a couple of loops, one to fix dots in the first field, and one to fix dots in the second field.
sed -e ':f1' -e 's/^\([^ .]*\)\./\1\//' -e 't f1' \
-e ':f2' -e 's/^\([^ ][^ ]*\) \([^ .]*\)\./\1 \2\//' -e 't f2'
The ^ anchors are crucial to this working correctly. Yes, you can write it all on one line in a single argument to sed; I prefer the clarity of separate arguments when the script is a complex as this. A typical sed script is inscrutable enough without adding any extra obstacles to comprehension.
sed ':f1;s/^\([^ .]*\)\./\1\//;t f1;:f2;s/^\([^ ][^ ]*\) \([^ .]*\)\./\1 \2\//;t f2'
For your input sample (two lines), the output is:
SomeText/any_text/ch SomeText2/any_3/ch 5.6e-5
SomeText/any_text/ch something/else/point/separated/ch4 5.4e5
If you're using GNU sed, you might need to add --posix to the options, though it seemed to behave itself correctly (so it probably recognized that I wasn't using any non-POSIX notations and therefore stuck with POSIX).
Tested on Mac OS X 10.7.5 with BSD sed and GNU sed.
awk '{gsub(/\./,"",$1);;gsub(/\./,"",$2);print}' your_file

Converting Files with regexp Pattern in sed

I want to turn this (Mitarbeiter.csv):
Max;Mustermann;02.03.1964;501;GL;Prokurist
Monika;Mueller;02.02.1972;500;Sek;Chefsekretaerin
Michael;Maier;06.07.1985;617;Aquise;-
into this (header-content.html):
<tr><td>Max</td><td>Mustermann</td><td>501</td></tr>
<tr><td>Monika</td><td>Mueller</td><td>500</td></tr>
<tr><td>Michael</td><td>Maier</td><td>617</td></tr>
by using sed
I've tried:
sed 's#^\([^\]+\);\([^\]+\);[^\]+;\([^\]+\);.*$#<tr><td>\2</td><td>\1</td><td>\3</td></tr>\n#g' <Mitarbeiter.csv >header-content.html
but that does nothing. Output is same as Mitarbeiter.csv
awk might be a little better suited to what you're trying to do:
awk -F\; '{printf "<tr><td>%s</td><td>%s</td><td>%s</td></tr>\n",$1,$2,$4}'
sed -r -ne 's:^([^;]+);([^;]+);[^;]+;([^;]+);.*:<tr><td>\1</td><td>\2</td><td>\3</td></tr>:p'
Or if you're using OSX or an older version of FreeBSD or NetBSD, replace the -r with -E to use extended regular expressions.
If you want to skip using ERE for portability (i.e. you're using Solaris or HP/UX or somesuch), the regexp might be:
^\([^;][^;]*\);\([^;][^;]*\);[^;]*;\([^;][^;]*\);.*
Note that these both require at least 1 character per field. If fields are allowed to be empty ... well, update your question before we more spend more time on things that might not be necessary. :-)
A few points,
you need the -r switch for extended regex patterns
Sed is greedy, and even -r does not support non greedy matching
The g flag is a special get flag, you probably don't want this
So your command should be:
sed -r 's#^([^\;]+);([^\;]+);[^\;]+;([^\;]+);.*$#<tr><td>\1</td><td>\2</td><td>\3</td></tr>#' < Mitarbeiter.csv > header-content.html
Note that your items cannot have a semicolon in them, as that is the field separator. If you a a true csv file, this won't work, as it will not ignore an escaped semicolon, either wrapped in quotes or with an escape char.
Why would you want to use sed?
awk '{print "<tr><td>"$1"</td><td>"$2"</td><td>"$4"</td></tr>}
' IFS=';' Mitarbeiter.csv > header-content.html
If you insist on using sed, you can try:
$ p='\([^;]*\);'
$ sed "s#$p$p$p$p.*#<tr><td>\1</td><td>\2</td><td>\4</td></tr>#" \
Mitarbeiter.csv > header-content.html