sed remove lines that starts with a specific pattern - regex

I'm trying to use sed command with a regex pattern that works fine with grep. But it's not matching nothing with sed command.
I have a text file and want to delete each line that starts with (wow or waw).
This is the command I'm using But it's not working.
sed -i '/^w\(o\|a\)w/d' text.txt
I tried using the same pattern with grep and it works fine:
grep '^w\(o\|a\)w' text.txt
Anything wrong with the regex in the sed command ?

With GNU sed, you can use
sed -i '/^w[oa]w/d' file
With FreeBSD sed, use
sed -i '' '/^w[oa]w/d' file
Here, [oa] is a bracket expression matching either o or a.
See an online sed demo:
sed '/^w[oa]w/d' <<< "wow 1
waw 2
wiw 3"
Output: wiw 3.

Related

General solutions to replace string regex preceded and followed by '\n'

I have a file in CentOS which looks like following
[root#localhost nn]# cat -A excel.log
real1$
0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I$
real2$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I$
real3$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real4$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I1^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real5$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real6$
I would like to replace \nreal[2-6]\n with \t\t\t' and have tried unsuccessfully the following
sed -i 's/\nreal[2-6]\n/\t\t\t/g' file
It seems that sed has difficulty to deal with line break. Any idea to fulfill the regex in CentOS?
Much appreciated!
If you want to consider perl then use:
perl -i -0777 -pe 's/\n(?:51[23]real|real[2-6])(?:\n|\z)/\t\t\t/g' file
If you want to avoid last real\d+ line to be replaced with \t\t\t then use:
perl -i -0777 -pe 's/\n(?:51[23]real|real[2-6])\n(?!\z)/\t\t\t/g' file
(?!\z) is negative lookahead to fail the match when we have line end just ahead of us.
With GNU sed, you need to use the -z option:
sed -i -z 's/\nreal[2-6]\n/\t\t\t/g' file
# ^^
Now, that you also want to handle specific alternations, you need to enable the POSIX ERE syntax, either with -r or -E option:
sed -i -Ez 's/\n(51[23]real|real[2-6])\n/\t\t\t/g' file

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

How can I get sed to only match to the first occurrence of a character?

I'm using GNU Sed 4.2.1. I'm trying to replace the second field in the following line (the password in /etc/shadow). Awk is not an option.
username:P#s$w0rDh#$H:15986:0:365::::
I've tried
sed -i 's/\(^[a-z]*\):.*?:/\1:TEST:/'
but nothing. I've tried many variations but for some reason I can't get it to only match that field. Help?
Use [^:]* to match everything up until the next :
sed -i 's/^\([^:]*\):\([^:]*\):/\1:TEST:/'
Using sed:
sed -i.bak 's/^\([^:]*:\)[^:]*\(.*\)$/\1foo\2/' file
Using awk you can do:
awk -F: '{$2="foo"}1' OFS=: file

Regex matching using SED in bash

I want to match the following line with the regex in sed:
db=connect_str=DBI:SQLAnywhere:ENG=ABC1_hostname12;DBN=ABC12;UID=USERID;PWD=passwd123;LINKS=tcpip{host=10.11.12.13:1234}
The regex I am using is:
sed -n '/ABC1_.+;/p' Config/db_conn.cfg
but this does not work. On the other hand, it works if I use:
sed -n '/ABC1_.*;/p' Config/db_conn.cfg
Can someone please explain why it's not working? Also is there another way to match it?
It's because sed is basic regex by default, which needs + to be escaped or else it represents a literal + instead of a regex +:
sed -n '/ABC1_.\+;/p' Config/db_conn.cfg
To use regex you're familiar with try sed -r -n (extended regex) and then you can do:
sed -r -n '/ABC1_.+;/p' Config/db_conn.cfg

Filter apache log file using regular expression

I have a big apache log file and I need to filter that and leave only (in a new file) the log from a certain IP: 192.168.1.102
I try using this command:
sed -e "/^192.168.1.102/d" < input.txt > output.txt
But "/d" removes those entries, and I needt to leave them.
Thanks.
What about using grep?
cat input.txt | grep -e "^192.168.1.102" > output.txt
EDIT: As noted in the comments below, escaping the dots in the regex is necessary to make it correct. Escaping in the regex is done with backslashes:
cat input.txt | grep -e "^192\.168\.1\.102" > output.txt
sed -n 's/^192\.168\.1\.102/&/p'
sed is faster than grep on my machines
I think using grep is the best solution but if you want to use sed you can do it like this:
sed -e '/^192\.168\.1\.102/b' -e 'd'
The b command will skip all following commands if the regex matches and the d command will thus delete the lines for which the regex did not match.