sed: delete characters between two strings - regex

I'd like to use sed to remove all characters between "foo=1&" and "bar=2&" for all occurrences in an xml file.
<url>http://example.com?addr=123&foo=1&s=alkjasldkffjskdk$bar=2&f=jkdng</url>
<url>http://example.com?addr=124&foo=1&k=d93ndkdisnskiisndjdjdj$bar=2&p=dnsks</url>
Here is my sed command:
sed -e '/foo=1&/,/bar=2&/d' sample.xml
When I run this, the file is unchanged.
The above is based on this example: Find "string1" and delete between that and "string2"

Use the substitution command instead of the delete command:
sed -e 's/\(foo=1&\).*\(bar=2&\)/\1\2/'

You should use
sed -i -e 's/\(foo=1&\).*\(bar=2&\)/\1\2/' your_html.xml

Related

Use bash to remove symbols from text file

I have a bunch of txt-files containing stuff like this:
text_i_need_to_remove{text_i_need_to_retain}
text_i need_to_remove{text_i_need_to_retain}
...
How do I remove text before curly braces (and curly braces themselves) and retain just only text_i_need_to_retain?
Deleting everything upto { or } at end of line
:%s/.*{\|}$//g
From bash shell, you can use text processing tools like sed and awk. Assume file is named ip.txt
1) With sed, which is pretty similar to regex we used inside vim. The -i flag allows to make change in place, i.e it modifies the input file itself.
$ sed -i 's/.*{\|}$//g' ip.txt
2) With awk, one can again use substitution or in this case, split the line on curly brackets and use only the second column.
$ awk -F'{|}' '{print $2}' ip.txt > tmp && mv tmp ip.txt
If you have GNU awk, there is -i inplace option for in place editing
$ gawk -i inplace -F'{|}' '{print $2}' ip.txt
To make changed to all files in current directory, use
sed -i 's/.*{\|}$//g' *
Or if they have common extension, say .txt, use
sed -i 's/.*{\|}$//g' *.txt
:%s/^.*{\(.*\)}$/\1/ or in bash, sed 's/^.*{\(.*\)}$/\1/ foo.txt
\(.*\) is a control group which feeds into \1 and looks like a lumbering zombie.
you can use this in vim;
:%s/^.*{// | %s/}$//
you can also use this script; first run this, if everythink is ok, uncomment sed with -i option as below;
#!/bin/bash
for item in $(ls /dir/where/my/files/are)
do
sed -i 's/^.*{//;s/}$//' /dir/where/my/files/are/$item
done
sed -i ; inplace replace
or
Only use as below;
sed -i 's/^.*{//;s/}$//' /dir/where/my/files/are/*
Perl can be used to do the substitution on all files:
perl -i -pe 's/.*{|}$//g' *.txt

Sed copy pattern between range only once

I am using sed to edit some sql script. I want to copy all the lines from the first "CREATE" pattern until the first "ALTER" pattern. The issue I am having is that sed copies all lines between each set of CREATE and ALTER instead of only the first occurrence (more than once).
sed -n -e '/CREATE/,/ALTER/w createTables.sql' $filename
Perl to the rescue:
perl -ne 'print if /CREATE/ .. /ALTER/ && close ARGV' -- "$filename" > createTables.sql
It closes the input when the ALTER is matched, i.e. it doesn't read any further.
Using sed
sed -n '/CREATE/,/ALTER/{p;/ALTER/q}' file > createTables.sql
or alternatively(note the newline)
sed -n '/CREATE/,/ALTER/{w createTables.sql
/ALTER/q}' file

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Delete matching line with sed

I currently have this sed command that replaces foo.us.param=value with foo.param=value:
sed -i -e 's/\.us\./\./' file.txt
I also need it to delete any lines that contain .eu. anywhere but leave all other lines untouched. Any help would save me a long time trying to figure this out alone and would be greatly appreciated.
sed -i -e 's/\.us\./\./' -e '/\.eu\./d' file.txt
instead of sed you could also use grep
grep -v '\.eu\.'

Filter apache log file using regular expression

I have a big apache log file and I need to filter that and leave only (in a new file) the log from a certain IP: 192.168.1.102
I try using this command:
sed -e "/^192.168.1.102/d" < input.txt > output.txt
But "/d" removes those entries, and I needt to leave them.
Thanks.
What about using grep?
cat input.txt | grep -e "^192.168.1.102" > output.txt
EDIT: As noted in the comments below, escaping the dots in the regex is necessary to make it correct. Escaping in the regex is done with backslashes:
cat input.txt | grep -e "^192\.168\.1\.102" > output.txt
sed -n 's/^192\.168\.1\.102/&/p'
sed is faster than grep on my machines
I think using grep is the best solution but if you want to use sed you can do it like this:
sed -e '/^192\.168\.1\.102/b' -e 'd'
The b command will skip all following commands if the regex matches and the d command will thus delete the lines for which the regex did not match.