General solutions to replace string regex preceded and followed by '\n' - regex

I have a file in CentOS which looks like following
[root#localhost nn]# cat -A excel.log
real1$
0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I$
real2$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I$
real3$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real4$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I1^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real5$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real6$
I would like to replace \nreal[2-6]\n with \t\t\t' and have tried unsuccessfully the following
sed -i 's/\nreal[2-6]\n/\t\t\t/g' file
It seems that sed has difficulty to deal with line break. Any idea to fulfill the regex in CentOS?
Much appreciated!

If you want to consider perl then use:
perl -i -0777 -pe 's/\n(?:51[23]real|real[2-6])(?:\n|\z)/\t\t\t/g' file
If you want to avoid last real\d+ line to be replaced with \t\t\t then use:
perl -i -0777 -pe 's/\n(?:51[23]real|real[2-6])\n(?!\z)/\t\t\t/g' file
(?!\z) is negative lookahead to fail the match when we have line end just ahead of us.

With GNU sed, you need to use the -z option:
sed -i -z 's/\nreal[2-6]\n/\t\t\t/g' file
# ^^
Now, that you also want to handle specific alternations, you need to enable the POSIX ERE syntax, either with -r or -E option:
sed -i -Ez 's/\n(51[23]real|real[2-6])\n/\t\t\t/g' file

Related

Linux CentOS sed command with regex issues

I have a txt file under CentOS in which I want to replace any "\t\n" with "\t\t". I tried this:
sed -i -E 's/\t\n/\t\t/g' myfile.txt
but it doesn't work. I don't know if CentOS doesn't support regex in sed.
Any help is appreciated!
p.s.
Input(two lines):
1\t2\t3\t$
4\t5\t6\t$
Output(one line):
1\t2\t\3\t\t4\t5\t6\t\t
In Editplus, the find regex is '\t\n' and the replace is '\t\t'. Then all lines ending with '\t\n' will become one line, and each '\n' is replaced by one additional '\t'.
p.s.
my file is read like this (cat -A myfile.txt)
You may use this perl command to join lines if previous line has a single tab:
perl -i -0777 -pe 's/(\S\t)\n(?!\z)/$1\t/g' excel.log
(?!\z) is a negative lookahead to fail this match for last line of the file.
You need to escape the backslashes.
sed -i -E 's/\\t\\n/\\t\\t/g' myfile.txt

sed remove lines that starts with a specific pattern

I'm trying to use sed command with a regex pattern that works fine with grep. But it's not matching nothing with sed command.
I have a text file and want to delete each line that starts with (wow or waw).
This is the command I'm using But it's not working.
sed -i '/^w\(o\|a\)w/d' text.txt
I tried using the same pattern with grep and it works fine:
grep '^w\(o\|a\)w' text.txt
Anything wrong with the regex in the sed command ?
With GNU sed, you can use
sed -i '/^w[oa]w/d' file
With FreeBSD sed, use
sed -i '' '/^w[oa]w/d' file
Here, [oa] is a bracket expression matching either o or a.
See an online sed demo:
sed '/^w[oa]w/d' <<< "wow 1
waw 2
wiw 3"
Output: wiw 3.

Replace a > with a " sed regular expression

I have lots of files that have lines that are in the following way:
#include "3rd-party/*lots folders*>
problem is that it ends with > instead of "
Is there a quick regex for sed to change that?
basically, if the line starts with #include "3rd-party, it should replace the last character to ".
Thanks in advance
You can use this:
sed -i '' '/^[[:blank:]]*#include "3rd-party/s/>$/"/' file
#include "3rd-party/*lots folders*"
Basically you can use:
sed '/^[[:space:]]*#include "3rd-party/s/>[[:space:]]*$/"/' file
Explanation:
/^[[:space:]]*#include/ is an address, a regular expression address. The subsequent command will apply to lines which start which optional space followed by an #include statement.
s/>[[:space:]]*$/"/ replaces > followed by optional space and the end of the line by a ".
Use the -i option if you want to change the file in place:
sed -i '/^[[:space:]]*#include/s/>[[:space:]]*$/"/' file
On a bunch of, let's say C files, use find and it's -exec option:
find . -name '*.c' -exec sed -i '/^[[:space:]]*#include/s/[[:space:]]*$/"/' {} \;
You can use sed for searching a pattern and doing an action on this line like
sed '/search_pattern/{action}' your_file
The action you want to do is replacing the last character in a line with >$ where > is your desired character and $ means that the searched character must be placed at the end of a line.
The action for doing this is the sedcommand s/// which work's like s/search_pattern/replace_pattern/.
This looks for your goal like:
sed '/#include "3rd-party/{s/>$/"/}' your_file
But since sed is a (s)tream (ed)itor you have to use sed's command flag -i to make your changes inline or pipe it with > to a new file.
Like this
sed -i '/#include "3rd-party/{s/>$/"/}' your_file
or like this
sed '/#include "3rd-party/{s/>$/"/}' your_file > new_file
Please let me know if this does your work.

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

BASH: replacing PERL with SED for in-place substitution

Would like to replace this statement with perl:
perl -pe "s|(?<=://).+?(?=/)|$2:80|"
with
sed -e "s|<regex>|$2:80|"
Since sed has a much less powerful regex engine (for example it does not support look-arounds) the task boils down to writing a sed compatible regex to match only a domain name in a fully qualitied URL. Examples:
http://php2-mindaugasb.c9.io/Testing/JS/displayName.js
http://php2-mindaugasb.c9.io?a=Testing.js
http://www.google.com?a=Testing.js
Should become:
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js
A solution like this would be ok:
sed -e "s|<regex>|http://$2:80|"
Thanks :)
Use the below sed command.
$ sed "s~//[^/?]\+\([?/]\)~//\$2:80\1~g" file
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js
You must need to escape the $ at the replacement part.
sed 's|http://[^/?]*|http://$2:80|' file
Output:
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js