Regex Pattern Replace - regex

So i wanted to replace the following
<duration>89</duration>
with
(Expected Result or at least Shoud become this:)
\n<duration>89</duration>
so basically replace every < with \n< in regex So i figured.
sed -e 's/<[^/]/\n</g'
Only problem it obviously outputs
\n<uration>89</duration>
Which brings me to my question. How can i tell regex to mach for a character which follows < (is not /) but stop it from replacing it so i can get my expected result?

Try this:
sed -e 's/<[^/]/\\n&/g' file
or
sed -e 's/<[^/]/\n&/g' file
&: refer to that portion of the pattern space which matched

It can be nicely done with awk:
echo '<duration>89</duration>' | awk '1' RS='<' ORS='\n<'
RS='<' sets the input record separator to<`
ORS='\n<' sets the output record separator to\n<'
1 always evaluates to true. An true condition without an subsequent action specified tells awk to print the record.

echo "<duration>89</duration>" | sed -E 's/<([^\/])/\\n<\1/g'
should do it.
Sample Run
$ echo "<duration>89</duration>
> <tag>Some Stuff</tag>"| sed -E 's/<([^\/])/\\n<\1/g'
\n<duration>89</duration>
\n<tag>Some Stuff</tag>

Your statement is kind of correct with one small problem. sed replaces entire pattern, even any condition you have put. So, [^/] conditional statement also gets replaced. What you need is to preserve this part, hence you can try any of the following two statements:
sed -e 's/<\([^/]\)/\n<\1/g' file
or as pointed by Cyrus
sed -e 's/<[^/]/\n&/g' file
Cheers!

echo '<duration>89</duration>' | awk '{sub(/<dur/,"\\n<dur")}1'
\n<duration>89</duration>

Related

Sed : print all lines after match

I got my research result after using sed :
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | cut -f 1 - | grep "pattern"
But it only shows the part that I cut. How can I print all lines after a match ?
I'm using zcat so I cannot use awk.
Thanks.
Edited :
This is my log file :
[01/09/2015 00:00:47] INFO=54646486432154646 from=steve idfrom=55516654455457 to=jone idto=5552045646464 guid=100021623456461451463 n
um=6 text=hi my number is 0 811 22 1/12 status=new survstatus=new
My aim is to find all users that spam my site with their telephone numbers (using grep "pattern") then print all the lines to get all the information about each spam. The problem is there may be matches in INFO or id, so I use sed to get the text first.
Printing all lines after a match in sed:
$ sed -ne '/pattern/,$ p'
# alternatively, if you don't want to print the match:
$ sed -e '1,/pattern/ d'
Filtering lines when pattern matches between "text=" and "status=" can be done with a simple grep, no need for sed and cut:
$ grep 'text=.*pattern.* status='
You can use awk
awk '/pattern/,EOF'
n.b. don't be fooled: EOF is just an uninitialized variable, and by default 0 (false). So that condition cannot be satisfied until the end of file.
Perhaps this could be combined with all the previous answers using awk as well.
Maybe this is what you actually want? Find lines matching "pattern" and extract the field after text= up through just before status=?
zcat file* | sed -e '/pattern/s/.*text=\(.*\)status=[^/]*/\1/'
You are not revealing what pattern actually is -- if it's a variable, you cannot use single quotes around it.
Notice that \(.*\)status=[^/]* would match up through survstatus=new in your example. That is probably not what you want? There doesn't seem to be a status= followed by a slash anywhere -- you really should explain in more detail what you are actually trying to accomplish.
Your question title says "all line after a match" so perhaps you want everything after text=? Then that's simply
sed 's/.*text=//'
i.e. replace up through text= with nothing, and keep the rest. (I trust you can figure out how to change the surrounding script into zcat file* | sed '/pattern/s/.*text=//' ... oops, maybe my trust failed.)
The seldom used branch command will do this for you. Until you match, use n for next then branch to beginning. After match, use n to skip the matching line, then a loop copying the remaining lines.
cat file | sed -n -e ':start; /pattern/b match;n; b start; :match n; :copy; p; n ; b copy'
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | ***cut -f 1 - | grep "pattern"***
instead change the last 2 segments of your pipeline so that:
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | **awk '$1 ~ "pattern" {print $0}'**

Sed or Awk or Perl substitution in a sentence

I need to make a substitution using Sed or other program. I have these patterns <ehh> <mmm> <mhh> repeated at the beginning of a sentences and I need to substitute for nothing.
I am trying this:
echo "$line" | sed 's/<[a-zA-z]+>//g'
But I get the same result, nothing changes. Anyone can help?
Thank you!
For me, for the test file
<ahh> test
<mmm>test 1
the following
sed 's/^<[a-zA-Z]\+>//g' testfile
produces
test
test 1
which seems to be what you want. Note that for basic regular expressions, you use \+ whereas for extended regular expressions, you use + (and need to use the -r switch for sed).
NB: I added a ^to the check since you said: at the beginning of the line.
echo '<ehh> <mmm> <mhh>blabla bla' | \
sed '^Js/^\([[:space:]]*\<[a-zA-Z]\{3\}\>\)\{1,\}//'
remove all starting occurence of your pattern (including heading space)
I escape & to be sure due to sed meaning of this character in pattern (work without on my AIX)
I don't use g because it remove several occurence of full pattern and there is only 1 begin (^) and use a multi occurence counter with group instead \(\)\{1,\}
If the goal is to get the last parameter from lines like this:
<ahh> test
<mmm>test 1
You can do:
awk -F\; '/^<[[:alpha:]]+&gt/ {print $NF}' <<< "$line"
test
test 1
It will search for pattern <[[:alpha:]]+&gt and print last field on line, separated by ;

How to match and partial substitute with sed

how can i match the substring "2153846-11" (composed sometimes by only numbers, like "2153846", sometimes like "2153846-11" or "2153846_11", sometimes like "2153846-1" always digits and in the first group no less then 5) inside the following:
"01/16/2015","2153846-11","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
and substitute the matched string with the first group (before dash/underscore) removing the second one.
The final result will be:
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
The instruction will be written a unique sed line like
sed -e 's/...//g' < myfile
Thanks
You can use this sed:
sed 's/"\([0-9]*\)[_-][0-9]*"/"\1"/g' file
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
You could try the below sed command.
$ echo '"01/16/2015","2153846-11","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"' | sed -r 's/"(2153846)([_-]11)?"/"\1"/g'
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"

cannot match multiple occurrences of character in sed regexp

I am trying to remove As at the end of line.
alice$ cat pokusni
SALALAA
alice$ sed -n 's/\(.*\)A$/\1/p' pokusni
SALALA
one A is removed just fine
alice$ sed -n 's/\(.*\)A+$/\1/p' pokusni
alice$ sed -n 's/\(.*\)AA*$/\1/p' pokusni
SALALA
multiple occurrences not:(
I am probably doing just some very stupid mistake, any help? Thanks.
Try this one 's/\(.*[^A]\)AA*$/\1/p'
Why + does not work:
Because it is just a normal character here.
Why 's/\(.*\)AA*$/\1/p' does not work:
Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA*. And A* will just match nothing.
This might work for you:
sed -n 's/AA*$//p' file
This replaces an A and zero or more A's at the end of line with nothing.
N.B.
sed -n 's/A*$//p file'
would produce the correct string however it would operate on every line and so produce false positives.
Using awk
awk '{sub(/AA$/,"A")}1' pokusni
SALALA
EDIT
Correct version, removing all A from end of line.
awk '{sub(/A*$/,x)}1' pokusni
You can use perl:
> echo "SALALAA" | perl -lne 'if(/(.*?)[A]+$/){print $1}else{print}'
SALAL

Sed substitute recursively

echo ddayaynightday | sed 's/day//g'
It ends up daynight
Is there anyway to make it substitute until no more match ?
My preferred form, for this case:
echo ddayaynightday | sed -e ':loop' -e 's/day//g' -e 't loop'
This is the same as everyone else's, except that it uses multiple -e commands to make the three lines and uses the t construct—which means "branch if you did a successful substitution"—to iterate.
This might work for you:
echo ddayaynightday | sed ':a;s/day//g;ta'
night
The g flag deliberately doesn't re-match against the substituted portion of the string. What you'll need to do is a bit different. Try this:
echo ddayaynightday | sed $':begin\n/day/{ s///; bbegin\n}'
Due to BSD Sed's quirkiness the embedded newlines are required. If you're using GNU Sed you may be able to get away with
sed ':begin;/day/{ s///; bbegin }'
with bash:
str=ddayaynightday
while true; do tmp=${str//day/}; [[ $tmp = $str ]] && break; str=$tmp; done
echo $str
The following works:
$ echo ddayaynightday | sed ':loop;/day/{s///g;b loop}'
night
Depending on your system, the ; may not work to separate commands, so you can use the following instead:
echo ddayaynightday | sed -e ':loop' -e '/day/{s///g
b loop}'
Explanation:
:loop # Create the label 'loop'
/day/{ # if the pattern space matches 'day'
s///g # remove all occurrence of 'day' from the pattern space
b loop # go back to the label 'loop'
}
If the b loop portion of the command is not executed, the current contents of the pattern space are printed and the next line is read.
Ok, here they're: while and strlen in bash.
Using them one may implement my idea:
Repeat until its length will stop changing.
There's neither way to set flag nor way to write such regex, to "substitute until no more match".