HP-UX: regular expression repetition does not work?

HP-UX: regular expression repetition does not work? - regex

I want to pick up those rows whose 4th filed is not empty.
But the following RE did not work:
^\([^,]*,\)\{3\}[^,][^,]*,.*$
Then I tried to print the captured groups, the result confused me.
It seems that the repetition does not work.
Would anyone explain it, please.
Details(see line 4~6):
$ cat tmp
1AAA,BBB,CCC,DDD,EEE,FFF
2AAA,BBB,CCC,DDD,EEE,FFF
3AAA,BBB,,DDD,EEE,FFF
4AAA,BBB,CCC,,EEE,FFF
5AAA,BBB,CCC,,EEE,FFF
6AAA,BBB,CCC,,EEE,FFF
7AAA,BBB,CCC,DDD,EEE,FFF
8AAA,BBB,CCC,DDD,EEE,FFF
9xxxxxxx
$ sed -n "/^\(\([^,]*,\)\{3\}\)\([^,][^,]*\)\(,.*\)$/ {s//\1/;p;}" tmp
1AAA,BBB,CCC,
2AAA,BBB,CCC,
3AAA,BBB,,
4AAA,BBB,
5AAA,BBB,
6AAA,BBB,
7AAA,BBB,CCC,
8AAA,BBB,CCC,
$ uname
HP-UX

This awk will print all lines where 4th field is not empty.
awk -F, '$4' file
1AAA,BBB,CCC,DDD,EEE,FFF
2AAA,BBB,CCC,DDD,EEE,FFF
3AAA,BBB,,DDD,EEE,FFF
7AAA,BBB,CCC,DDD,EEE,FFF
8AAA,BBB,CCC,DDD,EEE,FFF
It may be simpler to work with awk in this case, since it simple to test fields.
Here it just test if $4 is not empty, and print the line.

Sure it is way simpler to do this with awk, as shown by the slim and perfectly working answer proposed by Jotne.
If you want to investigate what's wrong with your HP-UX sed thing, I would suggest you have a look at this conversation and try to pass your data not through a file but through the stdin of sed: cat tmp | sed -n ... or sed -n ... < tmp.
My very first attempt at troubleshooting your issue, though, would be to replace your double quotes by single quotes, as maybe with your double quotes your shell is trying to interpret $/ or *, I don't know what shell you are using...

You could try the below GNU sed command,
$ sed -nr '/^[A-Z0-9]+,[A-Z]*,[A-Z]*,[A-Z]+,[A-Z]*,[A-Z]*$/p' file
OR
$ sed -nr '/^.*,.*,.*,.+,.*,.*$/p' file
1AAA,BBB,CCC,DDD,EEE,FFF
2AAA,BBB,CCC,DDD,EEE,FFF
3AAA,BBB,,DDD,EEE,FFF
7AAA,BBB,CCC,DDD,EEE,FFF
8AAA,BBB,CCC,DDD,EEE,FFF

Related

Get specific Text between Specific Tags

At the top of my HTML files, I have...
<H2>City</H2>
<P>Liverpool</P>
or
<H2>City</H2>
<P>Dublin</P>
I want to output the text between the tags straight after <H2>City</H2> instances. So in the examples above which are separate files, I want to print out Liverpool and in the second example, Dublin.
Looking at this thread, I try:
sed -e 's/City\(.*\)\/P/\1/'
which I hope would get me half way there... but that just prints out the entire file. Any ideas?

awk to the rescue! You need multi-char RS support though (gawk has it)
$ awk -F'[<>]' -v RS='<H2>City</H2>' 'NF{print $3}' file
another approach can be
$ awk 'c&&c--{sub(/<[^>]*>/,""); print} /<H2>City<\/H2>/{c=1}' file
find the next record after City and trim the angle brackets...

Try using the following regex :
(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)
see regex demo / explanation
sed
sed -e 's/(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)/'

I checked and the \s seem not work for spaces. You should use the newline character \n:
sed -e 's/<H2>City<\/H2>\n<P>\(.*\)<\/P>/\1/'
There is no need of use lookbehind (like above), that is an overkill.

With sed, you can use the n command to read next line after your pattern. Then just remove the tag to output your content:
sed -n '/<H2>City<\/H2>/n;s/ *<\/*P> *//gp;' file

I think this should work in your mac:
echo -e "<H2>City</H2>\n<P>Dublin</P>" |awk -F"[<>]" '/City/{getline;print $3}'
Dublin

removing unmatched lines with SED

I'm trying to remove everything but 3 separate lines with specific matching pattern and leave just the 3 lines I want
Here is my code;
sed -n '/matching pattern/matching pattern/matching pattern/p' > file.txt

If you have multiple commands on the same line, you need to separate the commands by a ;:
sed -n '/matching pattern/p;/matching pattern2/p;/matching pattern3/p' file
Alternatively you can put them onto separate lines:
sed -n '/matching pattern/p
/matching pattern2/p
/matching pattern3/p' file
Beside that, you can also use regex alternation:
sed -rn '/(pattern|pattern2|pattern3)/p' file
or (better) use grep:
grep -E '(pattern|pattern2|pattern3)' file
However, this might get messy if the patterns getting longer and more complicated.

awk to the rescue!
awk '/pattern1/ || /pattern2/ || /pattern3/' filename
I think it's cleaner than alternatives.

Sed with Deletion
There's always more than one way to do this sort of thing, but one useful sed programming pattern is using alternation with deletion. For example:
# BSD sed
sed -E '/root|daemon|nobody/!d' /etc/passwd
# GNU sed
sed -r '/root|daemon|nobody/!d' /etc/passwd
This makes it possible to express ideas like "delete everything except for the listed terms." Even when expressions are functionally equivalent, it can be helpful to use a construct that most closely matches the idea you're trying to convey.

This might work for you (GNU sed):
sed '/pattern1/b;/pattern2/b;/pattern3/b;d' file
The normal flow of sed is to print what remains in the pattern space after processing. Therefore if the required pattern is in the pattern space let sed do its thing otherwise delete the line.
N.B. the b command is like a goto and if it has no following identifier, it means break out of any further sed commands and print (or not print if the -n option is in action) the contents of the pattern space.

If I understood you correctly:
sed -n '/\(pattern1\|pattern2\|pattern3\)/p' file > newfile

Deleting lines matching a pattern from a Unix file

I have a file containing strings of the following format:
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|DELETE|REDEFINES|VARIABLE.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
I want to be able to use something like sed or awk to delete lines containing the word REDEFINES but NOT if the word PIC is also in there or if there is no full stop at the end of a line as this means the string has been split over 2 lines. So out of the 4 lines (3 strings) stated above I would only want to delete 05|DELETE|REDEFINES|VARIABLE.
I thought you might be able to use some kind of negation or lookahead but these don't seem to be available or I can't get them to work
Using awk this deletes anything containing REDEFINES in the String following the pattern in the example above:
awk '!/[[:print:]]*\REDEFINES[[:print:]]*\./'
Similarly using sed:
sed '/[[:print:]]*|REDEFINES[[:print:]]*\./d'
I just can't work out how to extend it to do what I need. Is this possible in sed or awk or do I need another tool?
Any help greatly appreciated.

Using awk
awk -v RS= '!/REDEFINES/ || /PIC/' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
Using sed (with older input data):
sed -i.bak '/REDEFINES/{/PIC/!d;}' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.

You can try the below command. Print the line if it contains PIC or if it does not contain REDEFINES. It is maintainable as it is not so tricky and could be understood without much of an effort.
cat input.txt | awk '{if ($0 ~ /PIC/ || $0 !~ /REDEFINES/){print $0}}'

Why don't you just use grep? Using negations on your question, here is what I understood:
keep the lines terminated with a full-stop, containing both REDEFINES and PIC.
So grep seems easy:
$ grep -E 'REDEFINES.*\.$' file | grep PIC
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
Hope this helps.

This might work for you (GNU sed):
sed -r '/REDEFINES/{/PIC|[^.]$/!d}' file
or perhaps more easily:
sed '/PIC/b;/REDEFINES.*\.$/d' file
or if you prefer:
sed '/PIC/!{/REDEFINES.*\.$/d}' file

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.

sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file

delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'

You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename

You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.

This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

Replace strings with double quotes in a XML file

I have a huge XML file with longer lines (5000-10000 characters per line) with following text:
Pattern="abc"
and I want to replace it with
Pattern="def"
As the line sizes are huge, I have no choice but to use awk. Please suggest how this can be achieved. I tried with the below but it is not working:
CMD="{sub(\"Pattern=\"abc\"\",\"Pattern=\"def\"\"); print}"
echo "$CMD"
awk "$CMD" "Some File Name.xml"
Any help is highly appreciated.

one suggestion with awk
BEGIN {FS="\""; OFS=""}
/Pattern="abc"/{$2="\"def\""}1

I don't understand why you said "As the line sizes are huge, I have no choice but to use awk". AFAIK sed is no more limited on line length than awk is and since this is a simple substitution on a single line, sed is the better choice of tool:
$ cat file
Pattern="abc"
$ sed -r 's/(Pattern=")[^"]+/\1def/' file
Pattern="def"
If the pattern occurs multiple times on the line, add a "g" to the end of the line.
Since you mention in your comment being stuck with a sed that can't handle long lines, let's assume you can't install GNU tools so you'll need a non-GNU awk solution like this:
$ awk '{sub(/Pattern="[^"]+/,"Pattern=\"def")}1' file
Pattern="def"
If you LITERALLY mean you only want to replace Pattern="abc" then just do:
$ awk '{sub(/Pattern="abc"/,"Pattern=\"def\"")}1' file
Pattern="def"

If You have bash you can try this:
Create file with long lines (>10_000 chars):
for((i=0;i<2500;++i));{ s="x$s";}
l="${s}Pattern=\"abc\"$s"
for i in {1..5}; { echo "$l$l";} >infile
The script:
while read x; do echo "${x//Pattern=\"abc\"/Pattern=\"def\"}";done <infile
This replaces all occurrences of Pattern="abc" to Pattern="def" in each line.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

HP-UX: regular expression repetition does not work? - regex

You could try the below GNU sed command, $ sed -nr '/^[A-Z0-9]+,[A-Z],[A-Z],[A-Z]+,[A-Z],[A-Z]$/p' file OR $ sed -nr '/^.,.,.,.+,.,.*$/p' file 1AAA,BBB,CCC,DDD,EEE,FFF 2AAA,BBB,CCC,DDD,EEE,FFF 3AAA,BBB,,DDD,EEE,FFF 7AAA,BBB,CCC,DDD,EEE,FFF 8AAA,BBB,CCC,DDD,EEE,FFF

Related

Get specific Text between Specific Tags

removing unmatched lines with SED

Deleting lines matching a pattern from a Unix file

how to select lines containing several words using sed?

Replace strings with double quotes in a XML file

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

HP-UX: regular expression repetition does not work? - regex

You could try the below GNU sed command, $ sed -nr '/^[A-Z0-9]+,[A-Z]*,[A-Z]*,[A-Z]+,[A-Z]*,[A-Z]*$/p' file OR $ sed -nr '/^.*,.*,.*,.+,.*,.*$/p' file 1AAA,BBB,CCC,DDD,EEE,FFF 2AAA,BBB,CCC,DDD,EEE,FFF 3AAA,BBB,,DDD,EEE,FFF 7AAA,BBB,CCC,DDD,EEE,FFF 8AAA,BBB,CCC,DDD,EEE,FFF

Related

Get specific Text between Specific Tags

removing unmatched lines with SED

Deleting lines matching a pattern from a Unix file

how to select lines containing several words using sed?

Replace strings with double quotes in a XML file

Categories

Resources

You could try the below GNU sed command, $ sed -nr '/^[A-Z0-9]+,[A-Z],[A-Z],[A-Z]+,[A-Z],[A-Z]$/p' file OR $ sed -nr '/^.,.,.,.+,.,.*$/p' file 1AAA,BBB,CCC,DDD,EEE,FFF 2AAA,BBB,CCC,DDD,EEE,FFF 3AAA,BBB,,DDD,EEE,FFF 7AAA,BBB,CCC,DDD,EEE,FFF 8AAA,BBB,CCC,DDD,EEE,FFF