Delete all lines in file with specific regex using sed

Delete all lines in file with specific regex using sed - regex

We'd like to delete all lines which matches with the following "regex input" and put them in a new file:
Hi|thisisatest|11
What we have:
check='([^[:space:]]+)|([^[:space:]]+)|([^[:space:]]+)'
sed '/$check/d' test.txt > test_new.txt
It currently does not work.
Edit:
We got the following test.txt:
Jack|Miles|44
Carl|13
Robert|Whittaker|87
John|2
Frank|65
We want to delete Jack|Miles|44 and Robert|Whittaker|87, which matches the regex (if the regex is correct).

Correct BRE regex is:
check='[^[:space:]]*|[^[:space:]]*|[^[:space:]]*'
Then use it as:
sed "/$check/d" file
Carl|13
John|2
Frank|65
btw awk can handle it even better without using regex. Just use | as delimiter and delete all line that don't have 2 fields:
awk -F '|' 'NF==2' file
Carl|13
John|2
Frank|65

It is much more simpler when using awk, just do,
awk -F'|' 'NF<=2' file
Carl|13
John|2
Frank|65
To modify the same file back with the updates, just do,
awk -F'|' 'NF<=2' file > tmp && mv tmp file

With GNU sed:
sed -r '/\S+\|\S+\|\S+/d' file

Also a grep:
grep -P '^\w+\|\d+$' file >tmp
selects the "correct" entries from a file e.g. word|digits
or
grep -P '^[^|]+\|[^|]+$' file >tmp
and rename the tmp back to file

Related

unix - pattern matching in file

so I have a file with the following:
username=jsmith
api=3434kjklj23j4l3kj4l34j3l4j
I would like to return using regular expression "jsmith" and "3434kjklj23j4l3kj4l34j3l4j"
I know the regular expression for it is:
(username=)(.*) > \2
(api=)(.*) > \2
however using grep or sed or awk. I can't seem to figure out the way to use them without return the entire line.
How would you go about doing that with a commandline command?

awk is made for this task:
awk -F= '{print$2}' file
If the file has other entries, you can limit the output with a condition:
awk -F= '$1=="username"||$1=="api"{print$2}' file

Here is one using bash, PCRE and positive lookbehind (where supported):
$ grep -Po "((?<=^username=)|(?<=^api=)).*" file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. output everything that is preceeded by username= or api= that start the lines.
And one in awk:
$ awk 'sub(/^(username|api)=/,""){print}' file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. print lines where preceeding ^username= or ^api= are removed first.

Since you want to see chess with the input game=chess, here some solutions without matching username= or api=
cut -d"=" -f2- file
# or
sed -n 's/[^=]*=//p' file

here's the answer that worked on the macos and RHEl7.
awk -F= '$1=="username"{print$2}' testfile.txt
awk -F= '$1=="api"{print$2}' testfile.txt
testfile.txt
username=user1
api=pass1
username=user2
api =pass2

Get specific Text between Specific Tags

At the top of my HTML files, I have...
<H2>City</H2>
<P>Liverpool</P>
or
<H2>City</H2>
<P>Dublin</P>
I want to output the text between the tags straight after <H2>City</H2> instances. So in the examples above which are separate files, I want to print out Liverpool and in the second example, Dublin.
Looking at this thread, I try:
sed -e 's/City\(.*\)\/P/\1/'
which I hope would get me half way there... but that just prints out the entire file. Any ideas?

awk to the rescue! You need multi-char RS support though (gawk has it)
$ awk -F'[<>]' -v RS='<H2>City</H2>' 'NF{print $3}' file
another approach can be
$ awk 'c&&c--{sub(/<[^>]*>/,""); print} /<H2>City<\/H2>/{c=1}' file
find the next record after City and trim the angle brackets...

Try using the following regex :
(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)
see regex demo / explanation
sed
sed -e 's/(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)/'

I checked and the \s seem not work for spaces. You should use the newline character \n:
sed -e 's/<H2>City<\/H2>\n<P>\(.*\)<\/P>/\1/'
There is no need of use lookbehind (like above), that is an overkill.

With sed, you can use the n command to read next line after your pattern. Then just remove the tag to output your content:
sed -n '/<H2>City<\/H2>/n;s/ *<\/*P> *//gp;' file

I think this should work in your mac:
echo -e "<H2>City</H2>\n<P>Dublin</P>" |awk -F"[<>]" '/City/{getline;print $3}'
Dublin

Deleting lines matching a pattern from a Unix file

I have a file containing strings of the following format:
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|DELETE|REDEFINES|VARIABLE.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
I want to be able to use something like sed or awk to delete lines containing the word REDEFINES but NOT if the word PIC is also in there or if there is no full stop at the end of a line as this means the string has been split over 2 lines. So out of the 4 lines (3 strings) stated above I would only want to delete 05|DELETE|REDEFINES|VARIABLE.
I thought you might be able to use some kind of negation or lookahead but these don't seem to be available or I can't get them to work
Using awk this deletes anything containing REDEFINES in the String following the pattern in the example above:
awk '!/[[:print:]]*\REDEFINES[[:print:]]*\./'
Similarly using sed:
sed '/[[:print:]]*|REDEFINES[[:print:]]*\./d'
I just can't work out how to extend it to do what I need. Is this possible in sed or awk or do I need another tool?
Any help greatly appreciated.

Using awk
awk -v RS= '!/REDEFINES/ || /PIC/' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
Using sed (with older input data):
sed -i.bak '/REDEFINES/{/PIC/!d;}' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.

You can try the below command. Print the line if it contains PIC or if it does not contain REDEFINES. It is maintainable as it is not so tricky and could be understood without much of an effort.
cat input.txt | awk '{if ($0 ~ /PIC/ || $0 !~ /REDEFINES/){print $0}}'

Why don't you just use grep? Using negations on your question, here is what I understood:
keep the lines terminated with a full-stop, containing both REDEFINES and PIC.
So grep seems easy:
$ grep -E 'REDEFINES.*\.$' file | grep PIC
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
Hope this helps.

This might work for you (GNU sed):
sed -r '/REDEFINES/{/PIC|[^.]$/!d}' file
or perhaps more easily:
sed '/PIC/b;/REDEFINES.*\.$/d' file
or if you prefer:
sed '/PIC/!{/REDEFINES.*\.$/d}' file

Removing lines from a file that don't match a pattern using sed

I want to remove all the lines from a file that don't have the form:
something.something,something,something
For example if the file was the following:
A sentence, some words
ABCD.CP3,GHD,HDID
Hello. How are you?
A.B,C,D
dbibb.yes,whoami,words
I would be left with:
ABCD.CP3,GHD,HDID
A.B,C,D
dbibb.yes,whoami,words
I have tried to branch to the end of the sed script if I match the pattern I don't want to delete but continue and delete the line if it doesn't match:
cp $file{,.tmp}
sed "/^.+\..+,.+,.+$/b; /.+/d" "$file.tmp" > $file
rm "$file.tmp"
but this doesn't seem to have any affect at all.
I suppose I could read the file line by line, check if matches the pattern, and output it to a file if it does, but I'd like to do it using sed or similar.

You can use grep successfully:
grep -E '^[^.]+\.[^,]+,[^,]+,[^,]+$' file > temp
mv temp file

grep -E '^[^.]+\.[^.]+(,[^,]+){2}$'

Instead of deleting the lines which didn't satisfies the pattern, you could print the lines that matches this something.something,something,something pattern.
Through sed,
$ sed -n '/^[^.]*\.[^,]*,[^,]*,[^,.]*$/p' file
ABCD.CP3,GHD,HDID
A.B,C,D
dbibb.yes,whoami,words
Use inline edit option -i[suffix] to save the changes made.
sed -ni.bak '/^[^.]*\.[^,]*,[^,]*,[^,.]*$/p' file
Note: -i[suffix] make a backup if suffix is provided.
Through awk,
$ awk '/^[^.]*\.[^,]*,[^,]*,[^,.]*$/{print}' file
ABCD.CP3,GHD,HDID
A.B,C,D
dbibb.yes,whoami,words

Bash - how to put each line within quotation

I want to put each line within quotation marks, such as:
abcdefg
hijklmn
opqrst
convert to:
"abcdefg"
"hijklmn"
"opqrst"
How to do this in Bash shell script?

Using awk
awk '{ print "\""$0"\""}' inputfile
Using pure bash
while read FOO; do
echo -e "\"$FOO\""
done < inputfile
where inputfile would be a file containing the lines without quotes.
If your file has empty lines, awk is definitely the way to go:
awk 'NF { print "\""$0"\""}' inputfile
NF tells awk to only execute the print command when the Number of Fields is more than zero (line is not empty).

I use the following command:
xargs -I{lin} echo \"{lin}\" < your_filename
The xargs take standard input (redirected from your file) and pass one line a time to {lin} placeholder, and then execute the command at next, in this case a echo with escaped double quotes.
You can use the -i option of xargs to omit the name of the placeholder, like this:
xargs -i echo \"{}\" < your_filename
In both cases, your IFS must be at default value or with '\n' at least.

This sed should work for ignoring empty lines as well:
sed -i.bak 's/^..*$/"&"/' inFile
or
sed 's/^.\{1,\}$/"&"/' inFile

Use sed:
sed -e 's/^\|$/"/g' file
More effort needed if the file contains empty lines.

I think the sed and awk are the best solution but if you want to use just shell here is small script for you.
#!/bin/bash
chr="\""
file="file.txt"
cp $file $file."_backup"
while read -r line
do
echo "${chr}$line${chr}"
done <$file > newfile
mv newfile $file

paste -d\" /dev/null your-file /dev/null
(not the nicest looking, but probably the fastest)
Now, if the input may contain quotes, you may need to escape them with backslashes (and then escape backslashes as well) like:
sed 's/["\]/\\&/g; s/.*/"&"/' your-file

This answer worked for me in mac terminal.
$ awk '{ printf "\"%s\",\n", $0 }' your_file_name
It should be noted that the text in double quotes and commas was printed out in terminal, the file itself was unaffected.

I used sed with two expressions to replace start and end of line, since in my particular use case I wanted to place HTML tags around only lines that contained particular words.
So I searched for the lines containing words contained in the bla variable within the text file inputfile and replaced the beginnign with <P> and the end with </P> (well actually I did some longer HTML tagging in the real thing, but this will serve fine as example)
Similar to:
$ bla=foo
$ sed -e "/${bla}/s#^#<P>#" -e "/${bla}/s#\$#</P>#" inputfile
<P>foo</P>
bar
$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Delete all lines in file with specific regex using sed - regex

It is much more simpler when using awk, just do, awk -F'|' 'NF<=2' file Carl|13 John|2 Frank|65 To modify the same file back with the updates, just do, awk -F'|' 'NF<=2' file > tmp && mv tmp file

With GNU sed: sed -r '/\S+\|\S+\|\S+/d' file

Also a grep: grep -P '^\w+\|\d+$' file >tmp selects the "correct" entries from a file e.g. word|digits or grep -P '^[^|]+\|[^|]+$' file >tmp and rename the tmp back to file

Related

unix - pattern matching in file

Get specific Text between Specific Tags

Deleting lines matching a pattern from a Unix file

Removing lines from a file that don't match a pattern using sed

Bash - how to put each line within quotation

Categories

Resources