I have following data
https://link1.com
asndiaiusdias Rye ioajsidsauihduiashd
link1.com/image.jpg
$89.99
https://link2.com
8iqiwudhuiqhwdqwuidhuiqhwi Rye iqwdihqwuidhuiqwhduihqwi
https://link2.com/image.jpg
$22.99
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
https://link4.com
iashduhuasdi rye huiqwheui
https://link4.com/image.jpg
$232.99
My goal is to in case-sensitive match "Rye"(also rye or RYe or rYe) and delete 1 line before the match and 3 lines after match
so result should be:
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
You can use sed,grep,awk no need to use only sed, just need to work
You may use this awk with an empty RS:
awk -v RS= '$3 !~ /^[rR][yY][eE]$/' file
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
$ awk -v RS= 'tolower($3) != "rye"' file
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
or if you can have multiple blocks of text output and want them each separated by a blank line:
$ awk -v RS= -v ORS='\n\n' 'tolower($3) != "rye"' file
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
every other answer is assuming that 1 line before and 3 after actually means paragraphs:
$ perl -00 -ne 'print if !/\Wrye\W/i' input.txt
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
-00 enables paragraph mode
-n doesn't print records by default
'print if !/\Wrye\W/i - prints a paragraph unless it matches
however if 1 line before and 3 after needs to be taken literally:
$ perl -0777 -pe 's/.*\n.*\Wrye\W.*\n(.*\n){3}//ig' input.txt
https://link3.com
8iqiwudhuiqhwdqwuidhuiqhwi SOMETHING ELSE iqwdihqwuidhuiqwhduihqwi
https://link3.com/image.jpg
$42.99
-0777 read the entire file
-p print
.*\n - match a line including the end of line (note that without /s . doesn't match \n)
Note: somebody has raised the dos compatibility issue in a comment. The "." matches any character except newline, which includes \r, thus .*\n covers also dos line endings.
Alternatively, you can use Perl for a job like this:
$ perl -i -pe 'BEGIN{undef $/;} s/.*?\n.*rye.*?\n(^.*?\n){3}///mig' input.txt
$ sed -e "/${exclude}/I,+2d" -i /path/to/file
then I easily managed deleting before line
Related
I need to print [PR:XXXXX] only.
Ex: [Test][PR:John][Finished][Reviewer:SE] to [PR:John] only. (PR tag)
Note:Other strings rather than the [PR:XXXXX] may changed time to time
Ex:
[Test][PR:Cook][Completed]
[Test][Finished][PR:Russell][Reviewer:SE]
[Dump][Reviewer:SE][Complete][PR:Arnold]
Note: There are no multi line inputs and only one PR tag is included in all of inputs.
Untill I create following sed command but it did not work:
sed "s/\[PR:[^]]*\]//"
You might use bash for this:
s='[Test][PR:Cook][Completed]'
regex='\[PR:[^]]*]'
[[ "$s" =~ $regex ]] && echo "${BASH_REMATCH[0]}"
# => [PR:Cook]
See this online demo.
You may use grep:
grep -o '\[PR:[^]]*]'
See this demo.
Or, you can use this sed:
sed -n 's/.*\(\[PR:[^]]*]\).*/\1/p'
See this online demo.
Or, you can use awk
awk 'match($0,/\[PR:[^]]*]/) {print substr($0,RSTART,RLENGTH)}'
See the online demo.
If you have more than 1 occurrences of [PR to be printed in a single line then try following.
awk '{while(match($0,/\[PR:[^]]*\]/)){print substr($0,RSTART,RLENGTH);$0=substr($0,RSTART+RLENGTH)}}' Input_file
Simple explanation would be, using match function of awk to find all blocks which have [PR.....] in them, then printing all occurrences until all are printed in each line.
If you only have 1 such field max per line and want a blank line printed if no such field exists on the line then using GNU awk for FPAT:
$ awk -v FPAT='[[]PR:[^]]+]' '{print $1}' file
[PR:Cook]
[PR:Russell]
[PR:Arnold]
If you can have 0 to N such fields per line, e.g.:
$ cat file
[Test][PR:Cook][Completed]
[Test][Finished][PR:Russell][Reviewer:SE]
[Test][Finished][Reviewer:SE]
[Test][Finished][PR:Jack][PR:Russell][Reviewer:SE]
[Dump][Reviewer:SE][Complete][PR:Arnold]
then here's some of the options depending on your requirements:
$ awk -v FPAT='[[]PR:[^][]+]' '{print $1}' file
[PR:Cook]
[PR:Russell]
[PR:Jack]
[PR:Arnold]
$ awk -v FPAT='[[]PR:[^][]+]' 'NF{print $1}' file
[PR:Cook]
[PR:Russell]
[PR:Jack]
[PR:Arnold]
$ awk -v FPAT='[[]PR:[^][]+]' '{for (i=1; i<=NF; i++) print $i}' file
[PR:Cook]
[PR:Russell]
[PR:Jack]
[PR:Russell]
[PR:Arnold]
$ awk -v FPAT='[[]PR:[^][]+]' '{$1=$1} 1' file
[PR:Cook]
[PR:Russell]
[PR:Jack] [PR:Russell]
[PR:Arnold]
Use this perl command line:
perl -pe 's/\[[^P][^R][^:].*?\]//g' your_file
Test Below:
$ echo "[Test][Finished][PR:Russell][Reviewer:SE][PR:Rachel]"|perl -pe 's/\[[^P][^R][^:].*?\]//g'
[PR:Russell][PR:Rachel]
Another perl:
perl -lne 'print join "", grep {/^\[PR:/} /\[.+?\]/g' file
This will accomodate multiple PR tags on one line.
Here is a a shorter gnu-awk solution (using same input file as in Ed's answer):
awk -v RS='\\[PR:[^]]+]' 'RT {print RT}' file
[PR:Cook]
[PR:Russell]
[PR:Jack]
[PR:Russell]
[PR:Arnold]
Here is a Ruby:
ruby -lne 'puts $_.scan(/\[PR.+?\]/).join("")' file
This accommodates multiple PR tags per line.
I need to match a pattern in a file AND print the following 2 lines. I am using grep -A2 for this.
But I want to ignore some lines from this first grep.
I need the output from the first 'grep -A2' to do some further processing on so piping to grep -v won't help me as far as I understand.
$cat file.txt
stringA-hurdygurdy-andmorechars
line1
line2
stringA-hurdygurdy-stringB-andmorechars
line1
line2
stringA-hurdygurdy-andmorechars
line1
line2
I need to grep -A2 all the lines that have "stringA-hurdygurdy" but not the ones that contain stringB.
I'm trying
grep -A2 ^stringA.*[^stringB].* file.txt
You can do it using awk:
awk '/stringA/ && !/stringB/ {n = NR+2} n >= NR' file.txt
stringA-hurdygurdy-andmorechars
line1
line2
stringA-hurdygurdy-andmorechars
line1
line2
Could you please try following, written and tested with shown samples in GNU awk. There is a variable named lines where we could put how many lines we need to print after matched pattern.
awk -v lines="2" '
/^stringA/ && !/stringB/{
count=0
found=1
print
next
}
found && ++count<=lines
' Input_file
Explanation: Adding detailed explanation for above.
awk -v lines="2" ' ##Starting awk program from here and setting lines variabnle value to 2.
/^stringA/ && !/stringB/{ ##Checking condition if line contains stringA and DOES NOT contain stringB then do following.
count=0 ##Setting count variable to 0 here.
found=1 ##Setting found variable to 1 here.
print ##Printing current line here.
next ##next will skip statements from here.
}
found && ++count<=lines ##Checking condition if found is SET and count(with increasing value of 1) is lesser than lines then print that line.
' Input_file ##Mentioning Input_file name here.
With grep -P you need a negative lookahead:
^stringA(?!.*stringB).*$[\r\n]+.*[\r\n]+.*
Use this Perl one-liner (similar to the awk solution from anubhava):
perl -lne '$line_num = $. if /stringA/ && !/stringB/; print if $line_num <= $. && $. <= ( $line_num + 2 );' file.txt
Output:
stringA-hurdygurdy-andmorechars
line1
line2
stringA-hurdygurdy-andmorechars
line1
line2
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing. Its use in this particular case as posted by the OP is optional.
$. : current input line number.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
I don't know the grep syntax but here is the regex which worked for me (.NET regex engine):
^.*$(?<=stringA.*)(?<!stringB.*)
You capture a complete line, then you made two look-behinds.
I had to enable the multiline option /m.
Test it in .NET:
Regex newRegex = new Regex(
#"^.*$(?<=stringA.*)(?<!stringB.*)",
RegexOptions.Multiline
);
I'm totally a regular expression newbie and I think the problem of my code lies in the regular expression I use in match function of awk.
#!/bin/bash
...
line=$(sed -n '167p' models.html)
echo "line: $line"
cc=$(awk -v regex="[0-9]" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH+1); print pattern_match}')
echo "cc: $cc"
The result is:
line: <td><center>0.97</center></td>
cc:
In fact, I want to extract the numerical value 0.97 into variable cc.
You need to pass your shell variable $line to awk, otherwise it cannot be used within the script.
Alternatively, you can just read the file using awk (no need to involve sed at all).
If you want to match the . as well as the digits, you'll have to add that to your regular expression.
Try something like this:
cc=$(awk 'NR == 167 && match($0, /[0-9.]+/) { print substr($0, RSTART, RLENGTH) }' models.html)
Three things:
You need to pass the value of line into awk with -v:
awk -v line="$line" ...
Your regular expression only matches a single digit. To match a float, you want something like
[0-9]+\.[0-9]+
No need to add 1 to the match length for the substring
substr(line, RSTART, RLENGTH)
Putting it all together:
line='<td><center>0.97</center></td>'
echo "line: $line"
cc=$(awk -v line="$line" -v regex="[0-9]+\.[0-9]+" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH); print pattern_match}')
echo "cc: $cc"
Result:
line: <td><center>0.97</center></td>
cc: 0.97
The basic idea is this. Suppose that you want to search a file for multiple patterns from a pipe with awk :
... | awk -f - '{...}' someFile.txt
* '...' is just short for some code
* '-f -' indicates the pattern is taken from pipe
Is there a way to know which pattern is searched at each instant within the awk script
(like you know $1 is the first field, is there something like $PATTERN that contains the current pattern
searched or a way to get something like it?
More Elaboration:
if I have 2 files:
someFile.txt containing:
1
2
4
patterns.txt containing:
1
2
3
4
running this command:
cat patterns.txt |awk -f - '{...}' someFile.txt
What should I type between the braces such that only the pattern in patterns.txt that
has not been matched in someFile.txt is printed?(in this case the number 3 in patterns.txt is not matched)
Under the requirements that patterns.txt be supplied as stdin and that the processing be done with awk:
$ cat patterns.txt | awk 'FNR==NR{p=p "\n" $0;next;} p !~ $0' someFile.txt -
3
This was tested using GNU awk.
Explanation
We want to remove from patterns.txt anything that matches a line in someFile.txt. To do this, we first read in someFile.txt and create patterns from it. Next, we print only the lines from patterns.txt that do not match any of the patterns from someFile.txt.
FNR==NR{p=p "\n" $0;next;}
NR is the number of lines that awk has read so far and FNR is the number of lines that awk has read so far from the current file. Thus, if FNR==NR, we are still reading the first named file: someFile.txt. We save all such lines in the newline-separated variable p. We then tell awk to skip the remaining commands and jump to the next line.
p !~ $0
If we got here, then we are now reading the second named file on the command line which is - for stdin. This boolean condition evaluates to either true or false. If it is true, the line is printed. If not, it is skipped. In other words, the above is awk's crytic shorthand for:
p !~ $0 {print $0}
cmd | awk 'NR==FNR{pats[$0]; next} {for (p in pats) if ($0 ~ p) delete pats[p]} END{ for (p in pats) print p }' - someFile.txt
Another way in awk
cat patterns.txt | awk 'NR>FNR&&!($0 in a);{a[$0]}' someFile.txt -
I am using an awk command (someawkcommand) that prints these lines (awkoutput):
>Genome1
ATGCAAAAG
CAATAA
and then, I want to use this output (awkoutput) as the input of a sed command. Something like that:
someawkcommand | sed 's/awkoutput//g' file1.txt > results.txt
file1.txt:
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The final objective is to delete all lines in a file (file1.txt) containing the exact pattern found previously by awk.
The file results.txt contains (output of sed):
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
How should I write the sed command? Is there any simple way that sed will recognize the output of awk as its input?
Using GNU awk for multi-char RS:
$ cat file1
>Genome1
ATGCAAAAG
CAATAA
$ cat file2
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
$ gawk -v RS='^$' -v ORS= 'NR==FNR{rmv=$0;next} {sub(rmv,"")} 1' file1 file2
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The stuff that might be non-obvious to newcomers but are very common awk idioms:
-v RS='^$' tells awk to read the whole file as one string (instead of it's default one line at a time).
-v ORS= sets the Output Record Separator to the null string (instead of it's default newline) so that when the file is printed as a string awk doesn't add a newline after it.
NR==FNR is a condition that is only true for the first input file.
1 is a true condition invoking the default action of printing the current record.
Here is a possible sed solution:
someawkcommand | sed -n 's_.*_/&/d;_;H;${x;s_\n__g p}' | sed -f - file1.txt
First sed command turns output from someawkcommand into a sed expression.
Concretely, it turns
>Genome1
ATGCAAAAG
CAATAA
into:
/>Genome1/d;/ATGCAAAAG/d;/CAATAA/d;
(in sed language: delete lines containing those patterns; mind that you will have to escape /,[,],*,^,$ in your awk output if there are some, with another substitution for instance).
Second sed command reads it as input expression (-f - reads sed commands from file -, i.e. gets it from pipe) and applies to file file1.txt.
Remark for other readers:
OP wants to use sed, but as notified in comments, it may not be the easiest way to solve this question. Deleting lines with awk could be simpler. Another (easy) solution could be to use grep with -v (invert match) and -f (read patterns from files) options, in this way:
someawkcommand | grep -v -f - file1.txt
Edit: Following #rici's comments, here is a new command that takes output from awk as a single multiline pattern.
Disclaimer: It gets dirty. Kids, don't do it home. Grown-ups are strongly encouraged to consider avoiding sed for that.
someawkcommand | \
sed -n 'H;${x;s_\n__;s_\n_\\n_g;s_.*_H;${x;s/\\n//;s/&//g p}_ p}' | \
sed -n -f - file1.txt
Output from inner sed is:
H;${x;s/\n//;s/>Genome1\nATGCAAAAG\nCAATAA//g p}
Additional drawback: it will add an empty line instead of removed pattern. Can't fix it easily (problems if pattern is at beginning/end of file). Add a substitution to remove it if you really feel like it.
This is can more easily be done in awk, but the usual "eliminate duplicates" code is not correct. As I understand the question, the goal is to remove entire stanzas from the file.
Here's a possible solution which assumes that the first awk script outputs a single stanza:
awk 'NR == FNR {stanza[nstanza++] = $0; next}
$0 == stanza[i] {++i; next}
/^>/ && i == nstanza {i=0; next}
i {for (j=0; j<i; ++j) print stanza[j]; i=0}
{print $0;}
' <(someawkcommand) file1.txt
This might work for you (GNU sed):
sed '1{h;s/.*/:a;$!{N;ba}/p;d};/^>/!{H;$!d};x;s/\n/\\n/g;s|.*|s/&\\n*//g|p;$s|.*|s/\\n*$//|p;x;h;d' file1
sed -f - file2
This builds a script from file1 and then runs it against file2.
The script slurps in file2 and then does a gobal substitution(s) using the contents of file1. Finally it removes any blank lines at the end file caused by the contents deletion.
To see the script produced from file1, remove the pipe and the second sed command.
An alternative way would be to use diff and sed:
diff -e file2 file1 | sed 's/d/p/g' | sed -nf - file2