Regex/Perl to match blocks of text that contain a string - regex

So I have a log file that looks something like this:
EVENT-header
apple
orange
peach
blueberry
EVENT-header
bike
car
blueberry
EVENT-header
reddit
hacker news
stack overflow
slashdot?
voat
What I am trying to do is extract the blocks of text (from EVENT-header to the two newlines before the next EVENT-header) that contain the word "peach".
I think this is a problem that regex would solve, but I am having trouble making regex that does this. Here's what I have come up so far:
's/EVENT-header((?!\n\n).)+peach((?!\n\n).)+\n\n/&/p'
I'm not an expert at this. Is there an easy way to do it using regex/perl?

You can do this easily using paragraph mode which makes perl read blocks of text delimited by blank lines
perl -00 -ne'print if /peach/' logfile.log
If you prefer a full program file then it looks like this
use strict;
use warnings;
open my $fh, '<', 'logfile.log' or die $!;
{
local $/ = '';
while ( <$fh> ) {
print if /peach/;
}
}

EVENT-header\n[\s\S]*?(?=(?:\n\nEVENT-header|$))
You can use this.See demo.
https://regex101.com/r/hR7tH4/3

There are various ways to do this, with multiline regex match being a good candidate. If the data file is as regular as it appears, specifically with each "record" separated by the marker 'EVENT-header', then you can also use the trick of setting $/ (aka $RS aka $INPUT_RECORD_SEPARATOR) to be this marker and then slurping the file into an array. You'll get an array entry for each record in the file, and then it's trivial to loop over the array, select the elements that match 'peach', and print out the entire containing record.
For example:
#!/usr/bin/perl -w
use strict;
$/='EVENT-header';
my (#entries, $entry);
my $infile = 'data.txt';
open(IN, "<$infile") or die "Aaargh: $^E\n";
#entries = <IN>;
chomp #entries;
close(IN);
foreach $entry (#entries)
{
if ($entry =~ m/peach/)
{
print "matching entry: $entry\n";
}
}

Borodin has already given best solution for your question. But here is a code in case you don't want use one liner:
#!/usr/bin/perl
use warnings;
use strict;
local $/ = ""; #to enable paragraph mode
open my $fh, "<", "input.log" or die "Unable to open file: $!";
while (my $line = <$fh>)
{
chomp $line;
if ($line =~ m/peach/)
{
print $line, "\n";
}
}
Output:
EVENT-header
apple
orange
peach
blueberry

Related

Trying to input a formatted text file and get rid of all white spaces in addition to getting every word on a separate line in perl

#!/usr/bin/perl
use strict;
use warnings;
open (my $inFile, '<', 'electricity.txt') or die $!;
while (<$inFile>) {
chomp $_;
$_ = split(/\s+/);
print $_;
}
close ($inFile);
When I run this, I get a bunch of numeric values as the output. I'm not sure why.
I ended up doing it this way...
#!/usr/bin/perl
use strict;
use warnings;
open (my $inFile, '<', 'electricity.txt') or die $!;
while (my $lines = <$inFile>) {
chomp $lines;
$lines =~ s/\s+/\n/g;
print $lines;
foreach my $line (split /\s+/, $lines) {
$count{$line}++;
}
}
close ($inFile);
Perhaps you intended to write as following
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
open my $inFile, '<', 'electricity.txt'
or die $!;
while( <$inFile> ) {
chomp;
say for split;
}
close ($inFile);
NOTE: $_ is not intended for assignment, split in list context returns list/array of elements, otherwise it returns size of the list/array
NOTE: split and split ' ' is special case for split /\s+/
TutorialPoint: Perl - Special Variables
In your Answer code you have forget to declare the %count hash variable. Hence sure you will get the compilation error while booting your code. Just I will check in my way how can I get the answer in your question.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %count = ();
my #text = split /\s+/, do { local $/; <DATA> };
$count{$_}++ for(#text);
print Dumper \%count;
print join "\n", #text;
I will get the output from the above code. My Question is, how can I merge two line of the codes in a single line. Someone please advice. I tried but I couldn't get the output.
__DATA__
We study six natural decompositions of mixed states in one spatial dimension:

How to use Regex in a While If statement? Perl

I'm new to programming and I've run into an issue. We have to use Perl to write a script that opens a file, then loops through each line using a Regex - then print out the results. The opening of the file and the loop I have, but I can't figure out how to implement the Regex. It outputs 0 matched results, when the assignment outline suggests the number to be 338. If I don't use the Regex, it outputs 2987, which is the total number of lines - which is correct. So there's something incorrect with the Regex I just can't figure out. Any help would be greatly appreciated!
Here's what I have thus far:
use warnings;
use strict;
my $i = 0;
my $filename = 'C:\Users\sample.log.txt';
open (fh, '<', $filename) or die $!;
while(<fh>) {
if ($filename=~ /(sshd)/){
$i++;
}
}
close(fh);
print $i;
Consider this piece of code of yours:
while(<fh>) {
if ($filename=~ /(sshd)/){
$i++;
}
}
You are indeed looping through the file lines, but you keep checking if the file name matches your regex. This is clearly not what you intend.
You meant:
while (my $line = <fh>) {
if ($line =~ /sshd/){
$i++;
}
}
Parentheses around the regex seem superfluous (they are meat to capture, while you are only matching).
Since expression while (<fh>) assigns the content of the line to special variable $_ (which is the default argument for regexp matching), this can be shortened as:
while (<fh>) {
$i++ if /sshd/;
}
OP code has some errors which I've correcte
use warnings;
use strict;
use feature 'say';
my $i = 0;
my $filename = 'C:\Users\sample.log.txt';
open my $fh, '<', $filename
or die "Couldn't open $filename";
map{ $i++ if /sshd/ } <$fh>;
close($fh);
say "Found: $i";

Issue with Perl Regex

new perl coder here.
When I copy and paste the text from a website into a text file and read from that file, my perl script works with no issues. When I use getstore to create a file from the website automatically which is what I want, the output is a bunch of |'s.
The text looks identical when I copy and paste, or download the text with getstore.. I'm unable to figure out the problem. Any help would be highly appreciated.
The output that I desire is as follows:
|www\.arkinsoftware\.in|www\.askmeaboutrotary\.com|www\.assculturaleincontri\.it|www\.asu\.msmu\.ru|www\.atousoft\.com|www\.aucoeurdelanature\.
enter code here
Here is the code I am using:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
getstore("http://www.malwaredomainlist.com/hostslist/hosts.txt", "malhosts.txt");
open(my $input, "<", "malhosts.txt");
while (my $line = <$input>) {
chomp $line;
$line =~ s/.*\s+//;
$line =~ s/\./\\\./g;
print "$line\|";
}
The bunch of | you get, is from the unfitting comment-lines at the beginning. So the solution is to ignore all "unfitting" lines.
So instead of
$line =~ s/.*\s+//;
use
next unless $line =~ s/^127.*\s+//;
so you would ignore every line except thos starting with 127.
Here's what I'd do:
my $first = 1;
while (<$input>) {
/^127\.0\.0\.1\s+(.+?)\s*$/ or next;
print '|' if !$first;
$first = 0;
print quotemeta($1);
}
This matches your input in a more precise way, and quotemeta takes care of true regex escaping.
I'd probably go with something like:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
getstore( "http://www.malwaredomainlist.com/hostslist/hosts.txt",
"malhosts.txt" );
open( my $input, "<", "malhosts.txt" );
print join ( "|",
map { m/^\d/ && ! m/localhost/ ?
quotemeta ((split)[1]) : () } <$input> );
Gives:
0koryu0\.easter\.ne\.jp|1\-atraffickim\.tf|10\-trafficimj\.tf|109\-204\-26\-16\.netconnexion\.managedbroadband\.co\.uk|11\-atraasikim\.tf|11\.lamarianella\.info|12\-tgaffickvcmb\.tf| #etc.

Perl: unable to get the correct match from the file

I need help with my script. I am writing a script that will check if the username is still existing in /etc/passwd. I know this can be done on BASH but as much as possible I want to avoid using it, and just focus on writing using Perl instead.
Okay, so my problem is that, my script could not find the right match in my $password_file. I still got the No root found error even though it is still in the file.
Execution of the script.
jpd#skriv ~ $ grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
jpd#skriv ~ $ ~/Copy/documents/scripts/my_perl/test.pl root
Applying pattern match (m//) to #id will act on scalar(#id) at /home/jpd/Copy/documents/scripts/my_perl/test.pl line 16.
No root found!
jpd#skriv ~ $
Also, why do I always get this "Applying pattern match..." warning?
Here's the code:
#!/usr/bin/perl
use strict;
use warnings;
my $stdin = $ARGV[0];
my $password_file = '/etc/passwd';
open (PWD, $password_file) || die "Error: $!\n";
my #lines = (<PWD>);
close PWD;
for my $line (#lines) {
my #list = split /:/, $line;
my #id = "$list[0]\n";
if (#id =~ m/$stdin/) {
die "Match found!\n";
} else {
print "No $stdin found!\n";
exit 0;
}
}
Thanks in advance! :)
Regards,
sedawkgrep
Perl Newbie
I have a few things to point out regarding your code:
Good job using use strict; and use warnings;. They should be included in EVERY perl script.
Pick meaningful variable names.
$stdin is too generic. $username does a better job of documenting the intent of your script.
Concerning your file processing:
Include use autodie; anytime you're working with files.
This pragma will automatically handle error messages, and will give you better information than just "Error: $!\n". Also, if you are wanting to do a manual error messages, be sure to remove the new line from your message or die won't report the line number.
Use Lexical file handles and the three argument form of open
open my $fh, '<', $password_file;
Don't load an entire file into memory unless you need to. Instead, use while loop and process the file line by line
Concerning your comparison: #id =~ m/$stdin/:
Always use a scalar to the left of comparison =~
The comparison operator binds a scalar to a pattern. Therefore the line #id =~ m/$stdin/ is actually comparing the size of #id to your pattern: "1" =~ m/$stdin/. This is obviously a bug.
Be sure to escape the regular expression special characters using quotemeta or \Q...\E:
$list[0] =~ m/\Q$stdin/
Since you actually want a direct equality, don't use a regex at all, but instead use eq
You're exiting after only processing the first line of your file.
In one fork you're dying if you find a match in the first line. In your other fork, you're exiting with the assumption that no other lines are going to match either.
With these changes, I would correct your script to the following:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $username = $ARGV[0];
my $password_file = '/etc/passwd';
open my $fh, '<', $password_file;
while (<$fh>) {
chomp;
my #cols = split /:/;
if ($cols[0] eq $username) {
die "Match found!\n";
}
}
print "No $username found!\n";
#!/usr/bin/perl
use strict;
use warnings;
my $stdin = $ARGV[0];
my $password_file = '/etc/passwd';
open (PWD,"<$password_file");
my #lines = <PWD>;
my #varr = grep (m/root/, #lines);
Then check varr array and split it if you need.
You'd be better off using a hash for key lookups, but with minimal modification this should work:
open my $in, '<', 'in.txt';
my $stdin = $ARGV[0];
while(<$in>){
chomp;
my #list = split(/\:/);
my ($id) = $list[0];
if ($id eq $stdin) {
die "Match found\n";
}
}

How to start matching and saving matched from exact point in a text

I have a text and I write a parser for it using regular expressions and perl.
I can match what I need with two empty lines (I use regexp), because there is a pattern that allows recognize blocks of text after two empty lines.
But the problem is that the whole text has Introduction part and some text in the end I do not need.
Here is a code which matches text when it finds two empty lines
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'first';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp ($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) {
close(OUT);
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
}
else {
$empty = 0;}
print OUT "$line\n";
}
close(OUT);
This is example of the text I need (it's really small :))
this is file example
I think that I need to iterate over the text till the moment it will find the word LOREM IPSUM with regexps this kind "/^LOREM IPSUM/", because it is the point from which needed text starts(and save the text in one file when i reach the word).
And I need to finish iterating over the text when INDEX word is fount or save the text in separate file.
How could I implement it. Should I use next function to proceed with lines or what?
BR,
Yuliya
You'd change your while loop to something like
my $in_lorem = 0;
while (my $line = <$fh>) {
if( $line =~ /^LOREM IPSUM/ ) {
$in_lorem = 1;
next;
}
next unless $in_lorem;
# your processing goes here
}
This will skip header lines until you hit the line that starts with LOREM IPSUM, after which you will process lines.
You'd use a similar pattern for ignoring all lines after a given line match, except you wouldn't have to process any more lines, so instead of using next you'd use last. That pattern is left as an exercise to the reader. :-)
You could use the flip flop range operator to start processing when you match LOREM IPSUM and stop when you match INDEX.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $file = 'firsttest';
open (my $fh, '<', $file) or die "Failed to open $file: $!";
while (<$fh>){
if (m/^LOREM IPSUM/ .. m/^INDEX/){
#Do your other matching, processing, etc. here
print;
last if m/^INDEX/;#Optional, to avoid reading remaining lines.
}
}