Perl Regular expression match from user input - regex

I'm just learning perl and I'm trying to learn regular expressions at the same time. Basically I'm trying to open a log file and print out any lines that match user input to a new file. Using the following code I get no output at all if I type in the word "Clinton". But if I replace
print MYFILE if /\$string\;
with
print MYFILE if /\Clinton\;
it runs as expected. Any ideas? I know it is something simple that I am missing.
print "Enter a word to look up: ";
$string = <>;
print "You put $string";
open(LOG,"u_ex121011.log") or die "Unable to open logfile:$!\n";
open (MYFILE, '>>data2.txt');
while(<LOG>){
print MYFILE if /\Q($string)\E/;
}
close (MYFILE);
close(LOG);
print "Check data2.txt";

In Perl, unlike in some languages, the input operator doesn't silently remove a trailing newline. So your $string is actually "Clinton\n" rather than than "Clinton". To fix it, use the chomp function:
$string = <>;
chomp $string;
print "You put $string\n";

You should also use the 3 argument version of open.
open( my $LOG, '<', 'u_ex121011.log' ) or die "Unable to open file:$!\n";
open http://perldoc.perl.org/functions/open.html

In addition to what ruakh said, you should check if the string is on the line by using the $_ variable and the =~ operator.
print MYFILE "$_\n" if $_ =~ /\Q$string\E/;
Going off of your comment, you can split the line up surprisingly enough using split.
Here is an example of what you could do:
my #lines = split( ' ', $_ );
print MYFILE "$lines[0] $lines[1] $lines[2] $lines[3]\n";
Here is documentation of split: http://perldoc.perl.org/functions/split.html

Related

Perl: How to parse through a file and print each line that matches user inputted strings?

I'm new to programming so bear with me. I'm working on a Perl script that asks the user the number of different items they want to search for and what those items are, separating them by pressing ENTER. That part works okay.
Then, the script is to open up a file, parse through, and print each line that matches with the items that the user initially listed. This is the part that I haven't been able to figure out yet. I've tried different variations of the code. I saw many people suggest using the index function but I had no luck with it. It does seem to be working when I swap $line =~ $array for $line =~ /TEXT/. I'm hoping someone here can shed some light.
Thanks in advance!
#!usr/bin/perl
use strict;
use warnings;
my $line;
my $array;
print "Enter number of items: ";
chomp(my $n = <STDIN>);
my #arrays;
print "Enter items, press enter to separate: \n";
for (1..$n) {
my $input = <STDIN>;
push #arrays, $input;
}
open (FILE, "file.txt") || die "can't open file!";
chomp(my #lines = <FILE>);
close (FILE);
foreach $array (#arrays) {
foreach $line (#lines) {
if ($line =~ $array) {
print $line, "\n";
}
}
}
#purplekushbear Welcome to Perl! In Perl, there is more than one way to do it (TIMTOWTDI) so please take this in the spirit of teaching that it is given.
First off your line one -- the #! (sha bang line) is missing the leading / in the path to perl. In Linux/UNIX environments if your script is executable the path after the #! is used to run your program. --- If you do an ls on /usr/bin/perl you should see it. Sometimes it is found at /bin/perl or /usr/local/bin/perl.
When the person mentioned you forgot to chomp they where referring to where you are setting the $input variable. Just chomp like you did for $n and you will be ok.
As for the main part of your program go back and read what you wanted to do and do exactly that might be simpler to do. I think you have a good start on the problem and seem to know that arrays start with a # and scalar variables use the $ sigil, and you use strict which is great.
Here is one way to solve your problem:
#!/usr/bin/perl
use strict;
use warnings;
print "Enter number of items: ";
chomp(my $num = <STDIN>);
my #items = ();
print "Enter items, press enter to separate: \n";
for (1 .. $num)
{
chomp(my $input = <STDIN>);
push #items, $input;
}
open (FILE, "file.txt") || die "can't open file because $!";
while (my $line = <FILE>)
{
foreach my $item (#items)
{
if ($line =~ m/$item/)
{
print $line;
last;
}
}
}
close (FILE);
Notice I used the name #items for your items instead of #arrays which will make understanding the code easier when you come back to it someday. Always write with an eye towards maintainability. Anyways, ask if you have any questions but since I left much of the code the same I don't think you will have much trouble figuring it out. Perldoc and google are your friends. E.g. you can type:
perldoc -f last
to find out how last works. Have fun!
In you script you have forgot to add the chomp while giving the user input, then you need to last the inside for loop when pattern is matched.
Then here is another way,You can try the following, same thing with different method.
I'm making variable name $regex instead of #array. In $regex variable I'm concatenating user input values with | separated. (In regex | behave like or). While concatenating I'm making the quotemeta to escape the special characters. Then I'm making the precompiled regex with qr for $regex variable
#!usr/bin/perl
use strict;
use warnings;
print "Enter number of items: ";
chomp(my $n = <STDIN>);
my $regex;
print "Enter items, press enter to separate: \n";
for (1..$n)
{
chomp(my $input = <STDIN>);
$regex .= quotemeta($input)."|";
}
chop $regex; #remove the last pipe
$regex = qr($regex);
open my $fh,"<", "file.txt" || die "can't open file!";
while(<$fh>)
{
print if(/$regex/i);
}
Then user #ikegami said his comment, you can use the Perl inbuilt #ARGV method instead of STDIN , for example
Your method
my #array = #ARGV;
Another method
my $regex = join "|", map { quotemeta $_ } #ARGV;
Then run the script perl test.pl input1 input2 input3.
And always use 3 arguments to open a file

perl regex: searching thru entire line of file

I'm a regex newbie, and I am trying to use a regex to return a list of dates from a text file. The dates are in mm/dd/yy format, so for years it would be '55' for '1955', for example. I am trying to return all entries from years'50' to '99'.
I believe the problem I am having is that once my regex finds a match on a line, it stops right there and jumps to the next line without checking the rest of the line. For example, I have the dates 12/12/12, 10/10/57, 10/09/66 all on one line in the text file, and it only returns 10/10/57.
Here is my code thus far. Any hints or tips? Thank you
open INPUT, "< dates.txt" or die "Can't open input file: $!";
while (my $line = <INPUT>){
if ($line =~ /(\d\d)\/(\d\d)\/([5-9][0-9])/g){
print "$&\n" ;
}
}
A few points about your code
You must always use strict and use warnings 'all' at the top of all your Perl programs
You should prefer lexical file handles and the three-parameter form of open
If your regex pattern contains literal slashes then it is clearest to use a non-standard delimiter so that they don't need to be escaped
Although recent releases of Perl have fixed the issue, there used to be a significant performance hit when using $&, so it is best to avoid it, at least for now. Put capturing parentheses around the whole pattern and use $1 instead
This program will do as you ask
use strict;
use warnings 'all';
open my $fh, '<', 'dates.txt' or die "Can't open input file: $!";
while ( <$fh> ) {
print $1, "\n" while m{(\d\d/\d\d/[5-9][0-9])}g
}
output
10/10/57
10/09/66
You are printing $& which gets updated whenever any new match is encountered.
But in this case you need to store the all the previous matches and the updated one too, so you can use array for storing all the matches.
while(<$fh>) {
#dates = $_ =~ /(\d\d)\/(\d\d)\/([5-9][0-9])/g;
print "#dates\n" if(#dates);
}
You just need to change the 'if' to a 'while' and the regex will take up where it left off;
open INPUT, "< a.dat" or die "Can't open input file: $!";
while (my $line = <INPUT>){
while ($line =~ /(\d\d)\/(\d\d)\/([5-9][0-9])/g){
print "$&\n" ;
}
}
# Output given line above
# 10/10/57
# 10/09/66
You could also capture the whole of the date into one capture variable and use a different regex delimiter to save escaping the slashes:
while ($line =~ m|(\d\d/\d\d/[5-9]\d)|g) {
print "$1\n" ;
}
...but that's a matter of taste, perhaps.
You can use map also to get year range 50 to 99 and store in array
open INPUT, "< dates.txt" or die "Can't open input file: $!";
#as = map{$_ =~ m/\d\d\/\d\d\/[5-9][0-9]/g} <INPUT>;
$, = "\n";
print #as;
Another way around it is removing the dates you don't want.
$line =~ s/\d\d\/\d\d\/[0-4]\d//g;
print $line;

Perl Regex not finding pattern within script

I'm reading the contents of a log file, performing a regex on the lines and putting the results in an array, but for some reason there is no output.
use strict;
use warnings;
my $LOGFILE = "log.file";
my $OUTFILE = "out.file";
open(LOG, "$LOGFILE") or die ("Could not open $LOGFILE: $!\n");
open(TMP, ">", "$OUTFILE") or die ("Could not open $OUTFILE: $!\n");
my #data = (<LOG> =~ /<messageBody>(.*?)<\/messageBody>/sg);
print TMP "This is a test line. \n";
foreach (#data){
print "#data\n";
print "\n=======================\n";
}
close TMP;
close LOG;
My output is a file (out.file) and the only content is "This is a test line." I know the regex works because I tried it at the prompt with:
-lne 'BEGIN{undef $/} while (/(.*?)</messageBody>/sg) {print $1} log.file > test.file
What am I doing wrong?
Your data likely spans lines.
You'll therefore need to slurp the entire thing before using your regex:
my $logdata = do {local $/; <LOG>};
my #data = $logdata =~ m{<messageBody>(.*?)</messageBody>}sg;
If you want to print the data's items, then you should do something like this:
foreach $item (#data){
print TMP "$item\n";
print TMP "\n=======================\n";
}
$item will loop through all array items and will be written in TMP file

textpad regular expressions

Please give me some advice on removing newline characters before alphabets and ignoring the lines starting with >.
eg:
>gi|16802049|ref|NP_463534.1| chromosomal replication initiation protein [Listeria monocytogenes EGD-e]
MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT
GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA
KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL
LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR
IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS
QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH
EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF
>gi|16802050|ref|NP_463535.1| DNA polymerase III subunit beta [Listeria monocytogenes EGD-e]
MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE
SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN
VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE
LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL
LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS
FSGTMRPFVLRPKDAANPNEILQLITPVRTY
should come in a straight line and while the newline before lines starting with '>' should not be removed. I tried
\n^[a-z]
but it also removes the first alphabet of each line. Is it possible for it to do the same without removing the first alphabet of each line and ignore lines starting with '>'. thax in advance. Iam looking for a code for textpad.
You can use this regex
[\r\n]+(?=[a-zA-Z])
and replace it with empty string
OR
[\r\n]+([a-zA-Z])
and replace it with \1 or $1 whichever works
I have solved this by using regular expressions in perl. for anyone who needs something like this in the future
use warnings;
print "Please enter the name of the file\n";
my $n =<STDIN>;
print "Please enter the name of the output file\n";
my $n1=<STDIN>;
open(INFO,"$n") or die "cannot open";
#a = <INFO>;
#print #a;
foreach(#a)
{
$_ =~ s/\n//g;
$_ =~ s/>/\n>/g;
}
#print #a;
open (MYFILE, ">$n1");
print MYFILE #a;
close(MYFILE);
close(INFO);
It's extremely simple.

Perl regex: How to find in a file a word typed by a user

I am writing a script to read a LOG file. I want the user to type a word and then look it up and print the line (from a string) matching the word.
I'm just learning Perl so please be very specific and simple so that I can understand it.
print "Please Enter the word to find: ";
chomp ($userInput = <STDIN>);
while ($line = <INPUT>)
if ($line =~ /userInput/)
print $line;
I know that this is not perfect but I'm just learning.
You were close. You need to expand the variable in the pattern match.
print "Please Enter the word to find: ";
chomp ($userInput = <STDIN>);
while ($line = <INPUT>) {
if ($line =~ /$userInput/) { # note extra dollar sign
print $line;
}
}
Be aware that that is a pattern match, so you are searching with a string that potentially contains wildcards in it. If you want a literal string, put a \Q in front of the variable as you interpolate it: /\Q$userInput/.
Something like .\bWORD\b. might work (thou it is not tested)
print $line if ($line =~ /.*\bWORD\b/)
#NewLearner
\b is for word boundaries
http://www.regular-expressions.info/wordboundaries.html
If you're doing just one loopup, using a while loop is fine. Though of course you'll need to fix your syntax.
You could also use grep:
print grep /$userInput/, <INPUT>;
If you want to do multiple lookups, you can either reopen the file handle (if the file is large), or store it in an array:
print grep /$userInput/, #array;
You'll have meta characters in your input, of course. This can be a good thing, or bad, depending on your users. For example, an experienced user would recognize the option to refine his search by entering a search term such as ^foo(?=bar), whereas other people may get very confused when they can't find the string foo+bar.
A way to escape meta characters is by using quotemeta on your input. Another is to use \Q ... \E inside your regex.
$userInput = quotemeta($userInput);
# or
print grep /\Q$userInput\E/, <INPUT>;
I believe if I were you, I would use a subroutine for the lookup. That way you can perform as many lookups as you like rather handily.
use strict;
use warnings; # ALWAYS use these
print "Please Enter the word to find: ";
chomp (my $userInput = <>); # <> is a more flexible handle
print lookup($userInput);
sub lookup {
my $word = shift;
open my $fh, "<", $inputfile or die $!;
my #hits;
while (<$fh>) {
push #hits, $_ if /\Q$word\E/;
}
return #hits;
}