How to use an user input as a regex? - regex

I have a simple program where the user can enter a string.
After this the user can enter a regex. I need the string to be compared against this regex.
The following code do not work - the regex always fails.
And I know that its maybe because I am comparing a string with a string and not a string with a regex.
But how would you do this?
while(1){
print "Enter a string: ";
$input = <>;
print "\nEnter a regex and see if it matches the string: ";
$regex = <>;
if($input =~ $regex){
print "\nThe regex $regex matched the string $input\n\n";
}
}

Use lexical variables instead of global ones.
You should remember that strings read by <> usually contain newlines, so it might be necessary to remove the newlines with chomp, like this:
chomp(my $input = <STDIN>);
chomp(my $regex = <STDIN>);
You might want to interpret regex special characters taken from the user literally, so that ^ will match a literal circumflex, not the beginning of the string, for example. If so, use the \Q escape sequence:
if ($input =~ /\Q$regex\E/) { ... }
Don't forget to read the Perl FAQ in your journey through Perl. It might have all the answers before you even begin to specify the question: How do I match a regular expression that's in a variable?

You need to use a //, m//, or s/// — but you can specify a variable as the pattern.
if ($input =~ /$regex/) {
print "match found\n";
}

I think you need to chomp input and regex variables. and correct the expression to match regex
chomp( $input );
chomp( $regex );
if($input =~ /$regex/){
print "\nThe regex $regex matched the string $input\n\n";
}

Related

Replacing a single character in a perl regex match

How can I replace the 6th "_" that appears in the regex match?
Here is the literal input to be searched. It is not representing a path to the input:
/Users/rob/Documents/Test/m160505_031746_42156_c100980652550000001823221307061611_s1_p0_30_0_59.fsa
Here is my code, which parses out what I need. I just now need to replace the last matched "_" with a "/":
#!/usr/bin/perl
use strict;
use warnings;
open(IN, '<', '/Users/roblogan/Test_Database.txt') or die $!;
open(OUT, '>', '/Users/roblogan/Test_Output.txt') or die $!;
while (my $line = <IN>){
if ($line =~ m/(m160505_031746_42156_c100980652550000001823221307061611_s1_p0_[0-9]*)/){
print OUT $1, "\n";
}
}
Current output:
m160505_031746_42156_c100980652550000001823221307061611_s1_p0_30
Desired output:
m160505_031746_42156_c100980652550000001823221307061611_s1_p0/30
I have tried:
if ($line =~ s/(m160505_031746_42156_c100980652550000001823221307061611_s1_p0_[0-9]*)/(m160505_031746_42156_c100980652550000001823221307061611_s1_p0\/[0-9]*)/){
Any help would be appreciated.
This Perl code will do what I think you need, determined from your subject line and example output
It finds the sixth occurrence of an underscore in the target string and, if that underscore is followed by decimal digits, it changes the underscore to a slash and removes everything following the digits
I have used the pipe character | as the delimiter for the substitute operator s/// to avoid the need to escape forward slashes
use strict;
use warnings 'all';
my $path = q{/Users/rob/Documents/Test/m160505_031746_42156_c100980652550000001823221307061611_s1_p0_30_0_59.fsa};
$path =~ s|^(?:[^_]*_){5}[^_]*\K_(\d+).*|/$1|s;
print $path, "\n";
output
/Users/rob/Documents/Test/m160505_031746_42156_c100980652550000001823221307061611_s1_p0/30
From your description, the easiest way is:
$line =~ s!(m160505_031746_42156_c100980652550000001823221307061611_s1_p‌​‌​0)_!$1/!
I've chosen ! as the delimiter because / is used in the replacement part.
$1 is a variable containing the text matched by the first ( ) group in the regex (I didn't want to repeat the whole thing twice).
The final _ is not included in $1 (it's outside of the parens); instead we put / in the replacement part.
See perldoc perlretut for more information.

How to find all the words that begin with a|b and end with a|b. (Ex: “adverb” and “balalaika”)

The following perl program has a regex written to serve my purpose. But, this captures results present within a string too. How can I only get strings separated by spaces/newlines/tabs?
The test data I used is present below:
http://sainikhil.me/stackoverflow/dictionaryWords.txt
use strict;
use warnings;
sub print_a_b {
my $file = shift;
$pattern = qr/(a|b|A|B)\S*(a|b|A|B)/;
open my $fp, $file;
my $cnt = 0;
while(my $line = <$fp>) {
if($line =~ $pattern) {
print $line;
$cnt = $cnt+1;
}
}
print $cnt;
}
print_a_b #ARGV;
You could consider using an anchor like \b: word boundary
That would help apply the regexp only after and before a word.
\b(a|b|A|B)\S*(a|b|A|B)\b
Simpler, as Avinash Raj adds in the comments:
(?i)\b[ab]\S*[ab]\b
(using the case insensitive flag or modifier)
If you have multiple words in the same line then you can use word boundaries in a regex like this:
(?i)\b[ab][a-z]*[ab]\b
The pattern code is:
$pattern = /\b[ab][a-z]*[ab]\b/i;
However, if you want to check for lines with only has a word, then you can use:
(?i)$[ab][a-z]*[ab]$
Update: for your comment * lines that begin and end with the same character*, you can use this regex:
(?i)\b([a-z])[a-z]*\1\b
But if you want any character and not letters only like above you can use:
(?i)\b(.)[a-z]*\1\b

regular expression that matches any word that starts with pre and ends in al

The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i

Regex Word Boundary in Perl not yield expected results

So I'm having an issue with pulling data from a string between 2 keywords. I understand that in regex I'm suppose to use the \b boundary tags and I've written the following for a test example, however it seems to only match the whole string instead of just the portion I want.
For example, the string: "here are more string words START OF INFORMATION SECTION some other stuff"
I am gathering text between "START" and "SECTION".
So I'm expecting "START OF INFORMATION SECTION", I believe.
This is the following snippet I have written in Perl specifically, but it doesn't yield the results I expected.
#!/usr/bin/perl
# This is perl 5, version 22, subversion 1 (v5.22.1) built for cygwin-thread-multi
use POSIX;
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
# this should provide me with the specific text between my two boundary words
$text =~ /\bSTART\b(.*?)\bSECTION\b/;
print "New String: $text\n";
Your code is simply testing whether the regex pattern matches the string, returning a true or false value to indicate whether there was a match. You discard that indicator
If there was a match then the strings captured using parentheses in the regex pattern will be assigned to the capture variables $1, $2 etc.
It's unclear what you need to do, but this program prints everything between START and SECTION: in this case OF INFORMATION
There's no need for use POSIX, but use strict and use warnings 'all' are essential
#!/usr/bin/perl
use strict;
use warnings 'all';
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
output
Original String: here are more string words START OF INFORMATION SECTION some other stuff
New String: OF INFORMATION
You should use this
$text =~ /\b(START\b(.*?)\bSECTION)\b/;
print "New String: $1\n";
IDEONE DEMO
$1 is the first captured group.
As suggested by borodin
if ( $text =~ /\b(START\b(.*?)\bSECTION)\b/ ) {
my $tmp = $1;
print "New String: $tmp\n";
}
The match operator doesn't change the string it matches.
You can use either of the following to inspect the captured string:
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
or
if ( my ($section) = $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
print "New String: $section\n";
}

Perl regex: How to find in a file a word typed by a user

I am writing a script to read a LOG file. I want the user to type a word and then look it up and print the line (from a string) matching the word.
I'm just learning Perl so please be very specific and simple so that I can understand it.
print "Please Enter the word to find: ";
chomp ($userInput = <STDIN>);
while ($line = <INPUT>)
if ($line =~ /userInput/)
print $line;
I know that this is not perfect but I'm just learning.
You were close. You need to expand the variable in the pattern match.
print "Please Enter the word to find: ";
chomp ($userInput = <STDIN>);
while ($line = <INPUT>) {
if ($line =~ /$userInput/) { # note extra dollar sign
print $line;
}
}
Be aware that that is a pattern match, so you are searching with a string that potentially contains wildcards in it. If you want a literal string, put a \Q in front of the variable as you interpolate it: /\Q$userInput/.
Something like .\bWORD\b. might work (thou it is not tested)
print $line if ($line =~ /.*\bWORD\b/)
#NewLearner
\b is for word boundaries
http://www.regular-expressions.info/wordboundaries.html
If you're doing just one loopup, using a while loop is fine. Though of course you'll need to fix your syntax.
You could also use grep:
print grep /$userInput/, <INPUT>;
If you want to do multiple lookups, you can either reopen the file handle (if the file is large), or store it in an array:
print grep /$userInput/, #array;
You'll have meta characters in your input, of course. This can be a good thing, or bad, depending on your users. For example, an experienced user would recognize the option to refine his search by entering a search term such as ^foo(?=bar), whereas other people may get very confused when they can't find the string foo+bar.
A way to escape meta characters is by using quotemeta on your input. Another is to use \Q ... \E inside your regex.
$userInput = quotemeta($userInput);
# or
print grep /\Q$userInput\E/, <INPUT>;
I believe if I were you, I would use a subroutine for the lookup. That way you can perform as many lookups as you like rather handily.
use strict;
use warnings; # ALWAYS use these
print "Please Enter the word to find: ";
chomp (my $userInput = <>); # <> is a more flexible handle
print lookup($userInput);
sub lookup {
my $word = shift;
open my $fh, "<", $inputfile or die $!;
my #hits;
while (<$fh>) {
push #hits, $_ if /\Q$word\E/;
}
return #hits;
}