I'm trying to make an on-the-fly pattern tester in Perl.
Basically it asks you to enter the pattern, and then gives you a >>>> prompt where you enter possible matches. If it matches it says "%%%% before matched part after match" and if not it says "%%%! string that didn't match". It's trivial to do like this:
while(<>){
chomp;
if(/$pattern/){
...
} else {
...
}
}
but I want to be able to enter the pattern like /sometext/i rather than just sometext
I think I'd use an eval block for this? How would I do such a thing?
This sounds like a job for string eval, just remember not to eval untrusted strings.
#!/usr/bin/perl
use strict;
use warnings;
my $regex = <>;
$regex = eval "qr$regex" or die $#;
while (<>) {
print /$regex/ ? "matched" : "didn't match", "\n";
}
Here is an example run:
perl x.pl
/foo/i
foo
matched
Foo
matched
bar
didn't match
^C
You can write /(?i:<pattern>)/ instead of /<pattern>/i.
This works for me:
my $foo = "My bonnie lies over the ocean";
print "Enter a pattern:\n";
while (<STDIN>) {
my $pattern = $_;
if (not ($pattern =~ /^\/.*\/[a-z]?$/)) {
print "Invalid pattern\n";
} else {
my $x = eval "if (\$foo =~ $pattern) { return 1; } else { return 0; }";
if ($x == 1) {
print "Pattern match\n";
} else {
print "Not a pattern match\n";
}
}
print "Enter a pattern:\n"
}
Related
I am trying to write a perl script that get all strings that is does not start and end with a single quote. And a string cannot be a part of comment # and each line in DATA is not necessary at the beginning of a line.
use warnings;
use strict;
my $file;
{
local $/ = undef;
$file = <DATA>;
};
my #strings = $file =~ /(?:[^']).*(?:[^'])/g;
print join ("\n",#strings);
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
I am getting no where with this regex.
The expected output is
"This is a string2"
"This is comment syntax #"
"This is string 4"
Obviously this is only an exercise, as there are been many students asking about this problem lately. Regex's will only ever get you part of the way there, as there will pretty much always be edge cases.
The following code is probably good enough for your purposes, but it doesn't even successfully parse itself because of quotes inside a qr{}. You'll have to figure out how to get strings that span lines to work on your own:
use strict;
use warnings;
my $doublequote_re = qr{"(?: (?> [^\\"]+ ) | \\. )*"}x;
my $singlequote_re = qr{'(?: (?> [^\\']+ ) | \\. )*'}x;
my $data = do { local $/; <DATA> };
while ($data =~ m{(#.*|$singlequote_re|$doublequote_re)}g) {
my $match = $1;
if ($match =~ /^#/) {
print "Comment - $match\n";
} elsif ($match =~ /^"/) {
print "Double quote - $match\n";
} elsif ($match =~ /^'/) {
print "Single quote - $match\n";
} else {
die "Carp! something went wrong! <$match>";
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Do not know how to achieve that by using regular expression, so here is a simple hand-written lexer:
#!/usr/bin/perl
use strict;
use warnings;
sub extract_string {
my #buf = split //, shift;
while (my $peer = shift #buf) {
if ($peer eq '"') {
my $str = "$peer";
while ($peer = shift #buf) {
$str .= "$peer";
last if $peer eq '"';
}
if ($peer) {
return ($str, join '', #buf);
}
else {
return ("", "");
}
}
elsif ($peer eq '#') {
return ("", "");
}
}
}
my ($str, $buf);
while ($buf = <DATA>) {
chomp $buf;
while (1) {
($str, $buf) = extract_string $buf;
print "$str\n" if $str;
last unless $buf;
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Another option is using Perl module such as PPI.
I was trying to get the number from a string. The number can be pure digits e.g. 12334 or can be separated with underscore 12_345
I was trying with the below code but was unable to get anything from it.
my $string = "this is a 141_153_923 number : $_123_456";
if ($string =~ /\b\d*(?:\d+\_?\d+)*\d*\b/) {
print "$&\n";
}
expected output is 141_153_923
I have also tried with string 141_153_923 and it it still not returning anything even with
$string =~ /\b\d\b/
on the string 141_153_923
I hope you have the variable $_123_456 is declared in your Perl code. Otherwise you'll get an warning.
Now the regex. Try with this one:
if ($string =~ /\b(\d+(?:_\d+)*)\b/) {
Try this regex: /((?:\d+\_?)+)/.
...
my $string = "this is a 141_153_923 number : \$_123_456";
my $num;
if (($num) = $string =~ /((?:\d+\_?)+)/) {
print "first: $num\n";
}
$string = "this is a 141153923 number : \$_123_456";
if (($num) = $string =~ /((?:\d+\_?)+)/) {
print "second: $num\n";
}
...
output:
first: 141_153_923
second: 141153923
I want to search the lines of a file to see if any of them match one of a set of regexs.
something like this:
my #regs = (qr/a/, qr/b/, qr/c/);
foreach my $line (<ARGV>) {
foreach my $reg (#regs) {
if ($line =~ /$reg/) {
printf("matched %s\n", $reg);
}
}
}
but this can be slow.
it seems like the regex compiler could help. Is there an optimization like this:
my $master_reg = join("|", #regs); # this is wrong syntax. what's the right way?
foreach my $line (<ARGV>) {
$line =~ /$master_reg/;
my $matched = special_function();
printf("matched the %sth reg: %s\n", $matched, $regs[$matched]
}
}
where 'special_function' is the special sauce telling me which portion of the regex was matched.
Use capturing parentheses. Basic idea looks like this:
my #matches = $foo =~ /(one)|(two)|(three)/;
defined $matches[0]
and print "Matched 'one'\n";
defined $matches[1]
and print "Matched 'two'\n";
defined $matches[2]
and print "Matched 'three'\n";
Add capturing groups:
"pear" =~ /(a)|(b)|(c)/;
if (defined $1) {
print "Matched a\n";
} elsif (defined $2) {
print "Matched b\n";
} elsif (defined $3) {
print "Matched c\n";
} else {
print "No match\n";
}
Obviously in this simple example you could have used /(a|b|c)/ just as well and just printed $1, but when 'a', 'b', and 'c' can be arbitrarily complex expressions this is a win.
If you're building up the regex programmatically you might find it painful to have to use the numbered variables, so instead of breaking strictness, look in the #- or #+ arrays instead, which contain offsets for each match position. $-[0] is always set as long as the pattern matched at all, but higher $-[$n] will only contain defined values if the nth capturing group matched.
How can I find the number of times a word is in a block of text in Perl?
For example my text file is this:
#! /usr/bin/perl -w
# The 'terrible' program - a poorly formatted 'oddeven'.
use constant HOWMANY => 4; $count = 0;
while ( $count < HOWMANY ) {
$count++;
if ( $count == 1 ) {
print "odd\n";
} elsif ( $count == 2 ) {
print "even\n";
} elsif ( $count == 3 ) {
print "odd\n";
} else { # at this point $count is four.
print "even\n";
}
}
I want to find the number of "count" word for that text file. File is named terrible.pl
Idealy it should use regex and with minimum number of line of code.
EDIT: This is what I have tried:
use IO::File;
my $fh = IO::File->new('terrible.pl', 'r') or die "$!\n";
my %words;
while (<$fh>) {
for my $word ($text =~ /count/g) {
print "x";
$words{$word}++;
}
}
print $words{$word};
Here's a complete solution. If this is homework, you learn more by explaining this to your teacher than by rolling your own:
perl -0777ne "print+(##=/count/g)+0" terrible.pl
If you are trying to count how many times appears the word "count", this will work:
my $count=0;
open(INPUT,"<terrible.pl");
while (<INPUT>) {
$count++ while ($_ =~ /count/g);
}
close(INPUT);
print "$count times\n";
I'm not actually sure what your example code is but you're almost there:
perl -e '$text = "lol wut foo wut bar wut"; $count = 0; $count++ while $text =~ /wut/g; print "$count\n";'
You can use the /g modifier to continue searching the string for matches. In the example above, it will return all instances of the word 'wut' in the $text var.
You can probably use something like so:
my $fh = IO::File->new('test.txt', 'r') or die "$!\n";
my %words;
while (<$fh>) {
for my $word (split / /) {
$words{$word}++;
}
}
That will give you an accurate count of every "word" (defined as a group of characters separated by a space), and store it in a hash which is keyed by the word with a value of the number of the word which was seen.
perdoc perlrequick has an answer. The term you want in that document is "scalar context".
Given that this appears to be a homework question, I'll point you at the documentation instead.
So, what are you trying to do? You want the number of times something appears in a block of text. You can use the Perl grep function. That will go through a block of text without needing to loop.
If you want an odd/even return value, you can use the modulo arithmetic function. You can do something like this:
if ($number % 2) {
print "$number is odd\n"; #Returns a "1" or true
}
else {
print "$number is even\n"; #Returns a "0" or false
}
I'm trying to grab all NA phone numbers from a CSV file. The numbers can appear anywhere in each line and each line can also have multiple numbers (separated by commas). The regex I've come up with does work, at least it grabs the first phone number in the line. But despite using the "/g" flag it won't grab any of the other phone numbers. Can anyone suggest what might be wrong with my code?
#!/usr/bin/perl
use warnings;
use diagnostics;
use strict;
my $data_file = "test.csv";
open my $FH, "<", $data_file || die "cannot open file\n";
my #lines = <$FH>;
while (#lines) {
if ((shift #lines) =~ /((\(\d{3}\)\s+|\d{3}-?|\d{3}\.?)(\d{3}-?|\d{3}\.?)\d{4})/g) {
print "$1\n";
} else {
print "No match\n";
}
}
$1 is a scalar, and thus cannot contain multiple matches. You might want to try something like this:
my #matches = ((shift #lines) =~ /((?:\(\d{3}\)\s+|\d{3}-?|\d{3}\.?)(?:\d{3}-?|\d{3}\.?)\d{4})/g);
if (#matches) {
print join("\n", #matches)."\n";
} else {
print "No match\n";
}
Or you could try something like this:
my $line = shift #lines;
if ($line =~ /((\(\d{3}\)\s+|\d{3}-?|\d{3}\.?)(\d{3}-?|\d{3}\.?)\d{4})/) {
while ($line =~ /((\(\d{3}\)\s+|\d{3}-?|\d{3}\.?)(\d{3}-?|\d{3}\.?)\d{4})/g) {
print "$1\n";
}
} else {
print "No match\n";
}