Perl if-statements / regular expressions - regex

I'm not entirely sure why my if-statements are not validating user input. Here's my code.
The statements that contain regular expressions are supposed to allow leading, and trailing white space.
sub Menu
{
&processlist;
&creating_Refs;
print "[Sort by COLUMN|sortup|sortdown| quit]:";
my $user_input = <STDIN>;
chomp($user_input);
if($user_input =~ m/[quit\s]/)
{
exit;
}
elsif($user_input eq 'sortup')
{
print "working bro\n\n";
#$VAR1 = sort sortup #$VAR1;
foreach my $ref (#$VAR1)
{
print "$ref->{PID}, $ref->{USER}, $ref->{PR}, $ref->{NI}, $ref->{VIRT}, $ref->{RES}, $ref->{SHR}, $ref->{S}, $ref->{CPU}, $ref->{MEM}, $ref->{TIME}, $ref->{COMMAND} \n";
}
}
elsif($user_input eq 'sortdown \n')
{
print "working on sortdown\n\n";
}
elsif($user_input =~ m/[sort by]+\w/)
{
}
else
{
print "Error, please re-enter command \n\n";
&Menu;
}
}

A character class like [abcd] allows any one of the characters specified in the square brackets. When you say [sort by], it is equivalent to /s|o|r|t| |b|y/, which will match any one of those characters, only once. If you want to match sort by, use /sort by/.
And in your case:
if($user_input =~ m/quit/){
exit;
}
and to match exact words use word boundaries:
if($user_input =~ m/\bquit\b/){
exit;
}

if($user_input =~ m/quit/){
exit;
}
Also chomp removes trailing \n
So:
elsif($user_input eq 'sortdown \n')
Will never be true.

Related

Perl read file and extract price

A file.txt file contains the string "hello the apple cost 10.99 today"
How can I extract the 10.99 only?
This is what I have so far:
open READFILE, ("<file.txt");
while (<READFILE>)
{
if ($a = $_ =~ m/\d\.\d/)
{
print "$a\n";
}
}
However, my output shows 1 instead of 10.99. Can you please tell me what's wrong?
The 1 you were getting was indicating that there was a match. You need brackets to capture the match.
open READFILE, "<file.txt";
while (<READFILE>)
{
if (m/([\d\.]+)/)
{
my $price = $1;
print "price = $price\n";
}
}
open READFILE, "file.txt";
while (<READFILE>)
{
if ($_ =~ /(\d+\.\d+)/)
{
print "$1\n";
}
}
You are good to go
Change your code to this,
while (my $a = <DATA>) {
if ($a =~ s/.*?(\d+\.\d+).*/$1/)
{
print "$a\n";
}
}
s/regex/replacement/modifiers
s/.*?(\d+\.\d+).*/$1/ All characters are matched except the decimal point number. This number was captured by a capturing group. Replacing all the chars with the chars inside group index 1 will give only the decimal point number. That particular number was assigned to the variable a.
OR
while (my $a = <DATA>) {
if ($a =~ m/.*?(\d+\.\d+).*/)
{
print "$1\n";
}
}
The regex .*?(\d+\.\d+).* was matched against the input string and on successful match, decimal point number from the input string was captured. By printing the group index 1 will give you the stored decimal point number.

In Perl, how can I remove all spaces that are not inside double quotes " "?

I'm tying to come up with some regex that will remove all space chars from a string as long as it's not inside of double quotes (").
Example string:
some string with "text in quotes"
Result:
somestringwith"text in quotes"
So far I've come up with something like this:
$str =~ /"[^"]+"|/g;
But it doesn't seem to be giving the intended result.
I'm honestly very new at perl and haven't had too much regexp experience. So if anyone willing to answer would also be willing to provide some insight into the why and how that would be great!
Thanks!
EDIT
String will not contain escaped "'s
It should actually always be formatted like this:
Some.String = "Some Value"
Result would be
Some.String="Some Value"
Here is a technique using split to separate the quoted strings. It relies on your data being consistent and will not work with loose quotes.
use strict;
use warnings;
my #line = split /("[^"]*")/;
for (#line) {
unless (/^"/) {
s/[ \t]+//g;
}
}
print #line; # line is altered
Basically, you split up the string in order to isolate the quoted strings. Once that is done, perform the substitution on all other strings. Since the array elements are aliased in the loop, substitutions are performed on the actual array.
You can run this script like so:
perl -n script.pl inputfile
To see the output. Or
perl -n -i.bak script.pl inputfile
To do in-place edit on inputfile, while saving backup in inputfile.bak.
With that said, I'm not sure what your edit means. Do you want to change
Some.String = "Some Value"
to
Some.String="Some Value"
Text::ParseWords is tailor-made for this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
my #strings = (
q{This.string = "Hello World"},
q{That " string " and "another shoutout to my bytes"},
);
for my $s ( #strings ) {
my #words = quotewords '\s+', 1, $s;
print join('', #words), "\n";
}
Output:
This.string="Hello World"
That" string "and"another shoutout to my bytes"
Using Text::ParseWords means if you ever had to deal with quoted strings with escaped quotation marks in them, you'd be ready ;-)
Also, this sounds like you have a configuration file of some sort and you're trying to parse it. If that is the case, there are probably better solutions.
I suggest removing the quoted substrings using split and then recombining them with join after removing whitespace from the intermediate text.
Note that if the regex used for split contains captures then the captured values will also be included in the list returned.
Here's some sample code.
use strict;
use warnings;
my $source = <<END;
Some.String = "Some Value";
Other.String = "Other Value";
Last.String = "Last Value";
END
print join '', map {s/\s+// unless /"/; $_; } split /("[^"]*")/, $source;
output
Some.String= "Some Value";Other.String = "Other Value";Last.String = "Last Value";
I would simply loop through the string char by char. This way you can handle escaped strings too (just add an isEscaped variable).
my $text='lala "some thing with quotes " lala ... ';
my $quoteOpen = 0;
my $out;
foreach $char(split//,$text) {
if ($char eq "\"" && $quoteOpen==0) {
$quoteOpen = 1;
$out .= $char;
} elsif ($char eq "\"" && $quoteOpen==1) {
$quoteOpen = 0;
$out .= $char;
} elsif ($char =~ /\s/ && $quoteOpen==1) {
$out .= $char;
} elsif ($char !~ /\s/) {
$out .= $char;
}
}
print "$out\n";
Splitting on double quotes, removing spaces only from even fields (i.e. those in quotes):
sub remove_spaces {
my $string = shift;
my #fields = split /"/, $string . ' '; # trailing space needed to keep final " in output
my $flag = 1;
return join '"', map { s/ +//g if $flag; $flag = ! $flag; $_} #fields;
}
It can be done with regex:
s/([^ ]*|\"[^\"]*\") */$1/g
Note that this won't handle any kind of escapes inside the quotes.

How can I know which portion of a Perl regex is matched by a string?

I want to search the lines of a file to see if any of them match one of a set of regexs.
something like this:
my #regs = (qr/a/, qr/b/, qr/c/);
foreach my $line (<ARGV>) {
foreach my $reg (#regs) {
if ($line =~ /$reg/) {
printf("matched %s\n", $reg);
}
}
}
but this can be slow.
it seems like the regex compiler could help. Is there an optimization like this:
my $master_reg = join("|", #regs); # this is wrong syntax. what's the right way?
foreach my $line (<ARGV>) {
$line =~ /$master_reg/;
my $matched = special_function();
printf("matched the %sth reg: %s\n", $matched, $regs[$matched]
}
}
where 'special_function' is the special sauce telling me which portion of the regex was matched.
Use capturing parentheses. Basic idea looks like this:
my #matches = $foo =~ /(one)|(two)|(three)/;
defined $matches[0]
and print "Matched 'one'\n";
defined $matches[1]
and print "Matched 'two'\n";
defined $matches[2]
and print "Matched 'three'\n";
Add capturing groups:
"pear" =~ /(a)|(b)|(c)/;
if (defined $1) {
print "Matched a\n";
} elsif (defined $2) {
print "Matched b\n";
} elsif (defined $3) {
print "Matched c\n";
} else {
print "No match\n";
}
Obviously in this simple example you could have used /(a|b|c)/ just as well and just printed $1, but when 'a', 'b', and 'c' can be arbitrarily complex expressions this is a win.
If you're building up the regex programmatically you might find it painful to have to use the numbered variables, so instead of breaking strictness, look in the #- or #+ arrays instead, which contain offsets for each match position. $-[0] is always set as long as the pattern matched at all, but higher $-[$n] will only contain defined values if the nth capturing group matched.

How to determine number of times a word appears in text?

How can I find the number of times a word is in a block of text in Perl?
For example my text file is this:
#! /usr/bin/perl -w
# The 'terrible' program - a poorly formatted 'oddeven'.
use constant HOWMANY => 4; $count = 0;
while ( $count < HOWMANY ) {
$count++;
if ( $count == 1 ) {
print "odd\n";
} elsif ( $count == 2 ) {
print "even\n";
} elsif ( $count == 3 ) {
print "odd\n";
} else { # at this point $count is four.
print "even\n";
}
}
I want to find the number of "count" word for that text file. File is named terrible.pl
Idealy it should use regex and with minimum number of line of code.
EDIT: This is what I have tried:
use IO::File;
my $fh = IO::File->new('terrible.pl', 'r') or die "$!\n";
my %words;
while (<$fh>) {
for my $word ($text =~ /count/g) {
print "x";
$words{$word}++;
}
}
print $words{$word};
Here's a complete solution. If this is homework, you learn more by explaining this to your teacher than by rolling your own:
perl -0777ne "print+(##=/count/g)+0" terrible.pl
If you are trying to count how many times appears the word "count", this will work:
my $count=0;
open(INPUT,"<terrible.pl");
while (<INPUT>) {
$count++ while ($_ =~ /count/g);
}
close(INPUT);
print "$count times\n";
I'm not actually sure what your example code is but you're almost there:
perl -e '$text = "lol wut foo wut bar wut"; $count = 0; $count++ while $text =~ /wut/g; print "$count\n";'
You can use the /g modifier to continue searching the string for matches. In the example above, it will return all instances of the word 'wut' in the $text var.
You can probably use something like so:
my $fh = IO::File->new('test.txt', 'r') or die "$!\n";
my %words;
while (<$fh>) {
for my $word (split / /) {
$words{$word}++;
}
}
That will give you an accurate count of every "word" (defined as a group of characters separated by a space), and store it in a hash which is keyed by the word with a value of the number of the word which was seen.
perdoc perlrequick has an answer. The term you want in that document is "scalar context".
Given that this appears to be a homework question, I'll point you at the documentation instead.
So, what are you trying to do? You want the number of times something appears in a block of text. You can use the Perl grep function. That will go through a block of text without needing to loop.
If you want an odd/even return value, you can use the modulo arithmetic function. You can do something like this:
if ($number % 2) {
print "$number is odd\n"; #Returns a "1" or true
}
else {
print "$number is even\n"; #Returns a "0" or false
}

evaluating a user-defined regex from STDIN in Perl

I'm trying to make an on-the-fly pattern tester in Perl.
Basically it asks you to enter the pattern, and then gives you a >>>> prompt where you enter possible matches. If it matches it says "%%%% before matched part after match" and if not it says "%%%! string that didn't match". It's trivial to do like this:
while(<>){
chomp;
if(/$pattern/){
...
} else {
...
}
}
but I want to be able to enter the pattern like /sometext/i rather than just sometext
I think I'd use an eval block for this? How would I do such a thing?
This sounds like a job for string eval, just remember not to eval untrusted strings.
#!/usr/bin/perl
use strict;
use warnings;
my $regex = <>;
$regex = eval "qr$regex" or die $#;
while (<>) {
print /$regex/ ? "matched" : "didn't match", "\n";
}
Here is an example run:
perl x.pl
/foo/i
foo
matched
Foo
matched
bar
didn't match
^C
You can write /(?i:<pattern>)/ instead of /<pattern>/i.
This works for me:
my $foo = "My bonnie lies over the ocean";
print "Enter a pattern:\n";
while (<STDIN>) {
my $pattern = $_;
if (not ($pattern =~ /^\/.*\/[a-z]?$/)) {
print "Invalid pattern\n";
} else {
my $x = eval "if (\$foo =~ $pattern) { return 1; } else { return 0; }";
if ($x == 1) {
print "Pattern match\n";
} else {
print "Not a pattern match\n";
}
}
print "Enter a pattern:\n"
}