perl regex match with numbers - regex

I am trying to match the following (the below is in an array called #abc):
goo foo tool: 1.2.1 (a3 change: 234342 # 2014/02/19 14:20:27)
with
my $match = "goo foo tool: (\d+)\.(\d+)\.(\d+) \(a3 change: \d+ # #DATE# #TIME#\)";
and in my code,
78 foreach (#abc){
79 print "$_\n";
80 if ($_ =~ m/$match/){
81 print "$1\n";
82 } else {
83 print "not matched\n";
84 }
85 }
I am not seeing why it is printing "not matched\n";
anyone else sees why?

The #DATE# and #TIME# string constants aren't goingt to match the dates you have. Simply adjust to regex to actually match those values:
my $match = "goo foo tool: (\d+)\.(\d+)\.(\d+) \(a3 change: \d+ \# \d+/\d+/\d+ \d+:\d+:\d+\)";

Related

Partial match of strings, operator ( =~ )

I have used " =~ " to compare two strings (The length of two strings is the same.) in my script to allow a don't care condition. If a character is "." in a string, that character is ignored to compare. In other words, it is a partial match case.
comp_test.pl :
#!/usr/bin/perl
use strict;
use warnings;
my $a=".0.0..0..1...0..........0...0......010.1..........";
my $b="10.0..0..1...0..........0...0......010.1..........";
my $c=".0.0..0..1...0..........0...0......010.1..........";
if ($a =~ $b) {
print "a and b same\n";
}
if ($a =~ $c) {
print "a and c same\n";
}
Because of don't care condition by ".", the expected result should be both "a and b same" and "a and c same". However, currently, the result is only "a and c same". Please let me know any good operator or changing "." to "x" may help?
This is not a perl version problem. You are doing a regular expression match. The operand on the left of the =~ is the string and the operand on the right is the regex being applied to it.
This can be used for the kind of partial matching you are doing, given that the strings are the same length and each character of the regular expression matches a character of the string, but only where there is a . on the right. Where there is a 1 or a 0 in the regular expression ($b in the case of $a =~ $b), there must be an exactly matching character in the string ($a), not a ..
To do the kind of partial match you seem to want to do, you can use a bitwise exclusive or, like so:
sub partial_match {
my ($string_1, $string_2) = #_;
return 0 if length($string_1) != length($string_2);
# bitwise exclusive or the two strings together; where there is
# a 0 in one string and a 1 in the other, the result will be "\1".
# count \1's to get the number of mismatches
my $mismatches = ( $string_1 ^ $string_2 ) =~ y/\1//;
return $mismatches == 0;
}
While . matches 1 (or any other character), 1 doesn't match . (or any other character other than 1).
The following is a fast solution. It performs best when most strings match (since it always checks the entire string).
sub is_match { ( ( $_[0] ^ $_[1] ) =~ tr/\x00\x1E\x1F//c ) == 0 }
say is_match($a, $b) ? "match" : "no match";
say is_match($b, $c) ? "match" : "no match";
How it works:
Hex of characters
=================
30 30 31 31 2E 2E "0011.."
30 31 30 31 30 31 "010101"
XOR -----------------
00 01 01 00 1E 1F
^^ ^^ 2 mismatches
This solution even works if one of the strings is shorter than the other (since the XOR will result in 30, 31 or 2E for the extra characters).
The following is a fast solution. It performs best when most strings don't match (since it stops checking as soon as a match is impossible).
sub make_matcher {
my $pat =
join '',
map { $_ eq '.' ? $_ : "[.\Q$_\E]" }
split //, $_[0];
return qr/^$pat\z/;
}
sub is_match { $_[0] =~ make_matcher($_[1]) }
say is_match($a, $b) ? "match" : "no match";
say is_match($b, $c) ? "match" : "no match";
Tested.
From my understanding, you are trying to compare the length of 2 strings.Basically, only the length of the sting need to be compared, not the bit-wise characters.
my $a=".0.0..0..1...0..........0...0......010.1..........";
my $b="10.0..0..1...0..........0...0......010.1..........";
my $c=".0.0..0..1...0..........0...0......010.1..........";
So the code could be:
if(length($a) == length($b))
{
print "match found";
}
else
{
print "No match";
}

Changing time format using regex in perl

I want to read 12h format time from file and replace it with 24 hour
example
this is due at 3:15am -> this is due 15:15
I tried saving variables in regex and manupilate it later but didnt work, I also tried using substitution "/s" but because it is variable I couldnt figure it out
Here is my code:
while (<>) {
my $line = $_;
print ("this is text before: $line \n");
if ($line =~ m/\d:\d{2}pm/g){
print "It is PM! \n";}
elsif ($line =~ m/(\d):(\d\d)am/g){
print "this is try: $line \n";
print "Its AM! \n";}
$line =~ s/($regexp)/<French>$lexicon{$1}<\/French>/g;
print "sample after : $line\n";
}
A simple script can do the work for you
$str="this is due at 3:15pm";
$str=~m/\D+(\d+):\d+(.*)$/;
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12;
$min=$2;
$str=~s/at.*/$hour:$min/g;
print "$str\n";
Gives output as
this is due 15:15
What it does??
$str=~m/\D+(\d+):(\d+)(.*)$/; Tries to match the string with the regex
\D+ matches anything other than digits. Here it matches this is due at
(\d+) matches any number of digits. Here it matches 3. Captured in group 1 , $1 which is the hours
: matches :
(\d+) matches any number of digits. Here it matches 15, which is the minutes
(.*) matches anything follwed, here am . Captures in group 2, `$2
$ anchors the regex at end of
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12; Converts to 24 hour clock. If $2 is pm adds 12 unless it is 12. Also if the time is am and 12 then the hour is 0
$str=~s/at.*/$hour:$min/g; substitutes anything from at to end of string with $hour:$min, which is the time obtained from the ternary operation performed before
#!/usr/bin/env perl
use strict;
use warnings;
my $values = time_12h_to_24h("11:00 PM");
sub time_12h_to_24h
{
my($t12) = #_;
my($hh,$mm,$ampm) = $t12 =~ m/^(\d\d?):(\d\d?)\s*([AP]M?)/i;
$hh = ($hh % 12) + (($ampm =~ m/AM?/i) ? 0 : 12);
return sprintf("%.2d:%.2d", $hh, $mm);
}
I found this code in the bleow link. Please check:
Is my pseudo code for changing time format correct?
Try this it give what you expect
my #data = <DATA>;
foreach my $sm(#data){
if($sm =~/(12)\.\d+(pm)/g){
print "$&\n";
}
elsif($sm =~m/(\d+(\.)?\d+)(pm)/g )
{
print $1+12,"\n";
}
}
__DATA__
Time 3.15am
Time 3.16pm
Time 5.17pm
Time 1.11am
Time 1.01pm
Time 12.11pm

Regular expression not returning first number from 3 lines

Hi I am new with Perl programming, I wrote a code to store a first number from a scalar variable using regular expression but i am getting first number from last line but I need number from first line.
For example in the following code I need $num = 22 but code returns 656.
my $num ;
my $sample = "fd 22 sdf sdf 96
dsf6 66s sd6 7777 sd
656 dd 55 ";
my #sentences = split(/\n/, $sample);
for my $line(#sentences)
{
($num )= $line =~ /([0-9]+) .*/ ;
}
print $num;
Can some one tell me whats wrong with my logic?
Your code overwrites the first match in the following iterations of the loop, 22 matches but 666 replaces it. Just break after the first match:
($num )= $line =~ /([0-9]+) .*/ and last;
or remove the loop and match against the sample:
($num )= $sample =~ /([0-9]+)/;
I think the pattern as written won't filter out entries like "s67" in the following
my $sample = "fd 66s s67 22 sdf sdf 96
dsf6 66s 656 dd 55 ";
and so it needs something like
($num) = $line =~ /\b([0-9]+)\b.*/ and last;
Or try
($num) = $sample =~ /[0-9]+/g

How do I get rid of this "(" using regex?

I was moving along on a regex expression and I have hit a road block I can't seem to get around. I am trying to get rid of "(" in the middle of a line of text using regex, there were 2 but I figured out how to get the one on the end of the line. its the one in the middle I can hack out.
Here is a more complete snippet of the file which I am search through.
ide1:0.present = "TRUE"
ide1:0.clientDevice = "TRUE"
ide1:0.deviceType = "cdrom-raw"
ide1:0.startConnected = "FALSE"
floppy0.startConnected = "FALSE"
floppy0.clientDevice = "TRUE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntulinux"
uuid.location = "56 4d e8 67 57 18 67 04-c8 68 14 eb b3 c7 be bf"
uuid.bios = "56 4d e8 67 57 18 67 04-c8 68 14 eb b3 c7 be bf"
vc.uuid = "52 c7 14 5c a0 eb f4 cc-b3 69 e1 6d ad d8 1a e7"
Here is a the entire foreach loop I am working on.
my #virtual_machines;
foreach my $vm (keys %virtual_machines) {
push #virtual_machines, $vm;
}
foreach my $vm (#virtual_machines) {
my $vmx_file = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
if ($vmx_file =~ m/^\bguestOSAltName\b\s+\S\s+\W(?<GUEST_OS> .+[^")])\W/xm) {
$virtual_machines{$vm}{"OS"} = "$+{GUEST_OS}";
} else {
$virtual_machines{$vm}{"OS"} = "N/A";
}
if ($vmx_file =~ m/^\bguestOSAltName\b\s\S\s.+(?<ARCH> \d{2}\W\bbit\b)/xm) {
$virtual_machines{$vm}{"Architecture"} = "$+{ARCH}";
} else {
$virtual_machines{$vm}{"Architecture"} = "N/A";
}
}
I am thinking the problem is I cannot make a match to "(" because the expression before that is to ".+" so that it matches everything in the line of text, be it alphanumeric or whitespace or even symbols like hypens.
Any ideas how I can get this to work?
This is what I am getting for an output from a hash dump.
$VAR1 = {
'NS02' => {
'ID' => '144',
'Version' => '7',
'OS' => 'Ubuntu Linux (64-bit',
'VMX' => '/vmfs/volumes/datastore2/NS02/NS02.vmx',
'Architecture' => '64-bit'
},
The part of the code block where I am working with ARCH work flawless so really what I need is hack off the "(64-bit)" part if it exists when the search runs into the ( and have it remove the preceding whitespace before the (.
What I am wanting is to turn the above hash dump into this.
$VAR1 = {
'NS02' => {
'ID' => '144',
'Version' => '7',
'OS' => 'Ubuntu Linux',
'VMX' => '/vmfs/volumes/datastore2/NS02/NS02.vmx',
'Architecture' => '64-bit'
},
Same thing minus the (64-bit) part.
You can simplify your regex to /^guestOSAltName\s+=\s+"(?<GUEST_OS>.+)"/m. What this does:
^ forces the match to start at the beginning of a line
guestOSAltName is a string literal.
\s+ matches 1 or more whitespace characters.
(?<GUEST_OS>.+) matches all the text from after the spaces to the end of the line, catches the group and names it GUEST_OS. If the line could have comments, you might want to change .+ to [^#]+.
The "'s around the group are literal quotes.
The m at the end turns on multi-line matching.
Code:
if ($vmx_file =~ /^guestOSAltName\s+=\s+"(?<GUEST_OS>.+)"/m) {
print "$+{GUEST_OS}";
} else {
print "N/A";
}
See it here: http://ideone.com/1xH5J
So you want to match the contents of the string after guestOSAltName up to (and not including) the first ( if present?
Then replace the first line of your code sample with
if ($vmx_file =~ m/^guestOSAltName\s+=\s+"(?<GUEST_OS>[^"()]+)/xm) {
If there always is a whitespace character before a potential opening parenthesis, then you can use
if ($vmx_file =~ m/^guestOSAltName\s+=\s+"(?<GUEST_OS>[^"()]+)[ "]/xm) {
so you don't need to strip trailing whitespace if present.
Something like this should work:
$match =~ s/^(.*?)\((.*?)$/$1$2/;
Generally find that .* is too powerful (as you are finding!). Two suggestions
Be more explicit on what you are looking for
my $text = '( something ) ( something else) ' ;
$text =~ /
\(
( [\s\w]+ )
\)
/x ;
print $1 ;
Use non greedy matching
my $text = '( something ) ( something else) ' ;
$text =~ /
\(
( .*? ) # non greedy match
\)
/x ;
print $1 ;
General observation - involved regexps are far easier to read if you use the /x option as this allows spacing and comments.
Use an ? behind your counter. ? stands for non greedy.
The regex is /^guestOSAltName[^"]+"(?<GUEST_OS>.+?)\s*[\("]+.*$/:
#!/usr/bin/env perl
foreach my $x ('guestOSAltName = "Ubuntu Linux (64-bit)"', 'guestOSAltName = "Microsoft Windows Server 2003, Standard Edition"') {
if ($x =~ m/^guestOSAltName[^"]+"(?<GUEST_OS>.+?)\s*[\("]+.*$/xm) {
print "$+{GUEST_OS}\n";
} else {
print "N/A\n";
}
if ($x =~ m/^guestOSAltName[^(]+\((?<ARCH>\d{2}).*/xm) {
print "$+{ARCH}\n";
} else {
print "N/A\n";
}
}
Start the demo:
$ perl t.pl
Ubuntu Linux
64
Microsoft Windows Server 2003, Standard Edition
N/A

In Perl, how can I capture a string of digits from a string containing carriage returns and line feeds?

I need to extract the text (characters and numbers) from a multiline string. Everything I have tried does not strip out the line feeds/carriage returns.
Here is the string in question:
"\r\n 50145395\r\n "
In HEX it is: 0D 0A 20 20 20 20 20 20 20 20 35 30 31 34 35 33 39 35 0D 0A 20 20 20 20
I have tried the following:
$sitename =~ m/(\d+)/g;
$sitename = $1;
and
$sitename =~ s/^\D+//g;
$sitename =~ s/\D+$//g;
and
$sitename =~ s/^\s+//g;
$sitename =~ s/\s+$//g;
In all cases I cannot get rid of any of the unwanted characters. I have run this in cygwin perl and Strawberry perl.
Thanks.
Capturing match in list context returns captured strings:
#!/usr/bin/perl
use strict; use warnings;
my $s = join('', map chr(hex), qw(
0D 0A 20 20 20 20 20 20 20 20 35 30
31 34 35 33 39 35 0D 0A 20 20 20 20
));
my ($x) = $s =~ /([A-Za-z0-9]+)/;
print "'$x'\n";
Output:
C:\Temp> uio
'50145395'
I'm not sure that you need, but here is code extracting all words from string
my #words = ( $sitename =~ m/(\w+)/g );
It can be also done with split. But you need to use spaces now:
my #words = split( m/\s+/, $sitename );
The obvious one I didn't see in your post:
$sitename =~ s/\D//g;
This removes all non-digits. To remove anything but word characters, you could:
$sitename =~ s/\W//g;
There's no need for ^ or $ if your intention is to replace every non-digit. Also, you can replace one character at a time if you use the global g option; no need to match more than one digit with \d+.
Edit: My solution was incorrect; please instead pay attention to Sinan Ünür's solution.
Do you want to remove only newlines and carriage returns? If so, this is what you want:
$sitename =~ s/[\r\n]//g;
If you want to remove all whitespace, not just newlines and linefeeds, use this instead:
$sitename =~ s/\s//g;
$x = <<END;
this is a multiline
string. this is a multiline
string.
END
$x =~ s/\r?\n?//g;
print $x;
In the past I have done something like:
my $newline = chr(13) . chr(10);
$data =~ s/$newline/ /g;
You can check out other ascii character codes at: http://www.asciitable.com./
use strict;
my $newline = chr(13);
my $newline2 = chr(10);
my $words = "\r\n 50145395\r\n ";
foreach my $char (split //, $words) {
my $val=ord($char);
print "->$char<- ($val)\n";
}
print "$words\n";
$words =~ s/$newline//g;
$words =~ s/$newline2//g;
$words =~ s/[ ]+//g;
foreach my $char (split //, $words) {
my $val=ord($char);
print "->$char<- ($val)\n";
}
print "$words\n";
To extract all digits, strip off non-digit characters
$sitename ="\r\n 50145395\r\n ";
$sitename =~ s/\D+//g;