Convert HH:MM into decimal hours - regex

I am trying to convert some time stamps from text file in format HH:MM into number format (for example, 12:30 -> 12,5)1 using a Perl regex for easier processing in future.
I am quite new in this topic so I am struggling with MM part and I don't know how to convert it. Currently I have something like this:
while ( <FILE> ) {
$line = $_;
$line =~ s/([0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])/$2,$1/g;
print $line;
}
1) In my locale, the comma , is used for decimal points. Imagine a . So this means 12 and a half, or 12.5.

I would not use a regular expression for converting. It can be done with pretty simple math. Parse out the times using your search pattern, and then pass it through something like this.
sub to_decimal {
my $time = shift;
my ($hours, $minutes) = split /:/, $time;
my $decimal = sprintf '%.02d', ($minutes / 60) * 100 ;
return join ',', $hours, $decimal;
}
If you run it in a loop like this:
for (qw(00 01 05 10 15 20 25 30 35 40 45 50 55 58 59)) {
say "$_ => " . to_decimal("12:$_");
}
You get:
00 => 12,00
01 => 12,01
05 => 12,08
10 => 12,16
15 => 12,25
20 => 12,33
25 => 12,41
30 => 12,50
35 => 12,58
40 => 12,66
45 => 12,75
50 => 12,83
55 => 12,91
58 => 12,96
59 => 12,98

perl -ple 's|(\d\d):(\d\d)|{$2/60 + $1}|eg'
Your locale should take care of the comma, i think

This will achieve what you need. It uses an executable substitution to replace the time string by an expression in terms of the hour and minute values. tr/./,/r is used to covert all dots to commas
use strict;
use warnings 'all';
while ( <DATA> ) {
s{ ( 0[0-9] | 1[0-9] | 2[0-3] ) : ( [0-5][0-9] ) }{
sprintf('%.2f', $1 + $2 / 60) =~ tr/./,/r
}gex;
print;
}
__DATA__
00:00
05:17
12:30
15:59
23:59
output
0,00
5,28
12,50
15,98
23,98

You only have to adjust the substitution tomake it work:
$line =~ s/(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])/"$1," . substr( int($2)\/60, 2)/eg;
The e modifier causes the substituting content to be eval'ed, thus you can write the intended result as kind of a formula contingent on the capture group contents. Note that the substr call eliminates the leading 0, in the string representation of fractions.
If you need to limit your self to a given number of fraction digits, format the result of the division using sprintf:
$line =~ s/(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])/"$1," . substr( sprintf('%.2f', int($2)\/60), 2)/eg;

You could use egrep and awk:
$ echo 12:30 | egrep -o '([0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])' | awk -F":" '{printf $1+$2/60}'
12.5

Assume your LC_NUMERIC is correct:
while (<FILE>) {
use locale ':not_characters';
my $line = $_;
$line =~ s!\b([01][0-9]|2[0-3]):([0-5][0-9])\b!$1 + $2/60!eg;
print $line;
}

Related

How to match variable length number range 0 to $n using perl regex?

I need to match a numeric range from 0 to a number $n where $n can be any random number from 1 - 40.
For example,
if $n = 16, I need to strictly match only the numeric range from 0-16.
I tried m/([0-9]|[1-3][0-9]|40)/ but that is matching all 0-40. Is there a way to use regex to match from 0 to $n ?
The code snippet is attached for context.
$n = getNumber(); #getNumber() returns a random number from 1 to 40.
$answer = getAnswer(); #getAnswer() returns a user input.
#Check whether user enters an integer between 0 and $n
if ($answer =~ m/regex/){
print("Answer is an integer within specified range!\n");
}
I know can probably do something like
if($answer >= 0 && $answer <=$n)
But I am just wondering if there is a regex way of doing it?
I wouldn't pull out the following trick if there's another reasonable way to solve the problem. There is, for instance, Matching Numeric Ranges with a Regular Expression.
The (?(...)true|false) construct is like a regex conditional operator, and you can use one of the regex verbs, (*FAIL), to always fail a subpattern.
For the condition, you can use (?{...}) as the condition:
my $pattern = qr/
\b # anchor somehow
(\d++) # non-backtracking and greedy
(?(?{ $1 > 42 })(*FAIL))
/x;
my #numbers = map { int( rand(100) ) } 0 .. 10;
say "#numbers";
foreach my $n ( #numbers ) {
next unless $n =~ $pattern;
say "Matched $n";
}
Here's a run:
74 69 24 15 23 26 62 18 18 43 80
Matched 24
Matched 15
Matched 23
Matched 26
Matched 18
Matched 18
This is handy when the condition is complex.
I only think about this because it's an encouraged feature in Raku (and I have several examples in Learning Perl 6). Here's some Raku code in the same form, and the pattern syntax is significantly different:
#!raku
my $numbers = map { 100.rand.Int }, 0 .. 20;
say $numbers;
for #$numbers -> $n {
next unless $n ~~ / (<|w> \d+: <?{ $/ <= 42 }>) /;
say $n
}
The result is the same:
(67 43 31 41 89 14 52 71 48 64 5 21 6 31 44 27 39 94 78 15 39)
31
41
14
5
21
6
31
27
39
15
39
You can dynamically create the pattern. I've used a non-capture group (?:) here to keep the start and end of string anchors outside the list of |-ed numbers.
my $n = int rand 40;
my $answer = 42;
my $pattern = join '|', 0 .. $n;
if ($answer =~ m/^(?:$pattern)$/) {
print "Answer is an integer within specified range";
}
Please keep in mind that for your purpose this makes little sense.

Perl: Matching 3 pairs of numbers from 4 consecutive numbers

I am writing some code and I need to do the following:
Given a 4 digit number like "1234" I need to get 3 pairs of numbers (the first 2, the 2 in the middle, and the last 2), in this example I need to get "12" "23" and "34".
I am new to perl and don't know anything about regex. In fact, I am writing a script for personal use and I've started reading about Perl some days ago because I figured it was going to be a better language for the task at hand (need to do some statistics with the numbers and find patterns)
I have the following code but when testing I processed 6 digit numbers, because I "forgot" that the numbers I would be processing are 4 digits, so it failed with the real data, of course
foreach $item (#totaldata)
{
my $match;
$match = ($item =~ m/(\d\d)(\d\d)(\d\d)/);
if ($match)
{
($arr1[$i], $arr2[$i], $arr3[$i]) = ($item =~ m/(\d\d)(\d\d)(\d\d)/);
$processednums++;
$i++;
}
}
Thank you.
You can move last matching position with pos()
pos directly accesses the location used by the regexp engine to store the offset, so assigning to pos will change that offset..
my $item = 1234;
my #arr;
while ($item =~ /(\d\d)/g) {
push #arr, $1;
pos($item)--;
}
print "#arr\n"; # 12 23 34
The simplest way would be to use a global regex pattern search
It is nearly always best to separate verificaton of the input data from processing, so the program below first rejects any values that are not four characters long or that contain a non-digit character
Then the regex pattern finds all points in the string that are followed by two digits, and captures them
use strict;
use warnings 'all';
for my $val ( qw/ 1234 6572 / ) {
next if length($val) != 4 or $val =~ /\D/;
my #pairs = $val =~ /(?=(\d\d))/g;
print "#pairs\n";
}
output
12 23 34
65 57 72
Here's a pretty loud example demonstrating how you can use substr() to fetch out the portions of the number, while ensuring that what you're dealing with is in fact exactly a four-digit number.
use warnings;
use strict;
my ($one, $two, $three);
while (my $item = <DATA>){
if ($item =~ /^\d{4}$/){
$one = substr $item, 0, 2;
$two = substr $item, 1, 2;
$three = substr $item, 2, 2;
print "one: $one, two: $two, three: $three\n";
}
}
__DATA__
1234
abcd
a1b2c3
4567
891011
Output:
one: 12, two: 23, three: 34
one: 45, two: 56, three: 67
foreach $item (#totaldata) {
if ( my #match = $item =~ m/(?=(\d\d))/ ) {
($heads[$i], $middles[$i], $tails[$i]) = #match;
$processednums++;
$i++;
}
}

Changing time format using regex in perl

I want to read 12h format time from file and replace it with 24 hour
example
this is due at 3:15am -> this is due 15:15
I tried saving variables in regex and manupilate it later but didnt work, I also tried using substitution "/s" but because it is variable I couldnt figure it out
Here is my code:
while (<>) {
my $line = $_;
print ("this is text before: $line \n");
if ($line =~ m/\d:\d{2}pm/g){
print "It is PM! \n";}
elsif ($line =~ m/(\d):(\d\d)am/g){
print "this is try: $line \n";
print "Its AM! \n";}
$line =~ s/($regexp)/<French>$lexicon{$1}<\/French>/g;
print "sample after : $line\n";
}
A simple script can do the work for you
$str="this is due at 3:15pm";
$str=~m/\D+(\d+):\d+(.*)$/;
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12;
$min=$2;
$str=~s/at.*/$hour:$min/g;
print "$str\n";
Gives output as
this is due 15:15
What it does??
$str=~m/\D+(\d+):(\d+)(.*)$/; Tries to match the string with the regex
\D+ matches anything other than digits. Here it matches this is due at
(\d+) matches any number of digits. Here it matches 3. Captured in group 1 , $1 which is the hours
: matches :
(\d+) matches any number of digits. Here it matches 15, which is the minutes
(.*) matches anything follwed, here am . Captures in group 2, `$2
$ anchors the regex at end of
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12; Converts to 24 hour clock. If $2 is pm adds 12 unless it is 12. Also if the time is am and 12 then the hour is 0
$str=~s/at.*/$hour:$min/g; substitutes anything from at to end of string with $hour:$min, which is the time obtained from the ternary operation performed before
#!/usr/bin/env perl
use strict;
use warnings;
my $values = time_12h_to_24h("11:00 PM");
sub time_12h_to_24h
{
my($t12) = #_;
my($hh,$mm,$ampm) = $t12 =~ m/^(\d\d?):(\d\d?)\s*([AP]M?)/i;
$hh = ($hh % 12) + (($ampm =~ m/AM?/i) ? 0 : 12);
return sprintf("%.2d:%.2d", $hh, $mm);
}
I found this code in the bleow link. Please check:
Is my pseudo code for changing time format correct?
Try this it give what you expect
my #data = <DATA>;
foreach my $sm(#data){
if($sm =~/(12)\.\d+(pm)/g){
print "$&\n";
}
elsif($sm =~m/(\d+(\.)?\d+)(pm)/g )
{
print $1+12,"\n";
}
}
__DATA__
Time 3.15am
Time 3.16pm
Time 5.17pm
Time 1.11am
Time 1.01pm
Time 12.11pm

How do I get rid of this "(" using regex?

I was moving along on a regex expression and I have hit a road block I can't seem to get around. I am trying to get rid of "(" in the middle of a line of text using regex, there were 2 but I figured out how to get the one on the end of the line. its the one in the middle I can hack out.
Here is a more complete snippet of the file which I am search through.
ide1:0.present = "TRUE"
ide1:0.clientDevice = "TRUE"
ide1:0.deviceType = "cdrom-raw"
ide1:0.startConnected = "FALSE"
floppy0.startConnected = "FALSE"
floppy0.clientDevice = "TRUE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntulinux"
uuid.location = "56 4d e8 67 57 18 67 04-c8 68 14 eb b3 c7 be bf"
uuid.bios = "56 4d e8 67 57 18 67 04-c8 68 14 eb b3 c7 be bf"
vc.uuid = "52 c7 14 5c a0 eb f4 cc-b3 69 e1 6d ad d8 1a e7"
Here is a the entire foreach loop I am working on.
my #virtual_machines;
foreach my $vm (keys %virtual_machines) {
push #virtual_machines, $vm;
}
foreach my $vm (#virtual_machines) {
my $vmx_file = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
if ($vmx_file =~ m/^\bguestOSAltName\b\s+\S\s+\W(?<GUEST_OS> .+[^")])\W/xm) {
$virtual_machines{$vm}{"OS"} = "$+{GUEST_OS}";
} else {
$virtual_machines{$vm}{"OS"} = "N/A";
}
if ($vmx_file =~ m/^\bguestOSAltName\b\s\S\s.+(?<ARCH> \d{2}\W\bbit\b)/xm) {
$virtual_machines{$vm}{"Architecture"} = "$+{ARCH}";
} else {
$virtual_machines{$vm}{"Architecture"} = "N/A";
}
}
I am thinking the problem is I cannot make a match to "(" because the expression before that is to ".+" so that it matches everything in the line of text, be it alphanumeric or whitespace or even symbols like hypens.
Any ideas how I can get this to work?
This is what I am getting for an output from a hash dump.
$VAR1 = {
'NS02' => {
'ID' => '144',
'Version' => '7',
'OS' => 'Ubuntu Linux (64-bit',
'VMX' => '/vmfs/volumes/datastore2/NS02/NS02.vmx',
'Architecture' => '64-bit'
},
The part of the code block where I am working with ARCH work flawless so really what I need is hack off the "(64-bit)" part if it exists when the search runs into the ( and have it remove the preceding whitespace before the (.
What I am wanting is to turn the above hash dump into this.
$VAR1 = {
'NS02' => {
'ID' => '144',
'Version' => '7',
'OS' => 'Ubuntu Linux',
'VMX' => '/vmfs/volumes/datastore2/NS02/NS02.vmx',
'Architecture' => '64-bit'
},
Same thing minus the (64-bit) part.
You can simplify your regex to /^guestOSAltName\s+=\s+"(?<GUEST_OS>.+)"/m. What this does:
^ forces the match to start at the beginning of a line
guestOSAltName is a string literal.
\s+ matches 1 or more whitespace characters.
(?<GUEST_OS>.+) matches all the text from after the spaces to the end of the line, catches the group and names it GUEST_OS. If the line could have comments, you might want to change .+ to [^#]+.
The "'s around the group are literal quotes.
The m at the end turns on multi-line matching.
Code:
if ($vmx_file =~ /^guestOSAltName\s+=\s+"(?<GUEST_OS>.+)"/m) {
print "$+{GUEST_OS}";
} else {
print "N/A";
}
See it here: http://ideone.com/1xH5J
So you want to match the contents of the string after guestOSAltName up to (and not including) the first ( if present?
Then replace the first line of your code sample with
if ($vmx_file =~ m/^guestOSAltName\s+=\s+"(?<GUEST_OS>[^"()]+)/xm) {
If there always is a whitespace character before a potential opening parenthesis, then you can use
if ($vmx_file =~ m/^guestOSAltName\s+=\s+"(?<GUEST_OS>[^"()]+)[ "]/xm) {
so you don't need to strip trailing whitespace if present.
Something like this should work:
$match =~ s/^(.*?)\((.*?)$/$1$2/;
Generally find that .* is too powerful (as you are finding!). Two suggestions
Be more explicit on what you are looking for
my $text = '( something ) ( something else) ' ;
$text =~ /
\(
( [\s\w]+ )
\)
/x ;
print $1 ;
Use non greedy matching
my $text = '( something ) ( something else) ' ;
$text =~ /
\(
( .*? ) # non greedy match
\)
/x ;
print $1 ;
General observation - involved regexps are far easier to read if you use the /x option as this allows spacing and comments.
Use an ? behind your counter. ? stands for non greedy.
The regex is /^guestOSAltName[^"]+"(?<GUEST_OS>.+?)\s*[\("]+.*$/:
#!/usr/bin/env perl
foreach my $x ('guestOSAltName = "Ubuntu Linux (64-bit)"', 'guestOSAltName = "Microsoft Windows Server 2003, Standard Edition"') {
if ($x =~ m/^guestOSAltName[^"]+"(?<GUEST_OS>.+?)\s*[\("]+.*$/xm) {
print "$+{GUEST_OS}\n";
} else {
print "N/A\n";
}
if ($x =~ m/^guestOSAltName[^(]+\((?<ARCH>\d{2}).*/xm) {
print "$+{ARCH}\n";
} else {
print "N/A\n";
}
}
Start the demo:
$ perl t.pl
Ubuntu Linux
64
Microsoft Windows Server 2003, Standard Edition
N/A

In Perl, how can I capture a string of digits from a string containing carriage returns and line feeds?

I need to extract the text (characters and numbers) from a multiline string. Everything I have tried does not strip out the line feeds/carriage returns.
Here is the string in question:
"\r\n 50145395\r\n "
In HEX it is: 0D 0A 20 20 20 20 20 20 20 20 35 30 31 34 35 33 39 35 0D 0A 20 20 20 20
I have tried the following:
$sitename =~ m/(\d+)/g;
$sitename = $1;
and
$sitename =~ s/^\D+//g;
$sitename =~ s/\D+$//g;
and
$sitename =~ s/^\s+//g;
$sitename =~ s/\s+$//g;
In all cases I cannot get rid of any of the unwanted characters. I have run this in cygwin perl and Strawberry perl.
Thanks.
Capturing match in list context returns captured strings:
#!/usr/bin/perl
use strict; use warnings;
my $s = join('', map chr(hex), qw(
0D 0A 20 20 20 20 20 20 20 20 35 30
31 34 35 33 39 35 0D 0A 20 20 20 20
));
my ($x) = $s =~ /([A-Za-z0-9]+)/;
print "'$x'\n";
Output:
C:\Temp> uio
'50145395'
I'm not sure that you need, but here is code extracting all words from string
my #words = ( $sitename =~ m/(\w+)/g );
It can be also done with split. But you need to use spaces now:
my #words = split( m/\s+/, $sitename );
The obvious one I didn't see in your post:
$sitename =~ s/\D//g;
This removes all non-digits. To remove anything but word characters, you could:
$sitename =~ s/\W//g;
There's no need for ^ or $ if your intention is to replace every non-digit. Also, you can replace one character at a time if you use the global g option; no need to match more than one digit with \d+.
Edit: My solution was incorrect; please instead pay attention to Sinan Ünür's solution.
Do you want to remove only newlines and carriage returns? If so, this is what you want:
$sitename =~ s/[\r\n]//g;
If you want to remove all whitespace, not just newlines and linefeeds, use this instead:
$sitename =~ s/\s//g;
$x = <<END;
this is a multiline
string. this is a multiline
string.
END
$x =~ s/\r?\n?//g;
print $x;
In the past I have done something like:
my $newline = chr(13) . chr(10);
$data =~ s/$newline/ /g;
You can check out other ascii character codes at: http://www.asciitable.com./
use strict;
my $newline = chr(13);
my $newline2 = chr(10);
my $words = "\r\n 50145395\r\n ";
foreach my $char (split //, $words) {
my $val=ord($char);
print "->$char<- ($val)\n";
}
print "$words\n";
$words =~ s/$newline//g;
$words =~ s/$newline2//g;
$words =~ s/[ ]+//g;
foreach my $char (split //, $words) {
my $val=ord($char);
print "->$char<- ($val)\n";
}
print "$words\n";
To extract all digits, strip off non-digit characters
$sitename ="\r\n 50145395\r\n ";
$sitename =~ s/\D+//g;