Regex to match hours and time - regex

I'm still learning Perl regular expressions and I need to match a string that represents the time.
However there are instances where multiple times get entered. Instead of '9AM' I will sometimes get '9AM5PM' or '09AM05PM' and so on... Fortunately, It always starts with one or two numbers and ends with 'AM' or 'PM' (Upper and Lowercase)
Here's what I have so far:
$string =~ /^((([1-9])|(1[0-2]))*(A|P)M)$/i;
Any help would be greatly appreciated!

The only problem I can see with your own code is that the hours field is optional (because you use a *) but you don't say what issues you're having.
You do have a lot of unnecessary captures. Every part of the pattern that is enclosed in parentheses will capture the corresponding part of the target string in an internal variables called $1, $2 etc. Unless you really need those captures it is best to use non-capturing parentheses (?: ... ) instead of the plain ones ( ... ).
Character classes like [1-9] are a single entity and don't need enclosing in parentheses. You also haven't accounted for a leading zero on values less than ten, and you should use a character class [AP] instead of an alternation (?:A|P)
It looks like you need
/\d{1,2}[AP]M/i
But you don't say what you want to do with the times once you have found them.
This snippet of code demonstrates the functionality by putting all the times that it finds in a string into array #times and then printing it with space separators.
use strict;
use warnings;
for my $string (qw/ 9AM 9AM5PM 09AM05PM /) {
my #times = $string =~ /\d{1,2}[AP]M/ig;
print "#times\n";
}
output
9AM
9AM 5PM
09AM 05PM
If you really want to verify that the hour value is in range (are you likely to come across 35pm?) then you could write
my #times = $string =~ / (?: 1[012] | 0?[1-9] ) [AP]M /igx
Note that the /x modifier makes whitespace insignificant within regular expressions, so that it can be used to clarify the form of the pattern.

You can try something like:
$string =~ /^((0?\d|1[0-2])[AP]M)+$/i;
As you can see here. Or:
$string =~ /^((0?\d|1[0-2])[AP]M){1,2}$/i;
If you want it to be just up to 2 hours together.

Related

print the matched word in perl regex

I need to print all my matched strings from a stored line in perl. I have seen various posts on this
Print the matched string using perl
Perl Regex - Print the matched value
and I experimented to first try to print the first word. But I get a build error
Use of uninitialized value $1 in concatenation (.) or string at rg.pl line 10.
I have tried with split and arrays and it works, but while printing $1, it throws error.
My code is here
#!/usr/bin/perl/
use warnings;
use strict;
#my $line = "At a far distance near the bar, was a parked car. Star were shining in the night. The boy in the car had scar and he was at war with his enemy. \n";
my $line = "At a far distance near the bar, was a parked car. \n";
if($line =~ /[a-z]ar/gi)
{ print "$1 \n"; }
$_ = $line;
I want my output for this code to be
far
and subsequently print all the words containing ar,
far
near
bar
parked
car
I even tried changing my code, as below but that didnt work, same error
if($line =~ /[a-z]ar/gi) {
my $match = $1;
print "$match \n"; }
First, you didn't capture anything, which is how $n variables are populated. Put parenthesis around what you want to be captured into $1
if ($line =~ /([a-z]ar)i/) { print "$1\n" }
I've removed the /g which is unneeded (and with potential for trouble†) here.
Next, your pattern requires and captures one letter followed by literal ar, no more no less. That won't capture near, nor will it capture parked (it'll get par only). It will not even match a word that starts with ar, since it requires that there is a letter before ar. You need to use quantifiers, to tell it how many times to match a letter. And you also want to find all matches.
One way is to scoop them all up by providing the list context and /g (global) modifier
my #words = $line =~ /([a-z]*ar[a-z]*)/gi;
print "$_\n" for #words;
The [a-z]* means to match a letter, zero-or-more times. So an optional string of letters. We also added an optional string of letters after ar. The /g makes it continue through the string after a match, to find all such patterns. In the list context the list of matches is returned.
Or, you can match in scalar context like in the first example, but in a while loop
while ($line =~ /([a-z]*ar[a-z]*)/gi) { print "$1\n" }
Here /g does something different. It matches a pattern once and returns true, the while condition is true and we print. Then it comes back and looks for a match from where it matched previously ... and keeps doing this until there are no more matches.
This is complex behavior altogether. From Regexp Quote-Like Operators in perlop
The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. [...]
Read about this in more detail and in a tutorial manner in perlretut, under "Global matching."
† Note on using /g modifier in scalar context
I've used that above, in while (/.../g), what is a very common way to hop over all occurrences of the pattern in a string, each time giving us control in the while body.
While this use is intended and idiomatic, the use of /g in scalar context can bring subtle trouble when not in the loop condition: the next regex with /g on this variable will continue from the previous match, not from the string's beginning, what may be unexpected.
That "next regex" may also simply be that same expression -- in the next pass of some larger loop in which our expression happens to be, and this holds across function calls as well. Consider
use warnings;
use strict;
use feature 'say';
my $s = q(one two three);
sub func { say $1 if $_[0] =~ /(\w+)/g }; # /g may be of great consequence!
for (1..4) {
# ... perhaps much, much later ...
func($s);
}
This loop prints lines one, then two, then three, and that's that. This (working) example is so bare bones that it is artificial bit I hope that it conveys that /g in scalar context may surprise.
For one thing, it is not uncommon to see /g on a regex in an if condition being plain wrong.
For multiple matches, use a while loop. Also, I surrounded the quantity you want to capture with parentheses to indicate that it is a capture group.
while ($line =~ /([a-z]*ar[a-z]*)/gi ) {
print "$1 \n";
}

Regex: delete contents of square brackets

Is there a regular expression that can be used with search/replace to delete everything occurring within square brackets (and the brackets)?
I've tried \[.*\] which chomps extra stuff (e.g. "[chomps] extra [stuff]")
Also, the same thing with lazy matching \[.*?\] doesn't work when there is a nested bracket (e.g. "stops [chomping [too] early]!")
Try something like this:
$text = "stop [chomping [too] early] here!";
$text =~ s/\[([^\[\]]|(?0))*]//g;
print($text);
which will print:
stop here!
A short explanation:
\[ # match '['
( # start group 1
[^\[\]] # match any char except '[' and ']'
| # OR
(?0) # recursively match group 0 (the entire pattern!)
)* # end group 1 and repeat it zero or more times
] # match ']'
The regex above will get replaced with an empty string.
You can test it online: http://ideone.com/tps8t
EDIT
As #ridgerunner mentioned, you can make the regex more efficiently by making the * and the character class [^\[\]] match once or more and make it possessive, and even by making a non capturing group from group 1:
\[(?:[^\[\]]++|(?0))*+]
But a real improvement in speed might only be noticeable when working with large strings (you can test it, of course!).
This is technically not possible with regular expressions because the language you're matching does not meet the definition of "regular". There are some extended regex implementations that can do it anyway using recursive expressions, among them are:
Greta:
http://easyethical.org/opensource/spider/regexp%20c++/greta2.htm#_Toc39890907
and
PCRE
http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
See "Recursive Patterns", which has an example for parentheses.
A PCRE recursive bracket match would look like this:
\[(?R)*\]
edit:
Since you added that you're using Perl, here's a page that explicitly describes how to match balanced pairs of operators in Perl:
http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text%3f
Something like:
$string =~ m/(\[(?:[^\[\]]++|(?1))*\])/xg;
Since you're using Perl, you can use modules from the CPAN and not have to write your own regular expressions. Check out the Text::Balanced module that allows you to extract text from balanced delimiters. Using this module means that if your delimiters suddenly change to {}, you don't have to figure out how to modify a hairy regular expression, you only have to change the delimiter parameter in one function call.
If you are only concerned with deleting the contents and not capturing them to use elsewhere you can use a repeated removal from the inside of the nested groups to the outside.
my $string = "stops [chomping [too] early]!";
# remove any [...] sequence that doesn't contain a [...] inside it
# and keep doing it until there are no [...] sequences to remove
1 while $string =~ s/\[[^\[\]]*\]//g;
print $string;
The 1 while will basically do nothing while the condition is true. If a s/// matches and removes a bracketed section the loop is repeated and the s/// is run again.
This will work even if your using an older version of Perl or another language that doesn't support the (?0) recursion extended pattern in Bart Kiers's answer.
You want to remove only things between the []s that aren't []s themselves. IE:
\[[^\]]*\]
Which is a pretty hairy mess of []s ;-)
It won't handle multiple nested []s though. IE, matching [foo[bar]baz] won't work.

How can I match everything that is after the last occurrence of some char in a perl regular expression?

For example, return the part of the string that is after the last x in axxxghdfx445 (should return 445).
my($substr) = $string =~ /.*x(.*)/;
From perldoc perlre:
By default, a quantified subpattern is "greedy", that is, it will match
as many times as possible (given a particular starting location) while
still allowing the rest of the pattern to match.
That's why .*x will match up to the last occurence of x.
The simplest way would be to use /([^x]*)$/
the first answer is a good one,
but when talking about "something that does not contain"...
i like to use the regex that "matches" it
my ($substr) = $string =~ /.*x([^x]*)$/;
very usefull in some case
the simplest way is not regular expression, but a simple split() and getting the last element.
$string="axxxghdfx445";
#s = split /x/ , $string;
print $s[-1];
Yet another way to do it. It's not as simple as a single regular expression, but if you're optimizing for speed, this approach will probably be faster than anything using regex, including split.
my $s = 'axxxghdfx445';
my $p = rindex $s, 'x';
my $match = $p < 0 ? undef : substr($s, $p + 1);
I'm surprised no one has mentioned the special variable that does this, $': "$'" returns everything after the matched string. (perldoc perlre)
my $str = 'axxxghdfx445';
$str =~ /x/;
# $' contains '445';
print $';
However, there is a cost (emphasis mine):
WARNING: Once Perl sees that you need one of $&, "$", or "$'" anywhere
in the program, it has to provide them for every pattern match. This
may substantially slow your program. Perl uses the same mechanism to
produce $1, $2, etc, so you also pay a price for each pattern that
contains capturing parentheses. (To avoid this cost while retaining
the grouping behaviour, use the extended regular expression "(?: ... )"
instead.) But if you never use $&, "$" or "$'", then patterns without
capturing parentheses will not be penalized. So avoid $&, "$'", and
"$`" if you can, but if you can't (and some algorithms really
appreciate them), once you've used them once, use them at will, because
you've already paid the price. As of 5.005, $& is not so costly as the
other two.
But wait, there's more! You get two operators for the price of one, act NOW!
As a workaround for this problem, Perl 5.10.0 introduces
"${^PREMATCH}", "${^MATCH}" and "${^POSTMATCH}", which are equivalent
to "$`", $& and "$'", except that they are only guaranteed to be
defined after a successful match that was executed with the "/p"
(preserve) modifier. The use of these variables incurs no global
performance penalty, unlike their punctuation char equivalents, however
at the trade-off that you have to tell perl when you want to use them.
my $str = 'axxxghdfx445';
$str =~ /x/p;
# ${^POSTMATCH} contains '445';
print ${^POSTMATCH};
I would humbly submit that this route is the best and most straight-forward
approach in most cases, since it does not require that you do special things
with your pattern construction in order to retrieve the postmatch portion, and there
is no performance penalty.
Regular Expression : /([^x]+)$/ #assuming x is not last element of the string.

How to have a variable as regex in Perl

I think this question is repeated, but searching wasn't helpful for me.
my $pattern = "javascript:window.open\('([^']+)'\);";
$mech->content =~ m/($pattern)/;
print $1;
I want to have an external $pattern in the regular expression. How can I do this? The current one returns:
Use of uninitialized value $1 in print at main.pm line 20.
$1 was empty, so the match did not succeed. I'll make up a constant string in my example of which I know that it will match the pattern.
Declare your regular expression with qr, not as a simple string. Also, you're capturing twice, once in $pattern for the open call's parentheses, once in the m operator for the whole thing, therefore you get two results. Instead of $1, $2 etc. I prefer to assign the results to an array.
my $pattern = qr"javascript:window.open\('([^']+)'\);";
my $content = "javascript:window.open('something');";
my #results = $content =~ m/($pattern)/;
# expression return array
# (
# q{javascript:window.open('something');'},
# 'something'
# )
When I compile that string into a regex, like so:
my $pattern = "javascript:window.open\('([^']+)'\);";
my $regex = qr/$pattern/;
I get just what I think I should get, following regex:
(?-xism:javascript:window.open('([^']+)');)/
Notice that it it is looking for a capture group and not an open paren at the end of 'open'. And in that capture group, the first thing it expects is a single quote. So it will match
javascript:window.open'fum';
but not
javascript:window.open('fum');
One thing you have to learn, is that in Perl, "\(" is the same thing as "(" you're just telling Perl that you want a literal '(' in the string. In order to get lasting escapes, you need to double them.
my $pattern = "javascript:window.open\\('([^']+)'\\);";
my $regex = qr/$pattern/;
Actually preserves the literal ( and yields:
(?-xism:javascript:window.open\('([^']+)'\);)
Which is what I think you want.
As for your question, you should always test the results of a match before using it.
if ( $mech->content =~ m/($pattern)/ ) {
print $1;
}
makes much more sense. And if you want to see it regardless, then it's already implicit in that idea that it might not have a value. i.e., you might not have matched anything. In that case it's best to put alternatives
$mech->content =~ m/($pattern)/;
print $1 || 'UNDEF!';
However, I prefer to grab my captures in the same statement, like so:
my ( $open_arg ) = $mech->content =~ m/($pattern)/;
print $open_arg || 'UNDEF!';
The parens around $open_arg puts the match into a "list context" and returns the captures in a list. Here I'm only expecting one value, so that's all I'm providing for.
Finally, one of the root causes of your problems is that you do not need to specify your expression in a string in order for your regex to be "portable". You can get perl to pre-compile your expression. That way, you only care what instructions the characters are to a regex and not whether or not you'll save your escapes until it is compiled into an expression.
A compiled regex will interpolate itself into other regexes properly. Thus, you get a portable expression that interpolates just as well as a string--and specifically correctly handles instructions that could be lost in a string.
my $pattern = qr/javascript:window.open\('([^']+)'\);/;
Is all that you need. Then you can use it, just as you did. Although, putting parens around the whole thing, would return the whole matched expression (and not just what's between the quotes).
You do not need the parentheses in the match pattern. It will match the whole pattern and return that as $1, which I am guess is not matching, but I am only guessing.
$mech->content =~ m/$pattern/;
or
$mech->content =~ m/(?:$pattern)/;
These are the clustering, non-capturing parentheses.
The way you are doing it is correct.
The solutions have been already given, I'd like to point out that the window.open call might have multiple parameters included in "" and grouped by comma like:
javascript:window.open("http://www.javascript-coder.com","mywindow","status=1,toolbar=1");
There might be spaces between the function name and parentheses, so I'd use a slighty different regex for that:
my $pattern = qr{
javascript:window.open\s*
\(
([^)]+)
\)
}x;
print $1 if $text =~ /$pattern/;
Now you have all parameters in $1 and can process them afterwards with split /,/, $stuff and so on.
It reports an uninitialized value because $1 is undefined. $1 is undefined because you have created a nested matching group by wrapping a second set of parentheses around the pattern. It will also be undefined if nothing matches your pattern.

How to return the first five digits using Regular Expressions

How do I return the first 5 digits of a string of characters in Regular Expressions?
For example, if I have the following text as input:
15203 Main Street
Apartment 3 63110
How can I return just "15203".
I am using C#.
This isn't really the kind of problem that's ideally solved by a single-regex approach -- the regex language just isn't especially meant for it. Assuming you're writing code in a real language (and not some ill-conceived embedded use of regex), you could do perhaps (examples in perl)
# Capture all the digits into an array
my #digits = $str =~ /(\d)/g;
# Then take the first five and put them back into a string
my $first_five_digits = join "", #digits[0..4];
or
# Copy the string, removing all non-digits
(my $digits = $str) =~ tr/0-9//cd;
# And cut off all but the first five
$first_five_digits = substr $digits, 0, 5;
If for some reason you really are stuck doing a single match, and you have access to the capture buffers and a way to put them back together, then wdebeaum's suggestion works just fine, but I have a hard time imagining a situation where you can do all that, but don't have access to other language facilities :)
it would depend on your flavor of Regex and coding language (C#, PERL, etc.) but in C# you'd do something like
string rX = #"\D+";
Regex.replace(input, rX, "");
return input.SubString(0, 5);
Note: I'm not sure about that Regex match (others here may have a better one), but basically since Regex itself doesn't "replace" anything, only match patterns, you'd have to look for any non-digit characters; once you'd matched that, you'd need to replace it with your languages version of the empty string (string.Empty or "" in C#), and then grab the first 5 characters of the resulting string.
You could capture each digit separately and put them together afterwards, e.g. in Perl:
$str =~ /(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)/;
$digits = $1 . $2 . $3 . $4 . $5;
I don't think a regular expression is the best tool for what you want.
Regular expressions are to match patterns... the pattern you are looking for is "a(ny) digit"
Your logic external to the pattern is "five matches".
Thus, you either want to loop over the first five digit matches, or capture five digits and merge them together.
But look at that Perl example -- that's not one pattern -- it's one pattern repeated five times.
Can you do this via a regular expression? Just like parsing XML -- you probably could, but it's not the right tool.
Not sure this is best solved by regular expressions since they are used for string matching and usually not for string manipulation (in my experience).
However, you could make a call to:
strInput = Regex.Replace(strInput, "\D+", "");
to remove all non number characters and then just return the first 5 characters.
If you are wanting just a straight regex expression which does all this for you I am not sure it exists without using the regex class in a similar way as above.
A different approach -
#copy over
$temp = $str;
#Remove non-numbers
$temp =~ s/\D//;
#Get the first 5 numbers, exactly.
$temp =~ /\d{5}/;
#Grab the match- ASSUMES that there will be a match.
$first_digits = $1
result =~ s/^(\d{5}).*/$1/
Replace any text starting with a digit 0-9 (\d) exactly 5 of them {5} with any number of anything after it '.*' with $1, which is the what is contained within the (), that is the first five digits.
if you want any first 5 characters.
result =~ s/^(.{5}).*/$1/
Use whatever programming language you are using to evaluate this.
ie.
regex.replace(text, "^(.{5}).*", "$1");