perl regex to get trailing numbers - regex

I'm trying to basically trying to separate a specific amount of text from the one or more numbers that appear at the end. The below works when there is 1 trailing number but not when there is two or more? Shouldn't the (\d+) be getting the "12" in "P_TIME12"?
my #strs = ('P_ABC1','P_DFRES3','P_TIME12');
foreach my $str (#strs) {
if ($str =~ /^P_(\w+)(\d+)$/) {
print "word " . $1 . " digits " . $2 . "\n";
}
}
Results in
word ABC digits 1
word DFRES digits 3
word TIME1 digits 2
TIA

\w contains digits, use [_a-zA-Z] instead, if the only digits are at the end
and \w+ is greedy, it will first match the whole word and leaves nothing for \d+, so it has to backtrack 1 character and the last character is good enought for \d+
if you want lazy operator, because you have digits in the middle, use ^P_(\w+?)(\d+)$

/^P_(\D+)(\d+)$/
The character class \d matches digits; its negation \D matches everything else.

In case it is acceptable for you to capture also spaces in the first part, a simpler solution is to match anything ungreedily before the trailing numbers, then the trailing numbers greedily.
This has the advantage that you can match even digits in the first part (provided that they don't appear at the end).
And spaces as well, as already said.
That is:
my #strs = qw(P_1ABC1 P_DFRES3 P_3TIME12);
foreach (#strs) {
if ( /^P_(.*?)(\d+)$/ ) {
print ">$1<", "\t\t", ">$2<", "\n"
}
}
which produces:
>1ABC< >1<
>DFRES< >3<
>3TIME< >12<

\w matches "word characters", including digits and underscore. Because you've asked for at least one digit (\d+), \w is being greedy and matching one as well.
You should be more explicit than \w, and use /^P_([A-Za-z_]+)(\d+)$/ instead.

Related

Regex perl with letters and numbers

I need to extract a strings from a text file that contains both letters and numbers. The lines start like this
Report filename: ABCL00-67900010079415.rpt ______________________
All I need is the last 8 numbers so in this example that would be 10079415
while(<DATA>){
if (/Report filename/) {
my ($bagID) = ( m/(\d{8}+)./ );
print $bagID;
}
Right now this prints out the first 8 but I want the last 8.
You just need to escape the dot, so that it would match the 8 digit characters which exists before the dot charcater.
my ($bagID) = ( m/(\d{8}+)\./ );
. is a special character in regex which matches any character. In-order to match a literal dot, you must need to escape that.
To match the last of anything, just precede it with a wildcard that will match as many characters as possible
my ($bag_id) = / .* (\d{8}) /x
Note that I have also use the /x modifier so that the regex can contain insignificant whitespace for readability. Also, your \d{8}+ is what is called a possessive quantifier; it is used for optimising some regex constructions and makes no difference at the end of the pattern

Regular expression to validate a string that contains both digit and non-digit characters

Could anybody help on a regular expression that I can use to validate if a string contains both digit and non-digit characters?
I'm using "\d+\D+" but it's not working. The test cases I have are:
a1
1a
a1b
1ab
ab1
1-2
12-
-12
The test cases I listed should all result in match. I'm using javascript RegExp.test() So 999 or asdf or _+sdf would not match.
Your current regex only matches strings of one or more digits, followed by one or more non-digits. You could use a look-ahead to check for the existence of a digit:
"(?=.*\d).*\D.*"
The (?=.*\d) part means "somewhere after this, there must be zero or more of any character followed by a digit." This allows your digit to appear anywhere in the string.
The .*\D.* part means "match zero or more of any character, then a non-digit, then zero or more of any character," which will match a non-digit at any position in the string and the rest of the characters (digits or not) around it.
You can try using lookaheads:
.*(?=.*\d)(?=.*\D).*
But maybe you don't even need a regex? Depending on the language/tool you're using, you might be able to do something like this:
Let your input string be s. If s is empty, it is invalid.
If the first character of s is a digit:
Loop through the other characters of s until you find a non-digit. If you don't find a non-digit, s is invalid.
Otherwise:
Loop through the other characters of s until you find a digit. If you don't find a digit, s is invalid.
If you found the appropriate digit/non-digit, s is valid.
This here works for me.
It's ( match1 | match2 ) where | means OR.
(\d+[a-zA-Z]+|[a-zA-Z]+\d+)
By digit and non-digit if you mean (any non-digit character) you can use the character classes \d for digit and \D which means [^\d]. There is no need for a lookaround here though. If you mean a number and a letter, you can use the following. I'm exploding your string for comparison strings. I'm using a group with an | operator to allow for digit before letter and vice versa.
<?php
$string = 'a1 1a a1b 1ab ab1 1-2 12- -12';
$strings = explode(' ',$string);
$pattern = '!([0-9][A-Za-z]|[A-Za-z][0-9])!';
foreach($strings as $tempString){
if(preg_match($pattern,$tempString)){
echo "$tempString matches\n";
} else {
echo "$tempString doesn't match\n";
}
}
?>
Output
a1 matches
1a matches
a1b matches
1ab matches
ab1 matches
1-2 doesn't match
12- doesn't match
-12 doesn't match
If we change to the \d\D character classes everything matches.
$pattern = '!(\d\D|\D\d)!';
Output
a1 matches
1a matches
a1b matches
1ab matches
ab1 matches
1-2 matches
12- matches
-12 matches

Perl regex: how to match strings that have a digit in between two nondigits?

How do I match strings of an array that has one or more digits in between two nondigits, and the string ends with a digit? Let's say I wanted to print out the strings that didn't match. How would I do this?
Here's what I have so far
my #array = ("OST3GIC2", "GRE1", "foo23eoo4","MAX13", "foo9fsa2");
foreach #array{
if !(grep /^+\D(+\d)+\D\d$/) {
print $_."\n";
}
Desired Output
GRE1
MAX13
thanks
You could look for:
/\D\d+\D.*\d$/
\D non-digit
\d+ any number of digits
\D a non-digit
.* anything
\d a digit
$ finally end-of-string
If you want the non-matches directly, you can use
^\D+\d+$
If you want the matches, and then filter out, you can use
^\D*\d+\D+\d+$
my #array = ("OST3GIC2", "GRE1", "foo23eoo4","MAX13", "foo9fsa2");
print(join("\n", grep { ! /^\D+\d+\D+\d+$/ } #array) . "\n");
Meaning
! to inverse the regexp result
^\D\d+\D+\d+$ means start with any non digit, then any digits, then any non digits, then ends with digits
( any means at least one here )

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.

Match a number in a string with letters and numbers

I need to write a Perl regex to match numbers in a word with both letters and numbers.
Example: test123. I want to write a regex that matches only the number part and capture it
I am trying this \S*(\d+)\S* and it captures only the 3 but not 123.
Regex atoms will match as much as they can.
Initially, the first \S* matched "test123", but the regex engine had to backtrack to allow \d+ to match. The result is:
+------------------- Matches "test12"
| +-------------- Matches "3"
| | +--------- Matches ""
| | |
--- --- ---
\S* (\d+) \S*
All you need is:
my ($num) = "test123" =~ /(\d+)/;
It'll try to match at position 0, then position 1, ... until it finds a digit, then it will match as many digits it can.
The * in your regex are greedy, that's why they "eat" also numbers. Exactly what #Marc said, you don't need them.
perl -e '$_ = "qwe123qwe"; s/(\d+)/$numbers=$1/e; print $numbers . "\n";'
"something122320" =~ /(\d+)/ will return 122320; this is probably what you're trying to do ;)
\S matches any non-whitespace characters, including digits. You want \d+:
my ($number) = 'test123' =~ /(\d+)/;
Were it a case where a non-digit was required (say before, per your example), you could use the following non-greedy expressions:
/\w+?(\d+)/ or /\S+?(\d+)/
(The second one is more in tune with your \S* specification.)
Your expression satisfies any condition with one or more digits, and that may be what you want. It could be a string of digits surrounded by spaces (" 123 "), because the border between the last space and the first digit satisfies zero-or-more non-space, same thing is true about the border between the '3' and the following space.
Chances are that you don't need any specification and capturing the first digits in the string is enough. But when it's not, it's good to know how to specify expected patterns.
I think parentheses signify capture groups, which is exactly what you don't want. Remove them. You're looking for /\d+/ or /[0-9]+/