Matching nth occurrence number in a string that contain hex number - regex

I have string that contain both word, hex and digit I want to match the number in the string but when the hex contain digit it will mesh out with my digit.
my $string = " aabb is = 35"; #where aabb is hex number
my $string1 = " abc0 is = 75" ; #where abc0 is hex number
my (#val1) = $string =~ /(\d+)/g;
print "$val1[0]\n";
#val1 = $string1 =~ /(\d+)/g;
print " $val1[1]\n";
The above scripts can get the number after = that I want but I need to hard code $val[0] $val1[1] to get the number. Anyway I able to match the digit by ignore the hex number in case i am not sure which number will reach the number I want? So I can just print $val1 to get the number I want.
8/30 Update, Thanks for Toto point out
Will be the case that first hex number contain all digit 1234 and also the number I want to match will not necessary be a last word in string. The number I want will not necessary the second number but will be after =

Since the hex may contain only digits (1234) the two numbers can't be distinguished by format.
The shown strings allow to match positionally (end of string) or based on = preceding it
my ($num) = $string =~ /([0-9]+)$/;
my ($num) = $string =~ /=\s*([0-9]+)\b/;
or make use of some other "landmark" in your strings, if different from shown samples.
Given the clarification in the question's edit the second example above is suitable.
Original post, before comments (edited)
A number won't have letters (unless it involves exponents), so use a word boundary
my ($num) = $string =~ /\b([0-9]+)\b/;
for an (unsigned) integer.
To allow +/- and/or a floating point format
my ($num) = $string =~ /( [+-]? [0-9]+\.?[0-9]* )/x;
but note that this leaves out some formats used for numbers. The looks_like_number from core Scalar::Util is more reliable, and one can first match more broadly and then filter the list with it.
You can use \d instead of 0-9 but this matches many extra characters unless it is used with /a modifier(s), available since 5.14. But note that the /a has a broader effect than restricting to ASCII only numbers (\d). See perlre (search for /a).

You can use (\d+$) to match numbers only if they are at the end of string.

How about?
This is matching the second number that exists in the string:
while(<DATA>) {
my ($dec) = $_ =~ /.+\b(\d+)\b/a;
say $dec;
}
__END__
hex abc123 dec 456 blah
hex 123123 dec 789456 blah
Output:
456
789456

Related

Subtract pattern matching from array element perl

I have written a code which allows me to subtract a specific value (ex: FP=0.021) from one element from an array if it matches a specific pattern. Here it is the code:
if ($info =~ /FP=/) {
my #array1 = split(';', $info);
if ($array1[$#array1] =~ /=([^.]*)/){
my $name1= $-[1];
$FPvalue = substr($array1[$#array1], $name1);
if ($FPvalue < 0.0001){
push(#FPvalues,$FPvalue);
Where $info is a string which contains information separated by a semicolon character (;).
I am lucky and the "FP=0.021" element is the last element from my array. But I would like to know a way for subtract it without using the expression: $array1[$#array1]
I would appreciate your help, thanks!
It is hard to tell without sample input data, but I think you want
push #FPvalues, $1 if $info =~ /FP=([\d.]+)/
It works by searching the string in $info for the sequence FP= followed by a number of dots and decimal digits. If that pattern is found, then the dots and digits part is put into $1 and pushed onto the array.
Here how you can parse the decimal number from the string as it resides at the end of the string:
$str = "asdsa;adsasd;adsasd;FP=0.021";
if($str =~ /=(\d+\.?\d+)$/){
print $1;
}

Getting equal number of digits on both sides of a character in a string

I have a string
$test = 'xyz45sd2-32d34-sd23-456562.abc.com'
The objective is to obtain $1 = 23 and $2 = 45 i.e equal number of digits on both sides of the last -. Note that the number of digits is variable, and is not necessarily 2.
I have tried the following:
$test1 =~ s/.*(\d+)-(\d+).*//;
But
$1 contains 3
$2 contains 456562
You can try this regex
if($test1 =~ m/(\S+)-(\S+)-([a-z]*)(\d+)-(\d\d)(\d+).*/)
{
print $4,"|",$5;
}
I assume that u need only the first 2 didgits from 456562
perl -e '"xyz45sd2-32d34-sd23-456562.abc.com" =~ /(\d{2})-(\d{2})\d*(?=\.)/; print "$1\n$2\n"'
This other entry confirms that regex does not count:
How to match word where count of characters same
Building upon GreatBigBore's idea, if there's an upper bound to the count, then you could try the or operator |. This only matches your requirement to find a match; depending on the matched count the match will be in different bins. Only one case correctly places them in $1 and $2.
(\d{3})-(\d{3})|(\d{2})-(\d{2})|(\d{1})-(\d{1})
However if you concatenate the result captures as $1$3$5 and $2$4$6, you will effectively get the 2 stings you were looking for.
Another idea is to operate iteratively, you could repeat your search on the string by increasing the number until the match fails. (\d{1})-(\d{1}) , (\d{2})-(\d{2}) ...
A binary search comes to mind making it an O{ln(N)}, N being the upper limit for the capture length.
Theoretical answer
Short answer:
What you're looking for is not possible using regular expressions.
Long Answer:
Regular expressions (as their name suggests) are a compact representation of Regular languages (Type-3 grammars in the Chomsky Heirarchy).
What you're looking for is not possible using regular expressions as you're trying to write out an expression that maintains some kind of count (some contextual information other than beginning and end). This kind of behavior cannot be modelled as a DFA(actually any Finite Automaton). The informal proof of whether a language is regular is that there exists a DFA that accepts that language. As this kind of contextual information cannot be modeled in a DFA, thus by contradiction, you cannot write a regular expression for your problem.
Practical Solution
my ($lhs,$rhs) = $test =~ /^[^-]+-[^-]+-([^-]+)-([^-.]+)\S+/;
# Alernatively and faster
my (undef,undef,$lhs,$rhs) = split /-/, $test;
# Rest is common, no matter how $lhs and $rhs is extracted.
my #left = reverse split //, $lhs;
my #right = split //, $rhs;
my $i;
for($i=0; exists($left[$i]) and exists($right[$i]) and $left[$i] =~ /\d/ and $right[$i] =~ /\d/ ; ++$i){}
--$i;
$lhs= join "", reverse #left[0..$i];
$rhs= join "", #right[0..$i];
print $lhs, "\t", $rhs, "\n";
Edit: It's possible to improve the my solution by using regular expressions to extract the required numeric portions of $lhs and $rhs instead of split, reverse and for.
as #Samveen said it's technically not possible to do in pure regex
And Like #Samveen solution here's another version
#get left and right
my (undef,undef,$left,$right) = split /-/, $test;
#get left numbers
$left =~ s/.*?(\d+)$/$1/;
##get right numbers
$right =~ s/^(\d+).*/$1/;
##get length of both
my $right_length = length $right;
my $left_length = length $left;
if ($right_length > $left_length){
#make right length as same as left length
$right =~ s/(\d{$left_length}).*/$1/;
} else {
#make left length as same as right length
$left =~ s/.*(\d{$right_length})/$1/;
}
print $left, "\t", $right, "\n";

Search for value from command output and just print that found value

I am calling my programm from perl and getting the output with:
$output = `$calling 2>>bla.txt`;
Now I need just a specific value that will be presented in the output which I can check with Regex.
The needed output is:
Distance from Segment XY to its Centroid is: 3.455564713591596
Where XY is any number, and I just match for the "to its Centroid is: " the following:
if( $output =~ m/\sto\sits\sCentroid\sis:\s(\d)*$/)
But how do I get only the value that is presented near to the end?
I just want it to be printed on the screen.
Any advice?
Instead of \d* ("zero or more digits"), you probably need to match \d+([.]\d+)? ("one or more digits, optionally followed by a decimal point and one or more additional digits"). That would give you:
if( $output =~ m/\sto\sits\sCentroid\sis:\s\d+([.]\d+)?$/)
(hat-tip to Jonathan Leffler for pointing that out).
That done — you want to capture the \d+([.]\d+)?, so, wrap it in parentheses to create a capture-group:
if( $output =~ m/\sto\sits\sCentroid\sis:\s(\d+([.]\d+)?)$/)
and then the special variable $1 will be whatever it captured:
if( $output =~ m/\sto\sits\sCentroid\sis:\s(\d+([.]\d+)?)$/)
{ print $1; }
See the "Extracting matches" section of the perlretut ("Perl regular expressions tutorial") manual-page.
By the way, \s matches a single white-space character. Usually you'd want either to match only an actual space — write e.g. to its rather than to\sits — or to match one or more white-space characters — e.g. to\s+its.
You print the number you captured in the regex with the parentheses:
print "$1\n" if ($output =~ m/\sto\sits\sCentroid\sis:\s([-+]?\d*\.?\d+)$/);
You also make sure that the regex can pick up a number with a decimal point, and I've allowed an optional sign, too. If you need to worry about optional exponents, add (?:[eE][-+]?\d+)? after the \d+ in my regex.
If you have other things to do with the value, then convert into a regular if statement:
if ($output =~ m/\sto\sits\sCentroid\sis:\s([-+]?\d*\.?\d+)$/)
{
print "$1\n";
process_centroid($1);
}

how to extract a single digit in a number using regexp

set phoneNumber 1234567890
this number single digit, i want divide this number into 123 456 7890 by using regexp. without using split function is it possible?
The following snippet:
regexp {(\d{3})(\d{3})(\d{4})} "8144658695" -> areacode first second
puts "($areacode) $first-$second"
Prints (as seen on ideone.com):
(814) 465-8695
This uses capturing groups in the pattern and subMatchVar... for Tcl regexp
References
http://www.hume.com/html84/mann/regexp.html
regular-expressions.info/Brackets for Capturing
On the pattern
The regex pattern is:
(\d{3})(\d{3})(\d{4})
\_____/\_____/\_____/
1 2 3
It has 3 capturing groups (…). The \d is a shorthand for the digit character class. The {3} in this context is "exactly 3 repetition of".
References
regular-expressions.info/Repetition, Character Class
my($number) = "8144658695";
$number =~ m/(\d\d\d)(\d\d\d)(\d\d\d\d)/;
my $num1 = $1;
my $num2 = $2;
my $num3 = $3;
print $num1 . "\n";
print $num2 . "\n";
print $num3 . "\n";
This is writen for Perl and works assuming the number is in the exact format you specified, hope this helps.
This site might help you with regex
http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

In Perl, how can I detect if a string has multiple occurrences of double digits?

I wanted to match 110110 but not 10110. That means at least twice repeating of two consecutive digits which are the same. Any regex for that?
Should match: 110110, 123445446, 12344544644
Should not match: 10110, 123445
/(\d)\1.*\1\1/
This matches a string with 2 instances of a double number, ie 11011 but not 10011
\d matches any digit
\1 matches the first match effectively doubling the first entry
This will also match 1111. If there needs to be other characters between change .* to .+
ooh, this looks neater
((\d)\2).*\1
If you want to find non-matching values, but there has to be 2 sets of doubles, then you would simply need to add the first part again as in
((\d)\2).*((\d)\4)
The bracketing would mean that $1 and $3 would contain the double digits and $2 and $4 contains the single digits (which are then doubled).
11233
$1=11
$2=1
$3=33
$4=3
If I understand correctly, your regexp will be:
m{
(\d)\1 # First repeated pair
.* # Anything in between
(\d)\2 # Second repeated pair
}x
For example:
for my $x (qw(110110 123445446 12344544644 10110 123445)) {
my $m = $x =~ m{(\d)\1.*(\d)\2} ? "matches" : "does not match";
printf "%-11s : %s\n", $x, $m;
}
110110 : matches
123445446 : matches
12344544644 : matches
10110 : does not match
123445 : does not match
If you're talking about all digits, this will do it:
00.*00|11.*11|22.*22|33.*33|44.*44|55.*55|66.*66|77.*77|88.*88|99.*99
It's just 9 different patterns OR'ed together, each of which checks for at least two occurrences of the desired 2-digit pattern.
Using Perls more advanced REs, you can use the following for two consecutive digits twice:
(\d)\1.*\1\1
or, as one of your comments states, two consecutive digits follwed somewhere by two more consecutive digits which may not be the same:
(\d)\1.*(\d)\2
depending on how your data is, here's a minimal regex way.
while(<DATA>){
chomp;
#s = split/\s+/;
foreach my $i (#s){
if( $i =~ /123445/ && length($i) ne 6){
print $_."\n";
}
}
}
__DATA__
This is a line
blah 123445446 blah
blah blah 12344544644 blah
.... 123445 ....
this is last line
There is no reason to do everything in one regex... You can use the rest of Perl as well:
#!/usr/bin/perl -l
use strict;
use warnings;
my #strings = qw( 11233 110110 10110 123445 123445446 12344544644 );
print if is_wanted($_) for #strings;
sub is_wanted {
my ($s) = #_;
my #matches = $s =~ /(?<group>(?<first>[0-9])\k<first>)/g;
return 1 < #matches / 2;
}
__END__
If I've understood your question correctly, then this, according to regexbuddy (set to using perl syntax), will match 110110 but not 10110:
(1{2})0\10
The following is more general and will match any string where two equal digits is repeated later on in the string.
(\d{2})\d+\1\d*
The above will match the following examples:
110110
110011
112345611
2200022345
Finally, to find two sets of double digits in a string and you don't care where they are, try this:
\d*?(\d{2})\d+?\1\d*
This will match the examples above plus this one:
12345501355789
Its the two sets of double 5 in the above example that are matched.
[Update]
Having just seen your extra requirement of matching a string with two different double digits, try this:
\d*?(\d)\1\d*?(\d)\2\d*
This will match strings like the following:
12342202345567
12342202342267
Note that the 22 and 55 cause the first string to match and the pair of 22 cause the second string to match.