how to extract a single digit in a number using regexp - regex

set phoneNumber 1234567890
this number single digit, i want divide this number into 123 456 7890 by using regexp. without using split function is it possible?

The following snippet:
regexp {(\d{3})(\d{3})(\d{4})} "8144658695" -> areacode first second
puts "($areacode) $first-$second"
Prints (as seen on ideone.com):
(814) 465-8695
This uses capturing groups in the pattern and subMatchVar... for Tcl regexp
References
http://www.hume.com/html84/mann/regexp.html
regular-expressions.info/Brackets for Capturing
On the pattern
The regex pattern is:
(\d{3})(\d{3})(\d{4})
\_____/\_____/\_____/
1 2 3
It has 3 capturing groups (…). The \d is a shorthand for the digit character class. The {3} in this context is "exactly 3 repetition of".
References
regular-expressions.info/Repetition, Character Class

my($number) = "8144658695";
$number =~ m/(\d\d\d)(\d\d\d)(\d\d\d\d)/;
my $num1 = $1;
my $num2 = $2;
my $num3 = $3;
print $num1 . "\n";
print $num2 . "\n";
print $num3 . "\n";
This is writen for Perl and works assuming the number is in the exact format you specified, hope this helps.
This site might help you with regex
http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

Related

Matching nth occurrence number in a string that contain hex number

I have string that contain both word, hex and digit I want to match the number in the string but when the hex contain digit it will mesh out with my digit.
my $string = " aabb is = 35"; #where aabb is hex number
my $string1 = " abc0 is = 75" ; #where abc0 is hex number
my (#val1) = $string =~ /(\d+)/g;
print "$val1[0]\n";
#val1 = $string1 =~ /(\d+)/g;
print " $val1[1]\n";
The above scripts can get the number after = that I want but I need to hard code $val[0] $val1[1] to get the number. Anyway I able to match the digit by ignore the hex number in case i am not sure which number will reach the number I want? So I can just print $val1 to get the number I want.
8/30 Update, Thanks for Toto point out
Will be the case that first hex number contain all digit 1234 and also the number I want to match will not necessary be a last word in string. The number I want will not necessary the second number but will be after =
Since the hex may contain only digits (1234) the two numbers can't be distinguished by format.
The shown strings allow to match positionally (end of string) or based on = preceding it
my ($num) = $string =~ /([0-9]+)$/;
my ($num) = $string =~ /=\s*([0-9]+)\b/;
or make use of some other "landmark" in your strings, if different from shown samples.
Given the clarification in the question's edit the second example above is suitable.
Original post, before comments (edited)
A number won't have letters (unless it involves exponents), so use a word boundary
my ($num) = $string =~ /\b([0-9]+)\b/;
for an (unsigned) integer.
To allow +/- and/or a floating point format
my ($num) = $string =~ /( [+-]? [0-9]+\.?[0-9]* )/x;
but note that this leaves out some formats used for numbers. The looks_like_number from core Scalar::Util is more reliable, and one can first match more broadly and then filter the list with it.
You can use \d instead of 0-9 but this matches many extra characters unless it is used with /a modifier(s), available since 5.14. But note that the /a has a broader effect than restricting to ASCII only numbers (\d). See perlre (search for /a).
You can use (\d+$) to match numbers only if they are at the end of string.
How about?
This is matching the second number that exists in the string:
while(<DATA>) {
my ($dec) = $_ =~ /.+\b(\d+)\b/a;
say $dec;
}
__END__
hex abc123 dec 456 blah
hex 123123 dec 789456 blah
Output:
456
789456

perl regex to convert currency

i need some help in text cleaning/normalization process
i struck at a place where i need to convert a currency format
input: $100 million output: 100 million dollar
input: eur20 million output: 20 million euros
i'm using perl regex for the cleaning process, help will be appreciated if someone can help me in providing a regex to convert input to output
this is my code so far
s/([\$])([0-9\.])([million])/ $2 $3 dollars/g;
example number is $4.2 million
this is what i tried for converting dollars symbol into word "dollars" and shift it to end of phrase, but it is not providing the result as expected, it provide me ".2 million" as output
[...] in a regex introduces a character class, so [million] is the same as [nolim], and it matches one of those characters.
I'd create a translation table for the currencies in a hash. From the keys of the hash, you can build a regex that matches them, and use it in the replacement:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use feature qw{ say };
my %currency = ( '$' => 'dollar', # or dollars?
eur => 'euros',
'€' => 'euros',
);
my $regex = join '|', map quotemeta, keys %currency;
for my $input ('$100 million', 'eur20 million', '€13.2 thousand') {
( my $output = $input )
=~ s/($regex)([0-9.]+ (?:million|thousand))/$2 $currency{$1}/g;
say $output;
}
Your regex does not give the result you claim it does.
s/([\$])([0-9.])([million])/ $2 $3 dollars/g;
With the help of the /x modifier we can add whitespace (even newlines and comments) to the pattern to improve readability. Your pattern can then be re-written as
s/([\$]) # match a literal $ and capture that as $1
([0-9.]) # match ONE digit or a dot and capture as $2
([million]) # match ONE character of 'm', 'i', 'l', 'o', 'n'
# and capture as $3
/ $2 $3 dollars/gx;
There is no way $100 million matches this pattern and results in .2 million. Possible inputs would be
$3i, $.o or $9m. They would give 3 i dollars, . o dollars, and 9 m dollars.
What you are looking for is a pattern like this:
s/\$ # a literal '$'
([\d.]+) # one or more digits or dots, like e.g. '99.5',
# captured as $1
\s+ # one or more whitespace
(million) # the literal text 'million', captured as $2
/$1 $2 dollars/gx;
(or, as a one-liner: s/\$([\d.]+)\s+(million)/$1 $2 dollars/g;)
Note that $2 in this case always is million and you could also rewrite it as s/\$([\d.]+)\s+million/$1 million dollars/g; (omitting the () around million).

Get Value from string between ":" and ","

Here is example of my string
"{"id":128,"order":128,"active":"1","name":"\"
Now I need to get "128" - id parameter. So its first value between ":" and ",".
I have tried with preg_match and different regular expressions but I'm just not good in regular expressions. Maybe someone will knew how to make it ?
$id = preg_match('/:(,*?)\,/s', $content, $matches);
Here is a sample code to get the number after the first : using a regex:
$re = "/(?<=\\:)[0-9]+/";
$str = "\"{\"id\":128,\"order\":128,\"active\":\"1\",\"name\":\"\"";
preg_match($re, $str, $matches);
print $matches[0];
Here is a sample program on TutorialsPoint.
Just a small detail about this regex (?<=\\:)[0-9]+: it uses a fixed-width look-behind that PHP supports, fortunately.
<?php
$txt='"{"id":128,"order":128,"active":"1","name":"\\"';
$re1='.*?'; # Non-greedy match on filler
$re2='(\\d+)'; # Integer Number 1
$re3='.*?'; # Non-greedy match on filler
$re4='(\\d+)'; # Integer Number 2
$re5='.*?'; # Non-greedy match on filler
$re6='(\\d+)'; # Integer Number 3
if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6."/is",$txt, $matches))
{
$int1=$matches[1][0];
$int2=$matches[2][0];
$int3=$matches[3][0];
print "($int1) ($int2) ($int3) \n";
}
?>

Extract strings in quotation marks and brackets

My string test is :
My name is "Ralph" ("France" is my country, "123" my age, ... , "an other text", ...)
I want to get strings in quotation marks, but only these in brackets. In my example : strings France and 123.
I've tested this pattern :
#\(.*"(.*)".*\)#
but it only matches the last string 123 (I use preg_match_all(), so it should return every result, no?)
If I add the Ungreedy option it matches only the first string France. So I don't understand why it wasn't greedy without the U option, and is there a way to obtain my strings in quotation marks and in brackets?
Thanks,
Raphaël N.
The only way I could get this to work is by using:
$subject = 'My 00123450 "Ralph" ("France" is my country, "123" my age, ... , "an other text", ...)';
$pattern = '/\((?:[^"\)]*"(.*?)")?(?:[^"\)]*"(.*?)")?(?:[^"\)]*"(.*?)")?[^"]*?\)/';
preg_match_all($pattern, $subject, $matches);
for ($i = 1; $i < count($matches); $i++)
{
print($i.': '.$matches[$i][0].";\n");
}
Output:
1: France;
2: 123;
3: an other text;
That Regex would only work for up to 3 occurrences of "quoted strings" inside a set of brackets. You could however extend the regex string to grab up to N occurrences as follows:
The regex to find 1 to N quoted strings within each set of parenthesis is:
n=1 /\((?:[^"\)]*"(.*?)")?[^"]*?\)/
n=2 /\((?:[^"\)]*"(.*?)")?(?:[^"\)]*"(.*?)")?[^"]*?\)/
n=3 /\((?:[^"\)]*"(.*?)")?(?:[^"\)]*"(.*?)")?(?:[^"\)]*"(.*?)")?[^"]*?\)/
To find 1-N strings you repeat the section (?:[^"\)]*"(.*?)")? N times. For 1-100 strings within each set of parenthesis you would have to repeat that section 100 times - obviously the regex would start evaluating very slowly.
I realise that isn't by any means ideal, but its my best effort at a 1 pass solution.
In 2 passes:
$subject = 'My name is "Ralph" ("France" is my country, "123" my age, ... , "an other text", ...)';
$pattern = '/\(.*?\)/';
preg_match_all($pattern, $subject, $matches);
$pattern2 = '/".*?"/';
preg_match_all($pattern2, $matches[0][0], $matches2);
print_r($matches2);
Produces correct output in 2 passes. Eagerly awaiting an answer that shows how to do it in 1 though. I've tried every variation I could think of, but can't get it to include overlapping matches.
Keep it simple and do it in 2 steps:
$s = 'My name is "Ralph" ("France" is my country, "123" my age) and "I" am. ';
$str = preg_replace('#^.*?\(([^)]*)\).*$#', '$1', $s);
if (preg_match_all('/"([^"]*)"/', $str, $arr))
print_r($arr[0]);
OUTPUT:
Array
(
[0] => "France"
[1] => "123"
)
This should work for you:
\("([^"]+)".+?"(.+)"
Explained:
\(" - match a bracket and double quote
([^"]+)" - capture everything inside double quotes
.+?" - match anything up to the next double quote
(.+) - capture everything that isn't a double quote
" - match last double quote
As long as your sample data is exactly as given, the regex should work

In Perl, how can I detect if a string has multiple occurrences of double digits?

I wanted to match 110110 but not 10110. That means at least twice repeating of two consecutive digits which are the same. Any regex for that?
Should match: 110110, 123445446, 12344544644
Should not match: 10110, 123445
/(\d)\1.*\1\1/
This matches a string with 2 instances of a double number, ie 11011 but not 10011
\d matches any digit
\1 matches the first match effectively doubling the first entry
This will also match 1111. If there needs to be other characters between change .* to .+
ooh, this looks neater
((\d)\2).*\1
If you want to find non-matching values, but there has to be 2 sets of doubles, then you would simply need to add the first part again as in
((\d)\2).*((\d)\4)
The bracketing would mean that $1 and $3 would contain the double digits and $2 and $4 contains the single digits (which are then doubled).
11233
$1=11
$2=1
$3=33
$4=3
If I understand correctly, your regexp will be:
m{
(\d)\1 # First repeated pair
.* # Anything in between
(\d)\2 # Second repeated pair
}x
For example:
for my $x (qw(110110 123445446 12344544644 10110 123445)) {
my $m = $x =~ m{(\d)\1.*(\d)\2} ? "matches" : "does not match";
printf "%-11s : %s\n", $x, $m;
}
110110 : matches
123445446 : matches
12344544644 : matches
10110 : does not match
123445 : does not match
If you're talking about all digits, this will do it:
00.*00|11.*11|22.*22|33.*33|44.*44|55.*55|66.*66|77.*77|88.*88|99.*99
It's just 9 different patterns OR'ed together, each of which checks for at least two occurrences of the desired 2-digit pattern.
Using Perls more advanced REs, you can use the following for two consecutive digits twice:
(\d)\1.*\1\1
or, as one of your comments states, two consecutive digits follwed somewhere by two more consecutive digits which may not be the same:
(\d)\1.*(\d)\2
depending on how your data is, here's a minimal regex way.
while(<DATA>){
chomp;
#s = split/\s+/;
foreach my $i (#s){
if( $i =~ /123445/ && length($i) ne 6){
print $_."\n";
}
}
}
__DATA__
This is a line
blah 123445446 blah
blah blah 12344544644 blah
.... 123445 ....
this is last line
There is no reason to do everything in one regex... You can use the rest of Perl as well:
#!/usr/bin/perl -l
use strict;
use warnings;
my #strings = qw( 11233 110110 10110 123445 123445446 12344544644 );
print if is_wanted($_) for #strings;
sub is_wanted {
my ($s) = #_;
my #matches = $s =~ /(?<group>(?<first>[0-9])\k<first>)/g;
return 1 < #matches / 2;
}
__END__
If I've understood your question correctly, then this, according to regexbuddy (set to using perl syntax), will match 110110 but not 10110:
(1{2})0\10
The following is more general and will match any string where two equal digits is repeated later on in the string.
(\d{2})\d+\1\d*
The above will match the following examples:
110110
110011
112345611
2200022345
Finally, to find two sets of double digits in a string and you don't care where they are, try this:
\d*?(\d{2})\d+?\1\d*
This will match the examples above plus this one:
12345501355789
Its the two sets of double 5 in the above example that are matched.
[Update]
Having just seen your extra requirement of matching a string with two different double digits, try this:
\d*?(\d)\1\d*?(\d)\2\d*
This will match strings like the following:
12342202345567
12342202342267
Note that the 22 and 55 cause the first string to match and the pair of 22 cause the second string to match.