regex match where order of substrings doesn't matter - regex

have a problem in using regexp. I have a code of the following format.
(01)123456789(17)987654321
Now I want to capture the digits after (01) in a named group: group01 and the digits after (17) in a namedGroup: group17.
the problem is that the code could be in different order like this:
(17)987654321(01)123456789
the named groups should contain the same content.
any ideas?
thank you Marco

In Python, PCRE and PHP
(?:(?<=\(17\))(?<group17>\d+)|(?<=\(01\))(?<group01>\d+)|.)+
.Net supports the above syntax and this one:
(?:(?<=\(17\))(?'group17'\d+)|(?<=\(01\))(?'group01'\d+)|.)+

This worked for me:
(?<group01>\(01\))[0-9]{9}|(?<group17>\(17\))[0-9]{9}

Everyone seems to be hardcoding "01" and "17". Here's a more general solution:
while ( my $data =~ /\((\d+)\)(\d+)/g ) {
my $group_number = $1;
my $group_data = $2;
$group{$group_number} = $group_data;
}
As long as you have unsatisfied (numbers)numbers patterns matching in your data, it will grab each one in succession. In this Perl snippet, it stores each group's data into a hash keyed on the group number.

you didn't say what language, they all have their own quirks. But something like this should work if there is always 9 digits after the (). ( In Ruby)
No groups, but its a little clearer like this, in my opinion, may not work for you.
string = "(01)123456789(17)987654321"
group17 = string =~ /\(17\)\d{9}/
group01 = string =~ /\(01\)\d{9}/
string[group17+4,9]
string[group01+4,9]
EDIT:
with named capture groups in ruby 1.9:
string = "(01)123456789(17)987654321"
if string =~ /\(17\)(?<g17>\d{9})/
match = Regexp.last_match
group17 = match[:g01]
end
if string =~ /\(01\)(?<g01>\d{9})/
match = Regexp.last_match
group01 = match[:g01]
end

Seeking for something like this?
(01|17)(\d*?)(01|17)(\d*?)
Expected matches:
0 => In most cases the whole match
1 => 01 or 17
2 => first decimal string
3 => second 01 or 17
4 => second decimal string
Tell me if it helps.

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski
Glib quotes aside, regex seems like overkill. Python code:
string = "(17)987654321(01)123456789"
substrings = [s for s in string.split("(") if len(s) > 0]
results = dict()
for substring in substrings:
substring = substring.split(")")
results["group" + substring[0]] = substring[1]
print results
>>> {'group17': '987654321', 'group01': '123456789'}

Related

regex, period allowed, not comma

Hi Im looking for a regex for
Valid:
20000
20.000
If a comma is used, it should not match with the comma and whats after.
Not valid
20.000,12
Right now Im using:
([0-9]+([.][0-9]+)*)+?
But this one also takes the last 2 digits after comma.
You could use
^\d+(?:\.\d+)*
# start of line, 1+ digits, .1234 eventually
See a demo on regex101.com.
If you add a ^ to the beginning of the regex, only the part from the start of the string will match
^([0-9]+([.][0-9]+)*)+?
But i think
^\d+(\.\d+)*
is the better solution to match numbers
If you want to match floating point numbers then use (^\d*.?\d+)
There is nothing in the question that suggests that the outcome needs to be a valid number. Therefore it looks like the expression needs to accept any number or a period in which case the regex is quite straightforward and the following should work:
^([0-9\.]+)$
Here are my tests to demonstrate the outcome
20000 - pass
20.000 - pass
20.000,12 - fail
1.000.000 - pass
23000,000 - fail
For info, I used the php code below for my test:
$testdata = array('20000', '20.000', '20.000,12', '1.000.000', '23000,000');
$pattern = "/^([0-9\.]+)$/";
foreach ($testdata as $k => $v) {
$result = preg_match($pattern, $v)? 'pass': 'fail';
echo "".$v." - ".$result."<br />";
}

Matching unordered substrings

I'm trying to write a regular expression that will remove/replace a problem string from my target string. In this case, my problem string is:
top:
My target string is:
F12+ vAWGPHGM
The challenge is the problem string is not always whole/intact and can come as individual characters. For example:
F 1t2op+:vAWGPHGM
F t12o+p: vAWGPHGM
F1t2op+:vAWGPHGM
F 12top+: vAWGPHGM
I'm using pcre (php) regex.
Other considerations include the number above can be one or two digits and plus is not always present. I've been trying to figure this out on regex101, but with not much luck. Regex101
You can use 2 captured groups to capture digits before and after t and use their back-references in replacement:
$repl = preg_replace('/\h*(\d*)t(\d*)o\+?p\+?:\h*/', '$1$2+ ', $str);
For all the 4 cases, replacement result will be:
F12+ vAWGPHGM
Updated RegEx Demo
If I understood well your question, you just want to get rid off characters t, o, p, and :, if this is the case, then you can use a character class like this:
[top:]
Working demo
Php code
$str = 'F 1t2op+:vHGM
F t12op: vHGM
1t2op+:vHGM
F 12top+: vHGM';
$result = preg_replace('/[top:]/', '', $str);
Keep in mind, that this doesn't follow any order, this just removes those chars from your string.

Matching incremental digits in numbers

After googling many days about the issue, finally I am posting this question here and hoping to get it solved by experts here; I am looking for the regex pattern that can match incremental back references. Let me explain:
For number 9422512322, the pattern (\d)\1 will match 22 two times, and I want the pattern (something like (\d)\1+1) that matches 12 (second digit is equal to first digit + 1)
In short the pattern should match all occurrence like 12, 23, 34, 45, 56, etc... There is no replacement, just matches required.
What about something like this?
/01|12|23|34|45|56|67|78|89/
It isn't sexy but it gets the job done.
You can use this regex:
(?:0(?=1)|1(?=2)|2(?=3)|3(?=4)|4(?=5)|5(?=6)|6(?=7)|7(?=8)|8(?=9))+.
This will match:
Any 0s which are followed by 1s, or
Any 1s which are followed by 2s, or
Any 2s which are followed by 3s, ...
Multiple times +, then match the corresponding character ..
Here is a regex demo, and the match is:
12345555567877785
You can run code within Perl regular expressions that can
control the regex execution flow. But, this is not likely
to be implemented anywhere else to this degree.
PCRE has some program variable interaction, but not like Perl.
(Note - to do overlap finds, replace the second ( \d ) with (?=( \d ))
then change print statement to print "Overlap Found $1$3\n";
If you use Perl, you can do all kinds of math-character relationships that can't be
done with brute force permutations.
- Good luck!
Perl example:
use strict;
use warnings;
my $dig2;
while ( "9342251232288 6709090156" =~
/
(
( \d )
(?{ $dig2 = $^N + 1 })
( \d )
(?(?{
$dig2 != $^N
})
(?!)
)
)
/xg )
{
print "Found $1\n";
}
Output:
Found 34
Found 12
Found 67
Found 01
Found 56
Here is one way to do it in Perl, using positive lookahead assertions:
#!/usr/bin/env perl
use strict;
use warnings;
my $number = "9422512322";
my #matches = $number =~ /(0(?=1)|1(?=2)|2(?=3)|3(?=4)|4(?=5)|5(?=6)|6(?=7)|7(?=8)|8(?=9))/g;
# We have only matched the first digit of each pair of digits.
# Add "1" to the first digit to generate the complete pair.
foreach my $first (#matches) {
my $pair = $first . ($first + 1);
print "Found: $pair\n";
}
Output:
Found: 12
Found: 23

Good way to find / replace using regex

I have an value that is not being read by our OCR program correctly. It's predicable so I would like to use a find/replace in regex (because this is how we are already extracting the data).
We get the named group like this: (?<Foo>.*?)
I would like to replace 'N1123456' with 'NY123456'. We know that we expect NY when we are getting N1.
What can I try to get this done in the same regular expression?
Edit: (?<Foo>.*?)
Make groups of non digits and digits and add Y after non digit group.
(\D+)(\d+)
Here is demo
Enclose it inside \b or ^ and $ for better precision.
Sample code:
PHP:
$re = ""(\\D+)(\\d+)"";
$str = "N1123456";
$subst = '$1Y$2';
$result = preg_replace($re, $subst, $str, 1);
Python:
import re
p = re.compile(ur'(\D+)(\d+)')
test_str = u"N1123456"
subst = u"$1Y$2"
result = re.sub(p, subst, test_str)
Java:
System.out.println("N1123456".replaceAll("(\\D+)(\\d+)","$1Y$2"));
If you expect N1 to be always followed by 6 digits, then you can do this:
Replace this: \bN1(\d{6})\b with this: NY$1.
This will replace any N1 followed by 6 digits with NY.
This is what I would do:
Dim str = Regex.Replace("N1123456", #"\bN1(\d+)", "NY$1");
The Expression to find the text is N1 followed by numbers like : \bN1(\d+).
The numbers belongs to the group(1) I would like to preserve and attach to NY during replacing: NY$1

Convert string with preg_replace in PHP

I have this string
$string = "some words and then #1.7 1.7 1_7 and 1-7";
and I would like that #1.7/1.7/1_7 and 1-7 to be replaced by S1E07.
Of course, instead of "1.7" is just an example, it could be "3.15" for example.
I managed to create the regular expression that would match the above 4 variants
/\#\d{1,2}\.\d{1,2}|\d{1,2}_\d{1,2}|\d{1,2}-\d{1,2}|\d{1,2}\.\d{1,2}/
but I cannot figure out how to use preg_replace (or something similar?) to actually replace the matches so they end up like S1E07
You need to use preg_replace_callback if you need to pad 0 if the number less than 10.
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$string = preg_replace_callback('/#?(\d+)[._-](\d+)/', function($matches) {
return 'S'.$matches[1].'E'.($matches[2] < 10 ? '0'.$matches[2] : $matches[2]);
}, $string);
You could use this simple string replace:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/', 'S${1}E${2}', $string);
But it would not yield zero-padded numbers for the episode number:
// some words and then S1E7 S1E7 S1E7 and S1E7
You would have to use the evaluation modifier:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/e', '"S".str_pad($1, 2, "0", STR_PAD_LEFT)."E".str_pad($2, 2, "0", STR_PAD_LEFT)', $string);
...and use str_pad to add the zeroes.
// some words and then S01E07 S01E07 S01E07 and S01E07
If you don't want the season number to be padded you can just take out the first str_pad call.
I believe this will do what you want it to...
/\#?([0-9]+)[._-]([0-9]+)/
In other words...
\#? - can start with the #
([0-9]+) - capture at least one digit
[._-] - look for one ., _ or -
([0-9]+) - capture at least one digit
And then you can use this to replace...
S$1E$2
Which will put out S then the first captured group, then E then the second captured group
You need to put brackets around the parts you want to reuse ==> capture them. Then you can access those values in the replacement string with $1 (or ${1} if the groups exceed 9) for the first group, $2 for the second one...
The problem here is that you would end up with $1 - $8, so I would rewrite the expression into something like this:
/#?(\d{1,2})[._-](\d{1,2})/
and replace with
S${1}E${2}
I tested it on writecodeonline.com:
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$result = preg_replace('/#?(\d{1,2})[._-](\d{1,2})/', 'S${1}E${2}', $string);