Regexp to capture overlapping matches - regex

I am looking for a regexp option or trick to capture all possible strings in a regexp when matches can overlap.
Example : /A.A/ in string "ABACADA"
It finds : ABA, ADA and not ACA !!
I would like : ABA, ACA, ADA
I am working in PHP, but it can be applied to other languages
preg_match_all('/A.A/',"ABACADA",$matches);
var_dump($matches[0]);
// output : array (size=2)
// 0 => string 'ABA' (length=3)
// 1 => string 'ADA' (length=3)
Can you help me? Thanks

You can use a positive lookahead assertion to get all 3 matches:
(?=(A.A))
RegEx Demo
For your input it finds 3 matches in captured group #1:
ABA
ACA
ADA
PHP Code:
if (preg_match_all('/(?=(A.A))/', "ABACADA", $m))
print_r($m[1]); // printing index 1
Output:
Array
(
[0] => ABA
[1] => ACA
[2] => ADA
)

Related

Preg_match / split barcode

I am struggeling with reading a GS1-128 barcode, and trying to split it up into the segments it contains, so I can fill out a form automatically.
But I can't figure it out. Scanning my barcode gives me the following:
]d2010704626096200210KT0BT2204[GS]1726090021RNM5F8CTMMBHZSY7
So I tried starting with preg_match and made the following:
/]d2[01]{2}\d{14}[10|17|21]{2}(\w+)/
Which gives me this result:
Array ( [0] => ]d2010704626096200210KT0BT2204 [1] => KT0BT2204 )
Now [1] is actually correct, men [0] isnt, so I have run into a wall.
In the end, this is the result I would like (without 01,10,17,21):
(01) 07046260962002
(10) KT0BT2204
(17) 60900
(21) RNM5F8CTMMBHZSY7
01 - Always 14 chars after
17 - Always 6 chars after
10 can be up to 20 chars, but always has end delimiter <GS> - But if barcode ends with 10 <GS> is not present
21 can be up to 20 chars, but always has end delimiter <GS> - But if barcode ends with 21 <GS> is not present
I tried follwing this question: GS1-128 and RegEx
But I couldnt figure it out.
Anyone that can help me?
This regex should do what you want (note I've split it into separate lines for clarity, you can use it like this with the x (extended) flag, or convert it back to one line):
^]d2(?:
01(?P<g01>.{14})|
10(?P<g10>(?:(?!\[GS]).){1,20})(?:\[GS]|$)|
17(?P<g17>.{6})|
21(?P<g21>(?:(?!\[GS]).){1,20})(?:\[GS]|$)
)+$
It looks for
start-of-line ^ followed by a literal ]d2 then one or more of
01 followed by 14 characters (captured in group g01)
10 followed by up to 20 characters, terminated by either [GS] or end-of-line (captured in group g10)
17 followed by 6 characters (captured in group g17)
21 followed by up to 20 characters, terminated by either [GS] or end-of-line (captured in group g21)
finishing with end-of-line $
Note that we need to use tempered greedy tokens to avoid the situation where a 10 or 21 code might swallow a following code (as in the second example in the regex demo below).
Demo on regex101
In PHP:
$barcode = ']d201070462608682672140097289158930[GS]10101656[GS]17261130';
preg_match_all('/^]d2(?:
01(?P<g01>.{14})|
10(?P<g10>(?:(?!\[GS]).){1,20})(?:\[GS]|$)|
17(?P<g17>.{6})|
21(?P<g21>(?:(?!\[GS]).){1,20})(?:\[GS]|$)
)+$/x', $barcode, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => ]d201070462608682672140097289158930[GS]10101656[GS]17261130
)
[g01] => Array
(
[0] => 07046260868267
)
[1] => Array
(
[0] => 07046260868267
)
[g10] => Array
(
[0] => 101656
)
[2] => Array
(
[0] => 101656
)
[g17] => Array
(
[0] => 261130
)
[3] => Array
(
[0] => 261130
)
[g21] => Array
(
[0] => 40097289158930
)
[4] => Array
(
[0] => 40097289158930
)
)
Demo on 3v4l.org
]d2[01]{2}(\d{14})(?:10|17|21)(\w+)\[GS\](\w+)(?:10|17|21)(\w+)
You can try something like this.
See demo..
https://regex101.com/r/Bw238X/1

How to use regex in PostgreSQL to put one point every 2 char?

How can I replace a string by putting a dot every two characters using the regexp_replace function?
For example:
1 => 1
12 => 12
123 => 12.3
1234 => 12.34
12345 => 12.34.5
123456 => 12.34.56
... and so on.
I tried some odds but I did not succeed.
Match (.{2})(?!$) globally and replace it with $1..
The (?!$) part is a negative look ahead preventing a match on the last two numbers. It avoids 12.34 from being 12.34..
test=> select regexp_replace('12345678', '(.{2})(?!$)', '\1.', 'g');
regexp_replace
----------------
12.34.56.78
Demo

Regex get matches for ranges in IPv4 addresses with octet notation

I want to use a Regex expression to get all ranges in an IP Address provided.
Examples:
192.168.0-255.1 would return 0-255
192.168.0-255.1-10 would return 0-255 and 1-10
192.168.0-10,42,80-200.1-10,128-255 would return 0-10, 80-200, 1-10, 128-255.
BONUS: I'd also like to be able to separate these expressions into 4 different ones to determine which octet the IP range is in.
Example: 192.168-180.0.1 I'd like to get 168-180 here from an expression that looks for a match with only one period left of the substring and two periods somewhere in the right side of the substring.
Something like this?
<?php
$input = <<<INPUT
192.168.0-255.1
192.168.0-255.1-10
192.168.0-10,42,80-200.1-10,128-255
INPUT;
preg_match_all("/[0-9]+\-[0-9]+/m", $input, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => 0-255
[1] => 0-255
[2] => 1-10
[3] => 0-10
[4] => 80-200
[5] => 1-10
[6] => 128-255
)
)
[0-9]+ 1 or more number
- "-" escaped
[0-9]+ 1 or more number

regex with an end alpha which is optional

I'm very new to regex and trying to figure this out.
I'm trying to validate a string that:
starts with anyone of these abcdjklmnpqrstwz
followed by either 1 or 2 alpha
followed by 1 to 4 ints
Optional: Ends with a single alpha
here is my regex
/(^[abcdjklmnpqrstwz](:?[a-z]{1,2})) [0-9]{1,4} *([a-z]{1})/i
here is some sample of string that should be true
bab 1234 a
bab 1234
bab 123
b 123 a
click here to test
Please see the following regex:
/^[a-dj-np-twz](?:[a-z]{1,2})?\s[0-9]{1,4}(?:\s[a-z])?$/
Here is a regex demo!
You should use a ? instead of {1} (demo):
/(^[abcdjklmnpqrstwz](:?[a-z]{1,2})) [0-9]{1,4} ([a-z]?)/i
The ? specifies that the preceding block is optional.

RegEx to match comma separated numbers with optional decimal part

I've a regex that matches comma separated numbers with an optional two digit decimal part in a given multiline text.
/(?<=\s|^)\d{1,3}(,\d{3})*(\.\d{2})?(?=\s|$)/m
It matches strings like 1, 12, 12.34, 12,345.67 etc successfully. How can I modify it to match a number with only the decimal part like .23?
EDIT: Just to clarify - I would like to modify the regex so that it matches 12, 12.34 and .34
And I am looking for 'stand alone' valid numbers. i.e., number-strings whose boundaries are either white space or start/end of line/string.
This:
\d{1,3}(,\d{3})*(\.\d\d)?|\.\d\d
matches all of the following numbers:
1
12
.99
12.34
12,345.67
999,999,999,999,999.99
If you want to exclude numbers like 123a (street addresses for example), or 123.123 (numbers with more than 2 digits after the decimal point), try:
(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d\d)?|\.\d\d)(?=\s|$)
A little demo (I guessed you're using PHP):
$text = "666a 1 fd 12 dfsa .99 fds 12.34 dfs 12,345.67 er 666.666 er 999,999,999,999,999.99";
$number_regex = "/(?<=\s|^)(?:\d{1,3}(?:,\d{3})*(?:\.\d\d)?|\.\d\d)(?=\s|$)/";
if(preg_match_all($number_regex, $text, $matches)) {
print_r($matches);
}
which will output:
Array
(
[0] => Array
(
[0] => 1
[1] => 12
[2] => .99
[3] => 12.34
[4] => 12,345.67
[5] => 999,999,999,999,999.99
)
)
Note that it ignores the strings 666a and 666.666
/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?|\.(\d{2}))(?=\s|$)/m
Or taking into account some countries where . is used as a thousand seperator, and , is used as a decimal seperator
/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?|\d{1,3}(\.\d{3})*(,\d{2})?|\.(\d{2})|,(\d{2}))(?=\s|$)/m
Insane Regex for Internationalisation
/((?<=\s)|(?<=^))(((\d{1,3})((,\d{3})|(\.\d{3}))*(((?<=(,\d{3}))(\.\d{2}))|((?<=(\.\d{3}))(,\d{2}))|((?<!((,\d{3})|(\.\d{3})))([\.,]\d{2}))))|([\.,]\d{2}))(?=\s|$)/m
Matches
14.23
14,23
114,114,114.23
114.114.114,23
Doesn't match
14.
114,114,114,23
114.114.144.23
,
.
<empty line>
This answer treats with this question more comprehensively.
(#"^((([0-9]+)(.([0-9]+))?)(\,(([0-9]+)(.([0-9]+))?))*)$")
This works for comma separated whole number or comma separated decimal numbers.
Example:
Happy scenarios:
case 1) 9,10
case 2) 10.1,11,12,15,15.2
case 3) 9.8
case 4) 9
Sad scenarios:
case 1) 2..7
case 2) 2,,7
case 3) 2.
case 4) 7,
case 5) ,
case 6) .
case 7) .2
case 8) ,2