I have a regex that looks like this:
/(((\+|00)32[ ]?(?:\(0\)[ ]?)?)|0){1}(4(60|[789]\d)\/?(\s?\d{2}\.?){2}(\s?\d{2})|(\d\/?\s?\d{3}|\d{2}\/?\s?\d{2})(\.?\s?\d{2}){2})/g
this matches: +32 16/894477 but +32 16-894477 doesn't
this 20150211-0001731015-1 also matches but this shouldn't match
I am trying to fix my regex here:
https://regex101.com/r/LmaIPA/1
(((\+|00)32[ ]?(?:\(0\)[ ]?)?)|0){1}(4(60|[789]\d)\/?(\s?\d{2}\.?){2}(\s?\d{2})|(\d\/?\s?\d{3}|\d{2}(\/?|\-)\s?\d{2})(\.?\s?\d{2}){2})
I guess I fixed part of it by adding this but let me know if there something else that doesn't work properly :)
There are a lot of capture groups, and some can also be omitted if you don't need them for after processing.
The issue is that for +32 16-894477 you are not matching the hyphen, and you match the larger string as there are no boundaries set so you get a partial match.
Some notes:
You don't have to escape the / when using a different delimiter
You can omit {1} from the pattern
\s can also match a newline, you can use \h if you want to match a horizontal whitespace char
A single space [ ] does not have to be in a character class
You can extend the pattern with adding the hyphen and forward slash to a character class using [/-]?, wrap the whole pattern in a non capture group and assert a whitspace boundary to the right (?:whole pattern here)(?!\S)
A version without the capture groups for a match only:
(?:(?:(?:\+|00)32\h?(?:\(0\)\h?)?|0)(?:4(?:60|[789]\d)/?(?:\h?\d{2}\.?){2}\h?\d{2}|(?:\d/?\h?\d{3}|\d{2}[/-]?\h?\d{2})(?:\.?\h?\d{2}){2}))(?!\S)
Regex demo | Php demo
Php example
$re = '~(?:(?:(?:\+|00)32\h?(?:\(0\)\h?)?|0)(?:4(?:60|[789]\d)/?(?:\h?\d{2}\.?){2}\h?\d{2}|(?:\d/?\h?\d{3}|\d{2}[/-]?\h?\d{2})(?:\.?\h?\d{2}){2}))(?!\S)~';
$str = 'OK 01/07 - 31/07
OK 0487207339
OK +32487207339
OK 01.07.2016
OK +32 (0)16 89 44 77
OK 016894477
OK 003216894477
OK +3216894477
OK 016/89.44.77
OK +32 16894477
OK 0032 16894477
OK +32 16/894477
NOK +32 16-894477 (this should match)
OK 0479/878810
NOK 20150211-0001731015-1 (this shouldn\'t match)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => 0487207339
[1] => +32487207339
[2] => +32 (0)16 89 44 77
[3] => 016894477
[4] => 003216894477
[5] => +3216894477
[6] => 016/89.44.77
[7] => +32 16894477
[8] => 0032 16894477
[9] => +32 16/894477
[10] => +32 16-894477
[11] => 0479/878810
)
Related
I am struggeling with reading a GS1-128 barcode, and trying to split it up into the segments it contains, so I can fill out a form automatically.
But I can't figure it out. Scanning my barcode gives me the following:
]d2010704626096200210KT0BT2204[GS]1726090021RNM5F8CTMMBHZSY7
So I tried starting with preg_match and made the following:
/]d2[01]{2}\d{14}[10|17|21]{2}(\w+)/
Which gives me this result:
Array ( [0] => ]d2010704626096200210KT0BT2204 [1] => KT0BT2204 )
Now [1] is actually correct, men [0] isnt, so I have run into a wall.
In the end, this is the result I would like (without 01,10,17,21):
(01) 07046260962002
(10) KT0BT2204
(17) 60900
(21) RNM5F8CTMMBHZSY7
01 - Always 14 chars after
17 - Always 6 chars after
10 can be up to 20 chars, but always has end delimiter <GS> - But if barcode ends with 10 <GS> is not present
21 can be up to 20 chars, but always has end delimiter <GS> - But if barcode ends with 21 <GS> is not present
I tried follwing this question: GS1-128 and RegEx
But I couldnt figure it out.
Anyone that can help me?
This regex should do what you want (note I've split it into separate lines for clarity, you can use it like this with the x (extended) flag, or convert it back to one line):
^]d2(?:
01(?P<g01>.{14})|
10(?P<g10>(?:(?!\[GS]).){1,20})(?:\[GS]|$)|
17(?P<g17>.{6})|
21(?P<g21>(?:(?!\[GS]).){1,20})(?:\[GS]|$)
)+$
It looks for
start-of-line ^ followed by a literal ]d2 then one or more of
01 followed by 14 characters (captured in group g01)
10 followed by up to 20 characters, terminated by either [GS] or end-of-line (captured in group g10)
17 followed by 6 characters (captured in group g17)
21 followed by up to 20 characters, terminated by either [GS] or end-of-line (captured in group g21)
finishing with end-of-line $
Note that we need to use tempered greedy tokens to avoid the situation where a 10 or 21 code might swallow a following code (as in the second example in the regex demo below).
Demo on regex101
In PHP:
$barcode = ']d201070462608682672140097289158930[GS]10101656[GS]17261130';
preg_match_all('/^]d2(?:
01(?P<g01>.{14})|
10(?P<g10>(?:(?!\[GS]).){1,20})(?:\[GS]|$)|
17(?P<g17>.{6})|
21(?P<g21>(?:(?!\[GS]).){1,20})(?:\[GS]|$)
)+$/x', $barcode, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => ]d201070462608682672140097289158930[GS]10101656[GS]17261130
)
[g01] => Array
(
[0] => 07046260868267
)
[1] => Array
(
[0] => 07046260868267
)
[g10] => Array
(
[0] => 101656
)
[2] => Array
(
[0] => 101656
)
[g17] => Array
(
[0] => 261130
)
[3] => Array
(
[0] => 261130
)
[g21] => Array
(
[0] => 40097289158930
)
[4] => Array
(
[0] => 40097289158930
)
)
Demo on 3v4l.org
]d2[01]{2}(\d{14})(?:10|17|21)(\w+)\[GS\](\w+)(?:10|17|21)(\w+)
You can try something like this.
See demo..
https://regex101.com/r/Bw238X/1
since few days I am sitting and fighting with the regular expression without any success
My first expression, what I want:
brackets just one time, doesn't matter where
Text or numbers before and after brackets optional
numbers within the brackets
Example what is allowed:
[32] text1
text1 [5]
text1 [103] text2
text1
[123]
[some value [33]] (maybe to complicated, would be not so important?)
My second expression is similar but just numbers before and after the brackets instead text
[32] 11
11 [5]
11 [103] 22
11
[123]
no match:
[12] xxx [5] (brackets are more than one time)
[aa] xxx (no number within brackets)
That's what I did but is not working because I don't know how to do with the on-time-brackets:
^.*\{?[0-9]*\}.*$
From some other answer I found also that, that's looks good but I need that for the numbers:
^[^\{\}]*\{[^\{\}]*\}[^\{\}]*$
I want to use later the number in the brackets and replace with some other values, just for some additional information, if important.
Hope someone can help me. Thanks in advance!
This is what you want:
^([^\]\n]*\[\d+\])?[^[\n]*$
Live example
Update: For just numbers:
^[\d ]*(\[\d+\])?[\d ]*$
Explaination:
^ Start of line
[^...] Negative character set --> [^\]] Any character except ]
* Zero or more length of the Class/Character set
\d 0-9
+ One or more length of the Class/Character set
(...)? 0 or 1 of the group
$ End of line
Note: These RegExs can return empty matches.
Thanks to #MMMahdy-PAPION! He improved the answer.
How can I replace a string by putting a dot every two characters using the regexp_replace function?
For example:
1 => 1
12 => 12
123 => 12.3
1234 => 12.34
12345 => 12.34.5
123456 => 12.34.56
... and so on.
I tried some odds but I did not succeed.
Match (.{2})(?!$) globally and replace it with $1..
The (?!$) part is a negative look ahead preventing a match on the last two numbers. It avoids 12.34 from being 12.34..
test=> select regexp_replace('12345678', '(.{2})(?!$)', '\1.', 'g');
regexp_replace
----------------
12.34.56.78
Demo
I am looking for a regexp option or trick to capture all possible strings in a regexp when matches can overlap.
Example : /A.A/ in string "ABACADA"
It finds : ABA, ADA and not ACA !!
I would like : ABA, ACA, ADA
I am working in PHP, but it can be applied to other languages
preg_match_all('/A.A/',"ABACADA",$matches);
var_dump($matches[0]);
// output : array (size=2)
// 0 => string 'ABA' (length=3)
// 1 => string 'ADA' (length=3)
Can you help me? Thanks
You can use a positive lookahead assertion to get all 3 matches:
(?=(A.A))
RegEx Demo
For your input it finds 3 matches in captured group #1:
ABA
ACA
ADA
PHP Code:
if (preg_match_all('/(?=(A.A))/', "ABACADA", $m))
print_r($m[1]); // printing index 1
Output:
Array
(
[0] => ABA
[1] => ACA
[2] => ADA
)
I have a document which looks something like:
sort=SIZE:NumberDecreasing
FieldText=(((EQUAL{226742}:LocationId)) AND ())
FieldText=(((EQUAL{226742}:LocationId)) AND ((EQUAL{1}:LOD AND NOTEQUAL{1}:SCR AND EMPTY{}:RPDCITYID AND NOTEQUAL{1}:Industrial)))
FieldText=( NOT EQUAL{1}:ISSCHEME AND EQUAL{215629}:LocationId)
sort=DEALDATE:decreasing
From this I would like the word before a colon, and if there are {} brackets, before those too, a colon, and then the word after the colon. These should ideally be the only things left in the file, each on their own new line.
Output would then look like:
SIZE:NumberDecreasing
EQUAL:LocationId
EQUAL:LocationId
EQUAL:LOD
NOTEQUAL:SCR
EMPTY:RPDCITYID
NOTEQUAL:Industrial
EQUAL:ISSCHEME
EQUAL:LocationId
DEALDATE:decreasing
The closest I have come so far is:
Find:
^.?+ {[0-9]}:([a-zA-Z]+)
Replace with:
...\1:\2...
with the intent to run it several times, and later replace ... with \n
I can then remove multiple newlines.
Context: this is for a log analysis I am performing, I have already removed datestamps, and reduced elements of the query down to the sort and FieldText parameters
I do not have regular UNIX tools - I am working in a windows environment
The original log looks like:
03/11/2011 16:25:44 [9] ACTION=Query&summary=Context&print=none&printFields=DISPLAYNAME%2CRECORDTYPE%2CSTREET%2CTOWN%2CCOUNTY%2CPOSTCODE%2CLATITUDE%2CLONGITUDE&DatabaseMatch=Autocomplete&sort=RECORDTYPE%3Areversealphabetical%2BDRETITLE%3Aincreasing&maxresults=200&FieldText=%28WILD%7Bbournemou%2A%7D%3ADisplayName%20NOT%20MATCH%7BScheme%7D%3ARecordType%29 (10.55.81.151)
03/11/2011 16:25:45 [9] Returning 23 matches
03/11/2011 16:25:45 [9] Query complete
03/11/2011 16:25:46 [8] ACTION=GetQueryTagValues&documentCount=True&databaseMatch=Deal&minScore=70&weighfieldtext=false&FieldName=TotalSizeSizeInSquareMetres%2CAnnualRental%2CDealType%2CYield&start=1&FieldText=%28MATCH%7BBournemouth%7D%3ATown%29 (10.55.81.151)
03/11/2011 16:25:46 [12] ACTION=Query&databaseMatch=Deal&maxResults=50&minScore=70&sort=DEALDATE%3Adecreasing&weighfieldtext=false&totalResults=true&PrintFields=LocationId%2CLatitude%2CLongitude%2CDealId%2CFloorOrUnitNumber%2CAddressAlias%2A%2CEGAddressAliasID%2COriginalBuildingName%2CSubBuilding%2CBuildingName%2CBuildingNumber%2CDependentStreet%2CStreet%2CDependentLocality%2CLocality%2CTown%2CCounty%2CPostcode%2CSchemeName%2CBuildingId%2CFullAddress%2CDealType%2CDealDate%2CSalesPrice%2CYield%2CRent%2CTotalSizeSizeInSquareMetres%2CMappingPropertyUsetype&start=1&FieldText=%28MATCH%7BBournemouth%7D%3ATown%29 (10.55.81.151)
03/11/2011 16:25:46 [8] GetQueryTagValues complete
03/11/2011 16:25:47 [12] Returning 50 matches
03/11/2011 16:25:47 [12] Query complete
03/11/2011 16:25:51 [13] ACTION=Query&print=all&databaseMatch=locationidsearch&sort=RELEVANCE%2BPOSTCODE%3Aincreasing&maxResults=10&start=1&totalResults=true&minscore=70&weighfieldtext=false&FieldText=%28%20NOT%20LESS%7B50%7D%3AOFFICE%5FPERCENT%20AND%20EXISTS%7B%7D%3AOFFICE%5FPERCENT%20NOT%20EQUAL%7B1%7D%3AISSCHEME%29&Text=%28Brazennose%3AFullAddress%2BAND%2BHouse%3AFullAddress%29&synonym=True (10.55.81.151)
03/11/2011 16:25:51 [13] Returning 3 matches
03/11/2011 16:25:51 [13] Query complete
The purpose of the whole exercise is to find out which fields are being queried and sorted upon (and how we are querying/sorting upon them) - to this end, the output could also usefully be distinct - although that is not essential.
The Perl program below is complete, and includes your sample data in the source. It produces exactly the output you describe, including reporting NOT EQUAL{1}:ISSCHEME as EQUAL:ISSCHEME because of the intermediate space.
use strict;
use warnings;
while (<DATA>) {
print "$1:$2\n" while /(\w+) (?: \{\d*\} )? : (\w+)/xg;
}
__DATA__
sort=SIZE:NumberDecreasing
FieldText=(((EQUAL{226742}:LocationId)) AND ())
FieldText=(((EQUAL{226742}:LocationId)) AND ((EQUAL{1}:LOD AND NOTEQUAL{1}:SCR AND EMPTY{}:RPDCITYID AND NOTEQUAL{1}:Industrial)))
FieldText=( NOT EQUAL{1}:ISSCHEME AND EQUAL{215629}:LocationId)
sort=DEALDATE:decreasing
OUTPUT
SIZE:NumberDecreasing
EQUAL:LocationId
EQUAL:LocationId
EQUAL:LOD
NOTEQUAL:SCR
EMPTY:RPDCITYID
NOTEQUAL:Industrial
EQUAL:ISSCHEME
EQUAL:LocationId
DEALDATE:decreasing