I've got a sample data I'm looking to find matches within. I'm testing in Notepad++.
QA 44
ABQ DAL 280
ABQ HOU 290
HOU PHX 210
DAL PHX 102
When I use the following regular expression, I get matches as expected on the last four lines
([A-Z]{3}\s){2}[0-9]{3}
But when I try to hone in on the 3-digit number at the end and move everything else to a prefix, no matches come back.
(?<=([A-Z]{3}\s){2})[0-9]{3}
What exactly am I doing wrong with the prefix? I want all those 3-digit numbers to match and qualify on the letter codes before it, but it is not working.
If you just want the last digits you need a none-capture group within look behind :
(?<=(?:[A-Z]{3}\s){2})[0-9]{3}
see demo
Is there a reason you can't use a look behind with an anchor?
(?<=\s)\d{3}$
Related
Need a regex that will identify SSN so that we can mask the numbers, but want to skip masking of work order numbers (starts with WO-).
Details are below:
Current regex used: \b(?!000|666|9)\d{3}[- ]?(?!00)\d{2}[- ]?(?!0000)\d{4}\b
Sample Test Date
WO-011493479
011493479
778-62-8144
030 72 7381
757-85-7495
149-13-7317
401318448
003 06 8815
790714615
805 14 1893
From the above sample data, numbers after WO-011493479 should be skipped. Due to the above regex, a work order string like WO-011493479 is getting changed to WO-*********.
For checking the regex quickly, please use this link.
Note: The regex should work with all major browsers
You're one negative look behind from getting it yourself:
\b(?<!WO-)(?!000|666|9)\d{3}[- ]?(?!00)\d{2}[- ]?(?!0000)\d{4}\b
i have a problem and dont get it.
My Regex:
My Test-String:
I have two issues and one general question :)
As you can see in my Test-String the very last (german) Phone Number (the big yellow one in the Test-String attachment) does not match my Regex-Pattern correctly. I dont get it, what is the Problem here? the "0049" fits Group 5, but should fit Group 2, why is that?
My second Problem is, how can i get rid of the spaces before and after every match? (The 7 yellow small circles in the Test-String Attachment)
For copy/paste purposes, here is the Regex and Test-String again:
Regex:
((\+\d{2}|00\d{2})?([ ])?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ])?(\d+)?([ ])?(\d+)?)
Test-String:
Vorwahl 089, die E.123 ebenfalls , also (089) 1234567. Die DIN 5008, also +49 89 1234567 respectivly 0049 89 1234567. Die E.123 empfiehlt, also +49 89 123456 0 respectivly 0049 89 123456 0 oder +49 89 123456 789. Also +49 89 123 456 789. Klammern 089/1234567 und 0151 19406041. Test +49 151 123 456 789 respectivly 0049 151 123 456 789
Last but not at least, my general question:
Is it a good approach to Group each logical part as i did in my example?
A last Information: I validate my Regex with https://regex101.com/ and use it in Python with the re Module.
The thing that makes it unpredictable are the numerous optional groups (..)?.
As first step i recommend replacing ([ ])?(\d+)? as a coupled expression ([ ]?\d+)?, which will avoid spaces at the end of the match - your point #2.
As a second step i recommend coupling the first optional space with the expression of the "national dialling": ((\+|00)\d{2}([ ])?)?. Now we are lucky, because it solves both the space at the beginning and the recognition of the whole number, due to less possible matching options.
The new expression now looks like this:
(((\+|00)\d{2}([ ])?)?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ]?\d+)?([ ]?\d+)?)
I now recommend to simplify the last part, if you dont need the single group-values:
(((\+|00)\d{2}([ ])?)?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ]?\d+){0,2})
For better performance I suggest you remove the parenteses/groups where possible or mark them as non-capturing, if you don't need to have the specific group-values.
In some programming languages you will not need to most outer parenteses, as that is always group 0.
Given the string below,
ay bee ceefooh deefoo38 ee 37 ef gee38 aitch 38 eye19 jay38 kay 99 el88 em38 en 29 ou38 38 pee 12 q38 arr 999 esss 555
the goal is to match every word such that the suffix is a number that matches the number that appears after foo (which happens to be 38 in this case).
There is only one substring that begins with foo and ends with a number. The expected matches all exist after said substring.
Expected matches:
gee38
jay38
em38
ou38
q38
I've tried foo(\d+).*?(\w+\1)\b and foo(\d+).*(\w+\1)\b, but they fail to match all, because they either match the first one (gee38) or the last one (q38).
Is it possible to match all with just a single regex and, importantly, in just a single run?
The PCRE2 engine that I use behaves in the same way as https://regex101.com/r/uFEDOE/1. So, if the regex can match multiple substrings on regex101, then the engine that I use can too.
(?:foo|\G(?!^))(\d+).*?(?=(\w+))\w+(?=\1\b)
Demo
It could be some size or performance optimization.
#Niko Gambt, say if any optimization is important for you.
I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Regex: [0-9]{6-8}([0-9]{4})
Test Strings:
sfad 123456781234 afd sadfa fdads
sfd 12345671234 24312 fasdfa dsfafd
221 1234561234 safd safd23 34
Expected:
I need the end part 1234 captured in a group on each line.
Actual: No matches. :(
I would like to make [0-9]{6-8} to match the least possible characters so all these 3 strings would match. How do I make this lazy as it seems to be greedy now.
I need only regex solutions as this is part of a bigger solution. Here's a link to play with it: https://regex101.com/r/eF5pA9/1
[0-9]{6,8}([0-9]{4}) with g modifier matches all three.
Thanks #anubhava and #Anderson Pimentel.