How to write regex pattern to match AAAA00000000-0000? - regex

I need a regex to match a string in any of these formats:
"AAAA00000000"
"AAAA00000000-0000"
"0000000000"
I've got the first and third pattern right, this is what I came up with
^(([a-zA-Z]{4}[0-9]{8})|([0-9]{10}))$
I can't get that working to include the second pattern.

^[a-zA-Z]{4}[0-9]{8}(-[0-9]{4})?$
That is, XXXXnnnnnnnn and an optional -nnnn part.
XXXXnnnnnnnn
XXXXnnnnnnnn-nnnn
You can leave out the outermost parenthesis as this group equals the entire match (capturing group 0).
EDIT
Update to match nnnnnnnnnn, too:
^[0-9]{10}|[a-zA-Z]{4}[0-9]{8}(-[0-9]{4})?$
Matches:
nnnnnnnnnn
XXXXnnnnnnnn
XXXXnnnnnnnn-nnnn
EDIT #2
In response to comment, this is the shortest / most readable I'm able to cook up:
^[0-9]{10}|[a-zA-Z]{4}[0-9]{8}(-[0-9]{4}|)$
Same characteristics as immediately above.

You are missing the "-" of the second format...

^\w{4}\d{8}(-\d{4})?$
Fixed a typo

^[a-zA-Z]{4}([0-9]{8}|[0-9]{8}\-[0-9]{4})$

Related

Regex to find combination of lines not containing a string

I'm trying to find the correct regex that, within this input:
#Tag-1234
Scenario:
Blabla
Scenario:
Blabla
#Tag-1234
Scenario:
Blabla
Will select only the second one (the one without a tag).
So far I tried something like (?!#Tag-\d{4})\nScenario, but it's not doing the trick.
Can anyone throw some light into this?
I'm doing my tests on regex101 -> https://regex101.com/r/msDHKf/1
Thanks
It seems I was missing the \n before the tag that would concatenate to the search, so the regex would be
\n(?!#Tag-\d{4})\nScenario
So close!
In the pattern (?!#Tag-\d{4})\nScenario the lookahead can be removed as it is directly followed by matching \nScenario so the assertion will always be true.
If there should be no tag before Scenario but a newline, you can just match 2 newlines and then Scenario
\n\nScenario\b
See a regex demo.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Match string does not contain substring with regex

Ok, I know that it is a question often asked, but I did not manage to get what I wanted.
I am looking for a regular expression in order to find a pattern that does not contain a particular substring.
I want to find an url that does not contains the b parameter.
http://www.website.com/a=789&c=146 > MATCH
http://www.website.com/a=789&b=412&c=146 > NOT MATCH
Currently, I have the following Regex:
\bhttp:\/\/www\.website\.com\/((?!b=[0-9]+).)*\b
But I am wrong with the \b, the regex match the beginning of th string and stop when it find b=, instead of not matching.
See: http://regex101.com/r/fN3zU5/3
Can someone help me please?
Just use a lookahead to check anything following the URL must be a space or line end.
\bhttp:\/\/www\.website\.com\/(?:(?!b=[0-9]+).)*?\b(?= |$)
DEMO
use this:
^http:\/\/www\.website\.com\/((?!b=[0-9]+)).*$
\b only matches word endings.
^ matches start and end of string
and you dont even need to do it that complicated, If you dont want the url with the b parameter use this:
^http:\/\/www\.website\.com\/(?!b).*$
demo here : http://regex101.com/r/fN3zU5/5
import re
pattern=re.compile(r"(?!.*?b=.*).*")
print pattern.match(x)
This will look ahead if there is a "b=" present.A negative lookahead means it will not match that string.
You had a look at this possibility:
http://regex101.com/r/fN3zU5/6
^http:\/\/www\.website\.com\/[ac\=\d&]*$
only allow &,=,a,c and digits
complete url in group and there should not be a "b=" parameter
if you have more options and you dont want to list them all:
you dont allow a 'b' to be part of your parameters
^http:\/\/www\.website\.com\/[^b]*$
http://regex101.com/r/fN3zU5/7
^http:\/\/www\.website\.com\/(?!.*?b=.*?).*$ works too here "b=" is permitted at any position of the parameter string so you could even have the "b" string as a value of a parameter.
See
http://regex101.com/r/fN3zU5/8
This is what you want. ^http:\/\/www\.website\.com\/(([^b]=[0-9]+).)*$
Its a simple pattern not flexible but it works :
http:\/\/www\.website\.com\/+a=+\w+&+c=+\w+

regex optional part in prefix, but do not include it in matches if it present

Problem is easier to be seen in code then described I got following regex
(?<=First(Second)?)\w{5}
and following sample data
FirstSecondText1
FirstText2
I only want matches Text1 & Text2 , I get 3 though, Secon is added, and I don't want that.
Played around, cant seem to get it to work.
You need an additional negative lookahead:
(?<=First(Second)?)(?!Second)\w{5}
If you want to avoid using Second twice, you could do it without lookaround and take the result of the first capturing group:
First(?:Second)?(\w{5})
You can try this regex (?<=First(Second)?)\w{5}$. All you have to do is to add a $ in the end so that the regex would not match the text Secon. You can use this as long as you are sure of the pattern that comes at the end of the input text. In this case it is \w{5}$