This question already has answers here:
RegEx for an invoice format
(5 answers)
Closed 2 years ago.
I'm looking for a regex for Invoice Number in Vbscript
It can have alphanumeric but at least one numeric digit is a must.
I'm using the below regex but it matches ALPHA String INVOICE also. It need to have at least one digit
\b(?=.*\d)[A-Z0-9\-]{5,12}\b
Expected Match String
1233444
M62899M
M828828
783838PTE
A751987
Expected Unmatch String
INVOICE
ubb62727
XYZ
123
If we use ([A-Z0-9]*[0-9]+[A-Z0-9]*), I can't specify the length.
Please suggest a proper regex. Please note its totally different from the suggested duplicate as the requirement, format is different.
The blanket .* in your lookahead will happily skip past the trailing \b if it has to. Make it more constrained, so it can't.
\b(?=[-A-Z]*\d)[A-Z0-9-]{5,12}\b
(I removed the backslash before the -; if you really want to allow a literal backslash, obviously add it back, to the character class in the lookahead also. A dash at beginning or end of a character class is unambiguous and doesn't require a backslash escape; this is also the only way to have a literal dash in a character class in many regex dialects.)
Related
This question already has answers here:
Splitting a String by number of delimiters
(2 answers)
Closed 2 years ago.
I have a file containing informations in the following format :
Fred,Frank , Marcel Godwin , Marion,Ryan
I need the match commas and any whitespace around them, but not any comma inside brackets.
My problem is that with my current regex [\s,]+ the whitespaces between words are matched. So in this example the whitespace between Marcel and Godwin.
I thought about using something like \s,\s* but it wouldn't match parts when there is no whitespace around the comma, like between Fred and Frank
Surely, it's a simple fix but I can't figure it out.
I think this will match the commas including the whitespace before and afterwards like you explained in your question.
\s*(?=\,)\,(?<=\,)\s*
This is a positive looahead: (?=\,), it means it matches any whitespace if there is a comma afterwards.
This is a positive lookbehind: (?<=\,), it means it matches any whitespace if there is a comma rigth before.
Try it out yourself. You can use this page to check the output in your browser.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I'm currently using this website to create some regular expressions for a programming language I want to build, at the moment I'm just setting up an expression for identifiers.
In my language, identifiers are expressed like most languages:
They cannot begin with a digit, or special character other than an underscore
After the first character they can contain alphanumeric and underscore characters
Given those rules I've come up with the following expression by myself:
^\D\w+$
Obviously, it doesn't account for special characters, however the following expression does (which I didn't make myself):
^(?!\d)\w+$
Why does the second expression account special characters? Shouldn't they be producing the same results?
I will explain why the second regex works.
The second regex uses a lookahead. After matching the start of the string, the engine checks whether the next character is a digit but it does not match it! This is important because if the next character is not a digit, it tries to use \w to match that same character, which it couldn't if the character is a symbol, if it is a digit, the negative lookahead fails and nothing is matched.
\D on the other hand, will match the character if it is not a digit, and \w will match whatever comes after that. That means all symbols are accepted.
This ^(?!\d)\w+$ means a string consisted of word characters [a-zA-Z0-9_] that doesn't start with a digit.
This ^\D\w+$ means a non-digit character followed by at least one character from [a-zA-Z0-9_] set.
So #ab01 is matched by second regex while first regex rejects it.
(?!\d)\w+ means "match a word which is not prepended with digits". But as you're wrapping it with ^ and $ characters it is basically the same as just ^\w+$ which is obviously not the same as ^\D\w+$. ^(?!\d).+\w+$ (note ".+" in the middle) would behave the same as ^\D\w+$
This question already has answers here:
Regex to allow alphanumeric and dot
(3 answers)
Closed 4 years ago.
I am trying to match the string when there's 0 or multiple dots. The regex that I can only match multiple dots but not 0 dot.
(\w*)((\w*\.)+\w*)
These are the test string I am using
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
abc
The Regex will match these
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
But not this one:
abc
https://regexr.com/?38ed7
If you really must use a regex, here is one (but it is inefficient):
/^(?![^.]*\.[^.]*$).*$/
It says:
Match a string so that the beginning of the string is not followed by a whole string with a single dot.
It does some backtracking when parsing the negative lookahead.
As mentioned in the comments to the question, I do think, unless you must have a regex, that a simple function might be better. But if you like the conciseness of a regex and performance is not a huge concern, you can go with the one I gave above. Regexes with "nots" in them are generally a tad messy, but once you understand lookarounds they do become doable. Cheers.
/\..*\.|^[^.]*$/
Or, in plain English:
Match EITHER a dot, then any number of characters, then another dot; OR the beginning of the string, then any number of non-dots, then the end of the string.
This question already has answers here:
Java RegEx that matches exactly 8 digits
(3 answers)
Closed 5 years ago.
Basically I'm looking for a regex that matches some simple phone numbers.
I want to match numbers in a longer string of text like 123 4567, 891-0111, or 21314151, something that is (hopefully) identified by (\d{3,4}[- ]\d{3,4}|\d{4,8}), but I don't want to match them if they're part of a longer number like 3919503570275.
If I require the next character to be a non-digit or the end of a line, then that next character is also included in the match, which I don't want.
Surround your regex with a lookahead and a lookbehind to reject \d on both sides:
(?<!\d)(\d{3,4}[- ]\d{3,4}|\d{4,8})(?!\d)
Demo.
Note that this would accept a string that looks like a phone number preceded or followed by letters.
Depending on what programming language you use, I suggest to either use negative look-ahead or to use groups to extract the number.
See https://www.regular-expressions.info/lookaround.html for information about lookaround pattern.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What is the meaning of this regular expression?
['`?!\"-/]
Why it matches parenthesis?
I used Java for development
In your regex
['`?!\"-/]
The quantity "-/ is being interpreted as a range of values, just as A-Z would mean taking every letter between A and Z. It turns out, by reading the basic ASCII table, that parentheses lie within this range, so your pattern is including them.
One trick you can use here with dash is to place it at the end:
['`?!\"/-]
^^^^ this will not be interpreted as a range
Because you didn't escape the dash -. The dash, inside a character class [] denotes a range of characters. In this case from " to /. And parentheses are between those, in ASCII.
The dash needs to be escaped \-, if it's not the first or last character, inside a character class, when you want it to be matched as a literal.
You have to use following
You need to escape -, otherwise, parentheses are matching.
Seems like "-/ will include parentheses as well. Like [A-C], which matches ASCII chars between A to C
[\'`?!\"\-/]
It will match following characters in a string.
'`?"-/
Check in the regex101