What is the meaning of this regular expression? ['`?!\"-/] [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What is the meaning of this regular expression?
['`?!\"-/]
Why it matches parenthesis?
I used Java for development

In your regex
['`?!\"-/]
The quantity "-/ is being interpreted as a range of values, just as A-Z would mean taking every letter between A and Z. It turns out, by reading the basic ASCII table, that parentheses lie within this range, so your pattern is including them.
One trick you can use here with dash is to place it at the end:
['`?!\"/-]
^^^^ this will not be interpreted as a range

Because you didn't escape the dash -. The dash, inside a character class [] denotes a range of characters. In this case from " to /. And parentheses are between those, in ASCII.
The dash needs to be escaped \-, if it's not the first or last character, inside a character class, when you want it to be matched as a literal.

You have to use following
You need to escape -, otherwise, parentheses are matching.
Seems like "-/ will include parentheses as well. Like [A-C], which matches ASCII chars between A to C
[\'`?!\"\-/]
It will match following characters in a string.
'`?"-/
Check in the regex101

Related

Regex for alphanumeric with at least one digit [duplicate]

This question already has answers here:
RegEx for an invoice format
(5 answers)
Closed 2 years ago.
I'm looking for a regex for Invoice Number in Vbscript
It can have alphanumeric but at least one numeric digit is a must.
I'm using the below regex but it matches ALPHA String INVOICE also. It need to have at least one digit
\b(?=.*\d)[A-Z0-9\-]{5,12}\b
Expected Match String
1233444
M62899M
M828828
783838PTE
A751987
Expected Unmatch String
INVOICE
ubb62727
XYZ
123
If we use ([A-Z0-9]*[0-9]+[A-Z0-9]*), I can't specify the length.
Please suggest a proper regex. Please note its totally different from the suggested duplicate as the requirement, format is different.
The blanket .* in your lookahead will happily skip past the trailing \b if it has to. Make it more constrained, so it can't.
\b(?=[-A-Z]*\d)[A-Z0-9-]{5,12}\b
(I removed the backslash before the -; if you really want to allow a literal backslash, obviously add it back, to the character class in the lookahead also. A dash at beginning or end of a character class is unambiguous and doesn't require a backslash escape; this is also the only way to have a literal dash in a character class in many regex dialects.)

What is the difference between r'.*' and '[.*]' regular expressions in python? [duplicate]

This question already has answers here:
regexp dot with square brackets or not
(2 answers)
Closed 4 years ago.
Suppose, a = 'I was born in 1997'
And I want to print everything present in a using regular expression (just for learning purpose.)
So when I use: re.match(r'.*', a)
Then it shows match as: <_sre.SRE_Match object; span=(0, 18), match='I was born in 1997'>
But when I use: re.match(r'[.*]', a)
I get no output, i.e no match is found.
The regular expression .* matches zero or more characters, so of course it matches the entire string.
The regular expression [.*] matches a single character that is either . or * (they lose their significance as meta-characters between square brackets). And since re.match matches only at the beginning of the string, it doesn't match anything. (re.search matches anywhere in the string, but it still wouldn't match anything in your string).
re.match(r'[AEIOU]', a) would match the I at the beginning of your string.
Documentation for Python regular expressions: https://docs.python.org/3.7/library/re.html
. is a special character in regex which means match anything except new line.but when you use inside [] character class it becomes a normal character .
* is a quantifier which means zero or more time. but when you use in character class it becomes normal character *
.* ---> Means match anything except new line.
[.*] ---> means match . or *
For further reading
character class
Dot

Regex - Why don't these two expressions produce the same result? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I'm currently using this website to create some regular expressions for a programming language I want to build, at the moment I'm just setting up an expression for identifiers.
In my language, identifiers are expressed like most languages:
They cannot begin with a digit, or special character other than an underscore
After the first character they can contain alphanumeric and underscore characters
Given those rules I've come up with the following expression by myself:
^\D\w+$
Obviously, it doesn't account for special characters, however the following expression does (which I didn't make myself):
^(?!\d)\w+$
Why does the second expression account special characters? Shouldn't they be producing the same results?
I will explain why the second regex works.
The second regex uses a lookahead. After matching the start of the string, the engine checks whether the next character is a digit but it does not match it! This is important because if the next character is not a digit, it tries to use \w to match that same character, which it couldn't if the character is a symbol, if it is a digit, the negative lookahead fails and nothing is matched.
\D on the other hand, will match the character if it is not a digit, and \w will match whatever comes after that. That means all symbols are accepted.
This ^(?!\d)\w+$ means a string consisted of word characters [a-zA-Z0-9_] that doesn't start with a digit.
This ^\D\w+$ means a non-digit character followed by at least one character from [a-zA-Z0-9_] set.
So #ab01 is matched by second regex while first regex rejects it.
(?!\d)\w+ means "match a word which is not prepended with digits". But as you're wrapping it with ^ and $ characters it is basically the same as just ^\w+$ which is obviously not the same as ^\D\w+$. ^(?!\d).+\w+$ (note ".+" in the middle) would behave the same as ^\D\w+$

How to write a regular expression for my string? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I want to write a regular expression for QR8.4_Z4J25 in shell script? How can i do it?
Is this correct?
[QR][0-9][.][0-9][_][A-Z][0-9][A-Z][0-9][0-9]
It's obviously wrong because it'll only match Q8.4_Z4J25 or R8.4_Z4J25, but not QR8.4_Z4J25
A bracket matches any one character specified, so you'd like to write:
[Q][R][0-9][.][0-9][_][A-Z][0-9][A-Z][0-9][0-9]
You don't need to use brackets for a single character, though, so it can be simplified to
QR[0-9]\.[0-9]_[A-Z][0-9][A-Z][0-9][0-9]
Be sure to escape the dot if it's outside of a bracket because it would otherwise match any single character.
in case you want to match QR9.1_8A9YK as well, you should change it to
QR[0-9]\.[0-9]_[A-Z0-9]\{5\}
If you're using Extented Regular Expression, usually by supplying an option -E to the tool you're using, then you shouldn't escape the braces:
QR[0-9]\.[0-9]_[A-Z0-9]{5}
Square brackets in regular expressions denote a collection of characters.
[MX_5] will match one character that is M, X _ or 5.
[0-9] will match one character that is between 0 and 9.
[a-z] will match one character that is between lowercase a and z.
Notice the pattern? The square brackets match a single character. In order to match multiple characters they need to be followed by a + or * or {} to denote how many of those characters it should match.
However, in your case, you just want to match the actual letters QR in that order, so simply don't use square brackets.
QR[0-9]\.[0-9]_[A-Z][0-9][A-Z][0-9][0-9]
The same goes for characters like the underscore which are always in the same place. Note that the . was escaped with a \ because it has a special meaning in regex.
Going back to matching multiple characters with square brackets, if the order of the last 5 characters doesn't matter, you can further reduce your expression using a single square bracket and a {} to match all your trailing characters after the underscore.
QR[0-9]\.[0-9]_[A-Z0-9]{5}

Regex that can only contain alphanumeric characters and underscores, but the first character must be alphabetical (single character failure) [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 7 years ago.
I have the following regex pattern:
^[A-Za-z][A-Za-z0-9_-]+$`
It is used to match; alphanumeric characters, underscores and dashes, with the first character being alphabetical.
This works as expected, but I also need it to be able to match single characters. A conditions of a fails.
How can I modify the pattern to make a single alphabetical character pass?
The + means "one or more". Replace it with * for "zero or more".
^[A-Za-z][A-Za-z0-9_-]*$
This shoudl do it for you