What is the difference between r'.*' and '[.*]' regular expressions in python? [duplicate] - regex

This question already has answers here:
regexp dot with square brackets or not
(2 answers)
Closed 4 years ago.
Suppose, a = 'I was born in 1997'
And I want to print everything present in a using regular expression (just for learning purpose.)
So when I use: re.match(r'.*', a)
Then it shows match as: <_sre.SRE_Match object; span=(0, 18), match='I was born in 1997'>
But when I use: re.match(r'[.*]', a)
I get no output, i.e no match is found.

The regular expression .* matches zero or more characters, so of course it matches the entire string.
The regular expression [.*] matches a single character that is either . or * (they lose their significance as meta-characters between square brackets). And since re.match matches only at the beginning of the string, it doesn't match anything. (re.search matches anywhere in the string, but it still wouldn't match anything in your string).
re.match(r'[AEIOU]', a) would match the I at the beginning of your string.
Documentation for Python regular expressions: https://docs.python.org/3.7/library/re.html

. is a special character in regex which means match anything except new line.but when you use inside [] character class it becomes a normal character .
* is a quantifier which means zero or more time. but when you use in character class it becomes normal character *
.* ---> Means match anything except new line.
[.*] ---> means match . or *
For further reading
character class
Dot

Related

Regular Expression for anything in Between ${something} [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am a newbie in regular expression, I have written regular expression for ${serviceName} basicly I want to take the words in between ${ } So I already wrote regular expression for this that is perfectly fine
"\\$\\{(\\w+)\\}"
But what I want to take any values not only the words which are in between ${serviceName.1.Type}.So can you guys help me with regular expression for ${serviceName.1.Type}.
I hope my question is clear.
Thanks In Advance.
A good place to test regular expressions is https://regex101.com/
\w+ matches any word character (equal to [a-zA-Z0-9_])
If you want to match anything you can replace it with: .*
.* matches any character (except for line terminators)
You might want to add a "?" at the end to match to first "}"
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed
Also you don't need to escape the { } in this case
So what you want is:
"\\${(.*?)}"
\$\{([\w?\.?\d?\s?]+)\}
This expression captures as a group everything that appears between {}
You can then call the group with the expression $1
On this web you will see your exercise solved and if other expressions have some additional character you can try to add it. Now it is prepared for points \. , spaces \s, letters \w and digits \d

Can someone please explain ,that what is exactly happening in the 3rd line of this program [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
the code is presented below
import re
line = "dogs are better than humans"
matchObj = re.match( r'(.*) are (.*?) .*', line)
if matchObj:
print ("matchObj.group() : ", matchObj.group())
(.*): matches and captures any character (except new lines) any number of times. This may be zero times. . denotes "any character" and * signifies repetition. The parentheses are used to denote capture groups (explained below).
are: literal string " are "
(.*?): same as (.*) except it tries to match as few characters as possible (non-greedy). This means that it would try to stop matching as soon as possible. If your string contained multiple spaces after (.*?), this part of the expression would match all those spaces. Adding the non-greedy symbol (?) will make it stop at the first space (since that is the character after this segment of the expression).
.* any character any number of times.
Capture groups or captures for short are portions of the entire match. Wrapping an expression within your regex allows you to easily retrieve that portion of your match.
(dogs) are (better) than humans
(.*)   are  (.*?)     .*
In your example, dogs and better would be captured. These are also referred to as "groups". In regular expressions, they are marked by a pair of parentheses.
Play around with the regex here. Hover on the match to see which portions of the expression are captured.

Regular Expressions particualrs for VSCode Syntax Highlighting [duplicate]

This question already has answers here:
Reference - What does this regex mean?
(1 answer)
What does ?! mean?
(3 answers)
Closed 5 years ago.
I'm trying to write a sytnax highlighter for VSCode, which uses the TextMate format. I've got an entry for one-line comments, copied from an example, and it works fine, but I'd like to extend/modify it.
"linecomment": {
"name": "comment",
"match": "(%)(?!(\\[=*\\[|\\]=*\\])).*$\n?",
"captures": {
"1": {
"name": "comment"
}
}
},
The problem is, the regular expressions used here are not documented anywhere that I can find. I understand basic Grep and the theory behind regular expressions, but I have no idea what is going on in ?!(\\[=*\\[|\\]=*\\])).*$\n?. In particular, I don't know which characters are in the regex language, and which are being matched.
Can somebody explain to me:
Which regular expression format is used here, and where it is documented?
What the given regex means, and what its parts are?
I don't know the answer to (1), but the answer to (2) is as follows:
Firstly, if you've only used grep and not other flavours of regex, you should know that there are some syntax differences. In most flavours, for example, \+ is a literal + and + is the quantifier; in grep + is literal and \+ is the quantifier. And there are other characters where the meaning of \ is reversed in this way.
Secondly, the string literal isn't the same as the string itself, because of backslash-escaping. The string literal looks like this:
"(%)(?!(\\[=*\\[|\\]=*\\])).*$\n?"
while the string itself looks like this:
(%)(?!(\[=*\[|\]=*\])).*$
?
(with a newline character near the end).
Let's look at the following subexpression:
\[=*\[|\]=*\]
At first I thought this was a character class, delimited by \[ and \]. But (a) I don't know of any flavour of regex where backslash-escaped square brackets are character class delimiters and unescaped ones are literal square brackets, rather than vice versa; (b) why would someone write a character class with repeated characters?; (c) there's no obvious reason why the first \] would be a literal ] and the second one would end the character class. So it looks like \[ and \] are literal square brackets.
| means "or" in regexes. It is a low-precedence operator. So this subexpression means either \[=*\[ or \]=*\]. In other words, it matches strings such as [[, [=[, [======[, etc, as well as ]], ]=], etc.
(?!...) is a zero-width assertion. It is a negative lookahead: it matches at any point in the string where the positive lookahead (?=...) would not match. In general, if the regex A matches the string a and C matches string c then the regex A(?!B)C matches the string ac, unless the regex B matches c (or some substring of c). In other words, the match fails if the string is something like %]==].
.* matches any number of characters. (0 is a number). (I assume this doesn't match newlines.) $ is another zero-width assertion: it can only match at the end of the line. Actually, it's not needed in this case - the .* subexpression is greedy and will match all non-newline characters, so the end of the .* match is guaranteed to be the end of the line. That is, unless there's some edge case I'm not aware of involving carriage returns or some even more exotic line terminating character.
Finally, \n? will match the newline character itself, if it exists (? is a quantifier). If this is the last line of the string then there may not be a newline; in that case the regex match would fail without the ?.
Putting it all together: The regex will match from a % until the end of the line, including the newline character if it exists, unless the string it's trying to match starts with %[[ or %]==] or something similar.

What is the meaning of this regular expression? ['`?!\"-/] [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What is the meaning of this regular expression?
['`?!\"-/]
Why it matches parenthesis?
I used Java for development
In your regex
['`?!\"-/]
The quantity "-/ is being interpreted as a range of values, just as A-Z would mean taking every letter between A and Z. It turns out, by reading the basic ASCII table, that parentheses lie within this range, so your pattern is including them.
One trick you can use here with dash is to place it at the end:
['`?!\"/-]
^^^^ this will not be interpreted as a range
Because you didn't escape the dash -. The dash, inside a character class [] denotes a range of characters. In this case from " to /. And parentheses are between those, in ASCII.
The dash needs to be escaped \-, if it's not the first or last character, inside a character class, when you want it to be matched as a literal.
You have to use following
You need to escape -, otherwise, parentheses are matching.
Seems like "-/ will include parentheses as well. Like [A-C], which matches ASCII chars between A to C
[\'`?!\"\-/]
It will match following characters in a string.
'`?"-/
Check in the regex101

How to write a regular expression for my string? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I want to write a regular expression for QR8.4_Z4J25 in shell script? How can i do it?
Is this correct?
[QR][0-9][.][0-9][_][A-Z][0-9][A-Z][0-9][0-9]
It's obviously wrong because it'll only match Q8.4_Z4J25 or R8.4_Z4J25, but not QR8.4_Z4J25
A bracket matches any one character specified, so you'd like to write:
[Q][R][0-9][.][0-9][_][A-Z][0-9][A-Z][0-9][0-9]
You don't need to use brackets for a single character, though, so it can be simplified to
QR[0-9]\.[0-9]_[A-Z][0-9][A-Z][0-9][0-9]
Be sure to escape the dot if it's outside of a bracket because it would otherwise match any single character.
in case you want to match QR9.1_8A9YK as well, you should change it to
QR[0-9]\.[0-9]_[A-Z0-9]\{5\}
If you're using Extented Regular Expression, usually by supplying an option -E to the tool you're using, then you shouldn't escape the braces:
QR[0-9]\.[0-9]_[A-Z0-9]{5}
Square brackets in regular expressions denote a collection of characters.
[MX_5] will match one character that is M, X _ or 5.
[0-9] will match one character that is between 0 and 9.
[a-z] will match one character that is between lowercase a and z.
Notice the pattern? The square brackets match a single character. In order to match multiple characters they need to be followed by a + or * or {} to denote how many of those characters it should match.
However, in your case, you just want to match the actual letters QR in that order, so simply don't use square brackets.
QR[0-9]\.[0-9]_[A-Z][0-9][A-Z][0-9][0-9]
The same goes for characters like the underscore which are always in the same place. Note that the . was escaped with a \ because it has a special meaning in regex.
Going back to matching multiple characters with square brackets, if the order of the last 5 characters doesn't matter, you can further reduce your expression using a single square bracket and a {} to match all your trailing characters after the underscore.
QR[0-9]\.[0-9]_[A-Z0-9]{5}