This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
I'm trying to build regex that removes * and = or any combination of them from the end of the string, so I tried "[*=]$", but it was lazy, for example, if I have the string this is a dog =*, then it will remove * and keep =, then I tried the regex [*=]+$, and it did the job, But I can't understand how the regex engine would work with the last regex, or in another word, how this regex become greedy.
Note that + repeats the previous token one or more times. So [*=]+ matches one or more * or = symbols exists at the last.
What happens in the background is, at first [*=] matches all the * or = symbols (matching continuous characters). Once after regex engine saw the + which exists next to the char class, then it starts to match the following * or = symbols. And finally once it saw the end of the line anchor $, all the matches other than the one exists at the last will get discarded by the regex engine. Now, you left with the last match (match exists at the end of a line).
Related
This question already has answers here:
Python regular expression pattern * is not working as expected
(2 answers)
Closed 3 years ago.
I am using python re module. I am not able to get why following two are behaving differently. I am expecting that the one with * will also give same result.
re.search(r'([0-9]+)',':329392.899')
Output: re.Match object; span=(1, 7), match='329392'
re.search('([0-9]*)',':329392.899')
Output: re.Match object; span=(0, 0), match=''
re.search will first attempt to find a match starting at the beginning of the string, and only advance the starting position when a match cannot be found. The [0-9]* pattern does match the at the beginning of the string, it just matches zero characters (* matches zero or more).
* matches zero or more of the pattern. There are zero digits at the very beginning of the input string, before the :, it's matching that.
+ matches one or more of the pattern, so it doesn't find a match until it gets to the 3, then it matches all the digits.
* means match zero or more time, so when you use ([0-9]*) it will match ( capture ) empty string also which is why you get Output: re.Match object; span=(0, 0), match=''
Whereas on the other hand + means one or more so it won't capture the empty string
Have a look at demo and see the highlighted matches and matched values, also you're missing r in second snippet
Regex Demo
This question already has answers here:
Regex match everything up to first period
(3 answers)
Closed 4 years ago.
I want to remove the domain information from hostnames.
E.g. I want "server1.mydomain.com" to be just "server1".
I thought I had it with:
^(\w*)
But then I realized I also have hostnames like "desktop-1.mydomain.com", and they all got changed to "desktop" and not "desktop-1" etc.
Any suggestions how to do this?
As already mentioned by Wiktor in the comments, the easiest regular expression is
^[^.]+
The explanation from regex101.com is:
^ asserts position at start of a line
Match a single character not present in the list below [^.]+
+ Quantifier — Matches between one and unlimited times, as many times as
possible, giving back as needed (greedy)
. matches the character . literally (case sensitive)
If you are using a programming language, another possible solution is to split the string in the dot character and get the first element of the resulting array. For example:
const array1 = 'server1.mydomain.com'.split(/\./);
console.log(array1[0]);
const array2 = 'desktop-1.mydomain.com'.split(/\./);
console.log(array2[0]);
Prints:
server1
desktop-1
This question already has answers here:
Regex to match exactly n occurrences of letters and m occurrences of digits
(3 answers)
Closed 4 years ago.
I am looking for a regex that matches the following:
2 times the character 'a' and 3 times the character 'b'.
Additionally, the characters do not have to be subsequent, meaning that not only 'aabbb' and 'bbaaa' should be allowed, but also 'ababb', 'abbab' and so forth.
By the sound of it this should be an easy task, but atm I just can't wrap my head around it. Redirection to a good read is appreciated.
You need to use positive lookaheads. This is the same as the password validation problem described here.
Edit:
A positive lookahed will allow you to check a pattern against the string without changing where the next part of the regex matches. This means that you can test multiple regex patterns at the current position of the string and for the regex to match all the positive lookaheads will have to match.
In your case you are looking for 2 a' and 3 b's so the regex to match exactly 2 a's anywhere in the string is /^[^a]*a[^a]*a[^a]*$/ and for 3 b's is /^[^b]*b[^b]*b[^b]*b[^b]*$/ we now need to combine these so that we can match both together as follows /^(?=[^a]*a[^a]*a[^a]*$)(?=[^b]*b[^b]*b[^b]*b[^b]*$).*$/. This will start at the beginning of the string with the ^ anchor, then look for exactly 2 a's then the end of the string. Then because that was a positive lookahead the (?= ... ) the position for the next part of the pattern to match at in the string wont move so we are still at the start of the string and now match exactly 3 b's. As this is a positive lookahead we are still at the beginning of the string but now know that we have 2 a's and 3'b in the string so we match the whole of the string with .*$.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I am following some instructions for data upload. I can't figure out what the following two points mean. Does anyone have any idea?
Regexp search/replace
search: 201([0-9])([0-9])([0-9])([0-9][0-9]) ([0-9])
replace:201\1\2\3\4 \5
Regexp search/replace
replace 20110401 with whatever year month day that is being fixed
^(.{462})
\120110401
Any decent regex tutorial will help.
() wrap groups that can be referenced later with \#. For example, \2 references the token matched by the second pair of parentheses.
[0-9] means any character between 0-9 inclusive.
^ is the left anchor (i.e., start of string or new line), and .{462} means any character, 462 times.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
I am trying to understand what the regular expression ^(\d{1,2})$ stands for in google sheets. A quick look around the regex sites and in tools left me confused. Can anybody please help?
^ Asserts position at start of the string
( Denotes the start of a capturing group
\d Numerical digit, 0, 1, 2, ... 9. Etc.
{1,2} one to two times.
) You guessed it - Closes the group.
$ Assert position at end of the string
Regular expression visualization:
^ - start of a line.
(\d{1,2}) - captures upto two digits(ie; one or two digits).
$ - End of the line.
It means at least one at most two digits \d{1,2}, no other characters at the beginning ^ or the end $. Parenthesis essentially picks the string in it i.e. what ever the digits are
^ matches the start of the line
The parens can be ignored for now..
\d{1, 2} means one or two digits
$ is the end of the line.
The parens, if you need them, can be used to retrieve the digit(s) that were found in the regex.