Lets say I need to match a pattern if it appears 3 or 6 times in a row. The closest I can get is something like \d{3,6} but that doesn't quite do what I need.
'123' should match
'123456' should match
'1234' should not match
^(\d{3}|\d{6})$
You have to have some sort of terminator otherwise \d{3} will match 1234. That's why I put ^ and $ above. One alternative is to use lookarounds:
(?<!\d)(\d{3}|\d{6})(?!\d)
to make sure it's not preceded by or followed by a digit (in this case). More in Lookahead and Lookbehind Zero-Width Assertions.
How about:
(\d\d\d){1,2}
although you'll also need guards at either end which depend on your RE engine, something like:
[^\d](\d\d\d){1,2}[^\d]
or:
^(\d\d\d){1,2}$
For this case we can get away with this crafty method:
Clean Implementation
/(\d{3}){1,2}/
/(?:\d{3}){1,2}/
How?!
This works because we're looking for multiples of three that are consecutive in this case.
Note: There's no reason to capture the group for this case so I add the ?: non capture group flag to the capture group.
This is similar to paxdiablo implementation, but slightly cleaner.
Matching Hex
I was doing something similar for matching on basic hex colors since they could be 3 or 6 in length. This allowed me to keep my hex color checker's matching DRY'd up ie:
/^0x(?:[\da-f]{3}){1,2}$/i
First one matches 3, 6 but also 9, 12, 15, .... Second looks right. Here's one more twist:
\d{3}\d{3}?
Related
I have the following string;
Start: 738392E, 6726376N
I extracted 738392 ok using (?<=.art\:\s)([0-9A-Z]*). This gave me a one group match allowing me to extract it as a column value
.
I want to extract 6726376 the same way. Have only one group appear because I am parsing that to a column value.
Not sure why is (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) giving me the entire line after S.
Helping me get it right with an explanation will go along way.
Because you used positive lookaheads. Those just make some assertions, but don't "move the head along".
(?=(art\:\s\s*)) makes sure you're before "art: ...". The next thing is another positive lookahead that you quantify with a star to make it optional. Finally you match anything, so you get the rest of the line in your capture group.
I propose a simpler regex:
(?<=(art\:\s))(\d+)\D+(\d+)
Demo
First we make a positive lookback that makes sure we're after "art: ", then we match two numbers, seperated by non-numbers.
There is no need for you to make it this complicated. Just use something like
Start: (\d+)E, (\d+)N
or
\b\d+(?=[EN]\b)
if you need to match each bit separately.
Your expression (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) has several problems besides the ones already mentioned: 1) your first and second lookahead match at different locations, 2) your second lookahead is quantified, which, in 25 years, I have never seen someone do, so kudos. ;), 3) your capturing group matches about anything, including any line or the empty string.
You match the whole part after it because you use .* which will match until the end of the line.
Note that this part [0-9]* at the end of the pattern does not match because it is optional and the preceding .* already matches until the end of the string.
You could get the match without any lookarounds:
(art:\s)(\d+)[^,]+,\s(\d+)
Regex demo
If you want the matches only, you could make use of the PyPi regex module
(?<=\bStart:(?:\s+\d+[A-Z],)* )\d+(?=[A-Z])
Regex demo (For example only, using a different engine) | Python demo
Regex that matches following patterns
1. mrrtjjsf8907m5q29ui
2. 0?userid=y1arx6uxb1nidmz3tguv
3. bryj9itvwjmbyv3wg8ef
I am trying to pass these values to another variable col=?([a-zA-Z0-9]{1,20})|([a-zA-Z0-9]{1,20})
it is taking the right values for first and three for second one it is taking values 0
instead it should take y1arx6uxb1nidmz3tguv
I think you need to use this regex instead:
[a-zA-Z](?=[A-Za-z]*\d)[a-zA-Z0-9]{1,19}
Demo
This will make sure that the bunch of characters that you are going to match is only built of alphabets and numeber thus 0 Or userid are not considered as match!
Alternative:
But the above may not consider the case where a valid sequence may start with number instead of alphabet. In that case you may use the following regex which will consider both situation:
(?:[a-zA-Z](?=[A-Za-z]*\d)|\d(?=\d*[A-Za-z]))[a-zA-Z0-9]{1,19}
Demo 2
It appears you are not wanting to match anything before an equal sign =. You can use the line terminator $ to ensure that it will match your your characters and stop at any non-matching characters.
([a-zA-Z0-9]{1,20})$
DEMO
Lets say I need to match a pattern if it appears 3 or 6 times in a row. The closest I can get is something like \d{3,6} but that doesn't quite do what I need.
'123' should match
'123456' should match
'1234' should not match
^(\d{3}|\d{6})$
You have to have some sort of terminator otherwise \d{3} will match 1234. That's why I put ^ and $ above. One alternative is to use lookarounds:
(?<!\d)(\d{3}|\d{6})(?!\d)
to make sure it's not preceded by or followed by a digit (in this case). More in Lookahead and Lookbehind Zero-Width Assertions.
How about:
(\d\d\d){1,2}
although you'll also need guards at either end which depend on your RE engine, something like:
[^\d](\d\d\d){1,2}[^\d]
or:
^(\d\d\d){1,2}$
For this case we can get away with this crafty method:
Clean Implementation
/(\d{3}){1,2}/
/(?:\d{3}){1,2}/
How?!
This works because we're looking for multiples of three that are consecutive in this case.
Note: There's no reason to capture the group for this case so I add the ?: non capture group flag to the capture group.
This is similar to paxdiablo implementation, but slightly cleaner.
Matching Hex
I was doing something similar for matching on basic hex colors since they could be 3 or 6 in length. This allowed me to keep my hex color checker's matching DRY'd up ie:
/^0x(?:[\da-f]{3}){1,2}$/i
First one matches 3, 6 but also 9, 12, 15, .... Second looks right. Here's one more twist:
\d{3}\d{3}?
I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$
I am implementing the following problem in ruby.
Here's the pattern that I want :
1234, 1324, 1432, 1423, 2341 and so on
i.e. the digits in the four digit number should be between [1-4] and should also be non-repetitive.
to make you understand in a simple manner I take a two digit pattern
and the solution should be :
12, 21
i.e. the digits should be either 1 or 2 and should be non-repetitive.
To make sure that they are non-repetitive I want to use $1 for the condition for my second digit but its not working.
Please help me out and thanks in advance.
You can use this (see on rubular.com):
^(?=[1-4]{4}$)(?!.*(.).*\1).*$
The first assertion ensures that it's ^[1-4]{4}$, the second assertion is a negative lookahead that ensures that you can't match .*(.).*\1, i.e. a repeated character. The first assertion is "cheaper", so you want to do that first.
References
regular-expressions.info/Lookarounds and Backreferences
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
Just for a giggle, here's another option:
^(?:1()|2()|3()|4()){4}\1\2\3\4$
As each unique character is consumed, the capturing group following it captures an empty string. The backreferences also try to match empty strings, so if one of them doesn't succeed, it can only mean the associated group didn't participate in the match. And that will only happen if string contains at least one duplicate.
This behavior of empty capturing groups and backreferences is not officially supported in any regex flavor, so caveat emptor. But it works in most of them, including Ruby.
I think this solution is a bit simpler
^(?:([1-4])(?!.*\1)){4}$
See it here on Rubular
^ # matches the start of the string
(?: # open a non capturing group
([1-4]) # The characters that are allowed the found char is captured in group 1
(?!.*\1) # That character is matched only if it does not occur once more
){4} # Defines the amount of characters
$
(?!.*\1) is a lookahead assertion, to ensure the character is not repeated.
^ and $ are anchors to match the start and the end of the string.
While the previous answers solve the problem, they aren't as generic as they could be, and don't allow for repetitions in the initial string. For example, {a,a,b,b,c,c}. After asking a similar question on Perl Monks, the following solution was given by Eily:
^(?:(?!\1)a()|(?!\2)a()|(?!\3)b()|(?!\4)b()|(?!\5)c()|(?!\6)c()){6}$
Similarly, this works for longer "symbols" in a string, and for variable length symbols too.