I am implementing the following problem in ruby.
Here's the pattern that I want :
1234, 1324, 1432, 1423, 2341 and so on
i.e. the digits in the four digit number should be between [1-4] and should also be non-repetitive.
to make you understand in a simple manner I take a two digit pattern
and the solution should be :
12, 21
i.e. the digits should be either 1 or 2 and should be non-repetitive.
To make sure that they are non-repetitive I want to use $1 for the condition for my second digit but its not working.
Please help me out and thanks in advance.
You can use this (see on rubular.com):
^(?=[1-4]{4}$)(?!.*(.).*\1).*$
The first assertion ensures that it's ^[1-4]{4}$, the second assertion is a negative lookahead that ensures that you can't match .*(.).*\1, i.e. a repeated character. The first assertion is "cheaper", so you want to do that first.
References
regular-expressions.info/Lookarounds and Backreferences
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
Just for a giggle, here's another option:
^(?:1()|2()|3()|4()){4}\1\2\3\4$
As each unique character is consumed, the capturing group following it captures an empty string. The backreferences also try to match empty strings, so if one of them doesn't succeed, it can only mean the associated group didn't participate in the match. And that will only happen if string contains at least one duplicate.
This behavior of empty capturing groups and backreferences is not officially supported in any regex flavor, so caveat emptor. But it works in most of them, including Ruby.
I think this solution is a bit simpler
^(?:([1-4])(?!.*\1)){4}$
See it here on Rubular
^ # matches the start of the string
(?: # open a non capturing group
([1-4]) # The characters that are allowed the found char is captured in group 1
(?!.*\1) # That character is matched only if it does not occur once more
){4} # Defines the amount of characters
$
(?!.*\1) is a lookahead assertion, to ensure the character is not repeated.
^ and $ are anchors to match the start and the end of the string.
While the previous answers solve the problem, they aren't as generic as they could be, and don't allow for repetitions in the initial string. For example, {a,a,b,b,c,c}. After asking a similar question on Perl Monks, the following solution was given by Eily:
^(?:(?!\1)a()|(?!\2)a()|(?!\3)b()|(?!\4)b()|(?!\5)c()|(?!\6)c()){6}$
Similarly, this works for longer "symbols" in a string, and for variable length symbols too.
Related
I m trying to improve my regex skills.
I can't manage this exercise.
https://alf.nu/RegexGolf
You have to match words without consecutive identical characters.
To make it clear, we should avoid patterns like abba, or baab, czzc.
The only way I see is to use capture groups:
([a-z])([a-z])\2\1
Then have a negative lookahead:
(?!([a-z])([a-z])\2\1)
But on the site it doesn't work since it doesn't match anything.
Any advice?
Thank you
Use a negative lookahead:
^(?:(.)(?!\1))*$
Explanation:
^ from the start of the input
(?:
(.) match AND capture a single character
(?!\1) then assert that what follows is a different character (not the same)
)* match zero or more such matching characters
$ end of the input
Demo
Another, possibly cleaner, way to do this would be to just have a global negative lookahead at the very start of the pattern:
^(?!.*(.)\1).*$
This would assert at the very beginning that no character is duplicated, anywhere in the string.
^(?!cr|pal|tar)[a-z]{1,4}([a-z])\1[a-z]{0,5}$
This worked for me in the link you gave. I guess we had to match patterns with consecutive letters. But there were some exceptions for which I had to use negative look ahead at the beginning. I have used ([a-z])\1 to match consecutive characters surrounded by possible characters of possible limit. Hope this helps!
Attached the screenshot for reference.
https://i.stack.imgur.com/va1Uq.png
Thanks to Tim Biegeleisen, here is the answer.
^(?!.*(.)(.)\2\1).*$
I have the following string;
Start: 738392E, 6726376N
I extracted 738392 ok using (?<=.art\:\s)([0-9A-Z]*). This gave me a one group match allowing me to extract it as a column value
.
I want to extract 6726376 the same way. Have only one group appear because I am parsing that to a column value.
Not sure why is (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) giving me the entire line after S.
Helping me get it right with an explanation will go along way.
Because you used positive lookaheads. Those just make some assertions, but don't "move the head along".
(?=(art\:\s\s*)) makes sure you're before "art: ...". The next thing is another positive lookahead that you quantify with a star to make it optional. Finally you match anything, so you get the rest of the line in your capture group.
I propose a simpler regex:
(?<=(art\:\s))(\d+)\D+(\d+)
Demo
First we make a positive lookback that makes sure we're after "art: ", then we match two numbers, seperated by non-numbers.
There is no need for you to make it this complicated. Just use something like
Start: (\d+)E, (\d+)N
or
\b\d+(?=[EN]\b)
if you need to match each bit separately.
Your expression (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) has several problems besides the ones already mentioned: 1) your first and second lookahead match at different locations, 2) your second lookahead is quantified, which, in 25 years, I have never seen someone do, so kudos. ;), 3) your capturing group matches about anything, including any line or the empty string.
You match the whole part after it because you use .* which will match until the end of the line.
Note that this part [0-9]* at the end of the pattern does not match because it is optional and the preceding .* already matches until the end of the string.
You could get the match without any lookarounds:
(art:\s)(\d+)[^,]+,\s(\d+)
Regex demo
If you want the matches only, you could make use of the PyPi regex module
(?<=\bStart:(?:\s+\d+[A-Z],)* )\d+(?=[A-Z])
Regex demo (For example only, using a different engine) | Python demo
I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$
I want to match the following rules:
One dash is allowed at the start of a number.
Only values between 0 and 9 should be allowed.
I currently have the following regex pattern, I'm matching the inverse so that I can thrown an exception upon finding a match that doesn't follow the rules:
[^-0-9]
The downside to this pattern is that it works for all cases except a hyphen in the middle of the String will still pass. For example:
"-2304923" is allowed correctly but "9234-342" is also allowed and shouldn't be.
Please let me know what I can do to specify the first character as [^-0-9] and the rest as [^0-9]. Thanks!
This regex will work for you:
^-?\d+$
Explanation: start the string ^, then - but optional (?), the digit \d repeated few times (+), and string must finish here $.
You can do this:
(?:^|\s)(-?\d+)(?:["'\s]|$)
^^^^^ non capturing group for start of line or space
^^^^^ capture number
^^^^^^^^^ non capturing group for end of line, space or quote
See it work
This will capture all strings of numbers in a line with an optional hyphen in front.
-2304923" "9234-342" 1234 -1234
++++++++ captured
^^^^^^^^ NOT captured
++++ captured
+++++ captured
I don't understand how your pattern - [^-0-9] is matching those strings you are talking about. That pattern is just the opposite of what you want. You have simply negated the character class by using caret(^) at the beginning. So, this pattern would match anything except the hyphen and the digits.
Anyways, for your requirement, first you need to match one hyphen at the beginning. So, just keep it outside the character class. And then to match any number of digits later on, you can use [0-9]+ or \d+.
So, your pattern to match the required format should be:
-[0-9]+ // or -\d+
The above regex is used to find the pattern in some large string. If you want the entire string to match this pattern, then you can add anchors at the ends of the regex: -
^-[0-9]+$
For a regular expression like this, it's sometimes helpful to think of it in terms of two cases.
Is the first character messed up somehow?
If not, are any of the other characters messed up somehow?
Combine these with |
(^[^-0-9]|^.+?[^0-9])
I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?
The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')
\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.
\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/