Regex, avoid matching consecutive characters - regex

I m trying to improve my regex skills.
I can't manage this exercise.
https://alf.nu/RegexGolf
You have to match words without consecutive identical characters.
To make it clear, we should avoid patterns like abba, or baab, czzc.
The only way I see is to use capture groups:
([a-z])([a-z])\2\1
Then have a negative lookahead:
(?!([a-z])([a-z])\2\1)
But on the site it doesn't work since it doesn't match anything.
Any advice?
Thank you

Use a negative lookahead:
^(?:(.)(?!\1))*$
Explanation:
^ from the start of the input
(?:
(.) match AND capture a single character
(?!\1) then assert that what follows is a different character (not the same)
)* match zero or more such matching characters
$ end of the input
Demo
Another, possibly cleaner, way to do this would be to just have a global negative lookahead at the very start of the pattern:
^(?!.*(.)\1).*$
This would assert at the very beginning that no character is duplicated, anywhere in the string.

^(?!cr|pal|tar)[a-z]{1,4}([a-z])\1[a-z]{0,5}$
This worked for me in the link you gave. I guess we had to match patterns with consecutive letters. But there were some exceptions for which I had to use negative look ahead at the beginning. I have used ([a-z])\1 to match consecutive characters surrounded by possible characters of possible limit. Hope this helps!
Attached the screenshot for reference.
https://i.stack.imgur.com/va1Uq.png

Thanks to Tim Biegeleisen, here is the answer.
^(?!.*(.)(.)\2\1).*$

Related

check if there is a word repeated at least 2 or more times. (Regular Expression)

Using Regular Expression,
from any line of input that has at least one word repeated two or more times.
Here is how far i got.
/(\b\w+\b).*\1
but it is wrong because it only checks for single char, not one word.
input: i might be ill
output: < i might be i>ll
<> marks the matched part.
so, i try to do (\b\w+\b)(\b\w+\b)*\1
but it is not working totally.
Can someone give help?
Thanks.
this should work
(\b\w+\b).*\b\1\b
greedy algorithm will ensure longest match. If you want second instance to be a separate word you have to add the boundaries there as well. So it's the same as
\b(\w+)\b.*\b\1\b
Positive lookahead is not a must here:
/\b([A-Za-z]+)\b[\s\S]*\b\1\b/g
EXPLANATION
\b([A-Za-z]+)\b # match any word
[\s\S]* # match any character (newline included) zero or more times
\b\1\b # word repeated
REGEX 101 DEMO
To check for repeated words you can use positive lookahead like this.
Regex: (\b[A-Za-z]+\b)(?=.*\b\1\b)
Explanation:
(\b[A-Za-z]+\b) will capture any word.
(?=.*\b\1\b) will lookahead if the word captured by group is present or not. If yes then a match is found.
Note:- This will produce repeated results because the word which is matched once will again be matched when regex pointer captures it as a word.
You will have to use programming to strip off the repeated results.
Regex101 Demo

Regex to match certain word but not a particular combination

I have 15 titles as follows:
fruits-and-flowers-themeA
fruits-and-flowers-themeB
fruits-and-flowers-just-test-themeA
themeAfruitsandflowers
nice-fruits-and-flowers-themeA
botanical-names-themeA
I want a regex to help me get only those titles with "themeA" in them, but it should not include "nice" and not include "just-test" or "just-tests".
I tried
^(?!.*just-test|*just-tests|nice).*?(?:themeA).*,
but I still get fruits-and-flowers-just-test-themeA in the output.
How to fix this?
Thanks
You can use this regex with negative lookahead:
^(?!.*?(?:just-tests?|nice)).*?themeA.*$
Working Demo
Option 1
You can use a single regex with lookaheads (see online demo):
^(?!.*nice?)(?!.*just-tests?).*themeA.*
The ^ asserts that the match starts at the beginning of the string (so we don't match a subset of the string
The (?!.*nice?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by nice
The (?!.*just-tests?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by just-test and an optional s
As a further tweak, you can compress the lookaheads into one using an | alternation as in anubhava's answer.
Option 2 without lookaheads (Perl, PHP/PCRE)
^(?:.*(?:nice|just-tests?).*)(*SKIP)(?!)|.*themeA.*
This one doesn't use lookaheads but just skips the unwanted titles. See demo.
Use two different regular expressions for clarity and simplicity.
Match your string against one regex that matches themeA:
/themeA/
and then check that the string does NOT match the one you don't want:
/nice|just-tests?/
Doing it in two different regexes makes it far easier to understand and maintain.

Regex to match number in #define statement

I have a line like this:
#define PROG_HWNR "36084"
or this:
#define PROG_HWNR "#37595"
I'd like to extract the number (and increase it, but that's not the matter here)
I wrote a regex, but it's not working (at least in http://gskinner.com/RegExr/ )
(?<="#?)(.*?)(?=")
I also tried variations like
(?<=("#?))(.*?)(?=")
or
(?<=("|"#)))(.*?)(?=")
But no success. The problem is, that I want to match only the number, no matter if there is a # or not ...
Can you point me in the right direction? Thanks!!
Try this regex:
"#?(\d+)"$
It will match:
" a quote
#? optional hash
( (start capturing)
\d+ one or more digits
) (stop capturing)
" a quote
$ anchor to end
Here is a JSFiddle, and here is a RegExr
The problem is the variable length of the lookbehind. Only few regex engines can deal with this. Because there are only two possible lookbehinds (including the # or not), you can expand that into two lookbehinds:
(?:(?<="#)|(?<=")).*?(?=")
Note that you don't need to capture the .*? if you use lookarounds, as they are excluded from the match anyway. Also, a better way than using non-greedy .*? is to use a greedy expression that can never go past the ending delimiter:
(?:(?<="#)|(?<="))[^"]*(?=")
Alternatively (if you can access captured submatches), you can use a capturing approach and get rid of the lookarounds:
"#?([^"]*)"
Try this:
^#define \w+ "#?(\d+)"$
That will match the whole line, with the first/single group being the number you are looking for.
This is actually pretty basic regex functionality: match an optional character (?) and match a group of characters (the parentheses).
You can even go one simpler:
\d+
will match a string of digits. Only the digits. And ignore the rest of the input string.
Use this tool for testing this stuff, I found it pretty handy: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Matching double symbol

Hey guys I've been working with this one for a little while. I can't seem to get it.
Here is what I have so far
(#[^{2,}+)([^(\s\W\d{2}]+)(\b)
http://rubular.com/r/zlx3j00Wjl
Although this is not excepting periods in the match.
I basically need to match this.
#function.name(param)
I just need to match function.name. This does that.
http://rubular.com/r/hWMB72LsWT
I don't want to match this
##function.name(param)
hello##test.com`
Didn't know if anyone has any ideas. Thanks for the help.
You can use a negative lookahead: #(?!#) matches a # not followed by another #.
Here is my go at it (here it is on Rubular):
(?<!#)#(\w+(?:\.\w+)*)\([^)]*\)
Explained:
(?<!#)# # an '#' not preceded by an '#'
(\w+(?:\.\w+)*) # any number of xxx.xxx.xxx, captured into a group
\([^)]*\) # brackets, containing anything that isn't a closing bracket
Since this is Ruby, you might not care about matching parentheses. In that case you can just remove the last section.
Try this:
(?:^|\s)#+([^(]+)
You will have function.match and function.name in the first group, will not match hello##test.com. Rubular:
http://rubular.com/r/b8gy1LcVGz
Try this
(?!.*##)^#([^()\s]+)\b
See it here on Rubular
I removed some brackets from your expression
I removed the Quantifier from the leading #
(?!.*##) is a negative lookahead assertion. It will fail if it finds anywhere in the string two # characters in a row.
I am not sure about your requirements, if there is all the time a set of brackets at the end, then you don't need your word boundary. If there can be similar strings without brackets that you don't want to match, then I would add another lookahead to ensure this assertion:
?!.*##)^#([^()\s]+)(?=\()
See it here on Rubular

Regex to match all permutations of {1,2,3,4} without repetition

I am implementing the following problem in ruby.
Here's the pattern that I want :
1234, 1324, 1432, 1423, 2341 and so on
i.e. the digits in the four digit number should be between [1-4] and should also be non-repetitive.
to make you understand in a simple manner I take a two digit pattern
and the solution should be :
12, 21
i.e. the digits should be either 1 or 2 and should be non-repetitive.
To make sure that they are non-repetitive I want to use $1 for the condition for my second digit but its not working.
Please help me out and thanks in advance.
You can use this (see on rubular.com):
^(?=[1-4]{4}$)(?!.*(.).*\1).*$
The first assertion ensures that it's ^[1-4]{4}$, the second assertion is a negative lookahead that ensures that you can't match .*(.).*\1, i.e. a repeated character. The first assertion is "cheaper", so you want to do that first.
References
regular-expressions.info/Lookarounds and Backreferences
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
Just for a giggle, here's another option:
^(?:1()|2()|3()|4()){4}\1\2\3\4$
As each unique character is consumed, the capturing group following it captures an empty string. The backreferences also try to match empty strings, so if one of them doesn't succeed, it can only mean the associated group didn't participate in the match. And that will only happen if string contains at least one duplicate.
This behavior of empty capturing groups and backreferences is not officially supported in any regex flavor, so caveat emptor. But it works in most of them, including Ruby.
I think this solution is a bit simpler
^(?:([1-4])(?!.*\1)){4}$
See it here on Rubular
^ # matches the start of the string
(?: # open a non capturing group
([1-4]) # The characters that are allowed the found char is captured in group 1
(?!.*\1) # That character is matched only if it does not occur once more
){4} # Defines the amount of characters
$
(?!.*\1) is a lookahead assertion, to ensure the character is not repeated.
^ and $ are anchors to match the start and the end of the string.
While the previous answers solve the problem, they aren't as generic as they could be, and don't allow for repetitions in the initial string. For example, {a,a,b,b,c,c}. After asking a similar question on Perl Monks, the following solution was given by Eily:
^(?:(?!\1)a()|(?!\2)a()|(?!\3)b()|(?!\4)b()|(?!\5)c()|(?!\6)c()){6}$
Similarly, this works for longer "symbols" in a string, and for variable length symbols too.