I want a Regex to match strings containing the same character twice (not necessarily consecutive) but not if that character appears three times or more.
For example, given these two inputs:
abcbde
abcbdb
The first, abcbde would match because it contains b twice. However, abcbdb contains b three times, so that would not match.
I have created this Regex, however it matches both:
(\w).*\1{1}
I've also tried to use the ? modifier, however that still matches abcbdb, which I don't want it to.
You need two checks: a first check to ensure no character exists 3 times in the input, and a second check to look for one that exists 2 times:
^(?!.*(\w).*\1.*\1).*?(\w).*\2
This is horribly inefficient compared to, say, using your programming language to construct an array of character frequencies, requiring only 1 pass through the entire input. But it works.
Related
I'd like to find a regular expression that matches strings that do NOT contain all the specified elements, independently of their order. For example, given the following data:
one two three four
one three two
one two
one three
four
Passing the words two three to the regex should match the lines one two, one three and four.
I know how to implement an expression that matches lines that do not contain ANY of the words, matching only line four:
^((?!two|three).)*$
But for the case I'm describing, I'm lost.
Nice question. It looks like you are looking for some AND logic. I am sure someone can come up with something better, but I thought of two ways:
^(?=(?!.*\btwo\b)|(?!.*\bthree\b)).*$
See the online demo
Or:
^(?=.*\btwo\b)(?=.*\bthree\b)(*SKIP)(*F)|^.*$
See the online demo
In both cases we are using positive lookahead to mimic the AND logic to prevent both words being present in a text irrespective of their position in the full string. If just one of those words is present, the string will pass.
Use this pattern:
(?!.*two.*three|.*three.*two)^.*$
See Demo
This question already has answers here:
Regex to match all permutations of {1,2,3,4} without repetition
(4 answers)
Closed 4 years ago.
First of all, I am aware that this is a problem you wouldn't usually use regex for, I am just trying to find out whether this is even possible.
That being said, what I am trying to do is match ALL occurrences of any permutation of a string (for now, I don't care if overlapping occurences match or not); for example, if I have the string abc, I want to match all occurrences of abc, acb, bac, bca, cab and cba.
What I have until now is the following regex: (?:([abc])(?!.{0,1}\1)){3} (note: I know that I could use + instead of {0,1}, but that only works for strings with length 3). This kind of works, but if there are two permutations next to each other where a letter of the first one is too close to a letter of the second one (eg. abc cba → c c), the first permutation does not match. Is it possible to solve this using regex?
Direct Approach
[abc]{3} would match too many results since it would also match aab.
In order to not double match a you would need to remove a from the group that follows leaving you with a[bc]{2}.
a[bc]{2} would match too many results since it would also match 'abb'.
In order to not double match b you would need to remove a from the group that follows leaving you with ab[c]{1} or abc for short.
abc would not match all combinations so you would need another group.
(abc)|([abc]{3}) which would match too many combinations again.
This path leads you down the road of having all permutations listed explicitly in groups.
Can you create combinations so that you do not need to write out all combinations?
(abc)|(acb) could be writtean as a((bc)|(cb)).
(bc)|(cb) I can not shorten that any further.
Match too many and remove unwanted
Depending on the regex engine you may be able to express AND as a look ahead so that you can remove matches. THIS and not THAT consume THIS.
(?=[abc]{3})(?=(?!a.a))[abc]{3} would not match aca.
This problem is now simmilar to the one above where you need to remove all combinations that would violate your permutations. In this example that is any expression containing the same character mutltiple times.
'(.)\1+' this expression uses grouping references on its own matches the same character multiple times but requires knowing how many groups exist in the expression and is very brittle Adding groups kills the expression ((.)\1+) no longer matches. Relative back references exist and require knowledge of your specific regex engine. \k<-1> may be what you could be looking for. I will assume .net since I happen to have a regex tester bookmarked for that.
The permutations that I want to exclude are: nn. n.n .nn nnn
So I create these patterns: ((?<1>.)\k<1>.) ((?<2>.).\k<2>) (.(?<3>.)\k<3>) ((?<4>.)\k<4>\k<4>)
Putting it all together gives me this expression, note that I used relative back references as they are in .net - your milage may vary.
(?=[abc]{3})(?=(?!((?<1>.)\k<1>.)))(?=(?!((?<2>.).\k<2>)))(?=(?!(.(?<3>.)\k<3>)))(?=(?!((?<4>.)\k<4>\k<4>)))[abc]{3}
The answer is yes for a specific length.
Here is some testing data.
I thought I had it with [0-9] but when I ran it that only took one number.
The string goes for example:
1 note
1,234 notes
68,000 notes
I want it so it takes the whole number and leaves out the notes part and the spaces and also the comma so just the full number.
The [0-9] would only take the first number of the string even when there wasnt a comma.
So how to only take the number please?
[0-9] means any one character between 0 and 9. What you are looking for is these characters repeated any number of times, but no other character should be there. The correct way to write this is [0-9]+.
M+, where M is some regex rule is equivalent to M M*, where * means 0 or more occurrences. So M+ can be inferred as at least one occurrence of portions specified by M.
EDIT: The question now also states that the entire number should be read, but the comma should be excluded from the output. AFAIK, this is impossible to be done using only regex, as the matched text can't be different from the stored text. A possible solution is to add , to the list of allowed characters and parse the result to remove them later on.
I find this very ambiguous and vague and I would love to understand
I have these strings
abbb
bbb
aaaabaaabaaabaaabaaabaaab
babba
bbbaaaa
aaaaabbaba
And they are all valid because contains multiple of b, then I use:
(a*ba*ba*ba*)* and this matches them all
(a*ba*ba*b)*a* this match them all as well
a*(ba*ba*ba*)* same as above
Are these really all the same? Or there are edge cases that I am not seeing?
all of your regexes match the empty string, which doesn't have 3 b's.
This one,
(a*ba*ba*ba*)*
does not match aa. But the following match aa, and they are also equivalent:
(a*ba*ba*b)*a*
a*(ba*ba*ba*)*
If you want to force at least 3 b's, you have to take the b's out of the Kleene star:
(a|b)*b(a|b)*b(a|b)*b(a|b)*
* is zero or more. So,
even if you match using a regex like the ones below
(d*ef*gg*hi*)*
(s*o*m*e*t*h*i*n*g*)
etc.
they will match
(a*ba*ba*ba*)*
( match a word which may have an a or not or many a's then a b and then 0 or more a's and then a b and then 0 or more a's and one b and then 0 or more a's ) zero or more of this kind of match.. Its okay if we dont find a match thats what you want to say.
Similarly for your second case:
(a*ba*ba*b)*a*
(0 or more a and then a b and then 0 or more a and then a b then 0 or more a and then a b) 0 or more of this and zero or more of a after that.
So your regex basically matches so many 0 presence conditions, thats why you are not able to find the clear difference. better use + instead of *. A + quatifier will make the match only of the character is present at least 1 or more times.
you can play around with regex on this site here : http://regex101.com/r/rM5zQ1
for basic learnings regexone will be really helpful for you.
Hope that helps !
You should use + after the group instead of *, or else an empty string would be accepted:
(a*ba*ba*ba*)+
Although this would only allow multiples of 3. If you want at least 3 and any number of extras, it would be:
a*ba*ba*b(a|b)*
This works for those requirements. But it isn't a good approach. In your example you are searching for "a" and "b", which are single character patterns, and it's already an unreasonably long expression for the simple rule "has 3 b's" in my opinion. But what if the patterns were more complex? You would need to repeat them at least 3 times, making it even more unwieldy.
And what if the rules change slightly? If you wanted to match a maximum instead of a minimum number of b's, it would become even more complex / repetitive, because your only choice would be to combine the patterns for each possible number (1, 2, 3):
(a*ba*|a*ba*ba*|a*ba*ba*ba*)
Or if you decide the word must be a certain length, it actually becomes impossible, short of listing every permutation (for a 7 letter word, ba{3}bab, a{2}babab, b{3}a{4} etc.).
So, I think a better way to solve this is to match the basic generic pattern, then examine the results of the match to check the counts. For example, just match a "word":
(a|b)+
Then on the matching text, match b:
b
and test the number of matches and/or length of text as needed. Each pattern is only repeated a maximum of twice, and your code can easily be adapted to different requirements.
I'm taking a computation course which also teaches about regular expressions. There is a difficult question that I cannot answer.
Find a regular expression for the language that accepts words that contains at most two pair of consecutive 0's. The alphabet consists of 0 and 1.
First, I made an NFA of the language but cannot convert it to a GNFA (that later be converted to regex). How can I find this regular expressin? With or without converting it to a GNFA?
(Since this is a homework problem, I'm assuming that you just want enough help to get started, and not a full worked solution?)
Your mileage may vary, but I don't really recommend trying to convert an NFA into a regular expression. The two are theoretically equivalent, and either can be converted into the other algorithmically, but in my opinion, it's not the most intuitive way to construct either one.
Instead, one approach is to start by enumerating various possibilities:
No pairs of consecutive zeroes at all; that is, every zero, except at the end of the string, must be followed by a one. So, the string consists of a mixed sequence of 1 and 01, optionally followed by 0:
(1|01)*(0|ε)
Exactly one pair of consecutive zeroes, at the end of the string. This is very similar to the previous:
(1|01)*00
Exactly one pair of consecutive zeroes, not at the end of the string — and, therefore, necessarily followed by a one. This is also very similar to the first one:
(1|01)*001(1|01)*(0|ε)
To continue that approach, you would then extend the above to support two pair of consecutive zeroes; and lastly, you would merge all of these into a single regular expression.
(0+1)*00(0+1)*00(0+1)* + (0+1)*000(0+1)*
contains at most two pair of consecutive 0's
(1|01)*(00|ε)(1|10)*(00|ε)(1|10)*