Regular Expression finding specific instances - regex

I'm having trouble finding a regular expression for the following problem
All strings over the alphabet {a, b, c, d} with at least four instances of c and at least two instances of a

Use a look-ahead:
^(?=(.*c){4,})(?=(.*a){2,})[a-z]+
I'm not sure what you mean by "alphabet" - I have assumed "any letter", but if it's literally a,b,c and d, change [a-z]+ to [a-d]+

A bit more efficient than Bohemian's solution, and also anchored to make sure we don't just match a substring of a longer string that might contain unwanted characters:
^(?=(?:[^c]*c){4})(?=(?:[^a]*a){2})[a-z]+$

As outlined in comments, this question seems to relate to the strict mathematical theory of regular expressions as derived from set theory. In that case, lookaheads are not permitted; you need to enumerate the permitted sequences. For simplicity and clarity, I am omitting the .* which should go before, between, and after the symbols in the following list.
ccccaa|
cccaca|
cccaac|
ccacca|
ccacac|
ccacca|
ccaacc|
caccca|
caccac|
cacacc|
caaccc|
acccca|
acccac|
accacc|
acaccc|
aacccc

Related

Regex to match hexadecimal and integer numbers [duplicate]

In a regular expression, I need to know how to match one thing or another, or both (in order). But at least one of the things needs to be there.
For example, the following regular expression
/^([0-9]+|\.[0-9]+)$/
will match
234
and
.56
but not
234.56
While the following regular expression
/^([0-9]+)?(\.[0-9]+)?$/
will match all three of the strings above, but it will also match the empty string, which we do not want.
I need something that will match all three of the strings above, but not the empty string. Is there an easy way to do that?
UPDATE:
Both Andrew's and Justin's below work for the simplified example I provided, but they don't (unless I'm mistaken) work for the actual use case that I was hoping to solve, so I should probably put that in now. Here's the actual regexp I'm using:
/^\s*-?0*(?:[0-9]+|[0-9]{1,3}(?:,[0-9]{3})+)(?:\.[0-9]*)?(\s*|[A-Za-z_]*)*$/
This will match
45
45.988
45,689
34,569,098,233
567,900.90
-9
-34 banana fries
0.56 points
but it WON'T match
.56
and I need it to do this.
The fully general method, given regexes /^A$/ and /^B$/ is:
/^(A|B|AB)$/
i.e.
/^([0-9]+|\.[0-9]+|[0-9]+\.[0-9]+)$/
Note the others have used the structure of your example to make a simplification. Specifically, they (implicitly) factorised it, to pull out the common [0-9]* and [0-9]+ factors on the left and right.
The working for this is:
all the elements of the alternation end in [0-9]+, so pull that out: /^(|\.|[0-9]+\.)[0-9]+$/
Now we have the possibility of the empty string in the alternation, so rewrite it using ? (i.e. use the equivalence (|a|b) = (a|b)?): /^(\.|[0-9]+\.)?[0-9]+$/
Again, an alternation with a common suffix (\. this time): /^((|[0-9]+)\.)?[0-9]+$/
the pattern (|a+) is the same as a*, so, finally: /^([0-9]*\.)?[0-9]+$/
Nice answer by huon (and a bit of brain-twister to follow it along to the end). For anyone looking for a quick and simple answer to the title of this question, 'In a regular expression, match one thing or another, or both', it's worth mentioning that even (A|B|AB) can be simplified to:
A|A?B
Handy if B is a bit more complex.
Now, as c0d3rman's observed, this, in itself, will never match AB. It will only match A and B. (A|B|AB has the same issue.) What I left out was the all-important context of the original question, where the start and end of the string are also being matched. Here it is, written out fully:
^(A|A?B)$
Better still, just switch the order as c0d3rman recommended, and you can use it anywhere:
A?B|A
Yes, you can match all of these with such an expression:
/^[0-9]*\.?[0-9]+$/
Note, it also doesn't match the empty string (your last condition).
Sure. You want the optional quantifier, ?.
/^(?=.)([0-9]+)?(\.[0-9]+)?$/
The above is slightly awkward-looking, but I wanted to show you your exact pattern with some ?s thrown in. In this version, (?=.) makes sure it doesn't accept an empty string, since I've made both clauses optional. A simpler version would be this:
/^\d*\.?\d+$/
This satisfies your requirements, including preventing an empty string.
Note that there are many ways to express this. Some are long and some are very terse, but they become more complex depending on what you're trying to allow/disallow.
Edit:
If you want to match this inside a larger string, I recommend splitting on and testing the results with /^\d*\.?\d+$/. Otherwise, you'll risk either matching stuff like aaa.123.456.bbb or missing matches (trust me, you will. JavaScript's lack of lookbehind support ensures that it will be possible to break any pattern I can think of).
If you know for a fact that you won't get strings like the above, you can use word breaks instead of ^$ anchors, but it will get complicated because there's no word break between . and (a space).
/(\b\d+|\B\.)?\d*\b/g
That ought to do it. It will block stuff like aaa123.456bbb, but it will allow 123, 456, or 123.456. It will allow aaa.123.456.bbb, but as I've said, you'll need two steps if you want to comprehensively handle that.
Edit 2: Your use case
If you want to allow whitespace at the beginning, negative/positive marks, and words at the end, those are actually fairly strict rules. That's a good thing. You can just add them on to the simplest pattern above:
/^\s*[-+]?\d*\.?\d+[a-z_\s]*$/i
Allowing thousands groups complicates things greatly, and I suggest you take a look at the answer I linked to. Here's the resulting pattern:
/^\s*[-+]?(\d+|\d{1,3}(,\d{3})*)?(\.\d+)?\b(\s[a-z_\s]*)?$/i
The \b ensures that the numeric part ends with a digit, and is followed by at least one whitespace.
Maybe this helps (to give you the general idea):
(?:((?(digits).^|[A-Za-z]+)|(?<digits>\d+))){1,2}
This pattern matches characters, digits, or digits following characters, but not characters following digits.
The pattern matches aa, aa11, and 11, but not 11aa, aa11aa, or the empty string.
Don't be puzzled by the ".^", which means "a character followd by line start", it is intended to prevent any match at all.
Be warned that this does not work with all flavors of regex, your version of regex must support (?(named group)true|false).

find a regular expression where a is never immediately followed by b (Theory of formal languages)

I need to find a simplified regular expression for the language of all strings
of a's, b's, and c's where a is never immediately followed by b.
I tried something and reached till (a+c)*c(b+c)* + (b+c)*(a+c)*
Is this fine and if so can this be simplified?
Thanks in advance.
You are looking for a negative lookbehind:
(?<!a)b
This will find you all the b instances that are not immediately following a
Or a negative lookahead:
a(?!b)
This will find you all the a instances that are not immediately followed by b
Here is a regex101 example for the lookbehind:
https://regex101.com/r/RsqXbW/1
Here is a regex101 example for the lookahead:
https://regex101.com/r/qiDIZU/1
You solution contains only strings from the desired language. However, it does not contain all of them. For example acbac is not contained. Your basic idea is fine, but you need to be able to iterate the possible factors. In:
(b+c)*(a (a)*(c(b+c)*)*)*
the first part generates all strings withhout a.
After the first a there come either nothing, another a or c. Another a leaves us with the same three options. c basically starts the game again. This is what the part after the first a formalizes. The many * are needed to possibly generate the empty string in all of the different options.

Match a pattern after a pattern?

Can you match a pattern in text that occurs after a pattern for instance in:
ssasabafra
Match all the a's after the b? Ive tried using a look behind like so:
(?<=b)[a]+
But it only matches the first a is there a way to match all occurences after b?
If you are using an expression engine that allows repetition in lookbehind expressions, how about:
(?<=b.*?)a
This looks behind for a b followed by any number of characters, and matches a
For most regex engines however, I don't think this is possible. But, what you can do is split the string on b, match the second part with /a/, then join the two strings again with b.
How about this:
a(?=[^b]*$)
However, this doesn't make sure that there is some b before the a. I guess you want to match all the a that is not followed by some substring containing b.
See demo on RegexPal
If you want to make sure that, there must be b somewhere before the a, then you should probably use the string manipulating functions, in your particular programming language.

Regular Expression to match sequences of one or more letters except for a specific value

Looking for some help with a Regular Expression to do the following:
Must be Alpha Char
Must be at least 1 Char
Must NOT be a specific value, e.g. != "Default"
Thanks for any help,
Dave
Use a negative lookahead:
^(?!Default)[a-zA-Z]+$
Solve this in two steps:
compare against the regular expression [a-zA-Z]+ which means "one or more of the letters from a-z or A-Z
if it passes that test, look it up in a list of specific values you are guarding against.
There's no point in trying to cram these two tests into a single complex regular expression that you don't understand. A good rule of thumb with regular expressions is, if you have to ask someone how to do it, you should strive to use the least complex solution possible. If you don't understand the regular expression you won't be able to maintain the code over time.
In pseudocode:
if regexp_matches('[a-zA-Z]+', string) && string not in ['Default', 'Foobar', ...] {
print "it's a keeper!"
}

Regex: How to optionally match something at beginning or end, but not both?

I have situation where in the regular expression is something like this:
^b?A+b?$
So b may match at the start of the string 0 or 1 times, and A must match one or more times. Again b may match at the end of the string 0 or 1 times.
Now I want to modify this regular expression in such way that it may match b either at the start or at the end of the string, but not both.
How do I do this?
Theres a nice "or" operator in regexps you can use.
^(b?A+|A+b?)$
Try this:
^(bA+|A+b?)$
This allows a b at the start and then at least one A, or some As at the start and optionally a b at the end. This covers all the possibilities and is slightly faster than the accepted answer in the case that it doesn't match as only one of the two options needs to be tested (assuming A cannot begin with b).
Just to be different from the other answers here, if the expression A is quite complex but b is simple, then you might want to do it using a negative lookahead to avoid repeating the entire expression for A in your regular expression:
^(b(?!.*b$))?A+b?$
The second might be more readable if your A is complex, but if performance is an issue I'd recommend the first method.
^(b+A+b?|b?A+b+)$
why doesn't that work?