I'm doing this for homework. I need to write a regular expression for a language over (a, b) that includes all strings not included in a language (a*b)*
for example 'aaaaaaabaaaaaaaabaaaaaaabaaaabaaaaaaaaab' would work. So I'm looking for a regular expression for allll the strings which are not included in that.
Can you help me at least get on the right step to figure it out?
I know that a*b means as many a's as we want followed by one b. Then that whole sort as many times as we want.
It seems that your (a*b)* regular expression matches anything that ends in a b or is empty.
so a regular expression that matches anything that end in a seems to be the solution..
a$ or /a$/
[^ab]*
This probably should do the trick.
This says the string should not have any a or b. ^ is the symbol for negation.
You could iterate over the String and check for characters besides 'a' and 'b'. It seems chains of 'b's are allowed and the empty string are a part of the language, since the Kleene star allows for zero instances of the character.
Related
I need to find a simplified regular expression for the language of all strings
of a's, b's, and c's where a is never immediately followed by b.
I tried something and reached till (a+c)*c(b+c)* + (b+c)*(a+c)*
Is this fine and if so can this be simplified?
Thanks in advance.
You are looking for a negative lookbehind:
(?<!a)b
This will find you all the b instances that are not immediately following a
Or a negative lookahead:
a(?!b)
This will find you all the a instances that are not immediately followed by b
Here is a regex101 example for the lookbehind:
https://regex101.com/r/RsqXbW/1
Here is a regex101 example for the lookahead:
https://regex101.com/r/qiDIZU/1
You solution contains only strings from the desired language. However, it does not contain all of them. For example acbac is not contained. Your basic idea is fine, but you need to be able to iterate the possible factors. In:
(b+c)*(a (a)*(c(b+c)*)*)*
the first part generates all strings withhout a.
After the first a there come either nothing, another a or c. Another a leaves us with the same three options. c basically starts the game again. This is what the part after the first a formalizes. The many * are needed to possibly generate the empty string in all of the different options.
This has been gone over but I've not found anything that works consistently... or assist me in learning where I've gone awry.
I have file names that start with 3 or more digits and ^\d{3}(.*) works just fine.
I also have strings that start with the word 'account' and ^ACCOUNT(.*) works just fine for these.
The problem I have is all the other strings that DO NOT meet the two previous criteria. I have been using ^[^\d{3}][^ACCOUNT](.*) but occasionally it fails to catch one.
Any insights would be greatly appreciated.
^[^\d{3}][^ACCOUNT](.*)
That's definitely not what you want. Square brackets create character classes: they match one character from the list of characters in brackets. If you put a ^ then the match is inverted and it matches one character that's not listed. The meaning of ^ inside brackets is completely different from its meaning outside.
In short, [] is not at all what you want. What you can do, if your regex implementation supports it, is use a negative lookahead assertion.
^(?!\d{3}|ACCOUNT)(.*)
This negative lookahead assertion doesn't match anything itself. It merely checks that the next part of the string (.*) does not match either \d{3} or ACCOUNT.
Demorgan's law says: !(A v B) = !A ^ !B.
But unfortunately Regex itself does
not support the negation of expressions. (You always could rewrite it, but sometimes, this is a huge task).
Instead, you should look at your Programming Language, where you can negate values without problems:
let the "matching" function be "match" and you are using match("^(?:\d{3}|ACCOUNT)(.)") to determine, whether the string matches one of both conditions. Then you could simple negate the boolean return value of that matching function and you'll receive every string that does NOT match.
L = {words such that the substring 'bb' is not present in in it
Given that the alphabet is A = {a,b}, is this language regular? If so, is there a regular expression that represents it?
Yes, this language is regular. Since this looks like homework, here's a hint: if the string bb isn't present, then the string consists of lots of blocks of strings of the form a* or a*b. Try seeing how to assemble the solution from this starting point.
EDIT: If this isn't a homework problem, here's one possible solution:
(a*(ba+)*b?)?
The idea is to decompose the string into a lot of long sequences of as with some b's interspersed in-between them. The first block of a's is at the front. Then, we repeatedly place down a b, at least one a, and then any number of additional as. Finally, we may optionally have one b at the end. As an alternative, we could have the empty string, so the entire thing is guarded by a ?.
Hope this helps!
I'm having trouble finding a regular expression for the following problem
All strings over the alphabet {a, b, c, d} with at least four instances of c and at least two instances of a
Use a look-ahead:
^(?=(.*c){4,})(?=(.*a){2,})[a-z]+
I'm not sure what you mean by "alphabet" - I have assumed "any letter", but if it's literally a,b,c and d, change [a-z]+ to [a-d]+
A bit more efficient than Bohemian's solution, and also anchored to make sure we don't just match a substring of a longer string that might contain unwanted characters:
^(?=(?:[^c]*c){4})(?=(?:[^a]*a){2})[a-z]+$
As outlined in comments, this question seems to relate to the strict mathematical theory of regular expressions as derived from set theory. In that case, lookaheads are not permitted; you need to enumerate the permitted sequences. For simplicity and clarity, I am omitting the .* which should go before, between, and after the symbols in the following list.
ccccaa|
cccaca|
cccaac|
ccacca|
ccacac|
ccacca|
ccaacc|
caccca|
caccac|
cacacc|
caaccc|
acccca|
acccac|
accacc|
acaccc|
aacccc
Is it possible to convert a properly formed (in terms of brackets) expression such as
((a and b) or c) and d
into a Regex expression and use Java or another language's built-in engine with an input term such as ABCDE (case-insensitive...)?
So far I've tried something along the lines of (b)(^.?)(a|e)* for the search b and (a or e) but it isn't really working out. I'm looking for it to match the characters 'b' and any of 'a' or 'e' that appear in the input string.
About the process - I'm thinking of splitting the input string into an array (based on this Regex) and receiving as output the characters that match (or none if the AND/OR conditions are not met). I'm relatively new to Regex and haven't spent a lot of time on it, so I'm sorry if what I'm asking about is not possible or the answer is really obvious.
Thanks for any replies.
The language of strings with balanced parentheses is not a regular language, which means no (pure) regular expression will match it.
That is because some kind of memory construct, usually a stack, is needed to maintain open parentheses.
That said, many languages offer recursive evaluation in regexes, notably Perl. I don't know the fine details, but I'm not going to bother with them because you can probably write your own parser.
Just iterate over every character in the string and keep track of a counter of open parentheses and a stack of strings. When you get to an open parentheses, push the stack in and put characters that aren't parentheses into string of the stack. When you get to a closed parentheses, evaluate the expression that you had built up and store the result onto the back of the string that's on the top of the stack.
Then again, I'm not fully sure I understand what you're doing. I apologize, then, if this is no help.
I'm not entirely certain I understand what you're trying to do, but here's something that might help. Start with something like
((a and b) or c) and d
And pass it through these substitution statements:
s/or/|/g
s/and| //g
s/([^()|])/(?=.*$1)/g
That will give you
(((?=.*a)(?=.*b))|(?=.*c))(?=.*d)
which is a regex that will match what you want.
No. A regex isn't computationally powerful enough to make sure that the opening and closing parentheses match. You need something that can describe it using a formal grammar.