Unable to fugure out the Regular Expression of the following - regex

I've been working on Compiler Design lately and found Regular Expression quite tricky.
So I am making a lexical analyzer for which I need lexical specification.
I'm unable to figure out the RE of identifiers (Rules defined below):
Maximum 4 characters
At least 1 Alphabet
What I have already tried:
(letter|digit){4} // I read that we can limit occurrence like this. But in this case, 11aa will also be accepted.
I think I can rewrite the above statement like this as well.
(letter|digit)(letter|digit)(letter|digit)(letter|digit)
Please correct me if I'm wrong and thanks in advance!

The tricky thing about this task is to make sure we have at least one letter.
And that letter could be at any of four positions.
(letter)(letter|digit){0,3} | (letter|digit)(letter)(letter|digit){0,2} | (letter|digit){2}(letter)(letter|digit){0,1} | (letter|digit){3}(letter)

Related

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
cat
car
bat
I want to add to this regex and output only matches that start with 'ca',
cat
car
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
^ca[tr]
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
^ca(cat|car|bat)
(?<=ca(cat|car|bat))
(?<=^ca(cat|car|bat))
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
(?=ca)(cat|car|bat)
(?=ca)([a-z]{4})
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see https://regexr.com/3f4vo). However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

Regex that matches even amount of character

Disclamer (after solved): this is my uni assignment thus I the answer could be simple. Hints are shown but my answer is hidden from here. Alternative answers could be found here but I take no responsibility with any plagiarism with direct answers posted here.
Hi I'm having troubles with the following exercise
Find regex that strictly represents the language:
b^(m+1), such that m>=0, m mod 2 = 1
The language breaks down to words:
{bb,bbbb,bbbbbb,bbbbbbbb,...}
I have tried the following:
b(bbb)?(bb)*
But this also accepts
{bb,bbb,bbbb,bbbbb,...}
Is there a way to write it such one bit of expression is depended on the other? ie: (bb)* cannot be chosen if (bbb)? is chosen at once, then repeat the decision but allow the vice versa.
Any help would be appreciated. Thanks
Update:-
You can use
^(?:bb)+$
Regex Demo
Initial heading of question was --> Regex that matches odd amount of character
You can try this
^b(?:(?:b{2})+)?$
Regex Demo
My guess is that, this might be closer,
^(?:bb){1,}$
and your set might look like,
bb
bbbb
bbbbbb
not sure though. If your set was correct, expression can likely be modified.
also, b would not probably be in the set, since m=0 does not pass the second requirement.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

Regular Expression to find CVE Matches

I am pretty new to the concept of regex and so I am hoping an expert user can help me craft the right expression to find all the matches in a string. I have a string that represents a lot of support information in it for vulnerabilities data. In that string are a series of CVE references in the format: CVE-2015-4000. Can anyone provide me a sample regex on finding all occurrences of that ? obviously, the numeric part of that changes throughout the string...
Generally you should always include your previous efforts in your question, what exactly you expect to match, etc. But since I am aware of the format and this is an easy one...
CVE-\d{4}-\d{4,7}
This matches first CVE- then a 4-digit number for the year identifier and then a 4 to 7 digit number to identify the vulnerability as per the new standard.
See this in action here.
If you need an exact match without any syntax or logic violations, you can try this:
^(CVE-(1999|2\d{3})-(0\d{2}[1-9]|[1-9]\d{3,}))$
You can run this against the test data supplied by MITRE here to test your code or test it online here.
I will add my two cents to the accepted answer. Incase we want to detect case insensitive "CVE" we can following regex
r'(?i)\bcve\-\d{4}-\d{4,7}'

Regular Expression with specific criteria

Hey everyone, I'm trying to type a regular expression that follows the following format:
someone#somewhere.com or some.one#some.where.com
There are no special characters or numbers permitted for this criteria. I thought I had it down, but I'm a bit rusty with regular expressions and when I tested mine, it failed all across the boards. So far, my regular is expression is:
^[a-zA-Z]+/.?[a-zA-Z]*#[a-zA-Z]+/.?[a-zA-Z]*/.com$
If anyone could help me, it would greatly be appreciated, thanks.
your regex looks good. I think you need to change the / to \ in front of the . .
Additionally, if you don't want someone.#somewhere..com pass your regex, u should change your regex to
^[a-zA-Z]+(\.[a-zA-Z]+)?#[a-zA-Z]+(\.[a-zA-Z]+)?\.com$
(not completely sure about the brackets () though, but i think that should be working)
its a backslash to espace dots. Also put the the parenthesis around the . and what follows otherwise an email like abc.#cde..com would be valid.
^[a-zA-Z]+(\.[a-zA-Z]+)?#[a-zA-Z]+(\.[a-zA-Z]+)?\.com$
It looks mostly OK. Change your / to \ though...
For the second case, I would ensure that if you have a . in the middle, it must be followed by more letters:
^[a-zA-Z]+(\.[a-zA-Z]+)?#[a-zA-Z]+(\.[a-zA-Z]+)?\.com$

Regular Expression matching anything after a word

I am looking to find anything that matches this pattern, the beginning word will be:
organism aogikgoi egopetkgeopt foprkgeroptk 13
So anything that starts with organism needs to be found using regex.
^organism will match anything starting with "organism".
^organism(.*) will also capture everything that follows, into the variable that contains the first match (which varies according to language -- in Perl it's $1).
Also just wanna add for others newbies like me and their various circumstances, you can do it in various ways depending on your text and what you are tryna do.
Like here's an Example where I wanna delete everything after ?spam so I could use .?spm.+ or .?spm.+ or any other ways as long you are creative about it lol.
This might come in handy, here's a Link | Link where you can find some basic necessary regex and their meanings.