Single Regex for filtering roman numerals from the text files - regex

I am stuck in between of a problem where only one pass of regular expression is allowed( some old hard code). I need the regex for roman numerals.
I have tried the standard one i.e. ^(?i)M*(D?C{0,3}|C[DM])(L?X{0,3}|X[LC])(V?I{0,3}|I[VX])$, but the problem is it allows null('') values also.
Is there any way around to check is problem?

To require that at least one character must be present, you can use a lookahead (?=.) at the start of your regular expression:
^(?=.)(?i)M*(D?C{0,3}|C[DM])(L?X{0,3}|X[LC])(V?I{0,3}|I[VX])$
Another solution is to separately test that your string is not the empty string.

I like this one:
\b(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b

Related

Floating Point - Regular expression

I am struggling to understand this simple regular expression. I have the following attempt:
[0-9]*\.?[0-9]*
I understand this as zero-to-many numeric digits, followed by one-to-zero periods and finally ending in zero-to-many numeric digits.
I am not want to match anything other than exactly as above. I do not want positive/negative support or any other special support types. However, for some reason, the above also matches what appear to be random characters. All of the following for whatever reason match:
f32
32a
32-
=33
In an answer, I am looking for:
An explanation of why my regular expression does not work.
A working version with an explanation of why it does work.
Edit: Due to what seems to be causing trouble, I have added the "QT" tag, that is the environment I am working with.
Edit: Due to continued confusion, I am going to add a bit of code. I am starting to think I am either misusing QT, or QT has a problem:
void subclassedQDialog::setupTxtFilters()
{
QRegExp numbers("^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$");
txtToFilter->setValidator(new QRegExpValidator(numbers,this));
}
This is from within a subclassed QDialog. txtToFilter is a QLineEdit. I can provide more code if someone can suggest what may be relevant. While the expression above is not the original, it is one of the ones from comments below and also fails in the same way.
Your problem is you haven't escaped the \ properly, you need to put \\. Otherwise the C++ compiler will strip out the \ (at least gcc does this, with a warning) and the regex engine will treat the . as any character.
Put ^ at the start and $ at the end. This anchors your regex to the start and end of the string.
Your expression find a match in the middle of the string. If you add anchors to the beginning and to the end of your expression, the strings from your list will be ignored. Your expression would match empty strings, but that't the price you pay for being able to match .99 and 99. strings.
^[0-9]*\.?[0-9]*$
A better choice would be
^[0-9]*(\.[0-9]+)?$
because it would match the decimal point only if at least one digit is present after it.
One of them needs to be a + instead of *. Do you want to allow ".9" to be valid, or will you require the leading 0?

Using regular expression to findi any character or no character?

I'm trying to create a regular expression to find all lines that contain a specific character for example "a". Using
^(.+)a
will only render the lines that don't start with the character "a", but contain them. Is there a way to express any characters or no characters?
I think the regex a should work in most line-by-line matchers.
See linepogl's answer for character-matching the whole line.
Use
^.*a.*$
The + means at least one, while the * means none or more.
The answer is /a/. Its fairly rediculous to have the matching variable actually reproduce the contents of the entire source string.

Regular Expression to List accepted words

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?
The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$
Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)
I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.

Regex AND'ing

I have to two strings that I want to match everything that doesn't equal them, the first string can be followed by a number of characters. I tried something like this, negating two ors and negating that result.
?!(?!^.*[^Factory]$|?![^AppName])
Any ideas?
Try this regular expression:
(?!.*Factory$|.*AppName)^.*
This matches every string that does not end with Factory and does not contain AppName.
dfa's answer is by far the best option. But if you can't use it for some reason, try:
^(?!.*Factory|AppName)
It's very difficult to determine from your question and your regex what you're trying to do; they seem to imply opposite behaviors. The regex I wrote will not match if Factory appears anywhere in the string, or AppName appears at the beginning of it.
what about
if (!match("(Factory|AppName)")) {
// your code
}
Would it work if you looked for the existence of those two strings and then negated the regex?

Using an asterisk in a RegExp to extract data that is enclosed by a certain pattern

I have an text that consists of information enclosed by a certain pattern.
The only thing I know is the pattern: "${template.start}" and ${template.end}
To keep it simple I will substitute ${template.start} and ${template.end} with "a" in the example.
So one entry in the text would be:
aINFORMATIONHEREa
I do not know how many of these entries are concatenated in the text. So the following is correct too:
aFOOOOOOaaASDADaaASDSDADa
I want to write a regular expression to extract the information enclosed by the "a"s.
My first attempt was to do:
a(.*)a
which works as long as there is only one entry in the text. As soon as there are more than one entries it failes, because of the .* matching everything. So using a(.*)a on aFOOOOOOaaASDADaaASDSDADa results in only one capturing group containing everything between the first and the last character of the text which are "a":
FOOOOOOaaASDADaaASDSDAD
What I want to get is something like
captureGroup(0): aFOOOOOOaaASDADaaASDSDADa
captureGroup(1): FOOOOOO
captureGroup(2): ASDAD
captureGroup(3): ASDSDAD
It would be great to being able to extract each entry out of the text and from each entry the information that is enclosed between the "a"s. By the way I am using the QRegExp class of Qt4.
Any hints? Thanks!
Markus
Multiple variation of this question have been seen before. Various related discussions:
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Using regular expressions how do I find a pattern surrounded by two other patterns without including the surrounding strings?
Use RegExp to match a parenthetical number then increment it
Regex for splitting a string using space when not surrounded by single or double quotes
What regex will match text excluding what lies within HTML tags?
and probably others...
Simply use non-greedy expressions, namely:
a(.*?)a
You need to match something like:
a[^a]*a
You have a couple of working answers already, but I'll add a little gratuitous advice:
Using regular expressions for parsing is a road fraught with danger
Edit: To be less cryptic: for all there power, flexibility and elegance, regular expression are not sufficiently expressive to describe any but the simplest grammars. Ther are adequate for the problem asked here, but are not a suitable replacement for state machine or recursive decent parsers if the input language become more complicated.
SO, choosing to use RE for parsing input streams is a decision that should be made with care and with an eye towards the future.