Floating Point - Regular expression - regex

I am struggling to understand this simple regular expression. I have the following attempt:
[0-9]*\.?[0-9]*
I understand this as zero-to-many numeric digits, followed by one-to-zero periods and finally ending in zero-to-many numeric digits.
I am not want to match anything other than exactly as above. I do not want positive/negative support or any other special support types. However, for some reason, the above also matches what appear to be random characters. All of the following for whatever reason match:
f32
32a
32-
=33
In an answer, I am looking for:
An explanation of why my regular expression does not work.
A working version with an explanation of why it does work.
Edit: Due to what seems to be causing trouble, I have added the "QT" tag, that is the environment I am working with.
Edit: Due to continued confusion, I am going to add a bit of code. I am starting to think I am either misusing QT, or QT has a problem:
void subclassedQDialog::setupTxtFilters()
{
QRegExp numbers("^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$");
txtToFilter->setValidator(new QRegExpValidator(numbers,this));
}
This is from within a subclassed QDialog. txtToFilter is a QLineEdit. I can provide more code if someone can suggest what may be relevant. While the expression above is not the original, it is one of the ones from comments below and also fails in the same way.

Your problem is you haven't escaped the \ properly, you need to put \\. Otherwise the C++ compiler will strip out the \ (at least gcc does this, with a warning) and the regex engine will treat the . as any character.

Put ^ at the start and $ at the end. This anchors your regex to the start and end of the string.

Your expression find a match in the middle of the string. If you add anchors to the beginning and to the end of your expression, the strings from your list will be ignored. Your expression would match empty strings, but that't the price you pay for being able to match .99 and 99. strings.
^[0-9]*\.?[0-9]*$
A better choice would be
^[0-9]*(\.[0-9]+)?$
because it would match the decimal point only if at least one digit is present after it.

One of them needs to be a + instead of *. Do you want to allow ".9" to be valid, or will you require the leading 0?

Related

Regex alphanumeric with hyphen, single quotes, and single spacing is timing out (crashing)

I have the following regular expression that I use but it crashes in my browsers (does nothing and then likely times out).
I am trying to accept alphanumeric, as well as dashes and single quotes. I'm also trying to restrict spacing to allow only single spaces (no more than one space consecutively)
<constant>
<constant-name>expressionFormat</constant-name>
<constant-value>^([a-zA-Z0-9'-]+\s?)*$</constant-value>
</constant>
A sample example string that crashes with this is:
"ABCDEFGHIJKLMNOPQ43 5343443RSTUVWXYZ0123456789 ‘ –"
I'm using Struts. Any tips on what I'm doing wrong? Thanks in advance!
I've found a solution.
My OLD expression:
^([a-zA-Z0-9'-]+\s?)*$
First off, I got rid of the \s since it includes other things like tabs, new lines, etc, which I do not want.
The ? is "greedy", which means if the regex fails it continues evaluating the rest of the string until it's sure it's going to return a failure... In essence, the + and ? were making it try and check recursively making it resource intensive for longer strings.
The following expression works much better for my case:
^([a-zA-Z0-9' -])*$
I believe that the browser is just taking a really long time to process the regex search and may even be timing out.
Your sample string
ABCDEFGHIJKLMNOPQ43 5343443RSTUVWXYZ0123456789 ‘ –
will not be matched by your regular expression:
^([a-zA-Z0-9'-]+\s?)*$
Add the special characters (‘ ’ — –), i.e.,
‘ ’ — –
if you want to accept them.
^([a-zA-Z0-9'‘’—–-]+\s?)*$
This regex matches your sample string.
UPDATE:
Try this regex that uses atomic grouping to avoid catastrophic backtracking:
^(?>[a-zA-Z0-9'-]+\s?)*$

Notepad++ masschange using regular expressions

I have issues to perform a mass change in a huge logfile.
Except the filesize which is causing issues to Notepad++ I have a problem to use more than 10 parameters for replacement, up to 9 its working fine.
I need to change numerical values in a file where these values are located within quotation marks and with leading and ending comma: ."123,456,789,012.999",
I used this exp to find and replace the format to:
,123456789012.999, (so that there are no quotation marks and no comma within the num.value)
The exp used to find is:
([,])(["])([0-9]+)([,])([0-9]+)([,])([0-9]+)([,])([0-9]+)([\.])([0-9]+)(["])([,])
and the exp to replace is:
\1\3\5\7\9\10\11\13
The problem is parameters \11 \13 are not working (the chars eg .999 as in the example will not appear in the changed values).
So now the question is - is there any limit for parameters?
It seems for me as its not working above 10. For shorter num.values where I need to use only up to 9 parameters the string for serach and replacement works fine, for the example above the search works but not the replacement, the end of the changed value gets corrupted.
Also, it came to my mind that instead of using Notepad++ I could maybe change the logfile on the unix server directly, howerver I had issues to build the correct perl syntax. Anyone who could help with that maybe?
After having a little play myself, it looks like back-references \11-\99 are invalid in notepad++ (which is not that surprising, since this is commonly omitted from regex languages.) However, there are several things you can do to improve that regular expression, in order to make this work.
Firstly, you should consider using less groups, or alternatively non-capture groups. Did you really need to store 13 variables in that regex, in order to do the replacement? Clearly not, since you're not even using half of them!
To put it simply, you could just remove some brackets from the regex:
[,]["]([0-9]+)[,]([0-9]+)[,]([0-9]+)[,]([0-9]+)[.]([0-9]+)["][,]
And replace with:
,\1\2\3\4.\5,
...But that's not all! Why are you using square brackets to say "match anything inside", if there's only one thing inside?? We can get rid of these, too:
,"([0-9]+),([0-9]+),([0-9]+),([0-9]+)\.([0-9]+)",
(Note I added a "\" before the ".", so that it matches a literal "." rather than "anything".)
Also, although this isn't a big deal, you can use "\d" instead of "[0-9]".
This makes your final, optimised regex:
,"(\d+),(\d+),(\d+),(\d+)\.(\d+)",
And replace with:
,\1\2\3\4.\5,
Not sure if the regex groups has limitations, but you could use lookarounds to save 2 groups, you could also merge some groups in your example. But first, let's get ride of some useless character classes
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
We could merge those groups:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
^^^^^^^^^^^^^^^^^^^^
We get:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(,)
Let's add lookarounds:
(?<=\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(?=,)
The replacement would be \2\4\6\8.
If you have a fixed length of digits at all times, its fairly simple to do what you have done. Even though your expression is poorly written, it does the job. If this is the case, look at Tom Lords answer.
I played around with it a little bit myself, and I would probably use two expressions - makes it much easier. If you have to do it in one, this would work, but be pretty unsafe:
(?:"|(\d+),)|(\.\d+)"(?=,) replace by \1\2
Live demo: http://regex101.com/r/zL3fY5

What is wrong with my simple regex that accepts empty strings and apartment numbers?

So I wanted to limit a textbox which contains an apartment number which is optional.
Here is the regex in question:
([0-9]{1,4}[A-Z]?)|([A-Z])|(^$)
Simple enough eh?
I'm using these tools to test my regex:
Regex Analyzer
Regex Validator
Here are the expected results:
Valid
"1234A"
"Z"
"(Empty string)"
Invalid
"A1234"
"fhfdsahds527523832dvhsfdg"
Obviously if I'm here, the invalid ones are accepted by the regex. The goal of this regex is accept either 1 to 4 numbers with an optional letter, or a single letter or an empty string.
I just can't seem to figure out what's not working, I mean it is a simple enough regex we have here. I'm probably missing something as I'm not very good with regexes, but this syntax seems ok to my eyes. Hopefully someone here can point to my error.
Thanks for all help, it is greatly appreciated.
You need to use the ^ and $ anchors for your first two options as well. Also you can include the second option into the first one (which immediately matches the third variant as well):
^[0-9]{0,4}[A-Z]?$
Without the anchors your regular expression matches because it will just pick a single letter from anywhere within your string.
Depending on the language, you can also use a negative look ahead.
^[0-9]{0,4}[A-Za-z](?!.*[0-9])
Breakdown:
^[0-9]{0,4} = This look for any number 0 through 4 times at the beginning of the string
[A-Za-z] = This look for any characters (Both cases)
(?!.*[0-9]) = This will only allow the letters if there are no numbers anywhere after the letter.
I haven't quite figured out how to validate against a null character, but that might be easier done using tools from whatever language you are using. Something along this logic:
if String Doesn't equal $null Then check the Rexex
Something along those lines, just adjusted for however you would do it in your language.
I used RegEx Skinner to validate the answers.
Edit: Fixed error from comments

Assistance with a regular expression

I am not good with regular expressions, and I could use some help with a couple of expressions I am working on. I have a line of text, such as Text here then 999-99 and I'd like to isolate that number sequence at the end. It could be either 999-99 or 999-99-9. The following seems to work:
\d{3}-\d{2}(-\d{1})?
But I notice that it really just seems to be searching anywhere within the text, as I can add text after the number sequence and it still matches. This needs to be more strict, so that the line must end with this exact sequence, and nothing after it. I tried ending with $ instead of ?, but that never seems to create a match (it always returns false).
I could also use some help with character replacement. I am working on a program which deals with OCR scanning, and occasionally the string value that comes back contains undisplayable characters, represented by the ܀ symbol. Is there a regular expression which will replace the ܀ characters with a space?
Try this regular expression.
([\d-]+)$
This should work. Just end your regex with $. It represents end of line
\d{3}-\d{2}(-\d{1})?$
Use the word-boundary metacharacter, \b:
\b\d{3}-\d{2}(-\d)?\b
You can also remove the {1} from the last \d since it's redundant.

Regular Expression to List accepted words

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?
The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$
Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)
I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.