TCL regexp for float fails at single digit - regex

I have developed the following regexp to capture float numbers.
([+-]?[0-9]+\.?[0-9]+([eE][-+]?[0-9]+)?)
It works fine for such things as 4.08955e-11 or 3.57. Now by stupid chance my parser came across 0 and failed. I guess I need to make all following the decimal point optional. But how do I do that?

Contrary to what one might think, matching every possible form of floating point number (including NaN etc) with a manageable regular expression that still discards e.g. impossibly large numbers or pseudo-octals is non-trivial.
There are some ideas about reducing the risk of false positives by using word boundaries, but note that those match boundaries between word characters (usually alphanumerics and underscore).
The scan command allows simple and reliable validation and extraction of floating point numbers:
scan $number %f

If you make all following the decimal point optional (which itself is optional) you could match values like 2.
Note that your regex does not match a single digit because you match 2 times one or more digits [0-9]+
If you only want to match float numbers or zero you could use an alternation and for example use word boundaries \b:
\b[-+]?(?:[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?|0)\b
Explanation
[-+]? Match optional + or -
\b Word boundary
(?: Non capturing group
[0-9]+\.[0-9]+ match one or more digits dot and one or more digits
(?:[eE][-+]?[0-9]+)? Optional exponent part
| Or
0 Match literally
) Close non capturing group
\b Word boundary
To match a float value that does not start with a dot and could be one or more digits without a dot you cold use use:
^[-+]?[0-9]+(?:\.[0-9]+)?(?:[eE][-+]?[0-9]+)?$

Perhaps using alternatives:
{[-+]?(?:\y[0-9]+(?:\.[0-9]*)?|\.[0-9]+\y)(?:[eE][-+]?[0-9]+\y)?}

Related

Regex to find a line with two capture groups that match the same regex but are still different

I am trying to analyse my source code (written in C) for not corresponding timer variable comparisons/allocations. I have a rage of timers with different timebases (2-250 milliseconds). Every timer variable contains its granularity in milliseconds in its name (e.g. timer10ms) as well as every timer-photo and define (e.g. fooTimer10ms, DOO_TIMEOUT_100MS).
Here are some example lines:
fooTimer10ms = timer10ms;
baaTimer20ms = timer10ms;
if (DIFF_100MS(dooTimer10ms) >= DOO_TIMEOUT_100MS)
if (DIFF_100MS(dooTimer10ms) < DOO_TIMEOUT_100MS)
I want to match those line where the timebases are not corresponding (in this case the second, third and fourth line). So far I have this regex:
(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))
that is capable of finding every line where there are two of those granularities. So instead of just line 2, 3 and 4 it matches all of them. The only idea I had to narrow it down is to add a negative lookbehind with a back-reference, like so:
(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))(?<!\1)
but this is not allowed because a negative lookbehind has to have a fixed length.
I found these two questions (one, two) but the fist does not have the restriction of having both capture groups being of the same kind and the second is looking for equal instances of the capture group.
If what I want can be achieved way easier, by using something else than regex, I would be happy to know. My mind is just stuck due to my believe that regex is capable of that and I am just not creative enough to use it properly.
One option is to match the timer part followed by the digits and use a negative lookahead with a backreference to assert that it does not occur at the right.
For the example data, a bit specific pattern using a range from 2-250 might be:
.*?(timer(?:2[0-4]\d|250|1?\d\d|[2-9])ms)\b\S*[^\S\r\n]*[<>]?=[^\S\r\n]*\b(?!\S*\1)\S+
The pattern matches
.*? Match any char except a newline, as least as possible (Non greedy)
( Capture group 1
timer Match literally
(?:2[0-4]\d|250|1?\d\d|[2-9]) Match a digit in the range of 2-250
ms Match literally
)\b Close group and a word boundary
\S*[^\S\r\n]* Match optional non whitespace chars and optional spaces without newlines
[<>]?= Match an optional < or > and =
[^\S\r\n]*\b Match optional whitespace chars without a newline and a word boundary
(?!\S*\1) Negative lookahead, assert no occurrence of what is captured in group 1 in the value
\S+ Match 1+ non whitespace chars
Regex demo
Or perhaps a broader pattern matching 1-3 digits and optional whitespace chars which might also match a newline:
.*?(timer\d{1,3}ms\b)\S*\s*[<>]?=\s*\b(?!.*\1)\S+
Regex demo
Note that {1-3} should be {1,3} and could also match 999

Regex for 9-digit phone number dot-separated

I would like to check if a phone number contains exactly 3 digits - dot - 3 digits - dot - 3 digits. (e.g. 123.456.789)
So far I have this, but it doesn't work:
^(\d{3}\){2}\d{4}$
Note that an escaped bracket \) loses its special meaning in regex and the pattern becomes invalid since the capturing group is not closed.
If you want to match a dot with a regex, you need to include it to your pattern, and if you say 3 digits must be at the end there is no point in declaring 4 digits with \d{4}.
^(\d{3}\.){2}\d{3}$
^ ^
or if we expand the first group:
^\d{3}\.\d{3}\.\d{3}$
So all the fix consists in adding a dot after the second backslash and adjusting the final limiting quantifier.
Note that for mostly "stylistics" concerns (since efficiency gain is insignificant) I'd use a non-capturing group with the first regex variant:
^(?:\d{3}\.){2}\d{3}$

RegExp: How do I include 'avoid non-numeric characters' from a pattern search?

I want to filter out all .+[0-9]. (correct way?) patterns to avoid duplicate decimal points within a numeral: (e.g., .12345.); but allow non-numerals to include duplicate decimal points: (e.g. .12345*.) where * is any NON-NUMERAL.
How do I include a non-numeral negation value into the regexp pattern? Again,
.12345. <-- error: erroneous numeral.<br/>
.12345(.' or '.12345*.' <-- Good.
I think you are looking for
^\d*(?:\.\d+)?(?:(?<=\d)[^.\d\n]+\.)?$
Here is a demo
Remember to escape the regex properly in Swift:
let rx = "^\d*(?:\\.\\d+)?(?:(?<=\\d)[^.\\d\\n]+\\.)?$"
REGEX EXPLANATION:
^ - Start of string
\d* - Match a digit optionally
(?:\.\d+)? - Match decimal part, 0 or 1 time (due to ?)
(?:(?<=\d)[^.\d\n]+\.)? - Optionally (due to ? at the end) matches 1 or more symbols preceded with a digit (due to (?<=\d) lookbehind) other than a digit ([^\d]), a full stop ([^.]) or a linebreak ([^\n]) (this one is more for demo purposes) and then followed by a full stop (\.).
$ - End of string
I am using non-capturing groups (?:...) for better performance and usability.
UPDATE:
If you prefer an opposite approach, that is, matching the invalid strings, you can use a much simpler regex:
\.[0-9]+\.
In Swift, let rx = "\\.[0-9]+\\.". It matches any substrings starting with a dot, then 1 or more digits from 0 to 9 range, and then again a dot.
See another regex demo
The non-numeral regex delimited character is \D. Conversely, if you're looking for only numerals, \d would work.
Without further context of what you're trying to achieve it's hard to suggest how to build a regex for it, though based on your example, (I think) this should work: .+\d+\D+

Regex to find integers and decimals in string

I have a string like:
$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;
I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).
I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.
Thanks for your help!
If you just want to grab the data, you can just use a loose regex:
([\d.]+)\s+(\S+)
([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.
Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).
You can get the number in the first capturing group, and the unit in the 2nd capturing group.
You can be a bit stricter on the number:
(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
\d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
\.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.
This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.
use this regular expression \b\d+([\.,]\d+)?
To get integers and decimals that either use a comma or a dot plus the next word, use the following regex:
/\d+([\.,]\d+)?\s\S+/

Regular expression to find the number, 0 or decimal

I'm looking for a regular expression which whill validate the number starting from 0 up - and might include decimals.
Any idea?
A simple regex to validate a number:
^\d+(\.\d+)?$
This should work for a number with optional leading zeros, with an optional single dot and more digits.
^...$ - match from start to end of the string (will not validate ab12.4c)
\d+ - At least one digit.
(...)? - optional group of...
\.\d+ - literal dot and one or more digits.
Because decimal numbers may or may not have a decimal point in them, and may or may not have digits before that decimal point if they have some afterwards, and may or may not have digits following that decimal point if they have some before it, you must use this:
^(\d+(\.\d*)?|\d*\.\d+)$
which is usually better written:
^(?:\d+(?:\.\d*)?|\d*\.\d+)$
and much better written:
(?x)
^ # anchor to start of string
(?: # EITHER
\d+ (?: \. \d* )? # some digits, then optionally a decimal point following by optional digits
| # OR ELSE
\d* \. \d+ # optional digits followed then a decimal point and more digits
) # END ALTERNATIVES
$ # anchor to end of string
If your regex compiler doesn’t support \d, or also depending on how Unicode-aware your regex engine is if you should prefer to match only ASCII digits instead of anything with the Unicode Decimal_Number property (shortcut Nd) — that is, anything with the Numeric_Type=Decimal property — then you might wish to swap in [0-9] for all instances above where I’ve used \d.
I always use RegExr to build my regular expressions. It is sort of drag-and-drop and has a live-preview of your regex and the result.
It'll look something like ^0[,.0-9]*
^[0-9]+(\.[0-9]+)?$
Note that with this expression 0.1 will be valid but .1 won't.
This should do what you want:
^[0-9]+([,.][0-9]+)?$
It will match any number starting with 0 and then any number, maybe a , or . and any number
'/^([0-9\.]+)$/'
will match if the test string is a positive decimal number