Definitions in flex lexical analyzer - How to define scientific notation - regex

I'm new to using flex and need to define real literal numbers, ie. 3.1 or 3.0e-10 as acceptable numbers. Here is what I have so far:
digit [0-9]
int {digit}+
real_literal ({digit}+)("."{digit}*)
From my understanding this works for decimals without accepting something like 12.52.23
How would I define numbers that accept scientific notation, such as 3.0e-10 like mentioned above?
Would it be something like this?
real_literal ({digit}+)("."{digit}*)|({digit}+)("."{digit})[Ee]["-""+"]{digit}+

A possible solution, a bit shorter than yours, would be:
{digit}+"."{digit}*([eE][-+]?{digit}+)?
Break down
{digit}+"."{digit} matches positive real numbers (it won't recognize integers).
(...)? enclosing the next part in parenthesis followed by a question mark makes what's inside optional.
[eE][-+]?{digit}+ match the exponent. Note that the - is not escaped within squared brackets in this case because it's the first or last character of the list, as #rici pointed out.
I just want to point out that you're not recognizing negative numbers here, but I don't know if that's intentional.

Related

Optimal regular expressions representing non-negative integers without leading zeroes except for zero itself

The only regular expression that I can think of that satisfies the title is
^(0|[1-9][0-9]*)$
I'm sure this is a relatively simple regular expression, but I get the feeling there are more efficient ways of writing it, like maybe using metacharacters. Am I missing anything?
I don't see any obvious improvements. You can use \d in place of [0-9]:
^(0|[1-9]\d*)$
but according to regex101, it does not lead to any speed improvement (6 steps for '0', 8 steps for any other match).
If you just want to show off your regex chops, there's always this:
^(0|(?!0)\d+)$
It uses negative lookahead 'zero or (not a zero followed by digits)', but that's actually slower (9 steps for any non-zero match).

how to find integer with comma and zeros after that (regex)?

I try to create regex(es) to extract all integers. It can be 6 -12 bur also +6.000 or -5,0 and onother one to extract real numbers which are not integers, for example 3.14, -6,26 but no 5.0.
For finding integers I tried "^[+-]?([0-9]+)(\\[.,]0{1,})?$" but it doesn't work on -6.00. And I have no idea how to create second regex (how to exclude integers with comas or dots and then zeros). Any help appreciated.
The problem with your integer regex appears to be the backslash(es). I don't know any regex engine in which you would need to escape the opening bracket of a character class, and you certainly don't want to match a literal backslash. Also, to a regex engine that understands it at all, the quantifier {1,} is an uglier, more complex way of saying +.
This should do your integer matching:
"^[+-]?[0-9]+([.,]0+)?$"
And this variation should do your non-integer matching:
"^[+-]?[0-9]+[.,]0*[1-9][0-9]*$"
In both cases I omitted parentheses not needed for expressing a correct pattern, but if you need to capture parts of the match then you will want to add some back in. You might also want to convert the grouping parentheses into non-capturing form if you are using a regex engine that supports it.
Also, the real number pattern requires at least one digit before the fraction separator character, per your examples. It would be easy to convert the pattern to also match strings of the form .1 or -.17. Similarly, the integer pattern requires at least one zero in the fraction part if there is a fraction separator, and restriction could be removed, too.

perl inline replace with quotient

I have text lines of the form
F xxx.xxx
where F may be followed by an arbitrary amount of whitespace, followed by an arbitrary number of digits, optionally followed by a decimal and decimal digits. I want to find these numbers, divide them by a variable, and replace them.
My code mostly works, but I can't escape to actually calculate division on the numbers.
my $devisor = 60.0;
s/F\s*?(\d+(\.\d+)?)/"NEWF$1\/$divisor"/;
How can I search, calculate, and inline replace like this?
If you use the '/e' modifier, the replacement part of s/// will be treated as a Perl expression and evaluated. Also, you probably want another separator, so you can write division nicely without escaping. Your parens will only capture the decimal part, not the whole number. And finally, devisor is a misspelling.
Thus, you can write s#F\s*(\d+(\/\d+)?)#$1/$divisor#e (assuming you actually define $divisor).

How to write this regular expression in Lua?

I'm new to the Lua regex equivalence features, I need to write the following regular expression, which should match numbers with decimals
\b[0-9]*.\b[0-9]*(?!])
Basically, it matches numbers in decimal format (eg: 1, 1.1, 0.1, 0.11), which do not end with ']', I've been trying to write a regex like this with Lua using string.gmatch, but I'm quite inexperienced with Lua matching expressions...
Thanks!
Lua does not have regular expressions, mainly because a full regular expression library would be bigger than Lua itself.
What Lua has instead are matching patterns, which are way less powerful (but still sufficient for many use cases):
There is no "word boundary" matcher,
no alternatives,
and also no lookahead or similar.
I think there is no Lua pattern which would match every possible occurrence of your string, and no other one, which means that you somehow must work around this.
The pattern proposed by Stuart, %d*%.?%d*, matches all decimal numbers (with or without a dot), but it also matches the empty string, which is not quite useful. %d+%.?%d* matches all decimal numbers with at least one digit before the dot (or without a dot), %d*%d.?%d+ matches all decimal numbers with at least one digit after the dot (or without a dot). %.%d+ matches decimal numbers without a digit before the dot.
A simple solution would be to search more than one of these patterns (for example, both %d+%.?%d* and %.%d+), and combine the results. Then look at the places where you found them and look if there is a ']' following them.
I experimented a bit with the frontier pattern.
The pattern %f[%.%d]%d*%.?%d*%f[^%.%d%]] matches all decimal numbers which are preceded by something that is neither digit nor dot (or by nothing), and followed by something that is neither ] nor digit nor dot (or by nothing). It also matches the single dot, though.
"%d*%.?%d+" will match all such numbers in decimal format (note that that's going to miss any signed numbers such as -1.1 or +3.14). You'll need to come up with another solution to avoid instances that end with ], such as removing them from the string before looking for the numbers:
local pattern = "%d*%.?%d+"
local clean = string.gsub(orig ,pattern .. "%]", "")
return string.gmatch(clean, pattern)

How can I check if at least one of two subexpressions in a regular expression match?

I am trying to match floating-point decimal numbers with a regular expression. There may or may not be a number before the decimal, and the decimal may or may not be present, and if it is present it may or may not have digits after it. (For this application, a leading +/- or a trailing "E123" is not allowed). I have written this regex:
/^([\d]*)(\.([\d]*))?$/
Which correctly matches the following:
1
1.
1.23
.23
However, this also matches empty string or a string of just a decimal point, which I do not want.
Currently I am checking after running the regex that $1 or $3 has length greater than 0. If not, it is not valid. Is there a way I can do this directly in the regex?
I think this will do what you want. It either starts with a digit, in which case the decimal point and digits after it are optional, or it starts with a decimal point, in which case at least one digit is mandatory after it.
/^\d+(\.\d*)?|\.\d+$/
Create a regular expression for each case and OR them. Then you only need test if the expression matches.
/^(\d+(\.\d*)?)|(\d*\.\d+)$/
A very late answer, but like to answer, taken from regular-expressions.info
[-+]?[\d]*\.?[\d]+?
Update This [\d]*\.?[\d]+?|[\d]+\. will help you matching 1.
http://regex101.com/r/lJ7fF4/7