Regex for numbers including decimal points - regex

I'm still trying to pick up regex so I can't seem to figure this one out. I want to be able to match any type of number including things like
0.2
.1243
1.
-0.34
+033.98274E-10
-.1e+004
I have created the following regex which matches all of these: [+-]?[0-9+\.]+([E][+-]?[0-9]+)?, however this also matches single decimal points such as if I had something like param.attribute, it would pick up on that decimal point. How can I get around this? I thought that in the part [0-9+\.] the + would require that the string contain at least one numeric value.

You may use alternation to make sure either 1. or .1 is matched. Avoid making all subpatterns optional if you do not want to end up with a single period matched:
[-+]?(?:[0-9]+\.?[0-9]*|[0-9]*\.?[0-9]+)(?:[eE][-+]?[0-9]+)?
^--- Alernative 1 | Alternative 2-^
See regex demo
More "fun" facts about alternation in regular expressions:
You can use alternation to match a single regular expression out of several possible regular expressions.
If you want to search for the literal text cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog. If you want more options, simply expand the list: cat|dog|mouse|fish.
The alternation operator has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. If you want to limit the reach of the alternation, you need to use parentheses for grouping.
And here is my 5 cents: to keep the regex match as clean as possible and unless you need to access any of the alternatives after a match is found, use non-capturing groups (i.e. (?: ... )) with alternations.

Here's a proposition:
[+-]?([0-9]+\.?[0-9]*|0?\.[0-9]+)([Ee][+-]?[0-9]+)?
See the demo here

Related

Regex: split number into optional first group of up to three then last group of up to three

I have two 1-6 digit numbers separated by a slash. I want these split up into groups of at most 3 digits, taking from the right.
For example:
0/1 -> [,0,,1]
1234/3 -> [1,234,,3]
12345/1234 -> [12,345,1,234]
123456/789123 -> [123,456,789,123]
I need to use a regular expression to do this because I want to do this for a location in NGINX. It's possible to do this with application logic but that is not the question due to performance.
Similar question which solves part of this was here using a negative lookahead: Regular expression to match last number in a string
What regex can achieve this split?
UPDATE:
This regex comes close to what I want (https://regex101.com/r/bQtNdK/3):
(?<prefix1>\d{0,3}?)(?<threes1>\d{0,3})\/(?<prefix2>\d{0,3}?)(?=\d)(?<threes2>\d{0,3})
It fails matching if the second number behind the slash is more than 3 digits long.
UPDATE2:
Now this regex works for most combinations (https://regex101.com/r/bQtNdK/5):
(?<prefix1>\d{0,3}?)(?<threes1>\d{1,3})\/(?<prefix2>\d{0,3})(?<threes2>\d{3})
I don't understand why this starts to fail if I use the same regex for prefix2/threes2 like prefix1/threes1 (i.e. make prefix2 also lazy). Any ideas how to solve this? So close...
I don't know that it's possible without the ability for the regex engine to remember all intermediate matches of a match group that matched an arbitrary number of times (.NET can do this, not sure what others). PCRE will apparently only remember the 'last' match for each group, other wise you could use something like this : (?<prefix1>\d{0,2})(?:(?<threes1>\d{3})*)\/(?<prefix2>\d{0,2})(?<threes2>\d{3})*\s
This regex seems to be correct now (regex101):
(?<prefix1>\d{0,3}?)(?<suffix1>\d{1,3})\/(?<prefix2>\d{0,3}?)(?<suffix2>\d{1,3})\/

Using regex to match multiple comma separated words

I am trying to find the appropriate regex pattern that allows me to pick out whole words either starting with or ending with a comma, but leave out numbers. I've come up with ([\w]+,) which matches the first word followed by a comma, so in something like:
red,1,yellow,4
red, will match, but I am trying to find a solution that will match like like the following:
red, 1 ,yellow, 4
I haven't been able to find anything that can break strings up like this, but hopefully you'll be able to help!
This regex
,?[a-zA-Z][a-zA-Z0-9]*,?
Matches 'words' optionally enclose with commas. No spaces between commas and the 'word' are permitted and the word must start with an alphanumeric.
See here for a demo.
To ascertain that at least one comma is matched, use the alternation syntax:
(,[a-zA-Z][a-zA-Z0-9]*|[a-zA-Z][a-zA-Z0-9]*,)
Unfortunately no regex engine that i am aware of supports cascaded matching. However, since you usually operate with regexen in the context of programming environments, you could repeatedly match against a regex and take the matched substring for further matches. This can be achieved by chaining or iterated function calls using speical delimiter chars (which must be guaranteed not to occur in the test strings).
Example (Javascript):
"red, 1 ,yellow, 4, red1, 1yellow yellow"
.replace(/(,?[a-zA-Z][a-zA-Z0-9]*,?)/g, "<$1>")
.replace(/<[^,>]+>/g, "")
.replace(/>[^>]+(<|$)/g, "> $1")
.replace(/^[^<]+</g, "<")
In this example, the (simple) regex is tested for first. The call returns a sequence of preliminary matches delimted by angle brackets. Matches that do not contain the required substring (, in this case) are eliminated, as is all intervening material.
This technique might produce code that is easier to maintain than a complicated regex.
However, as a rule of thumb, if your regex gets too complicated to be easily maintained, a good guess is that it hasn't been the right tool in the first place (Many engines provide the x matching modifier that allows you to intersperse whitespace - namely line breaks and spaces - and comments at will).
The issue with your expression is that:
- \w resolves to this: [a-zA-Z0-9_]. This includes numeric data which you do not want.
- You have the comma at the end, this will match foo, but not ,foo.
To fix this, you can do something like so: (,\s*[a-z]+)|([a-z]+\s*,). An example is available here.

Possible to use a back reference in a number range?

I want to match a string where a number is equal or higher than a number in a capturing group.
Example:
1x1 = match
1x2 = match
2x1 = no match
In my mind the regex would look something like this (\d)x[\1-9] but this doesn't work. Is it possible to achieve this using regex?
As you've discovered, you cannot interpolate a value within a regex because:
Because character classes are determined when the regex is compiled... The only character class regex node type is "hard-coded list of characters" that was built when the regex was compiled (not after it ran part way and figured out what $1 might end up being).
[Source]
Since character classes do not permit backreferences, a backslash followed by a number is repurposed in a character class:
A backslash followed by two or three octal digits is considered an octal number.
[Source]
This obviously isn't what you intended by [\1-9]. But since there's no way to compile a character class until all characters are known, we'll have to find another way.
If we're looking to do this entirely within a regex we can't enumerate all possible combinations, because we'd have to check all the captures to figure out which one matched. For example:
"1x2" =~ m/(?:(0)x(\d)|(1)x([1-9])|(2)x([2-9])|(3)x([3-9])|(4)x([4-9])|(5)x([5-9])|(6)x([6-9])|(7)x([7-9])|(8)x([89])|(9)x(9))/
Will contain "1" in $3 and "2" in $4, but you'd have to search captures 1 to 20 to find if anything was matched each time.
The only way around doing post processing on regex results is to use a regex conditional: (?(A)X) Where A is a conditional and X is the resulting action.
Sadly conditionals are not supported by RE2, but we'll keep going just to demonstrate it can be done.
What you'd want to use for the X is (*F) (or (?!) in Ruby 2+) to force failure: http://www.rexegg.com/regex-tricks.html#fail
What you'd want to use for the A is ?{$1 > $2}, but only Perl will allow you to use code directly in a regex. Perl would allow you to use:
m/(\d)x(\d)(?(?{$1 > $2})(?!))/
[Live Example]
So the answer to your question is: "No, you cannot do this with RE2 which Google Analytics uses, but yes you can do this with a Perl regex."

Regex how to match two similar numbers in separate match groups?

I got the following string:
[13:49:38 INFO]: Overall : Mean tick time: 4.126 ms. Mean TPS:
20.000
the bold numbers should be matched, each into its own capture group.
My current expression is (\d+.\d{3}) which matches 4.126 how can I match my 20.000 now into a second capture group? Adding the same capture group again makes it find nothing. So what I basically need is, "search for first number, then ignore everything until you find next digit."
You could use something like so: (\d+\.\d{3}).+?(\d+\.\d{3})$ (example here) which essentially is your regex (plus a minor fix) twice, with the difference that it will also look for the same pattern again at the end of the string.
Another minor note, your regex contains, a potential issue in which you are matching the decimal point with the period character. In regular expression language, the period character means any character, thus your expression would also match 4s222. Adding an extra \ in front makes the regex engine treat is as an actual character, and not a special one.

How can I "inverse match" with regex?

I'm processing a file, line-by-line, and I'd like to do an inverse match. For instance, I want to match lines where there is a string of six letters, but only if these six letters are not 'Andrea'. How should I do that?
I'm using RegexBuddy, but still having trouble.
(?!Andrea).{6}
Assuming your regexp engine supports negative lookaheads...
...or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}
Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching; they leave that to whatever language you are using them with.
For Python/Java,
^(.(?!(some text)))*$
http://www.lisnichenko.com/articles/javapython-inverse-regex.html
In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:
^(?:(?!Andrea).)*$
This is called a tempered greedy token. The downside is that it doesn't perform well.
The capabilities and syntax of the regex implementation matter.
You could use look-ahead. Using Python as an example,
import re
not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)
To break that down:
(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then
\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]
\w{6} means exactly six word characters.
re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...
Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for six characters. Or first check for at least six word characters, and then check that it does not match Andrea.
Negative lookahead assertion
(?!Andrea)
This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.
If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.
On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)
On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.
I just came up with this method which may be hardware intensive but it is working:
You can replace all characters which match the regex by an empty string.
This is a oneliner:
notMatched = re.sub(regex, "", string)
I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.
This will only return you the string result, not any match objects!
(?! is useful in practice. Although strictly speaking, looking ahead is not a regular expression as defined mathematically.
You can write an inverted regular expression manually.
Here is a program to calculate the result automatically.
Its result is machine generated, which is usually much more complex than hand writing one. But the result works.
If you have the possibility to do two regex matches for the inverse and join them together you can use two capturing groups to first capture everything before your regex
^((?!yourRegex).)*
and then capture everything behind your regex
(?<=yourRegex).*
This works for most regexes. One problem I discovered was when I had a quantifier like {2,4} at the end. Then you gotta get creative.
In Perl you can do:
process($line) if ($line =~ !/Andrea/);