Unsure of regex for numeric which can include unit chars - regex

I have the following regex which only allows numerics
^[0-9]+$
Trouble is, I also need to allow the user to enter a decimal aspect and unit
So these would all need to be valid
123
123456789
1.25M
1.2K
1.5B
12345.53M
0.5M
If anyone can help I'd be most grateful

Does this work for all your cases, and exclude all the cases that should not match?
^\d*\.?\d+[GMKB]?$
Explanation:
^\d* - Start with zero or more digits
\.? - Allow a decimal point, if there is one
\d+ - Require at least one digit (which might be after a decimal point)
[GMKB]? - Allow one of these 4 letters
$ - Don't allow any more characters after this sequence

Since you didn't specified one I assume you are using perl compatible regex engine. You can use this:
/^([0-9]*\.)?[0-9]+(B|K|M|G)?$/
I also assume that numbers like 0.1 can be written like .1. Having the units encapsulated in capturing group (B|K|M|G) makes it easy to extract it from results afterwards.
You can test the regex here

Related

Getting standalone numbers and not numeric-related codes

I’m new to regex, and I've spent a fair amount of time experimenting on regex testers, searching the web, etc. on the following issue. I’m using Python 3.7+.
Example Text String:
((AC00001234 + AC00005678) / 365) * (5 + 10)
Note - AC is always in uppercase and followed by exactly 8 digits.
Desired Outcome: A matched group with the following items. More specifically, any and all numbers not with the AC-prefix.
365
5
10
While I’ve tried more things than I can count, I’m come closest with a negative lookbehind (below). The problem is that the result is pulling in 00001234 and 00005678 as well. I’ve tried explicit character classes [0-9], adjusting some groupings, etc.
Current Code:
(?<!AC\d{8})\d+
Current Outcome:
00001234
00005678
365
5
10
On Stack Overflow, I looked at the following:
Negative lookbehind in a regex with an optional prefix, Match pattern not preceded or followed by string, Standalone numbers Regex?, and Regex to identify standalone numbers.
For simplicity, I've broken down the parsing into three other steps (e.g., extracting the AC-prefix codes only, math operators, etc.), and this piece is the final one I need to solve.
The obvious way to do it is this: (?<!AC)\d+ - a bunch of digits that is not preceded by AC. However, that fails, because it matches 0001234, as it is preceded by 0, and not AC. The missing piece is that you have to assert also that it is not preceded by a digit:
(?<!AC)(?<!\d)\d+
Depending on the possible input strings, a word boundary assertion can also do a similar job:
(?<!AC)\b\d+
Your code ((?<!AC\d{8})\d+) fails because it means "a bunch of digits not preceded by ACXXXXXXXX (where X is a digit). AC00001234 is not preceded by AC and eight more digits, so it is a match. You could kind of fix it by asserting it after the match: \d+(?<!AC\d{8}), but that fails for a similar reason - it will disqualify 00001234, but it does not disqualify 0000123, because there is no AC and eight digits in front of its end - only seven! so you still need a boundary assertion:
\d+(?<!AC\d{8})\b
However, this is less clear than the first two solutions (and also requires you to know the length of the ACXXXXXXXX string).

Regular expression for phone number with some exlussion

We have come across on validation for which we have following rules,
First character should always be "0" or "+46" or "0046"
Should be of length between 8 - 20 (including + if have)
And block following number,
(0900x, +46900x, 0046900x).
(0939x, +46939x, 0046939x).
(0944x, +46944x, 0046944x).
(099x, +4699x, 00469x).
Can you help me to create regular expression, I tried creating and testing it on https://regex101.com/ but it pretty hard for me to create it.
I guess the simplest way would be
^(?:0|\+46|0046)(?:900|939|944|99)
It starts by checking for country code or preceding 0 and then the 4 combinations of area code.
Check it out here at regex101.
Edit
If you want numbers not matching your rules, you can try the same with a negative look-ahead:
^(?!(?:0|\+46|0046)(?:900|939|944|99)).*
See this one here.
or without negative look-ahead:
^(?:[^0+]|0[^09]|\+[^4]|\+4[^6]|(?:\+46[^9])|(?:0046[^9])).*$
and this one here.
Edit 2
OK, here we go again ;)
This turned out to be a tough one. But here we go:
^(?=.{8,20}$)(?>\+46|0046|0(?!0))(?!900|939|944|99)\+?\d*
Add a positive look-ahead in the beginning to assert the length is correct
use a atomic group to match the country code - not allowing another double zero 00
Negativ look-ahead to disallow toll numbers.
Also changed so it only allows digits (and the optional + in the beginning).
See this one here.

Add two decimal digits to a number range regex

I've created a Regexp to validate a direction in degrees, between -359 and +359 (with optional sign). This is my regex:
const QString xWindDirectionPattern("[+-]{0,1}([0-9]{1,2}|[12][0-9]{2}|3[0-5][0-9])");
Now, I want to add two decimal numbers, in order to write numbers from -359.99 to +359.99. I've tried something like appending \.[0-9]{1,2}|[0-9]{1,3} but It does not work.
I'd like to have optional decimal point so I can have
23.3 valid
23.33 valid
23 valid
23.333 not valid
I've read some other questions, like this one, but I'm not able to modify the example to match a number range, like in my case.
How can I achieve this result?
Thanks in advance for your replies.
How can achieve this?
I've created a Regexp to validate a direction in degrees, between -359 and +359
No, you can't. You shouldn't. You are using the wrong tool. Regex cannot do the kinds of validation, which require it to dig into the semantics of the characters.
Regex can only process and match text, but cannot identify what they actually mean. Basically Regex are good for parsing regular language, and bad for almost everything else.
For e.g.:
A Regex can match 3 digits, but it would be extremely impractical to use it to match 3 digits that fall in range - [259, 634]. For that you would need to know the meaning of each individual digits in that number.
A Regex can match a pattern for date like - \d\d/\d\d/\d\d, but it cannot identify which part is date, and which part is month.
Similarly, it can find you two numbers x and y, but it cannot identify, whether x < y or not.
The task as above require you to understand the meaning of the text. Regex can't do that.
Well, of course you have come up with a regex for sure, but as you can see it is highly un-flexible. A little change in your requirement, will screw both - the regex and you.
You should better use corresponding language features - constructs like if-else to make sure you are reading degrees in that range, and not regex.
You can do this:
[+-]{0,1}((?:[0-9]{1,2}|[12][0-9]{2}|3[0-5][0-9])(?:\.[0-9]{1,2})?)
This will allow an a decimal point followed by one or two digits. You'll probably also want to use start and end anchors (^ / $) to ensure that there are no characters other than this pattern in your string—without this, 23.333 would be allowed because 23.33 matches the above pattern:
^[+-]{0,1}((?:[0-9]{1,2}|[12][0-9]{2}|3[0-5][0-9])(?:\.[0-9]{1,2})?)$
You can test it out here.
Try [+-]?([1-9]\d?|[12]\d{2}|3[0-5]\d)(\.\d{1,2})?.
[+-]? Optional Sign
[1-9]\d? 1 or 2 digit number
[12]\d{2} 100 to 299
3[0-5]\d 300 to 359
(\.\d{1,2})? Optional decimal point followed by 1 or two digits

Regex for detecting numbers away from each other?

so basically I want to detect if in these strings:
Hello 123 My 222 dear 112 troll 12 8889
192.1.1.254:10000
the numbers are in a format like this:
[0 to 255][ANYTHING][0 to 255][ANYTHING][0 to 255][ANYTHING][0 to 255][ANYTHING][0 to 65536]
Does anyone know how I can build such a regex?
It is for detecting if anyone posts an IP:Port in unusual format to bypass default ip:port filters.
Edit: As for the first comment: I do not know regex and what I have tried is:
if(regex_match("192.168 najlepszy serwer SAMP!!1 1 join1!! 8080","/^[0-2](*)?[0-5](*)?[0-5](*).(*)[0-2](*)?[0-5](*)?[0-5](*).(*)[0-2](*)?[0-5](*)?[0-5](*).(*)[0-2](*)?[0-5](*)?[0-5](*)?$/"))
{
print("Cannot send message");
}
else
{
print("New message for everyone! :)");
}
and some other not working regexes.
If you don't want to complicate your life checking the exact ranges, the simple regex would be:
/^.*(\d)+.+(\d)+.+(\d)+.+(\d)+.+(\d)+.*$/
The first four (\d)+ parts can be replaced with more complicated check for 0-255 range:
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
the last (\d)+ replace with next for port range check:
(6553[0-5]|655[0-2]\d|65[0-4]\d\d|6[0-4]\d\d\d|[1-5]\d\d\d\d|[1-9]\d{0,3})
An exact, simple, and direct representation of your pattern as a regular expression is not possible in the general case. The reason are the number ranges. Something like "at this place any integral number with a value from a to b" is just to complex. A regular expression is executed by a finite state machine and these (theoretical) beasts are (basically) only able to look at strings character by character. Therefore you can match something like "ignore all characters until you find the first digit, then check whether the first digit is followed by at most two more digits".
As a workaround you may try to build a list of alternations of possible digit patterns that covers your desired range of values (in the extreme case list every single value like \b(?:1|2|3|4|...|154|155|...|255)\b). I have a pattern for the range 0-255, but I have none for the range of possible port numbers. So a first approximation may be (really, this is only an approximation and not thoroughly tested):
\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b.*\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b.*\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b.*\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b[^0-9]*[0-9]{1,5}
In the above pattern (?: .... ) means a shy group (not remembered for back references) and \b means word boundary.
I'd suggest you read up on Regex syntax. For starters . is special and matches any character. Also doing something like [0-2][0-5][0-5] won't catch something like 192 as 9 is not within 0-5.
According to your requirements here's a Regex that should roughly do what you want
([0-2]?\d{1,2}).*([0-2]?\d{1,2}).*([0-2]?\d{1,2}).*([0-2]?\d{1,2}).*(\d{1,5})?
Each of the ([0-2]?\d{1,2}) portions will match 1 or 2 digits preceded optionally with a 0,1, or 2. Each () will capture a group which you can then examine using a Regex engine. You will need to examine this group as the Regex for each of those portions will match numbers above 255 (specifically 256-299).
The last group (\d{1,5})? is to catch the port number, again you will have to examine this as it will catch any 1 to 5 digit number (hence the {1,5}). The ? makes the group optional, remove it if you want it to have to match against a port number.
As far as doing Regex in C, I haven't had much experience but there should be a way to get all the grouped matches and inspect them. Unfortunately they will be strings so you will have to convert them to integers to examine them.
Are you sure you need regex for this? In my opinion, you do not need regex for this.
Just split numbers into groups which are seperated by non-numeric characters. Then analyze.
What language?
As for actually looking for valid range, take a look at this;
http://www.regular-expressions.info/numericranges.html
I would do this simple regex
((\d|\D)+)*

What's wrong with this number extracting Regex?

I have a string like the following:
<br><b>224h / 15.45 verbuchte Stunden</b>
I want to extract the numbers and have created the following Regex:
([0-9]\.?[0-9]{0,2})h\s\/\s([0-9]\.?[0-9]{0,2})
But for the preceding string this gives me the numbers 224 and 15 instead of 15.45.
What's wrong with this Regex?
Because you allow only one digit before the dot.
Try this, I used {1,2} as quantifier before the dot, change it to your needs. Probably + would be a better choice, it allows one or more.
([0-9]\.?[0-9]{0,2})h\s\/\s([0-9]{1,2}\.?[0-9]{0,2})
A better regex could be this
([0-9]+(?:\.[0-9]{1,2})?)h\s*\/\s*([0-9]+(?:\.[0-9]{1,2})?)
I made here the complete fraction part optional and require at least one and at most 2 digits after the dot and minimum one before.
The answer is given by stema.
If your regex engine supports character classes it could be a little bit more compact like this:
(\d{1,2}\.?\d{0,2})h\s/\s(\d{1,2}\.?\d{0,2})
\d is a shorthand character class for [0-9]