Regex for detecting numbers away from each other? - regex

so basically I want to detect if in these strings:
Hello 123 My 222 dear 112 troll 12 8889
192.1.1.254:10000
the numbers are in a format like this:
[0 to 255][ANYTHING][0 to 255][ANYTHING][0 to 255][ANYTHING][0 to 255][ANYTHING][0 to 65536]
Does anyone know how I can build such a regex?
It is for detecting if anyone posts an IP:Port in unusual format to bypass default ip:port filters.
Edit: As for the first comment: I do not know regex and what I have tried is:
if(regex_match("192.168 najlepszy serwer SAMP!!1 1 join1!! 8080","/^[0-2](*)?[0-5](*)?[0-5](*).(*)[0-2](*)?[0-5](*)?[0-5](*).(*)[0-2](*)?[0-5](*)?[0-5](*).(*)[0-2](*)?[0-5](*)?[0-5](*)?$/"))
{
print("Cannot send message");
}
else
{
print("New message for everyone! :)");
}
and some other not working regexes.

If you don't want to complicate your life checking the exact ranges, the simple regex would be:
/^.*(\d)+.+(\d)+.+(\d)+.+(\d)+.+(\d)+.*$/
The first four (\d)+ parts can be replaced with more complicated check for 0-255 range:
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
the last (\d)+ replace with next for port range check:
(6553[0-5]|655[0-2]\d|65[0-4]\d\d|6[0-4]\d\d\d|[1-5]\d\d\d\d|[1-9]\d{0,3})

An exact, simple, and direct representation of your pattern as a regular expression is not possible in the general case. The reason are the number ranges. Something like "at this place any integral number with a value from a to b" is just to complex. A regular expression is executed by a finite state machine and these (theoretical) beasts are (basically) only able to look at strings character by character. Therefore you can match something like "ignore all characters until you find the first digit, then check whether the first digit is followed by at most two more digits".
As a workaround you may try to build a list of alternations of possible digit patterns that covers your desired range of values (in the extreme case list every single value like \b(?:1|2|3|4|...|154|155|...|255)\b). I have a pattern for the range 0-255, but I have none for the range of possible port numbers. So a first approximation may be (really, this is only an approximation and not thoroughly tested):
\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b.*\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b.*\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b.*\b(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b[^0-9]*[0-9]{1,5}
In the above pattern (?: .... ) means a shy group (not remembered for back references) and \b means word boundary.

I'd suggest you read up on Regex syntax. For starters . is special and matches any character. Also doing something like [0-2][0-5][0-5] won't catch something like 192 as 9 is not within 0-5.
According to your requirements here's a Regex that should roughly do what you want
([0-2]?\d{1,2}).*([0-2]?\d{1,2}).*([0-2]?\d{1,2}).*([0-2]?\d{1,2}).*(\d{1,5})?
Each of the ([0-2]?\d{1,2}) portions will match 1 or 2 digits preceded optionally with a 0,1, or 2. Each () will capture a group which you can then examine using a Regex engine. You will need to examine this group as the Regex for each of those portions will match numbers above 255 (specifically 256-299).
The last group (\d{1,5})? is to catch the port number, again you will have to examine this as it will catch any 1 to 5 digit number (hence the {1,5}). The ? makes the group optional, remove it if you want it to have to match against a port number.
As far as doing Regex in C, I haven't had much experience but there should be a way to get all the grouped matches and inspect them. Unfortunately they will be strings so you will have to convert them to integers to examine them.

Are you sure you need regex for this? In my opinion, you do not need regex for this.
Just split numbers into groups which are seperated by non-numeric characters. Then analyze.
What language?
As for actually looking for valid range, take a look at this;
http://www.regular-expressions.info/numericranges.html

I would do this simple regex
((\d|\D)+)*

Related

Numbers between 99 and 9999999 regular expression

I am trying to generate a regular expression that will match any numbers within the range of 99 and 9999999. I have trouble understanding how generating number ranges generally works. I managed to find a range generator online that does the job for me, but I want to understand how it actually works.
My attempt to do this range is as follows:
(99|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9])
This is supposed to match 99, any 3 digit number or any 4 digit number, but it does not work as expected. When tested it matches only numbers 99 and 3 digit numbers. Four digit numbers are not matched at all. If I only write the part for 4 digit numbers on its own as
[1-9][0-9][0-9][0-9]
It matches 4 digit numbers, but when I construct it as in the first example it does not work. Can someone give me some clarification how this actually works and how successfully to generate a regular expression for the range of 99 to 9999999.
Link to demo - Here
So you want to know how this works...
Regexs have no real understanding of the values of numbers in your string, it only cares how they are represented, which is why looking for numbers in a range seems more awkward than it should be. The only reason your regex engine can understand a range in a character class like [0-9] at all is because of the characters' positions in a list (a character range like [&-~] is just as valid, and equally understandable to it.)
So, to match a range like 99-9999999, ya gotta spell out what that looks like: literal "99", or three digits without a leading zero, or four digits without a leading zero, and so on.
But this is what your demo did, right? And it didn't work. Of your test string "9293" your regex only matched "929". What happened here is the regex engine is eager to return a complete match - as soon as it found one it returned it, even though a better/longer match might have occurred later.
Here's how that match happened. (I'll skip some details like grouping, as they're not super relevant here.)
Step 1.
The engine compares the first token in the regex with the first character in the string
(99|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9])
9293 ✅
Success, they match.
Step 2.
The engine then advances both to the next token in the regex and the next character in the string and compares them.
(99|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9])
9293 ❌
Failure, no match. The engine would stop and return the failure here, but you're using alternation via |, so it knows there's an alternate expression to try.
Step 3.
The engine advances to the first token of the next alternate expression in the regex, and rewinds the position in the string.
(99|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9])
9293 ✅
Success, they match.
Step 4.
Continuing on.
(99|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9])
9293 ✅
Match.
Step 5.
And again.
(99|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9])
9293 ✅
Success. The complete expression matches. There's no need to try the remaining alternate. The match here returned is:
929
As you've probably figured out, if your input string was instead "9923" then step 2 would've matched and the engine there would've stopped and returned "99".
As you've also probably figured out, if you rearrange your alternate expressions from longest to shortest
([1-9][0-9][0-9][0-9]|[1-9][0-9][0-9]|99)
the longest would be attempted first, which would match and return your expected "9293".
Simplifying
It's still pretty wordy though, especially as you crank up the number of digits in your range. There are a couple things you can do to simplify it.
The character class [0-9] can be represented by the shorthand character class \d.
([1-9]\d\d\d|[1-9]\d\d|99)
And instead of repeating them use a quantifier in curly brackets like so:
([1-9]\d{3}|[1-9]\d{2}|99)
As it happens, quantifiers can also take the form of {min, max}, so you can combine the two similar alternates:
([1-9]\d{2,3}|99)
You might expect this to land you back returning "929" again, the engine being eager and all, but quantifiers are by default greedy so they'll try to pick up as much as they can. This lends itself well to your larger desired range:
([1-9]\d{2,6}|99)
Finishing up
What you do with it from here depends on what you need the regex to do. As it stands the parentheses are superfluous, there's no point in creating a capturing group of the entire regex itself. However a decision comes when you've got an input string like:
You will likely be eaten by 1000 grue.
If you're trying to pluck out how many grue are about to eat you, you might use
[1-9]\d{2,6}|99
which will return 1000.
However that sorta runs back into the original problem with your demo. If it's "12345678 grue", which is out of range, this'll match "1234567" which might not be what you want. You can make sure the number you've matched isn't immediately followed by (or preceded by) another digit by using negative lookarounds.
(?<!\d)([1-9]\d{2,6}|99)(?!\d)
(?<!\d) means "from this position, the prior character is not a digit" while (?!\d) means "from this position, the next character is not a digit."
The parentheses around the alternates are back as they're necessary for grouping here, otherwise the lookbehind would only be part of and apply in the first alternate expression and the lookahead would only be part of and apply in the second alternate.
On the other hand if you're trying to make sure the entire string only consists of a number in your range you'll want to instead use the anchors ^ and $ (start of string and end of string, respectively):
^([1-9]\d{2,6}|99)$
And finally you can trade the capturing group out for a non-capturing group (?:...), so:
^(?:[1-9]\d{2,6}|99)$
or
(?<!\d)(?:[1-9]\d{2,6}|99)(?!\d)
You'll still grab the number as the match, it just won't be repeated in a group capture. (Lookarounds are already non-capturing, no need to worry about those.)
First of all you need some string boundaries for you regex (anything except digit, in my example I use ^ and $ -- begging and end of line or string)
Try this one:
^([1-9][0-9]{2,6}|99)$

Regular expression starting at least with 2

I am trying to get/make a regular expression but i can't figure it out. I am searching for an expression so that a user, who is filling a form, can't type 0 ore 1. So it has to start at least with 2. What is the expression for it?
Thanks a lot.
Thanks. But this is not 100% waterproof. As a user you can't fill 0 or 1 but you can't fill 10 or 11 or 101 either. So everything with a 0 or a 1 at the beginning. Is there a solution?
Thanks again.
here, this should accept any numbers starting with 2 or more:
[2-9][0-9]*
or
^[2-9][0-9]*$
if you are matching whole lines.
I understand you mean it begins with a digit from 2 to 9, but you should tell if it can contain else later.
for pure numbers:
[2-9][0-9]*
this forces the content be numeric ans start with a digit > 1.
Use:
[2-9][0-9]+
if more than one number is mandatory,
This works as exact match, if you are doing a non-exact match use anchoring:
^[2-9][0-9]*$
if after the initial digit different character can happen use an appropriate pattern e.g:
[2-9].*
matches anything after the first digit:
[2-9][0-9a-zA-Z]*
matches a alfanumeric pattern etc...
If you mean to accept any string that is an integer number bigger than 1:
([1][0-9]+|[2-9][0-9]*)
the first half ([1][0-9]+) will match a number starting by 1 followed by at least another digit, the second will match the numbers 2-9 or a number starting with a digit 2-9 and more figures ([2-9][0-9]*).
Note that this does not accept potentially good integers written with a leading 0, like 0123. If you want to include that as well use:
(0*[1][0-9]+|0*[2-9][0-9]*)
Also note that a pattern like:
(matcher1|matcher2)
is not supported by all RE engines.
I reckon something like this would be useful for you:
(2+)(.)*
It's mean that only words starting with "2" math the expression.
If you wanna try regular expressiona easely, i like the web http://rubular.com/
It has a good interface to test expressions directly onto the web.
Greetings

Add two decimal digits to a number range regex

I've created a Regexp to validate a direction in degrees, between -359 and +359 (with optional sign). This is my regex:
const QString xWindDirectionPattern("[+-]{0,1}([0-9]{1,2}|[12][0-9]{2}|3[0-5][0-9])");
Now, I want to add two decimal numbers, in order to write numbers from -359.99 to +359.99. I've tried something like appending \.[0-9]{1,2}|[0-9]{1,3} but It does not work.
I'd like to have optional decimal point so I can have
23.3 valid
23.33 valid
23 valid
23.333 not valid
I've read some other questions, like this one, but I'm not able to modify the example to match a number range, like in my case.
How can I achieve this result?
Thanks in advance for your replies.
How can achieve this?
I've created a Regexp to validate a direction in degrees, between -359 and +359
No, you can't. You shouldn't. You are using the wrong tool. Regex cannot do the kinds of validation, which require it to dig into the semantics of the characters.
Regex can only process and match text, but cannot identify what they actually mean. Basically Regex are good for parsing regular language, and bad for almost everything else.
For e.g.:
A Regex can match 3 digits, but it would be extremely impractical to use it to match 3 digits that fall in range - [259, 634]. For that you would need to know the meaning of each individual digits in that number.
A Regex can match a pattern for date like - \d\d/\d\d/\d\d, but it cannot identify which part is date, and which part is month.
Similarly, it can find you two numbers x and y, but it cannot identify, whether x < y or not.
The task as above require you to understand the meaning of the text. Regex can't do that.
Well, of course you have come up with a regex for sure, but as you can see it is highly un-flexible. A little change in your requirement, will screw both - the regex and you.
You should better use corresponding language features - constructs like if-else to make sure you are reading degrees in that range, and not regex.
You can do this:
[+-]{0,1}((?:[0-9]{1,2}|[12][0-9]{2}|3[0-5][0-9])(?:\.[0-9]{1,2})?)
This will allow an a decimal point followed by one or two digits. You'll probably also want to use start and end anchors (^ / $) to ensure that there are no characters other than this pattern in your string—without this, 23.333 would be allowed because 23.33 matches the above pattern:
^[+-]{0,1}((?:[0-9]{1,2}|[12][0-9]{2}|3[0-5][0-9])(?:\.[0-9]{1,2})?)$
You can test it out here.
Try [+-]?([1-9]\d?|[12]\d{2}|3[0-5]\d)(\.\d{1,2})?.
[+-]? Optional Sign
[1-9]\d? 1 or 2 digit number
[12]\d{2} 100 to 299
3[0-5]\d 300 to 359
(\.\d{1,2})? Optional decimal point followed by 1 or two digits

Combine Regexp?

After collecting user input for various conditions like
Starts with : /(^#)/
Ends with : /(#$)/
Contains : /#/
Doesn't contains
To make single regex if user enter multiple conditions,
I combine them with "|" so if 1 and 2 given it become /(^#)|(#$)/
This method works so far but,
I'm not able to determine correctly, What should be the regex for the 4th condition? And combining regex this way work?
Update: #(user input) won't be same
for two conditions and not all four
conditions always present but they can
be and in future I might need more
conditions like "is exactly" and "is
exactly not" etc. so, I'm more curious
to know this approach will scale ?
Also there may be issues of user input
cleanup so regex escaped properly, but
that is ignored right now.
Will the conditions be ORed or ANDed together?
Starts with: abc
Ends with: xyz
Contains: 123
Doesn't contain: 456
The OR version is fairly simple; as you said, it's mostly a matter of inserting pipes between individual conditions. The regex simply stops looking for a match as soon as one of the alternatives matches.
/^abc|xyz$|123|^(?:(?!456).)*$/
That fourth alternative may look bizarre, but that's how you express "doesn't contain" in a regex. By the way, the order of the alternatives doesn't matter; this is effectively the same regex:
/xyz$|^(?:(?!456).)*$|123|^abc/
The AND version is more complicated. After each individual regex matches, the match position has to be reset to zero so the next regex has access to the whole input. That means all of the conditions have to be expressed as lookaheads (technically, one of them doesn't have to be a lookahead, I think it expresses the intent more clearly this way). A final .*$ consummates the match.
/^(?=^abc)(?=.*xyz$)(?=.*123)(?=^(?:(?!456).)*$).*$/
And then there's the possibility of combined AND and OR conditions--that's where the real fun starts. :D
Doesn't contain #: /(^[^#]*$)/
Combining works if the intended result of combination is that any of them matching results in the whole regexp matching.
If a string must not contain #, every character must be another character than #:
/^[^#]*$/
This will match any string of any length that does not contain #.
Another possible solution would be to invert the boolean result of /#/.
In my experience with regex you really need to focus on what EXACTLY you are trying to match, rather than what NOT to match.
for example
\d{2}
[1-9][0-9]
The first expression will match any 2 digits....and the second will match 1 digit from 1 to 9 and 1 digit - any digit. So if you type 07 the first expression will validate it, but the second one will not.
See this for advanced reference:
http://www.regular-expressions.info/refadv.html
EDITED:
^((?!my string).)*$ Is the regular expression for does not contain "my string".
1 + 2 + 4 conditions: starts|ends, but not in the middle
/^#[^#]*#?$|^#?[^#]*#$/
is almost the same that:
/^#?[^#]*#?$/
but this one matches any string without #, sample 'my name is hal9000'
Combining the regex for the fourth option with any of the others doesn't work within one regex. 4 + 1 would mean either the string starts with # or doesn't contain # at all. You're going to need two separate comparisons to do that.

Validating an IP with regex

I need to validate an IP range that is in format 000000000 to 255255255 without any delimiters between the 3 groups of numbers.
Each of the three groups that the final IP consists of should be 000 (yes, 0 padded) to 255.
As this is my 1st stackoverflow entry, please be lenient if I did not follow etiquette correctly.
^([01]\d{2}|2[0-4]\d|25[0-5]){3}$
Which breaks down in the following parts:
000-199
200-249
250-255
If you decide you want 4 octets instead of 3, just change the last {3} to {4}. Also, you should be aware of IPv6 too.
I would personally not use regex for this. I think it's easier to ensure that the string consists of 9 digits, split up the string into 3 groups of 3-digit numbers, and then check that each number is between 0 and 255, inclusive.
If you really insist on regex, then you could use something like this:
"([0-1][0-9][0-9]|2[0-4][0-9]|25[0-5]){3}"
The expression comprises an alternation of three terms: the first matches 000-199, the second 200-249, the third 250-255. The {3} requires the match exactly three times.
This is a pretty common question. Here is a nice intro page on regexps, that has this case as an example. It includes the periods, but you can edit those out easily enough.
for match exclusively a valid IP adress use
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}
instead of
([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])(([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])){3}
because many regex engine match the first possibility in the OR sequence
you can try your regex engine with : 10.48.0.200
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
I use this RegEx for search all ip in code from my project