I'm trying to create a regex filter to satisfy:
1) The 1st character should be a lower-case letter or a number
2) The rest of the characters should be a single character between index 32 and 126
3) However, none of the characters should be upper case letters or _
My current regex is:
^[a-z0-9][ -~]*$
This solves 1) and 2) above - but I struggle to include 3) above in the right way. Any help is appreciated.
A simple way is to add a negative lookahead for what you don't want.
^[a-z0-9](?!.*[A-Z_])[ -~]*$
But it's also possible to just split up the ranges, based on the ascii-table
^[a-z0-9][ -#\[-^`-~]*$
It's just a bit less easy to understand at a first glance.
Related
Im working a lot with trying to isolate sizes from strings, however i have come into some issues.
Current:
https://regex101.com/r/zbEtOU/1
Current regex
^([a-z]+\d*(?:\s*-\s*[a-z\d]+[/-][a-z\d]+)?|\d+)
Examples:
30/32
Fixed 8 (32-36)
XS/S
m/l
1-2Y
s/m
0-3M
32
Desired result:
I want to isolate the first value from, but when i encounter parentheses i want to match on those values.
So actual desired outcome from the examples:
30/32 = 30
Fixed 8 (32-36) = 32
XS/S = XS
m/l = m
1-2Y = 1-2Y (im guessing there is no way to output "1Y" in this case? Else it would overlap with 1-2M causing confusion as 1 != 1 in this case. When this happens I would prefer to get the original string) ideal case = 1Y
s/m = s
1-3M = 1-3M (im guessing there is no way to output "1M" in this case? Else it would overlap with 1-2Y causing confusion as 1 != 1 in this case. When this happens I would prefer to get the original string)
ideal case = 1M
32 = 32
I'm really out of my bounds on solving this as there is a lot of different conditions here!
All regex is run insensitive, so no need to worry about capital letters.
Anyone got a nice and easy way to solve my issue??
Everything needs to be captured in Group 1 - else my system cant isolate it
Run in Python 3.7
You can use
(?:^|.*\()(\d+(?:-\d+[A-Za-z]{1,3})?|[A-Za-z]{1,3})\b
See the regex demo.
Details:
(?:^|.*\() - start of string or any zero or more chars other than line break chars as many as possible, and then a ( char
(\d+(?:-\d+[A-Za-z]{1,3})?|[A-Za-z]{1,3}) - Group 1:
\d+(?:-\d+[A-Za-z]{1,3})? - one or more digits, followed with an optional occurrence of a -, one or more digits, and then one to three ASCII letters
| - or
[A-Za-z]{1,3} - one, two or three ASCII letters
\b - a word boundary.
I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !
You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty
\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max
You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.
Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan
The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.
I have a string of 8 separated hexadecimal numbers, such as:
3E%12%3%1F%3E%6%1%19
And I need to check if the number 12 is located within the first 4 set of numbers.
I'm guessing this shouldn't be all that complex, but my searches turned up empty. Regular expressions are always a trouble for me, but I don't have access to anything else in this scenario. Any help would be appreciated.
^([^%]+%){0,3}12%
See it in action
The idea is:
^ - from the start
[^%]+% - match multiple non % characters, followed by a % character
{0,3} - between 0 and 3 of those
12% - 12% after that
Here you go
^([^%]*%){4}(?<=.*12.*)
This will match both the following if that is what is intended
1%312%..
1%123%..
Check the solution if %123% is matched or not
If the number 12 should stand on its own then use
^([^%]*%){4}(?<=.*\b12\b.*)
I need a RegEx pattern that will return the first N words using a custom word boundary that is the normal RegEx white space (\s) plus punctuation like .,;:!?-*_
EDIT #1: Thanks for all your comments.
To be clear:
I'd like to set the characters that would be the word delimiters
Lets call this the "Delimiter Set", or strDelimiters
strDelimiters = ".,;:!?-*_"
nNumWordsToFind = 5
A word is defined as any contiguous text that does NOT contain any character in strDelimiters
The RegEx word boundary is any contiguous text that contains one or more of the characters in strDelimiters
I'd like to build the RegEx pattern to get/return the first nNumWordsToFind using the strDelimiters.
EDIT #2: Sat, Aug 8, 2015 at 12:49 AM US CT
#maraca definitely answered my question as originally stated.
But what I actually need is to return the number of words ≤ nNumWordsToFind.
So if the source text has only 3 words, but my RegEx asks for 4 words, I need it to return the 3 words. The answer provided by maraca fails if nNumWordsToFind > number of actual words in the source text.
For example:
one,two;three-four_five.six:seven eight nine! ten
It would see this as 10 words.
If I want the first 5 words, it would return:
one,two;three-four_five.
I have this pattern using the normal \s whitespace, which works, but NOT exactly what I need:
([\w]+\s+){<NumWordsOut>}
where <NumWordsOut> is the number of words to return.
I have also found this word boundary pattern, but I don't know how to use it:
a "real word boundary" that detects the edge between an ASCII letter
and a non-letter.
(?i)(?<=^|[^a-z])(?=[a-z])|(?<=[a-z])(?=$|[^a-z])
However, I would want my words to allow numbers as well.
IAC, I have not been able how to use the above custom word boundary pattern to return the first N words of my text.
BTW, I will be using this in a Keyboard Maestro macro.
Can anyone help?
TIA.
All you have to do is to adapt your pattern ([\w]+\s+){<NumWordsOut>} to, including some special cases:
^[\s.,;:!?*_-]*([^\s.,;:!?*_-]+([\s.,;:!?*_-]+|$)){<NumWordsOut>}
1. 2. 3. 4. 5.
Match any amount of delimiters before the first word
Match a word (= at least one non-delimiter)
The word has to be followed by at least one delimiter
Or it can be at the end of the string (in case no delimiter follows at the end)
Repeat 2. to 4. <NumWordsOut> times
Note how I changed the order of the -, it has to be at the start or end, otherwise it needs to be escaped: \-.
Thanks to #maraca for providing the complete answer to my question.
I just wanted to post the Keyboard Maestro macro that I have built using #maraca's RegEx pattern for anyone interested in the complete solution.
See KM Forum Macro: Get a Max of N Words in String Using RegEx
I need to validate string with 2 groups which are separated with one space with next rules:
Each group needs to be at least 2 character long but less or equal to 15
Both groups together can't be more than 20 chars long (not counting space)
Groups can only contain letters (that's simple, it's [a-zA-Z])
Following these rules, here are some examples
Firstname Lastname (Valid)
Somename T (Invalid, 2nd one is <2)
Somethingsomettt Here (Invalid, first one is > 15)
Somethingsome Somethingsome (Invalid, total > 20)
It'd be simple [a-zA-Z]{2,15} [a-zA-Z]{2,15} if it wasn't for that 2+2<=total<=20 condition.
Is it even possible to limit it this way? If it is - how?
UPDATE
Just for the sake of it, resulting regex was supposed to be ^(?=[a-zA-Z ]{5,21}$)[a-zA-z]{2,15} [a-zA-Z]{2,15}$, #vks was closest one to it. Nevertheless, thanks #popovitsj and #Avinash Raj too.
^(?=.{5,21}$)[a-zA-Z]{2,15} [a-zA-Z]{2,15}$
Try this.See demo.
http://regex101.com/r/nA6hN9/30
This can be done with lookahead. Something like this:
^(?=.{1,20}$)[a-zA-z]{2,14} [a-zA-Z]{2,14}$
You could try the below regex which uses negative lookahead,
(?!^.{22,})^[a-zA-Z]{2,15} [a-zA-Z]{2,15}$
DEMO