newbie with regex, parsing one and only one character - regex

i have to validate strings like:
10y9m12od or 9m12od or 12d or 10y9m or 9m
those are correct.
These are not correct:
10d2m5y, 2m5y10d...
As you can see, order of elements is important but elements are not mandatory...
I have this regex which I think it is fine but...:
([\d][yY]{1})?([\d][mM]{1})?([\d][o]{0,1}(d|D){1})$
Can anybody help me?

^(\d+[yY])?(\d{1,2}[mM])?(\d{1,2}o?[dD])?$
You need to allow more than one digit before each letter. Years can be any number of digits, months and days can be 1 or 2 digits.
There's no need to wrap \d and o in [].
You need a ^ anchor at the beginning.
There's no need for {1} to match a single repetition, that's the default for all patterns.

Related

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !
You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty
\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max
You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.
Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan
The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

Regular expression for 7 digit numbers separeted by commas

I need a regular expression to validate a concatenated string that consists of 7 digit numbers separated by commas.
Furthermore, I must ensure that:
The string is not empty.
The chain doesn't begins or finish with commas.
The numbers do not start with 0.
Example: 1234567,2345678,3456789
My solution so far: ^\d+(,\d+)*?$
The problems I still need to resolve:
Validate that the numbers are exactly 7 digits.
Validate that the numbers do not start with 0.
Thank you.
Something like ^[1-9]\d{6}(,[1-9]\d{6})+$ should work. The [1-9] ensures the number doesn't begin with 0, and \d{6} ensures that there are 6 digits to follow.
Based on Gavin answer, here is what worked for me : ^[1-9]\d{6}(,[1-9]\d{6})*$
The minor difference is the use of the * instead of + at the end of the regular expression. There are some cases where I must validate only one 7 digits number...
Thank you for the help everyone!

Regex to add leading zero in date record

Question - what is the shortest form of regex to add a leading zero into single digit in date record?
So I want to convert 8/8/2014 8:04:34 to 08/08/2014 8:04:34 - add leading zero when only one digit is presented.
The record can have two single digit entry, one single digit entry or no single digit entry. Some records can be in forms like 25/06/2014 19:50:18 or 9/06/2014 8:27:35 - in other words, some of them could be already normalized and regex needs to fix only single digit entry.
Not a regex user by any means. Your help is appreciated.
How about:
Ctrl+H
Find what: \b(\d)(?=/)
Replace with: 0$1
Replace all
This will change 8/8/2014 8:04:34 into 08/08/2014 8:04:34
Use the following regex to find:
(\d)(\d)?/(\d)(\d)?/(.*)
Then use the following to replace:
(?{2}\1\2:0\1)/(?{4}\3\4:0\3)/\5
What we are using is called conditionals in terms of regex. Refer this answer for explanation.
Make sure you have unselected the checkbox which says ". matches newline".
First of all, let's do some test-driven development and write the test cases. We can ignore the time and concentrate on the date alone. Also, the year is not important. We have to find all the possible cases for the day and the month. For each of them, we can have:
A single digit
Two digits, the first of which is already a 0
Two digits, the first of which is not a 0
Two digits, the second of which is a 0 (probably not needed, but just in case).
The case where we have to do something is only the first one, and the last 3 could be joined into a single one, but I prefer to keep them separated. We need to test 16 combinations:
8/8/2014
8/08/2014
8/12/2014
8/10/2014
08/8/2014
08/08/2014
08/12/2014
08/10/2014
12/8/2014
12/08/2014
12/12/2014
12/10/2014
10/8/2014
10/08/2014
10/12/2014
10/10/2014
Of all of these, only 1, 2, 3, 4, 5, 9, 13 must be changed. I don't know how to do it with a single regex, but with 2 regexes it's easy:
First regex, for the day:
(?<!\d)(\d/\d{1,2}/\d+)
replace with:
0\1
It matches a date where the day has only one digit, followed by a month with either 1 or 2 days, followed by a year with any number of digits, and it simply adds a 0 at the beginning.
Second regex, for the month:
(\d{2}/)(\d/\d+)
replace with:
\10\2
This one assumes that the first one has already been run, and thus the day has 2 digits. It finds dates where the month has a single digit, and adds a 0 before it. Please note that \10\2 means: the first group that matched, followed by a 0, followed by the second group. It doesn't mean: the tenth group, followed by the second. So the digits 1 and 0 are logically separated.
Run the first one, then the second one, and it gives the correct result:
08/08/2014
08/08/2014
08/12/2014
08/10/2014
08/08/2014
08/08/2014
08/12/2014
08/10/2014
12/08/2014
12/08/2014
12/12/2014
12/10/2014
10/08/2014
10/08/2014
10/12/2014
10/10/2014
Thanks to this recent answer I finally can give you an (hopefully) correct answer ;)
Replace
\b(?:(\d\d)|(\d))/(?:(\d\d)|(\d))/(\d\d)
with
(?{1}\1:0$2)/(?{3}\3:0\4)/\5
It uses Notepad++ conditionals (which I didn't know of until I stumbled over the mention question) to handle when only one or the other is single digit.
The regex matches a word boundary \b followed by two digits, captured in group 1, or one digit, captured in group 2, followed by a /. Then the same logic is repeated for day, which is captured in group 3 (2 digit) or 4 (1 digit). Then finally it checks that a year follows (at least two digits).
The conditional replace is explained in the linked answer. But simply put the (?{1} test if a match to group 1 was made it replaces with the expression before the :, otherwise the one after.
Hope this helps.
Regards
If you had a date like (ISO format)
2017-9-5
This
replace(/(\D)(\d)(?!\d)/g, '$10$2')
will turn it into
2017-09-05
and will preserve two digits in dates like
2017-11-11 or 2017-9-05
a general approach is to search for (in this case 5 digit numbers):
(\d)??(\d)??(\d)??(\d)??(\d)
Replace with
(?1\1:0)(?2\2:0)(?3\3:0)(?4\4:0)\5
You can use /^\d\/|(?<=\/)\d\/\d/g to select text, then add 0 before selected text, it should work for all your conditions.

*NIX REGEXP number series

Am playing around with regexp's but this is my headache. I have a dynamic number which needs a suffix. The suffix is always 0 to 9, 99 or 999.
Example:
I have the number 461200 and now I want to create an regexp that will match 461200 to 461209. What I've learned it should be ^46120[0-9]$? Is this correct or somewhere to the left of hell?
Ok, let us assume it is correct and I now want to match 461200 - 461299? This is where I get lost.
^4612[0-9]{2}?
It cannot be. I am yet to figure this out.
Any help appreciated.
For 1 digit at the end you need:
^4612[0-9]$
2 digits at the end:
^4612[0-9]{2}$
3 digits at the end:
^4612[0-9]{3}$
The number in braces {} means the number of time the preceding character or set has to be repeated.
Ok, let us assume it is correct and I now want to match 461200 -
461299?
You can either repeat the desired character class by saying [0-9][0-9] or use quantifiers [0-9]{2}.
It can be either:
^4612[0-9][0-9]$
or
^4612[0-9]{2}$
Both would work.
maybe try this regex:
^4612\d{2}$