RegEx for BMHT in a sequence - regex

I am trying to build a regular expression.
Abbreviations are as follows:
B - Billion
M - Million
T - Thousand
H - Hundred
Now, If I say 3M2T it means 3 million 2 thousand
But I cannot say 3T2M or I cannot say 3M2222T
BMTH should be in a sequence and should follow standard rule to create a number.
I went till this
([0-9]+[B]){1}+([0-9]+[M])?+([0-9]+[T])?+([0-9]+[H])? But here B is compulsary.
Please help.

Try this:
^(?:\d+B)?(?:\d{1,3}M)?(?:\d{1,3}T)?(?:\dH)?$
You can test it here regexr.com?2thld
(?:) is non capturing group, otherwise the captured part is stored in to a variable
\d is equal to [0-9]
? after a group or a character makes it optional
+ means one or more
{1,3} says at least once at most three occurrences
[M] is not needed when there is only 1 character then only M is enough

([0-9]{1,3}B)?([0-9]{1,3}M)?([0-9]{1,3}T)?([0-9]H)? Takes up to 3 digits for each B/M/T or 1 for H (in that order), each of the groups being optional. Add constraints suiting your needs…
Take note that [0-9] is not necessarily equal to \d, it depends on regional settings and stuff.

Related

How to create a matching regex pattern for "greater than 10-000-000 and lower than 150-000-000"?

I'm trying to make
09-546-943
fail in the below regex pattern.
​^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$
Passing criteria is
greater than 10-000-000 or 010-000-000 and
less than 150-000-000
The tried example "09-546-943" passes. This should be a fail.
Any idea how to create a regex that makes this example a fail instead of a pass?
You may use
^(?:(?:0?[1-9][0-9]|1[0-4][0-9])-[0-9]{3}-[0-9]{3}|150-000-000)$
See the regex demo.
The pattern is partially generated with this online number range regex generator, I set the min number to 10 and max to 150, then merged the branches that match 1-8 and 9 (the tool does a bad job here), added 0? to the two digit numbers to match an optional leading 0 and -[0-9]{3}-[0-9]{3} for 10-149 part and -000-000 for 150.
See the regex graph:
Details
^ - start of string
(?: - start of a container non-capturing group making the anchors apply to both alternatives:
(?:0?[1-9][0-9]|1[0-4][0-9]) - an optional 0 and then a number from 10 to 99 or 1 followed with a digit from 0 to 4 and then any digit (100 to 149)
-[0-9]{3}-[0-9]{3} - a hyphen and three digits repeated twice (=(?:-[0-9]{3}){2})
| - or
150-000-000 - a 150-000-000 value
) - end of the non-capturing group
$ - end of string.
This expression or maybe a slightly modified version of which might work:
^[1][0-4][0-9]-[0-9]{3}-[0-9]{3}$|^[1][0]-[0-9]{3}-[0-9]{2}[1-9]$
It would also fail 10-000-000 and 150-000-000.
In this demo, the expression is explained, if you might be interested.
This pattern:
((0?[1-9])|(1[0-4]))[0-9]-[0-9]{3}-[0-9]{3}
matches the range from (0)10-000-000 to 149-999-999 inclusive. To keep the regex simple, you may need to handle the extremes ((0)10-000-000 and 150-000-000) separately - depending on your need of them to be included or excluded.
Test here.
This regex:
((0?[1-9])|(1[0-4]))[0-9][- ]?[0-9]{3}[- ]?[0-9]{3}
accepts (space) or nothing instead of -.
Test here.

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !
You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty
\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max
You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.
Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan
The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

Need to capture single character, but ignore digit

I'm parsing out flight info.
Here's the sample data:
E0.777 7 3:09
E0.319 N 1:43
E0.735 8 1:45
E0.735 N 1:48
E0.M80 9 3:21
E0.733 1:48
I need to populate fields like this:
Equipment: 735
On Time: N
Duration: 1:48
Problem I'm having is capturing the Y or N character but ignoring the single digit, then capturing the duration.
This is the expression I have tried:
#"^.{3}(.{3})\s?([N|Y]?)?(?:[0-9]\s+)?(\w{4})"
Edit: I updated the sample data to clarify my question. Equipment is not always three digits, it could be a character and two digits. The data between the equipment and the duration could be a boolean N or Y, a single digit, or white space. Only the boolean should be captured.
Firstly, you mix up the concepts of alternation and character classes [Y|N] would match 3 different characters: Y or | or N. Either use (...) or leave out the pipe.
Secondly your double ? after the character class does not really do anything. Thirdly, at the end you only match consecutive spaces if a digit was found. But if there is no digit, the last ? will ignore the subpattern, thus not allowing spaces either.
Lastly, \w does not match :.
Try this:
#"^.{3}(\d{3})\s?(?:([NY])|\d)\s+(\d:\d\d)"
You should also think about restricting the repeated . at the beginning to a more precise character class (i.e \w{2}\., but I don't know the possibilities there).
#"^..\.(\d{3})\s(?:([YN])|\d)\s*(\S{4})"
Changed .{3} to ..\. which is a bit more specific about there being a literal . for character 3.
(?:([YN])|\d) matches either Y/N or a digit, but only captures a Y or N. Notice that it's [YN] not [Y|N].
Changed \w{4} to \S{4} since \w doesn't match colons :.
This will do it...
^\w\d\.(\d{3})\s(?:([YN])|\d)\s*(\d:\d{2})$
I made some other changes to your regex because it was easier for me to just rewrite it based off your data then to try to modify what you had.
This will capture the Y or N or it won't capture anything in that group. I also tried to be more specific with your duration regex.
Update: This works with your new requirements...
^\w\d\.(\w{3})\s(?:([YN])|\d|\s)\s*(\d:\d{2})$
You can see it working on your data here... http://regexr.com?32j1b
(hover over each line to see the matched groups)
This captures all lines with Y or N and ignores everything else:
^...(\d{3})\s*([YN])\s*(\d+:\d+)

Regular Expression to match pattern once or more with no partial matches

Better explained with examples:
HHH
HHHH
HHHBBHHH
HHHBH
BB
HHBH
I need to come up with a regexp that matches only 3 H's or a multiple of 3 H's (so 6, 9, 12, ... H's are ok as well) and 5 H's are not ok. And if possible I don't want to use Perl regexps.
So for the input above the regexp would match (1), (3) and (6) only.
I'm just starting with regular expressions here so I don't exactly know how I'm supposed to approach this.
edit
Just to clear something up:, an H can only be in one group of 3 H's. The group of 3 H's might be HHH or HHBH.
That's why in example 2 above it is not a match because the last H is not in a group of 3 H's. And you can't take the last 3 H's in a group because the middle 2 H's have already been inside a group before.
You can use the following regular expression:
^([^H]*H[^H]*H[^H]*H[^H]*)+$
It matches any string which contains in total 3 H or any multiple of 3. In between there might be any other character.
Explanation:
^ begin of string
( start of group
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]* any string of characters (or none) which is not 'H'
)+ containing the group once or twice or ...
$ end of string
By repeating the subpattern [^H]*H three times we make sure that there are indeed 3 H included, [^H]* allows any separating characters.
Note: use either egrep or run grep with additional argument -E.
Use this to match a multiple of 3 H's:
(H{3})+
Here is a complete regex for your examples:
^(H{3})+B*(H{3})*$
Edit: It looks like you need to count non-consecutive H's. In that case:
^(([^H]*H){3})+[^H]*$
That should match any string with a multiple of 3 H's.
Given the requirement that H's can be arbitrarily interleaved with non-H's, but that the total number of H's must be a non-zero multiple of 3 (so XXX, containing no H's, is not a match), then the total regular expression is anything but trivial. This is not a beginner's regular expression.
I'm going to assume that the dialect of regular expression treats {} and () as metacharacters for counting and grouping, and includes + for one-or-more. If you're using a regular expression system that has a different requirement (\{\}, for example) then adjust accordingly.
You need the regex to match the whole string, so there are no stray H's allowed. So, it must start with ^ and end with $. You need to allow an arbitrary number of non-H's at front and back. The H's may be separated by an arbitrary number of non-H's. That leads to:
^([^H]*H[^H]*H[^H]*H)+[^H]*$
Ouch; that is hard to read! It says the line must consist of 1 or more (+) groups of an arbitrary number of non-H's followed by an H, an arbitrary number of non-H's, another H, an arbitrary number of non-H's and a third H; all of which can be followed by an arbitrary number of non-H's.
Using the {} for counting:
^(([^H]*H){3})+[^H]*$
That's still hard to read. Note that my description said "arbitrary number of non-H's at front and back", but I only use the [^H]* at the back; that's because the repeating pattern allows an arbitrary number of non-H's at the front anyway so there's no need to repeat that fragment.