Can this Regex be improved?

Can this Regex be improved? - regex

I have a regex to match a user entered id which has the basic format of [a-zA-z]{2}[\d]{8} but the kicker is a space can be placed between any of the letters or digits in the id so my regex looks like this
[A-Za-z]+[\s]*[A-Za-z]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*
Which is obviously an abomination and should be killed with fire, can this be improved upon?
All of the following are valid inputs
a b 1 2 2 3 4 5 5 6
ab12345678
ab 12345678

Your regex does not comply with your specification, can there be 2 or more letters before the digits? Extactly 8 digits or 8 digist or more?
Try
([a-zA-Z]\s*){2}(\d\s*){8}
If there can only be one space between each character:
([a-zA-Z]\s?){2}(\d\s?){8}

Don't ever use \d and \s unless you know EXACTLY where you are going...
\d will match 09E6 ০ BENGALI DIGIT ZERO (the ০ is your digit :-) ). For example read http://msdn.microsoft.com/en-us/library/w1c0s6bb.aspx
\s will match more types of strange spaces (and the tab character) than you can count, and I'm not kidding. http://msdn.microsoft.com/en-us/library/t809ektx.aspx
Paradoxically using [a-zA-Z] you are limiting quite much your users... No àèéìòù, nor the Turkish ı and İ (the first one is an i without the dot, lower case, the second one is the upper case version of i) http://en.wikipedia.org/wiki/Dotted_and_dotless_I .
Perhaps you could use (\p{L}\p{M}*) (with brackets) instead of [A-Za-z] (all the letters plus the combining marks). You have to add an * or a + AFTER the close bracket. The one expression is for a single letter PLUS its combining marks.
Oh... and you can use one of the other suggestions as a basis for the regex :-)

[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*
can be replaced with...
\s*(?:\d+\s*){8}
(Also, you can just write \s, rather than [\s], and \d rather than [\d] - the brackets are redundant if you're only specifying a single backslash character class.)
Edit Since there seems to be some confusion about what part of the original regex is being replaced, here's the entire expression after replacement:
[A-Za-z]+\s*[A-Za-z]+\s*(?:\d+\s*){8}

(?:[A-Za-z]+\s*){2}(?:\d+\s*){8}

Related

Regex fixed number of characters but any quantity of spaces

I'm validating some input fields. Here's the regex for a simple example:
^\[0-9]\{6,6}\$
In the example, it requires 6 numbers to be input. However, I want to relax the validation a little and allow spaces where necessary, and remove them later - an example might be a bank sort code.
In the UK, a sort code could be written as 123456, or perhaps 12 34 56.
I know I can amend the expression to include a space within the brackets and relax the numbers in the curly brackets, but what I'd like to do is continue to limit the digits so that 6 must always be input, and allow none or more spaces - of course the spaces could be anywhere.
I'm not sure how to approach this - any ideas, help appreciated.

Try this:
^(\d\s*){6}$
It allows 0 or more whitespace characters after every digit.
If you want to limit whitespace to be inside the digits (without leading or trailing spaces):
^(\d\s*){5}\d$

If you allow spaces at any position alongside 6 digits, then you need
^(\s*[0-9]){6}\s*$
See regex demo
The \s* matches any whitespace, 0 or more repetitions.
Note that a limiting quantifier {6,6} (minimum 6 and maximum 6 repetitions) is equal to {6}.
Also, note that you need to double escape the \s as \\s if you pass the regex pattern as a regular string literal.
And if you plan to only allow regular spaces, not all whitespace, just use
^([ ]*[0-9]){6}[ ]*$

I think you want to look at a lookahead expression
This site explains them in more detail
For your example, ^(?=(\s*[0-9]\s*){6})(\d*\s*)$
This looks for any amount of space, followed by a digit followed by any amount of space 6 times.
Other answers I've seen so far only allow a total of 6 characters, this expression will allow any number of spaces but only 6 digits, no more, no less.
Note: ^(\s*[0-9]\s*){6}$ this will also work, without the lookahead expression
JavaScript Example

RegEx - 1 to 10 Alphanumeric Spaces Okay

New to Regular Expressions. Thanks in advance!
Need to validate field is 1-10 mixed-case alphanumeric and spaces are allowed. First character must be alphanumeric, not space.
Good Examples:
"Larry King"
"L King1"
"1larryking"
"L"
Bad Example:
" LarryKing"
This is what I have and it does work as long as the data is exactly 10 characters. The problem is that it does not allow less than 10 characters.
[0-9a-zA-Z][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ]
I've read and tried many different things but am just not getting it.
Thank you,
Justin

I don't know what environment you are using and what engine. So I assume PCRE (typically for PHP)
this small regex does exact what you want: ^(?i)(?!\s)[a-z\d ]{1,10}$
What's going on?!
the ^ marks the start of the string (delete it, if the expression must not match the whole string)
the (?i) tells the engine to be case insensitive, so there's no need to write all letter lower and upper case in the expression later
the (?!\s) ensures the following char won't be a white space (\s) (it's a so called negative lookahead)
the [a-z\d ]{1,10} matches any letter (a-z), any digit (\d) and spaces () in a row with min 1 and max 10 occurances ({1,10})
the $ at the end marks the end of the string (delete it, if the expression must not match the whole string)
Here's also a small visualization for better understanding.
Debuggex Demo

Try this: [0-9a-zA-Z][0-9a-zA-Z ]{0,9}
The {x,y} syntax means between x and y times inclusive. {x,} means at least x times.

You want something like this.
[a-zA-Z0-9][a-zA-Z0-9 ]{0,9}
This first part ensures that it is alphanumeric. The second part gets your alphanumeric with a space. the {0,9} allows from anywhere from 0 to 9 occurrences of the second part. This will give your 1-10

Try this: ^[(^\s)a-zA-Z0-9][a-z0-9A-Z ]*
Not a space and alphanumeric for the first character, and then zero or more alphanumeric characters. It won't cap at 10 characters but it will work for any set of 1-10 characters.

The below is probably most semantically correct:
(?=^[0-9a-zA-Z])(?=.*[0-9a-zA-Z]$)^[0-9a-zA-Z ]{1,10}$
It asserts that the first and last characters are alphanumeric and that the entire string is 1 to 10 characters in length (including spaces).

I assume that the space is not allowed at the end too.
^[a-zA-Z0-9](?:[a-zA-Z0-9 ]{0,8}[a-zA-Z0-9])?$
or with posix character classes:
^[[:alnum:]](?:[[:alnum:] ]{0,8}[[:alnum:]])?$

i think the simplest way is to go with \w[\s\w]{0,9}
Note that \w is for [A-Za-z0-9_] so replace it by [A-Za-z0-9] if you don't want _
Note that \s is for any white char so replace it by if you don't want the others

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

I have a regular expression that I'm going to be using to verify that an inputted number is in standard U.S. telephone format (i.e (###) ###-####). I am new to regex and still having some trouble figuring out the exact function of each character. If someone would go through this piece by piece/verify that I am understanding I would really appreciate it. Also if the regex is wrong I would obviously like to know that.
\D*?(\d\D*?){10}
What I think is happening:
\D*?( indicates an escape sequence for the parenthesis metacharacter... not sure why the \D*? is necessary
\d indicating digits
\D*? indicating there is a non-digit character (-) followed by the closing parenthesis.
{10} for the 10 digits
I feel very unsure explaining this, like my understanding is very vague in terms of why the regex is in the order that it is etc. Thanks in advance for help/explanations.
EDIT
It seems like this is not the best regex for what I want. Another possibility was [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}, but I was told this would fail. I suppose I'll have to do a little more work with regular expressions to figure this out.

\D matches any non-digit character.
* means that the previous character is repeated 0 or more times.
*? means that the previous character is repeated 0 or more times, but until the match of the following character in the regex. It is a bit difficult perhaps at the start, but in your regex, the next character is \d, meaning \D*? will match the least amount of characters until the next \d character.
( ... ) is a capture group, and is also used to group things. For instance {10} means that the previous character or group is repeated 10 times exactly.
Now, \D*?(\d\D*?){10} will match exactly 10 numbers, starting with non-digit characters or not, with non-digit characters in between the digits if they are present.
[(][0-9]{3}[)] [0-9]{3}-[0-9]{4}
This regex is a bit better since it doesn't just accept anything (like the first regex does) and will match the format (###) ###-#### (notice the space is a character in regex!).
The new things introduced here are the square brackets. These represent character classes. [0-9] means any character between 0 to 9 inclusive, which means it will match 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. Adding {3} after it makes it match 3 similar character class, and since this character class contains only digits, it will match exactly 3 digits.
A character class can be used to escape certain characters, such as ( or ) (note I mentioned earlier they are for capturing groups, or grouping) and thus, [(] and [)] are literal ( and ) instead of being used for capturing/grouping.
You can also use backslashes (\) to escape characters. Thus:
\([0-9]{3}\) [0-9]{3}-[0-9]{4}
Will also work. I would also recommend the use of line anchors ^ and $ if you're only trying to see if a phone number matches the above format. This ensures that the string has only the phone number, and nothing else. ^ matches the beginning of a line and $ matches the end of a line. Thus, the regex will become:
^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$
However, I don't know all the combinations of the different formats of phone numbers in the US, so this regex might need some tweaking if you have different phone number formats.

\D is "not a digit"; \d is "digit". With that in mind:
This matches zero or more non-digits, then it matches a digit and any number of non-digit characters 10 times. This won't actually verify that the number if formatted properly, just that it contains 10 digits. I suspect that the regex isn't what you want in the first place.
For example, the following will match your regex:
this is some bad text 1 and some more 2 and more 34567890

\D matches a character that is not a digit
* repeats the previous item 0 or more times
? find the first occurrence
\d matches a digit
so your group is matches 10 digits or non digits

How can I recognize a valid barcode using regex?

I have a barcode of the format 123456########. That is, the first 6 digits are always the same followed by 8 digits.
How would I check that a variable matches that format?

You haven't specified a language, but regexp. syntax is relatively uniform across implementations, so something like the following should work: 123456\d{8}
\d Indicates numeric characters and is typically equivalent to the set [0-9].
{8} indicates repetition of the preceding character set precisely eight times.
Depending on how the input is coming in, you may want to anchor the regexp. thusly:
^123456\d{8}$
Where ^ matches the beginning of the line or string and $ matches the end. Alternatively, you may wish to use word boundaries, to ensure that your bar-code strings are properly separated:
\b123456\d{8}\b
Where \b matches the empty string but only at the edges of a word (normally defined as a sequence consisting exclusively of alphanumeric characters plus the underscore, but this can be locale-dependent).

123456\d{8}
123456 # Literals
\d # Match a digit
{8} # 8 times
You can change the {8} to any number of digits depending on how many are after your static ones.
Regexr will let you try out the regex.

123456\d{8}
should do it. This breaks down to:
123456 - the fixed bit, obviously substitute this for what you're fixed bit is, remember to escape and regex special characters in here, although with just numbers you should be fine
\d - a digit
{8} - the number of times the previous element must be repeated, 8 in this case.
the {8} can take 2 digits if you have a minimum or maximum number in the range so you could do {6,8} if the previous element had to be repeated between 6 and 8 times.

The way you describe it, it's just
^123456[0-9]{8}$
...where you'd replace 123456 with your 6 known digits. I'm using [0-9] instead of \d because I don't know what flavor of regex you're using, and \d allows non-Arabic numerals in some flavors (if that concerns you).

Regex - simple phone number

I know there are a ton of regex examples on how to match certain phone number types. For my example I just want to allow numbers and a few special characters. I am again having trouble achieving this.
Phone numbers that should be allowed could take these forms
5555555555
555-555-5555
(555)5555555
(555)-555-5555
(555)555-5555 and so on
I just want something that will allow [0-9] and also special characters '(' , ')', and '-'
so far my expression looks like this
/^[0-9]*^[()-]*$/
I know this is wrong but logically I believe this means allow numbers 0-9 or and allow characters (, ), and -.

^(\(\d{3}\)|\d{3})-?\d{3}-?\d{4}$
\(\d{3}\)|\d{3} three digits with or without () - The simpler regex would be \(?\d{3}\)? but that would allow (555-5555555 and 555)5555555 etc.
An optional - followed by three digits
An optional - followed by four digits
Note that this would still allow 555555-5555 and 555-5555555 - I don't know if these are covered in your and so on part

This match what you want numbers,(, ) and -
/^[0-9()-]+$/

^[0-9-+\s]+$
06754654
+54654654
+546 546 5654 43534 +
+09945 345 3453 45

Why do you have a stray ^ in there? I think you meant [()-] This is actually making you have to have two beginning-of-strings in the regex, which will never match.
Also, \d is a nice shortcut for [0-9]. They are exactly the same.
Also, this will only match a bunch of numbers, then a bunch of ( or ) or -. Something like: 1294819024()()()()()-----()- would match. I think you want the whole thing to be able to repeat, something like: ^(\d*[()-]*)*$. Now, you can match repeating sequences of this.
Now, it is important to notice that nested * are typically inefficient, we can realize that we are just wanting to match any digit and the punctuation you want: [\d()-]*

For digits you can use \d. For more than one digit, you can use \d{n}, where n is the number of digits you want to match. Some special characters must be escaped, for example \( matches (. For example: \(\d{3}\)\-\d{3}\-\d{4} matches (555)-555-5555.

The second carat (afaik) is going to break anything you do since it means "start of string".
What you appear to be asking for therefore is:
start of string, followed by...
any number of numeric characters, followed by...
start of string, followed by...
any number of '(',')', or '-' characters, followed by...
end of string
Which won't work even if that second carat does nothing, because you're not accounting for anything after the first '(',')', or '-', and in fact will probably only validate an empty string if that.
You want /^[0-9()-]+$/ for a very crude pattern which will "work".

If you are doing US only number the best solution is to strip out all the non-digit characters and then just test to see if the length == 10.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Can this Regex be improved? - regex

Your regex does not comply with your specification, can there be 2 or more letters before the digits? Extactly 8 digits or 8 digist or more? Try ([a-zA-Z]\s){2}(\d\s){8} If there can only be one space between each character: ([a-zA-Z]\s?){2}(\d\s?){8}

(?:[A-Za-z]+\s){2}(?:\d+\s){8}

Related

Regex fixed number of characters but any quantity of spaces

RegEx - 1 to 10 Alphanumeric Spaces Okay

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

How can I recognize a valid barcode using regex?

Regex - simple phone number

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Can this Regex be improved? - regex

Your regex does not comply with your specification, can there be 2 or more letters before the digits? Extactly 8 digits or 8 digist or more? Try ([a-zA-Z]\s*){2}(\d\s*){8} If there can only be one space between each character: ([a-zA-Z]\s?){2}(\d\s?){8}

(?:[A-Za-z]+\s*){2}(?:\d+\s*){8}

Related

Regex fixed number of characters but any quantity of spaces

RegEx - 1 to 10 Alphanumeric Spaces Okay

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

How can I recognize a valid barcode using regex?

Regex - simple phone number

Categories

Resources

Your regex does not comply with your specification, can there be 2 or more letters before the digits? Extactly 8 digits or 8 digist or more? Try ([a-zA-Z]\s){2}(\d\s){8} If there can only be one space between each character: ([a-zA-Z]\s?){2}(\d\s?){8}

(?:[A-Za-z]+\s){2}(?:\d+\s){8}