Regex for minimum number of characters - regex

I created this regular expression to validate names:
^[a-zA-Z0-9\s\-\,]+.\*?$
Is there a way add the minimum number of characters?
I know we can use {x,}, but I cannot make it work.

{x,} should be used instead of + here...
^[a-zA-Z0-9\s,-]{5,}
But this would mean, "at least 5 characters in the beginning match those from the character class, and then anything...
If you write it like this (almost your original - just with {5,} instead of +):
^[a-zA-Z0-9\s\-\,]{5,}.\*?$
This means "at least 5 characters in the beginning match those from the character class, and any one character, and then optionally an asterisk, and that should be the end of it".

Use a lookahead at the beginning of the regex to make sure the total number of characters is at least your minimum. For example, if your minimum is 8 characters:
^(?=.{8,})[a-zA-Z0-9\s\-,]+.\*?$
Also, you don't need to escape the comma.

Related

RegEx: Non-repeating patterns?

I'm wrestling with how to write a specific regex, and thought I'd come here for a little guidance.
What I'm looking for is an expression that does the following:
Character length of 7 or more
Any single character is one of four patterns (uppercase letters, lowercase letters, numbers and a specific set of special characters. Let's say #$%#).
(Now, here's where I'm having problems):
Another single character would also match with one of the patterns described above EXCEPT for the pattern that was already matched. So, if the first pattern matched is an uppercase letter, the second character match should be a lowercase letter, number or special character from the pattern.
To give you an example, the string AAAAAA# would match, as would the string AAAAAAa. However, the string AAAAAAA, nor would the string AAAAAA& (as the ampersand was not part of the special character pattern).
Any ideas? Thanks!
If you only need two different kinds of characters, you can use the possessive quantifier feature (available in Objective C):
^(?:[a-z]++|[A-Z]++|[0-9]++|[#$%#]++)[a-zA-Z0-9#$%#]+$
or more concise with an atomic group:
^(?>[a-z]+|[A-Z]+|[0-9]+|[#$%#]+)[a-zA-Z0-9#$%#]+$
Since each branch of the alternation is a character class with a possessive quantifier, you can be sure that the first character matched by [a-zA-Z0-9#$%#]+ is from a different class.
About the string size, check it first separately with the appropriate function, if the size is too small, you will avoid the cost of a regex check.
First you need to do a negative lookahead to make sure the entire string doesn't consist of characters from a single group:
(?!(?:[a-z]*|[A-Z]*|[0-9]*|[#$%#]*)$)
Then check that it does contain at least 7 characters from the list of legal characters (and nothing else):
^[a-zA-Z0-9#$%#]{7,}$
Combining them (thanks to Shlomo for pointing that out):
^(?!(?:[a-z]*|[A-Z]*|[0-9]*|[#$%#]*)$)[a-zA-Z0-9#$%#]{7,}$

I need regex to only take numbers in string

I thought I had it with [0-9] but when I ran it that only took one number.
The string goes for example:
1 note
1,234 notes
68,000 notes
I want it so it takes the whole number and leaves out the notes part and the spaces and also the comma so just the full number.
The [0-9] would only take the first number of the string even when there wasnt a comma.
So how to only take the number please?
[0-9] means any one character between 0 and 9. What you are looking for is these characters repeated any number of times, but no other character should be there. The correct way to write this is [0-9]+.
M+, where M is some regex rule is equivalent to M M*, where * means 0 or more occurrences. So M+ can be inferred as at least one occurrence of portions specified by M.
EDIT: The question now also states that the entire number should be read, but the comma should be excluded from the output. AFAIK, this is impossible to be done using only regex, as the matched text can't be different from the stored text. A possible solution is to add , to the list of allowed characters and parse the result to remove them later on.

How to include special chars in this regex

First of all I am a total noob to regular expressions, so this may be optimized further, and if so, please tell me what to do. Anyway, after reading several articles about regex, I wrote a little regex for my password matching needs:
(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(^[A-Z]+[a-z0-9]).{8,20}
What I am trying to do is: it must start with an uppercase letter, must contain a lowercase letter, must contain at least one number must contain at least on special character and must be between 8-20 characters in length.
The above somehow works but it doesn't force special chars(. seems to match any character but I don't know how to use it with the positive lookahead) and the min length seems to be 10 instead of 8. what am I doing wrong?
PS: I am using http://gskinner.com/RegExr/ to test this.
Let's strip away the assertions and just look at your base pattern alone:
(^[A-Z]+[a-z0-9]).{8,20}
This will match one or more uppercase Latin letters, followed by by a single lowercase Latin letter or decimal digit, followed by 8 to 20 of any character. So yes, at minimum this will require 10 characters, but there's no maximum number of characters it will match (e.g. it will allow 100 uppercase letters at the start of the string). Furthermore, since there's no end anchor ($), this pattern would allow any trailing characters after the matched substring.
I'd recommend a pattern like this:
^(?=.*[a-z])(?=.*[0-9])(?=.*[!##$])[A-Z]+[A-Za-z0-9!##$]{7,19}$
Where !##$ is a placeholder for whatever special characters you want to allow. Don't forget to escape special characters if necessary (\, ], ^ at the beginning of the character class, and- in the middle).
Using POSIX character classes, it might look like this:
^(?=.*[:lower:])(?=.*[:digit:])(?=.*[:punct:])[:upper:]+[[:alnum:][:punct:]]{7,19}$
Or using Unicode character classes, it might look like this:
^(?=.*[\p{Ll}])(?=.*\d)(?=.*[\p{P}\p{S}])[\p{Lu}]+[\p{L}\d\p{P}\p{S}]{7,19}$
Note: each of these considers a different set of 'special characters', so they aren't identical to the first pattern.
The following should work:
^(?=.*[a-z])(?=.*[0-9])(?=.*[^a-zA-Z0-9])[A-Z].{7,19}$
I removed the (?=.*[A-Z]) because the requirement that you must start with an uppercase character already covers that. I added (?=.*[^a-zA-Z0-9]) for the special characters, this will only match if there is at least one character that is not a letter or a digit. I also tweaked the length checking a little bit, the first step here was to remove the + after the [A-Z] so that we know exactly one character has been matched so far, and then changing the .{8,20} to .{7,19} (we can only match between 7 and 19 more characters if we already matched 1).
Well, here is how I would write it, if I had such requirements - excepting situations where it's absolutely not possible or practical, I prefer to break up complex regular expressions. Note that this is English-specific, so a Unicode or POSIX character class (where supported) may make more sense:
/^[A-Z]/ && /[a-z]/ && /[1-9]/ && /[whatever special]/ && ofCorrectLength(x)
That is, I would avoid trying to incorporate all the rules at once.

Literal Characters in Regex Character Classes

While looking through some regex stuff, I found that you could put Literal Characters inside of a character class. I know when using character classes you can use ranges to shortcut instead of specifying every letter/number in a range, IE: [1-47-9] matches every number except 0,5,6.
If you have a regex including literal characters in a character class, does it treat this the same way and match the range of those characters? For example, would [\000-\005] positively match \000, \001, \002, \003, \004, \005?
Yes, it does work this way. You can specify a range between any arbitrary characters and as long as the code point of the left side is less than the code point of the right side the range will match any character between them (inclusive).

Regex for a string up to 20 chars long with a comma

I need to define a regex for a string with the following requirements:
Maximum 20 characters
Must be in the form Name,Surname
No numbers and special characters allowed (again, it's a name&surname)
I already tried something like ^[^1-9\?\*\.\?\$\^\_]{1,20}[,][^1-9\?\*\.\?\$\^\_\-]{1,20}$ but as you can find, it also matches a 40 chars long string.
How can I check for the whole string's maximum length and at the same time impose 1 comma inside of it and obviously not at the borders?
Thank you
Try the regex:
^(?=[^,]+,[^,]+$)[a-zA-Z,]{1,20}$
Rubular Link
Explanation:
^ : Start anchor
(?=[^,]+,[^,]+$) : Positive lookahead to ensure string has exactly one comma
surrounded by at least one non-comma character on both sides.
[a-zA-Z,]{1,20} : Ensure entire string is of length max 20 and has only
letters and comma
$ : End anchor
You can do this using forward negative assertions:
^(?!.{21})[A-Za-z]+,[A-Za-z]+$
The regex contains two parts now, the actual definition, and a statement at the start, saying that from that point, there will not be 21 characters.
So for the definition as stated above, the regex becomes
^(?!.{21})[^1-9\?*\.\?\$\^_\,]+,[^1-9\?*\.\?\$\^_\,]+$
The obvious answer would be: Don't ask for name and surname in the same input field.
If you still want to do it: There's no easy way that I know of, but here is a possibility. To see the principle think your [^1-9\?\*\.\?\$\^\_\,] instead of X (I added he \, since it's kind of important :-)).
^(X{1},X{19})|(X{2},X{18})|...|(X{19},X{1})$
Quite ugly, but should work.
On a different note: You don't capture nearly all special characters with your exclusive range. But it's probably still better than an inclusive range.
As I say, I think stated the way you have it, it's not matchable by a regular expression -- it's a pushdown language.
However, you could always split on ',' and match each substring, then total.
I have you tried your example, but removing the
{1,20}
in the middle, leaving to try this:
^[[^1-9\?\*\.\?\$\^\_],[^1-9\?\*\.\?\$\^\_\-]]{1,20}$
Use:
[[a-zA-Z],[a-zA-Z]]{1,20}