Regular Expression for alphabets,numbers,spaces and underscores - regex

How can I create a regex expression that will match only letters and numbers, one space between each word and underscores?
Good Examples:
Vamshi1
vamshi_pendota
vamshi pendota
Bad Examples:
vam shi1
vam_shi pendota

You should use a regex tester site like http://regex101.com/
You can enter in your examples, and use the quick reference to help you construct the correct regular expression.

With this simple regex:
^[a-zA-Z0-9]+(?:[ _][a-zA-Z0-9]+)?$
See demo
Option 2 for capitalization
If only the first letter of each word can be a capital letter, use
^[A-Z]?[a-z0-9]+(?:[ _][A-Z]?[a-z0-9]+)?$
What it means
^[a-zA-Z0-9]+(?:[ _][a-zA-Z0-9]+)?$
Assert position at the beginning of the string ^
Match a single character present in the list below [a-zA-Z0-9]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
A character in the range between “a” and “z” (case sensitive) a-z
A character in the range between “A” and “Z” (case sensitive) A-Z
A character in the range between “0” and “9” 0-9
Match the regular expression below (?:[ _][a-zA-Z0-9]+)?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character from the list “ _” [ _]
Match a single character present in the list below [a-zA-Z0-9]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
A character in the range between “a” and “z” (case sensitive) a-z
A character in the range between “A” and “Z” (case sensitive) A-Z
A character in the range between “0” and “9” 0-9
Assert position at the end of the string, or before the line break at the end of the string, if any (line feed) $

Unless you provide any further information, I suspect that what you are after cannot be achieved through a regular expression.
Regular expressions are used to match patterns of strings. In your case, the Good and Bad cases you want to match look the same from a pattern perspective.
Assuming that Vamshi is a valid name but Vam shi is not (despite both having alpha numeric characters and one white space) in your language, I suspect you need to look at a dictionary implementation and not simply a regular expression one.
EDIT: After seeing your change, something like so should work for you: ^[a-z0-9_]+(\s[a-z0-9_]+)*$. The regular expression should expect the string to start with one or more lower case letters and/or underscores optionally followed by a white space and more text.

Related

RegExp: Ignore all links starting with a specific set of characters

I'm using the RegExp below to find all links in a string. How to add a condition that ignores all links that start with one of these characters: ._ -? (e.g.; .sub.example.com, -example.com)
AS3:
var str = "hello world .sub.example.com foo bar -example.com lorem http://example.com/test";
var filter:RegExp = /((https?:\/\/|www\.)?[äöüa-z0-9]+[äöüa-z0-9\-\:\/]{1,}+\.[\*\!\'\(\)\;\:\#\&\=\$\,\?\#\%\[\]\~\-\+\_äöüa-z0-9\/\.]{2,}+)/gi
var links = str.match(filter)
if (links !== null) {
trace("Links: " + links);
}
You can use the following regex:
\b((https?:\/\/|www\.)?(?<![._ -])[äöüa-z0-9]+[äöüa-z0-9\-\:\/]{1,}+\.[\*\!\'\(\)\;\:\#\&\=\$\,\?\#\%\[\]\~\-\+\_äöüa-z0-9\/\.]{2,}+)\b
Edits:
Added word boundaries \b
Added negative look behind for [._ -] i.e.. (?<![._ -])
This is the regex I use to find in full text :
/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i
Regex explanation:
\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]
Assert position at a word boundary «\b»
Match the regex below and capture its match into backreference number 1 «(https?|ftp|file)»
Match this alternative «https?»
Match the character string “http” literally «http»
Match the character “s” literally «s?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Or match this alternative «ftp»
Match the character string “ftp” literally «ftp»
Or match this alternative «file»
Match the character string “file” literally «file»
Match the character string “://” literally «://»
Match a single character present in the list below «[-A-Z0-9+&##/%?=~_|$!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
The literal character “-” «-»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “+&##/%?=~_|$!:,.;” «+&##/%?=~_|$!:,.;»
Match a single character present in the list below «[A-Z0-9+&##/%=~_|$]»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “+&##/%=~_|$” «+&##/%=~_|$»

What does this regular expression mean /^[a-z]{1}[a-z0-9_]{3,13}$/

Can someone elaborate the following regular expression:
/^[a-z]{1}[a-z0-9_]{3,13}$/
and also give some sample strings that satisfy this regular expression?
The ^ anchor asserts that we are at the beginning of the string
[a-z]{1} matches one lower-case letter. The {1} is unneeded.
[a-z0-9_]{3,13} matches 3 to 13 chars. In case-insensitive mode, in many engines it could be replaced by \w{3,13}
The $ anchor asserts that we are at the end of the string
Sample Matches
abcd
a_000
a_blue_tree
See demo.
General Answers to "What Does this Regex Mean?
You can use a tool such as See regex101 to play with a regex. The right pane explains it token by token.
There are several explaining tools based on the same original Perl library, such as this one, on which one of the answers is based.
The ultimate answer can be found in Mastering Regular Expressions, 3rd Ed. and several excellent online tutorials, including the regex FAQ on this site.
Explanation: /^[a-z]{1}[a-z0-9_]{3,13}$/
^ - Asserts the start of a string
[a-z]{1} Matches exactly one character from a-z.
[a-z0-9_]{3,13} Matches any character from a-z or 0-9 but the length range must between 3 to 13.
$ End
Example
Check Explanation Here
NODE EXPLANATION
^ the beginning of the string
[a-z]{1} any character of: 'a' to 'z' (1 times)
[a-z0-9_]{3,13} any character of: 'a' to 'z', '0' to '9',
'_' (between 3 and 13 times (matching the
most amount possible))
$ before an optional \n, and the end of the
string
It means:
Start(^) with one ({1}) lowercase character([a-z]), then proceed with at least three ({3,) but with a maximum of 13 (13}) characters from the set of lowercase characters, underline and numbers([a-z0-9_]). After that the end of line is expected ($).
a000 satisfies the condition
It matches a string starting with a-z followed by 3 to 13 characters from the character set a-z, 0-9 or _.
There are a number of online tools that will explain/elaborate the meaning of a regular expression as well as test them.
Assert position at the beginning of the string «^»
Match a single character in the range between “a” and “z” «[a-z]{1}»
Exactly 1 times «{1}»
Match a single character present in the list below «[a-z0-9_]{3,13}»
Between 3 and 13 times, as many times as possible, giving back as needed (greedy) «{3,13}»
A character in the range between “a” and “z” «a-z»
A character in the range between “0” and “9” «0-9»
The character “_” «_»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Generated using RegexBuddy

R regular expression repetition ignores upper bound

I try to make regular expression which helps me filter strings like
blah_blah_suffix
where suffix is any string that has length from 2 to 5 characters. So I want accept strings
blah_blah_aa
blah_blah_abcd
but discard
blah_blah_a
blah_aaa
blah_blah_aaaaaaa
I use grepl in the following way:
samples[grepl("blah_blah_.{2,5}", samples)]
but it ignores upper bound for repetition (5). So it discards strings blah_blah_a,
blah_aaa, but accepts string blah_blah_aaaaaaa.
I know there is a way to filter strings without usage of regular expression but I want to understand how to use grepl correctly.
You need to bound the expression to the start and end of the line:
^blah_blah_.{2,5}$
The ^ matches beginning of line and $ matches end of line. See a working example here: Regex101
If you want to bound the expression to the beginning and end of a string (not multi-line), use \A and \Z instead of ^ and $.
Anchors Tutorial
/^[\w]+_[\w]+_[\w]{2,5}$/
DEMO
Options: dot matches newline; case insensitive; ^ and $ match at line breaks
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]{2,5}»
Between 2 and 5 times, as many times as possible, giving back as needed (greedy) «{2,5}»
Assert position at the end of a line (at the end of the string or before a line break character) «$»

Email Regular Expression Explanation

I'm trying to understand a regular expression which is currently being used to validate the input of an email address on a website. The value of this email address is used to populate a target system; validation of which can be expressed in plain English.
I would like to be able to highlight, with the use of examples, where the website validated email address imposes validation rules that are not required in the target system. To this end, I have obtained the regular expression from the developer, and am requiring some assistance in translating it to allow it to be understood in plain English:
^[_A-Za-z0-9_%+-]+(\\.[_A-Za-z0-9_%+-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,4})$
So far, I have gained some understanding from a previous post.
... which would seem to confirm the following:
^ = The matched string must begin here, and only begin here
[ ] = match any character inside the brackets, but only match one.
I'm not sure of the relevance of "only match one". Can anyone advise?
\+ = match previous expression at least once, unlimited number of times.
Presumably this means the previous expression refers to the characters contained within the preceding square brackets and it can be repeated unlimited times?
() = make everything inside the parentheses a group (and make them referencable).
I'm not sure what this might mean.
\\. = match a literal full stop (.)
Then we have a repeat of the square bracket content, though I'm unsure what the relevance is here since the initial square brackets character class can be repeated unlimited times?
# = match a literal # symbol
The final parenthesis seems to match the top level domain which must be at least 2 characters but no more than 4 characters.
I think my main issue is in understanding the relevance of the round brackets as I can't understand what they add beyond what the square brackets add.
Any help would be much appreciated.
^[_A-Za-z0-9_%+-]+(\\.[_A-Za-z0-9_%+-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,4})$
Assert position at the beginning of the string «^»
Match a single character present in the list below «[_A-Za-z0-9_%+-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
The character “_” «_»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “a” and “z” «a-z»
A character in the range between “0” and “9” «0-9»
One of the characters “_%” «_%»
The character “+” «+»
The character “-” «-»
Match the regular expression below and capture its match into backreference number 1 «(\\.[_A-Za-z0-9_%+-]+)*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
Match the character “\” literally «\\»
Match any single character that is not a line break character «.»
Match a single character present in the list below «[_A-Za-z0-9_%+-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
The character “_” «_»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “a” and “z” «a-z»
A character in the range between “0” and “9” «0-9»
One of the characters “_%” «_%»
The character “+” «+»
The character “-” «-»
Match the character “#” literally «#»
Match a single character present in the list below «[A-Za-z0-9]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “a” and “z” «a-z»
A character in the range between “0” and “9” «0-9»
Match the regular expression below and capture its match into backreference number 2 «(\\.[A-Za-z0-9]+)*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
Match the character “\” literally «\\»
Match any single character that is not a line break character «.»
Match a single character present in the list below «[A-Za-z0-9]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “a” and “z” «a-z»
A character in the range between “0” and “9” «0-9»
Match the regular expression below and capture its match into backreference number 3 «(\\.[A-Za-z]{2,4})»
Match the character “\” literally «\\»
Match any single character that is not a line break character «.»
Match a single character present in the list below «[A-Za-z]{2,4}»
Between 2 and 4 times, as many times as possible, giving back as needed (greedy) «{2,4}»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “a” and “z” «a-z»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Using ?=. in regular expression

I saw the phrase
^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9_##%\*\-]{8,24}$
in regex, which was password checking mechanism. I read few courses about regular expressions, but I never saw combination ?=. explained.
I want know how it works. In the example it is searching for at least one capital letter, one small letter and one number. I guess it's something like "if".
(?=regex_here) is a positive lookahead. It is a zero-width assertion, meaning that it matches a location that is followed by the regex contained within (?= and ). To quote from the linked page:
lookaround actually matches characters, but then gives up the match,
returning only the result: match or no match. That is why they are
called "assertions". They do not consume characters in the string, but
only assert whether a match is possible or not. Lookaround allows you
to create regular expressions that are impossible to create without
them, or that would get very longwinded without them.
The . is not part of the lookahead, because it matches any single character that is not a line terminator.
Although i am a newbie to regex but what i understand about the above regex is
1- ?= is positive lookahead i.e. it matches the expression by looking ahead and sees if there is any pattern that matches your search paramater like [A-Z]
2- .* makes sure that they can be 0 or more number of characters before your matching expression i.e. it makes sure that u can lookahead till the end of the input string to find a match.
In short * is a quantifier which says 0 or more so if:
For instance u changed * with ? for [A-Z] part then your expression will only return true if ur 1st or 2nd letter is capital. OR if u changed it with + then ur expression will return true if any letter other than the first is a capital letter
^ asserts position at start of the string
Positive Lookahead (?=\D*\d)
Assert that the Regex below matches
\D matches any character that's not a digit (equivalent to [^0-9])
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
Positive Lookahead (?=[^a-z]*[a-z])
Assert that the Regex below matches
Match a single character not present in the list below [^a-z]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Match a single character present in the list below [a-z]
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Positive Lookahead (?=[^A-Z]*[A-Z])
Assert that the Regex below matches
Match a single character not present in the list below [^A-Z]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [A-Z]
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
. matches any character (except for line terminators)
{8,30} matches the previous token between 8 and 30 times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)