What does this RegExp mean please?
[\w-\.]
I know the \w stands for word characters and could alternatively be written as:
[A-Za-z0-9_]
I know the \. means that the point will be treated as an ordinary character.
The only thing I don't really know is the hyphen character. Is this used as a Range Operator here or just the hyphen character in e.g. "fine-tune"?
Hyphen here is just the hyphen character.
Hyphen is treated as a range operator only when it is between two other characters.
Hyphen is normal character here, so it works as [a-zA-z0-9_-\.] (number, letters, and these three characters: -_.).
Related
I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.
Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.
Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$
I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details
As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.
You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.
[^.]+\.(txt|html)
I am learning regex, and am trying to parse this.
[^.] The ^ means "not", and the dot is a wildcard that means any character, so this means find a match with "not any character"? I still don't understand this. Can anyone explain?
The plus is a Kleene Plus which means "1 or more". So now it's "one or more" "not any character".
I get \., it means a period.
(txt|html) means match with a txt file or html file. I think I understand everything after the plus sign. What I don't understand is why it doesn't look something the DOS equivalent where I can just do this: *.txt or *.(txt|html) where * means everything that ends in the file extension .txt or .html?
Is [^.] the equivalent of * in DOS?
The dot (.) has no special meaning when it's inside a character class, and doesn't require to be escaped.
[^.] means "any character that is not a literal . character". [^.]+ matches one or more occurrences of any character that is not a dot.
From regular-expressions.info:
In most regex flavors, the only special characters or meta-characters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). The usual meta-characters are normal characters inside a character class, and do not need to be escaped by a backslash. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
. is not special inside [] character class. [^.]+ means one or more occurrences (+) of any character which is not a dot.
If you do *.txt it would not be valid regex as * would not get a character to repeat (zero or more times).
I'm writing a regular expression in Java for capturing some word without spaces.
The word can contain only letter, number, hyphens and dot.
The character set [\w+\-\\.] work well.
Now I want to edit the set for allowing a single space after the dot.
How I have to edit my regular expression?
You can add an alternation that matches this additional requirement
([\w\-.]|(?<=\.) )+
See it here on Regexr
(?<=\.) is a lookbehind assertion. It ensures that space is only matched, if it is preceded by a dot.
Other hints:
\w contains the underscore and matches per default only ASCII letters/digits. If you care about Unicode, use either the modifier UNICODE_CHARACTER_CLASS to enable Unicode for \w or use the Unicode properties \p{L} and \p{Nd} to match Unicode letters and digits.
You don't need to escape the dot in a character class.
You have \w+ in your character class, are you aware, that you just add the "+" character to the accepted characters?
In case of a dot followed by a space, I suppose this pattern should be neither the first, nor the last in the matched string? You may want to enclose it in word boundaries \b:
([0-9A-Za-z-]|\b\.( \b)?)+
I deliberately did not use \w, to exclude underscores.
For allowing ONLY a single space after the dot you can use this regex:
^(?!.*?\. {2})[\w.-]+$
You don't need to escape dot OR hyphen inside character class
(?!.*?\. {2}) is a negative lookahead that disallows 2 or more spaces after a dot
The Regular expression
/[\D\S]/
should match characters Which is not a digit or not whitespace
But When I test this expression in regexpal
It starts matching any character that's digit, whitespace
What i am doing wrong ?
\D = all characters except digits,
\S = all characters except whitespaces
[\D\S] = union (set theory) of the above character groups = all characters.
Why? Because \D contains \s and \S contains \d.
If you want to match characters which are not dights nor whitespaces you can use [^\d\s].
Your regex is invalidating itself as it goes. Putting the regex inside of [] means it has to match one of the items inside of it. These two items override each other, which end up matching everything. In theory, anything that is non digit, would match every other char. available, and any non whitespace matches any digit and any other char. as well.
You can try using [^\d\s] which says to negate the match of any digit or any space. Instead of having everything caught in the original regex, this negates the matching of both the \d and \s. You can see testing done with it here.
I need a Regex that matches all instances of any character that is not a-z (space and things like apostrophes need to be selected). Sorry for the noob factor.
//novice
With a somewhat sophisticated regex engine (grep will do just fine) this will be quite general:
/[^[:lower:]]+/
(Note the ^!)
The difference between [:lower:] and [a-z] is that the former should be I18N friendly and match e.g. ü, â etc.
For case insensitive matching use [:alpha:], to also include digits use [:alnum:]. [:alnum:] differs from \W in that it doesn't include _ (underscore).
Note that character classes written in this style may be combined as usual (like a-z etc.), e.g. [^[:lower:][:digit:]]+ would match a non-empty string of characters not including any lowercase letters or digits.
Here is regex that will literally match any char that is not a-z. The /g flag indicates a global match which will cover all instances of the match.
/[^a-z]+/g
If you need uppercase letters too, you can either pass the /i flag which indicates case insensitivity:
/[^a-z]+/gi
or include the uppercase chars in character class:
/[^a-zA-Z]+/g
The character class [^a-zA-Z] will match any character that isn't (upper or lowercase) a-z.
I'm sure you can figure out the rest.
\W will match any non-alphanumeric (a-z, 0-9, and underscore) character.
The following regular expression matches any letter other than [a-z]:
/[^a-z]+/
OK.
/[^a-z]+/ will match anything other than lowercase letters.
/[^A-Za-z]+/ will match anything non-alpha.
/\W+/ on most systems will match non-'word' characters. Word characters include A-Z, a-z, 0-9, and '_' (underscore). Note that that is an uppercase W.
If you ever need to create another regex try reading this. Teaching to fish and all that. :)