Regex to match spaces and apostrophes - regex

I need a Regex that matches all instances of any character that is not a-z (space and things like apostrophes need to be selected). Sorry for the noob factor.
//novice

With a somewhat sophisticated regex engine (grep will do just fine) this will be quite general:
/[^[:lower:]]+/
(Note the ^!)
The difference between [:lower:] and [a-z] is that the former should be I18N friendly and match e.g. ü, â etc.
For case insensitive matching use [:alpha:], to also include digits use [:alnum:]. [:alnum:] differs from \W in that it doesn't include _ (underscore).
Note that character classes written in this style may be combined as usual (like a-z etc.), e.g. [^[:lower:][:digit:]]+ would match a non-empty string of characters not including any lowercase letters or digits.

Here is regex that will literally match any char that is not a-z. The /g flag indicates a global match which will cover all instances of the match.
/[^a-z]+/g
If you need uppercase letters too, you can either pass the /i flag which indicates case insensitivity:
/[^a-z]+/gi
or include the uppercase chars in character class:
/[^a-zA-Z]+/g

The character class [^a-zA-Z] will match any character that isn't (upper or lowercase) a-z.
I'm sure you can figure out the rest.

\W will match any non-alphanumeric (a-z, 0-9, and underscore) character.

The following regular expression matches any letter other than [a-z]:
/[^a-z]+/

OK.
/[^a-z]+/ will match anything other than lowercase letters.
/[^A-Za-z]+/ will match anything non-alpha.
/\W+/ on most systems will match non-'word' characters. Word characters include A-Z, a-z, 0-9, and '_' (underscore). Note that that is an uppercase W.

If you ever need to create another regex try reading this. Teaching to fish and all that. :)

Related

Unicode regex that do not match any letter nor any punctuation sign

I am trying to create a unicode regex that matches every character except for a letter (of any language) and the punctuation signs .;:?!.
So for example the string
abcd 123 kjd ¤%/(" .?:!
should only match the bold parts below
abcd 123 kjd ¤%/(" .?:!
I know that \P{L}+ matches everything except a letter and \P{P}+ matches everything except a punctuation sign. How do I combine this two regex string to one? I have tried simply putting the together \P{L}+\P{P}+ but this does not give the required match. I have also tried writing [^.;:?!]\P{L}+ but this does not work either.
How do I combine one or more unicode regex or is there a better regex that achieves my requirement?
Using \P{L}+\P{P}+ will match 1+ times the opposite of any letter followed by 1+ times the opposite of any punctuation mark.
The pattern [^.;:?!]\P{L}+ matches 1 time any character other than the listed followed by 1+ times the opposite of any letter.
What you could do is add \p{L} (which will match any kind of letter) to the negated character class. As advised by Wiktor Stribiżew, you can add \p{Z} to match any kind of whitespace.
[^\p{Z}\p{L}.;:?!]
Regex demo

Regular expression to match alphanumeric, hyphen, underscore and space string

I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.
Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.
Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$
I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details
As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.
You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.

vim regex matching with Uppercase letters but NOT underscore

In vim regex syntax, I am trying to match with all words with starting uppercase, and not starting underscore
\\([A-Z][a-z_][A-Za-z_]\\+\\)
This is the what i have untill now.
I want something like this:
\\([A-Z^\_][a-z_][A-Za-z_]\\+\\)
Where [A-Z^\\_] denotes that it should match with all uppercase chars, but not underscore.
Any help would be greatly apreciated. Thanks in advance.
Edit: My question was woorded poorly. I want the first set to match with an uppercase char, which does not have an underscore in front of it. Sorry.
[A-Z] already does not include underscores; I guess you want to match whole words, so you don't want your regular expression to match inside a word. Vim has built-in \< and \> (like \b in other regular expression dialects, see #npinti's answer) for keyword boundaries; as lower/uppercase and underscore characters are usually keyword characters, wrapping your pattern with those should already be close enough:
\<\([A-Z][a-z_][A-Za-z_]\+\)\>
To strictly assert no underscore before your match (but allow any other keyword or non-keyword characters there), you'd need a negative lookbehind: \#<! means is not preceded by:
_\#<!\([A-Z][a-z_][A-Za-z_]\+\)
Where [A-Z^\_] denotes that it should match with all uppercase chars, but not underscore.
[A-Z] already matches with all uppercase chars excluding underscore. However in your first solution, you request that the second letter be lowercase or underscore ([a-z_]). If I stick to your definition:
all words with starting uppercase, and not starting underscore
Then [A-Z][A-Za-z_]+ should work.

Regular expression with a set with a character followed by a character

I'm writing a regular expression in Java for capturing some word without spaces.
The word can contain only letter, number, hyphens and dot.
The character set [\w+\-\\.] work well.
Now I want to edit the set for allowing a single space after the dot.
How I have to edit my regular expression?
You can add an alternation that matches this additional requirement
([\w\-.]|(?<=\.) )+
See it here on Regexr
(?<=\.) is a lookbehind assertion. It ensures that space is only matched, if it is preceded by a dot.
Other hints:
\w contains the underscore and matches per default only ASCII letters/digits. If you care about Unicode, use either the modifier UNICODE_CHARACTER_CLASS to enable Unicode for \w or use the Unicode properties \p{L} and \p{Nd} to match Unicode letters and digits.
You don't need to escape the dot in a character class.
You have \w+ in your character class, are you aware, that you just add the "+" character to the accepted characters?
In case of a dot followed by a space, I suppose this pattern should be neither the first, nor the last in the matched string? You may want to enclose it in word boundaries \b:
([0-9A-Za-z-]|\b\.( \b)?)+
I deliberately did not use \w, to exclude underscores.
For allowing ONLY a single space after the dot you can use this regex:
^(?!.*?\. {2})[\w.-]+$
You don't need to escape dot OR hyphen inside character class
(?!.*?\. {2}) is a negative lookahead that disallows 2 or more spaces after a dot

RegEx Lowercase Letters and Hyphen

Can someone help me write a regex that matches only all lower case letters plus hyphens.
Example: this-page-name
Mike Clark's pattern [a-z\-]+ would match -start-dash-double-dash---and-end-dash-
Maybe ^[a-z]+(-[a-z]+)*$ is little bit more precise.
This will catch 1 or more characters that are either lowercase a-z or the hyphen
[a-z\-]+
The trick is to escape the hyphen with a backslash.
For completeness, you can add an appropriate boundary such as \b on each end to signify a full word match, or ^ and $ to make it match a full line.