Regex to match spaces and apostrophes

Regex to match spaces and apostrophes - regex

I need a Regex that matches all instances of any character that is not a-z (space and things like apostrophes need to be selected). Sorry for the noob factor.
//novice

With a somewhat sophisticated regex engine (grep will do just fine) this will be quite general:
/[^[:lower:]]+/
(Note the ^!)
The difference between [:lower:] and [a-z] is that the former should be I18N friendly and match e.g. ü, â etc.
For case insensitive matching use [:alpha:], to also include digits use [:alnum:]. [:alnum:] differs from \W in that it doesn't include _ (underscore).
Note that character classes written in this style may be combined as usual (like a-z etc.), e.g. [^[:lower:][:digit:]]+ would match a non-empty string of characters not including any lowercase letters or digits.

Here is regex that will literally match any char that is not a-z. The /g flag indicates a global match which will cover all instances of the match.
/[^a-z]+/g
If you need uppercase letters too, you can either pass the /i flag which indicates case insensitivity:
/[^a-z]+/gi
or include the uppercase chars in character class:
/[^a-zA-Z]+/g

The character class [^a-zA-Z] will match any character that isn't (upper or lowercase) a-z.
I'm sure you can figure out the rest.

\W will match any non-alphanumeric (a-z, 0-9, and underscore) character.

The following regular expression matches any letter other than [a-z]:
/[^a-z]+/

OK.
/[^a-z]+/ will match anything other than lowercase letters.
/[^A-Za-z]+/ will match anything non-alpha.
/\W+/ on most systems will match non-'word' characters. Word characters include A-Z, a-z, 0-9, and '_' (underscore). Note that that is an uppercase W.

If you ever need to create another regex try reading this. Teaching to fish and all that. :)

Related

Unicode regex that do not match any letter nor any punctuation sign

I am trying to create a unicode regex that matches every character except for a letter (of any language) and the punctuation signs .;:?!.
So for example the string
abcd 123 kjd ¤%/(" .?:!
should only match the bold parts below
abcd 123 kjd ¤%/(" .?:!
I know that \P{L}+ matches everything except a letter and \P{P}+ matches everything except a punctuation sign. How do I combine this two regex string to one? I have tried simply putting the together \P{L}+\P{P}+ but this does not give the required match. I have also tried writing [^.;:?!]\P{L}+ but this does not work either.
How do I combine one or more unicode regex or is there a better regex that achieves my requirement?

Using \P{L}+\P{P}+ will match 1+ times the opposite of any letter followed by 1+ times the opposite of any punctuation mark.
The pattern [^.;:?!]\P{L}+ matches 1 time any character other than the listed followed by 1+ times the opposite of any letter.
What you could do is add \p{L} (which will match any kind of letter) to the negated character class. As advised by Wiktor Stribiżew, you can add \p{Z} to match any kind of whitespace.
[^\p{Z}\p{L}.;:?!]
Regex demo

Regular expression to match alphanumeric, hyphen, underscore and space string

I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.

Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.

Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$

I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details

As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.

You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.

vim regex matching with Uppercase letters but NOT underscore

In vim regex syntax, I am trying to match with all words with starting uppercase, and not starting underscore
\\([A-Z][a-z_][A-Za-z_]\\+\\)
This is the what i have untill now.
I want something like this:
\\([A-Z^\_][a-z_][A-Za-z_]\\+\\)
Where [A-Z^\\_] denotes that it should match with all uppercase chars, but not underscore.
Any help would be greatly apreciated. Thanks in advance.
Edit: My question was woorded poorly. I want the first set to match with an uppercase char, which does not have an underscore in front of it. Sorry.

[A-Z] already does not include underscores; I guess you want to match whole words, so you don't want your regular expression to match inside a word. Vim has built-in \< and \> (like \b in other regular expression dialects, see #npinti's answer) for keyword boundaries; as lower/uppercase and underscore characters are usually keyword characters, wrapping your pattern with those should already be close enough:
\<\([A-Z][a-z_][A-Za-z_]\+\)\>
To strictly assert no underscore before your match (but allow any other keyword or non-keyword characters there), you'd need a negative lookbehind: \#<! means is not preceded by:
_\#<!\([A-Z][a-z_][A-Za-z_]\+\)

Where [A-Z^\_] denotes that it should match with all uppercase chars, but not underscore.
[A-Z] already matches with all uppercase chars excluding underscore. However in your first solution, you request that the second letter be lowercase or underscore ([a-z_]). If I stick to your definition:
all words with starting uppercase, and not starting underscore
Then [A-Z][A-Za-z_]+ should work.

Regular expression with a set with a character followed by a character

I'm writing a regular expression in Java for capturing some word without spaces.
The word can contain only letter, number, hyphens and dot.
The character set [\w+\-\\.] work well.
Now I want to edit the set for allowing a single space after the dot.
How I have to edit my regular expression?

You can add an alternation that matches this additional requirement
([\w\-.]|(?<=\.) )+
See it here on Regexr
(?<=\.) is a lookbehind assertion. It ensures that space is only matched, if it is preceded by a dot.
Other hints:
\w contains the underscore and matches per default only ASCII letters/digits. If you care about Unicode, use either the modifier UNICODE_CHARACTER_CLASS to enable Unicode for \w or use the Unicode properties \p{L} and \p{Nd} to match Unicode letters and digits.
You don't need to escape the dot in a character class.
You have \w+ in your character class, are you aware, that you just add the "+" character to the accepted characters?

In case of a dot followed by a space, I suppose this pattern should be neither the first, nor the last in the matched string? You may want to enclose it in word boundaries \b:
([0-9A-Za-z-]|\b\.( \b)?)+
I deliberately did not use \w, to exclude underscores.

For allowing ONLY a single space after the dot you can use this regex:
^(?!.*?\. {2})[\w.-]+$
You don't need to escape dot OR hyphen inside character class
(?!.*?\. {2}) is a negative lookahead that disallows 2 or more spaces after a dot

RegEx Lowercase Letters and Hyphen

Can someone help me write a regex that matches only all lower case letters plus hyphens.
Example: this-page-name

Mike Clark's pattern [a-z\-]+ would match -start-dash-double-dash---and-end-dash-
Maybe ^[a-z]+(-[a-z]+)*$ is little bit more precise.

This will catch 1 or more characters that are either lowercase a-z or the hyphen
[a-z\-]+
The trick is to escape the hyphen with a backslash.
For completeness, you can add an appropriate boundary such as \b on each end to signify a full word match, or ^ and $ to make it match a full line.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match spaces and apostrophes - regex

I need a Regex that matches all instances of any character that is not a-z (space and things like apostrophes need to be selected). Sorry for the noob factor. //novice

The character class [^a-zA-Z] will match any character that isn't (upper or lowercase) a-z. I'm sure you can figure out the rest.

\W will match any non-alphanumeric (a-z, 0-9, and underscore) character.

The following regular expression matches any letter other than [a-z]: /[^a-z]+/

OK. /[^a-z]+/ will match anything other than lowercase letters. /[^A-Za-z]+/ will match anything non-alpha. /\W+/ on most systems will match non-'word' characters. Word characters include A-Z, a-z, 0-9, and '_' (underscore). Note that that is an uppercase W.

If you ever need to create another regex try reading this. Teaching to fish and all that. :)

Related

Unicode regex that do not match any letter nor any punctuation sign

Regular expression to match alphanumeric, hyphen, underscore and space string

vim regex matching with Uppercase letters but NOT underscore

Regular expression with a set with a character followed by a character

RegEx Lowercase Letters and Hyphen

Categories

Resources