vim regex matching with Uppercase letters but NOT underscore - regex

In vim regex syntax, I am trying to match with all words with starting uppercase, and not starting underscore
\\([A-Z][a-z_][A-Za-z_]\\+\\)
This is the what i have untill now.
I want something like this:
\\([A-Z^\_][a-z_][A-Za-z_]\\+\\)
Where [A-Z^\\_] denotes that it should match with all uppercase chars, but not underscore.
Any help would be greatly apreciated. Thanks in advance.
Edit: My question was woorded poorly. I want the first set to match with an uppercase char, which does not have an underscore in front of it. Sorry.

[A-Z] already does not include underscores; I guess you want to match whole words, so you don't want your regular expression to match inside a word. Vim has built-in \< and \> (like \b in other regular expression dialects, see #npinti's answer) for keyword boundaries; as lower/uppercase and underscore characters are usually keyword characters, wrapping your pattern with those should already be close enough:
\<\([A-Z][a-z_][A-Za-z_]\+\)\>
To strictly assert no underscore before your match (but allow any other keyword or non-keyword characters there), you'd need a negative lookbehind: \#<! means is not preceded by:
_\#<!\([A-Z][a-z_][A-Za-z_]\+\)

Where [A-Z^\_] denotes that it should match with all uppercase chars, but not underscore.
[A-Z] already matches with all uppercase chars excluding underscore. However in your first solution, you request that the second letter be lowercase or underscore ([a-z_]). If I stick to your definition:
all words with starting uppercase, and not starting underscore
Then [A-Z][A-Za-z_]+ should work.

Related

Unicode regex that do not match any letter nor any punctuation sign

I am trying to create a unicode regex that matches every character except for a letter (of any language) and the punctuation signs .;:?!.
So for example the string
abcd 123 kjd ¤%/(" .?:!
should only match the bold parts below
abcd 123 kjd ¤%/(" .?:!
I know that \P{L}+ matches everything except a letter and \P{P}+ matches everything except a punctuation sign. How do I combine this two regex string to one? I have tried simply putting the together \P{L}+\P{P}+ but this does not give the required match. I have also tried writing [^.;:?!]\P{L}+ but this does not work either.
How do I combine one or more unicode regex or is there a better regex that achieves my requirement?
Using \P{L}+\P{P}+ will match 1+ times the opposite of any letter followed by 1+ times the opposite of any punctuation mark.
The pattern [^.;:?!]\P{L}+ matches 1 time any character other than the listed followed by 1+ times the opposite of any letter.
What you could do is add \p{L} (which will match any kind of letter) to the negated character class. As advised by Wiktor Stribiżew, you can add \p{Z} to match any kind of whitespace.
[^\p{Z}\p{L}.;:?!]
Regex demo

Capitalized Words with Regular Expression

I'm new to RegEx and I'm looking for a way to match sentences where the first letter is capitalized and the rest is in lowercase.
I've tried a couple of things (IF statements included), but just can't seem to get it.
This is my last version:
(([A-Z])([a-z]+\s|[a-z]+))+
I thought it worked at first, but is now accepting capitalized letters in the middle of the word.
The Output Would Be Like This (Each Word Capitalized).
Thanks!!
The expression accepts capital letters in the middle of the world because now the spaces between words are optional, and words can run into each other.
You can take a more structured approach: a sentence must have at least one word. That's
[A-Z][a-z]*
After that initial word you can get any number of more words, each preceded by whitespace. So in total:
[A-Z][a-z]*(\s[A-Z][a-z]*)*
To match whole strings that start with an uppercase letter and then have no uppercase letters use
^[A-Z][^A-Z]*$
See the regex demo. ^ matches the start of string, [A-Z] matches the uppercase letters, [^A-Z]* matches 0 or more chars other than uppercase letters and $ matches the end of string.
To match capitalized words, you may use
\b[A-Z][a-zA-Z]*\b
where \b stands for word boundaries. See the regex demo.
In various regex flavors, there are other ways to match word boundaries:
bash,r (TRE, base R): \<[A-Z][a-zA-Z]*\>
postgresql, tcl: \m[A-Z][a-zA-Z]*\M or \y[A-Z][a-zA-Z]*\y
bash, mysql (MySQL versions before 8): [[:<:]][A-Z][a-zA-Z]*[[:>:]]
Also, you may consider using [[:upper:]] or \p{Lu} instead of [A-Z] and [[:alpha:]] or \p{L} instead of [a-zA-Z] to match any Unicode uppercase letters or any letters correspondingly.
See this demo and this demo, too.

Regular expression to match alphanumeric, hyphen, underscore and space string

I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.
Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.
Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$
I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details
As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.
You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.

RegEx Lowercase Letters and Hyphen

Can someone help me write a regex that matches only all lower case letters plus hyphens.
Example: this-page-name
Mike Clark's pattern [a-z\-]+ would match -start-dash-double-dash---and-end-dash-
Maybe ^[a-z]+(-[a-z]+)*$ is little bit more precise.
This will catch 1 or more characters that are either lowercase a-z or the hyphen
[a-z\-]+
The trick is to escape the hyphen with a backslash.
For completeness, you can add an appropriate boundary such as \b on each end to signify a full word match, or ^ and $ to make it match a full line.

Regex to match spaces and apostrophes

I need a Regex that matches all instances of any character that is not a-z (space and things like apostrophes need to be selected). Sorry for the noob factor.
//novice
With a somewhat sophisticated regex engine (grep will do just fine) this will be quite general:
/[^[:lower:]]+/
(Note the ^!)
The difference between [:lower:] and [a-z] is that the former should be I18N friendly and match e.g. ü, â etc.
For case insensitive matching use [:alpha:], to also include digits use [:alnum:]. [:alnum:] differs from \W in that it doesn't include _ (underscore).
Note that character classes written in this style may be combined as usual (like a-z etc.), e.g. [^[:lower:][:digit:]]+ would match a non-empty string of characters not including any lowercase letters or digits.
Here is regex that will literally match any char that is not a-z. The /g flag indicates a global match which will cover all instances of the match.
/[^a-z]+/g
If you need uppercase letters too, you can either pass the /i flag which indicates case insensitivity:
/[^a-z]+/gi
or include the uppercase chars in character class:
/[^a-zA-Z]+/g
The character class [^a-zA-Z] will match any character that isn't (upper or lowercase) a-z.
I'm sure you can figure out the rest.
\W will match any non-alphanumeric (a-z, 0-9, and underscore) character.
The following regular expression matches any letter other than [a-z]:
/[^a-z]+/
OK.
/[^a-z]+/ will match anything other than lowercase letters.
/[^A-Za-z]+/ will match anything non-alpha.
/\W+/ on most systems will match non-'word' characters. Word characters include A-Z, a-z, 0-9, and '_' (underscore). Note that that is an uppercase W.
If you ever need to create another regex try reading this. Teaching to fish and all that. :)