RegEx Lowercase Letters and Hyphen - regex

Can someone help me write a regex that matches only all lower case letters plus hyphens.
Example: this-page-name

Mike Clark's pattern [a-z\-]+ would match -start-dash-double-dash---and-end-dash-
Maybe ^[a-z]+(-[a-z]+)*$ is little bit more precise.

This will catch 1 or more characters that are either lowercase a-z or the hyphen
[a-z\-]+
The trick is to escape the hyphen with a backslash.
For completeness, you can add an appropriate boundary such as \b on each end to signify a full word match, or ^ and $ to make it match a full line.

Related

Regex match pattern, space and character

^([a-zA-Z0-9_-]+)$ matches:
BAP-78810
BAP-148080
But does not match:
B8241066 C
Q2111999 A
Q2111999 B
How can I modify regex pattern to match any space and/or special character?
For the example data, you can write the pattern as:
^[a-zA-Z0-9_-]+(?: [A-Z])?$
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ chars listed in the character class
(?: [A-Z])? Optionally match a space and a char A-Z
$ End of string
Regex demo
Or a more exact match:
^[A-Z]+-?\d+(?: [A-Z])?$
^ Start of string
[A-Z]+-? Match 1+ chars A-Z and optional -
\d+(?: [A-Z])? Matchh 1+ digits and optional space and char A-Z
$ End of string
Regex demo
Whenever you want to match something that can either be a space or a special character, you would use the dot symbol .. Your regex pattern would then be modified to:
^([a-zA-Z0-9_-])+.$
This will match the empty space, or any other character. If you want to match the example provided, where strictly one alphabetical, numer character will follow the space, you could include \w such that:
^([a-zA-Z0-9_-])+.\w$
Note that \w is equivalent to [A-Za-z0-9_]
Further, be careful when you use . as it makes your pattern less specific and therefore more likely to false positives.
I suggest using this approach
^[A-Z][A-Z\d -]{6,}$
The first character must be an uppercase letter, followed by at least 6 uppercase letters, digits, spaces or -.
I removed the group because there was only one group and it was the entire regex.
You can also use \w - which includes A-Z,a-z and 0-9, as well as _ (underscore). To make it case-insensitive, without explicitly adding a-z or using \w, you can use a flag - often an i.

Unicode regex that do not match any letter nor any punctuation sign

I am trying to create a unicode regex that matches every character except for a letter (of any language) and the punctuation signs .;:?!.
So for example the string
abcd 123 kjd ¤%/(" .?:!
should only match the bold parts below
abcd 123 kjd ¤%/(" .?:!
I know that \P{L}+ matches everything except a letter and \P{P}+ matches everything except a punctuation sign. How do I combine this two regex string to one? I have tried simply putting the together \P{L}+\P{P}+ but this does not give the required match. I have also tried writing [^.;:?!]\P{L}+ but this does not work either.
How do I combine one or more unicode regex or is there a better regex that achieves my requirement?
Using \P{L}+\P{P}+ will match 1+ times the opposite of any letter followed by 1+ times the opposite of any punctuation mark.
The pattern [^.;:?!]\P{L}+ matches 1 time any character other than the listed followed by 1+ times the opposite of any letter.
What you could do is add \p{L} (which will match any kind of letter) to the negated character class. As advised by Wiktor Stribiżew, you can add \p{Z} to match any kind of whitespace.
[^\p{Z}\p{L}.;:?!]
Regex demo

How to match anything but exclusively small letters?

I am trying to come up with a regex that will allow small letters alongside with other characters but not if there are only small letters.
e.g.
Example # would match
example # would not match
So a simple ^[A-Za-z0-9 ]+$ will not do the trick.
Here is an example of what I want to achieve, the last folder contains a city which is always in small letters, therefore a pattern I want to exclude:
https://regex101.com/r/gP1evZ/2
How can that be achieved in regex for python?
You could use an alternation here:
^(?:[^a-z]+|(?=[^a-z]).+)$
Demo
This regex says to match:
^(?: from the start of the string
[^a-z]+ all non lowercase letters
| OR
(?=[^a-z]) assert that at least one non lowercase letter character appears
.+ then match one or more of any type of character
)$ end of the string
If you want to allow matching spaces, and the string should not contain only lower case chars or allow an empty string:
^(?![a-z ]+$)[A-Za-z0-9 ]*[A-Za-z0-9][A-Za-z0-9 ]*$
Regex demo
Or without the lookahead, match at least an uppercase char or digit
^[A-Za-z0-9 ]*[A-Z0-9][A-Za-z0-9 ]*$
Regex demo
Edit
For the updated data, you could use a negative lookahead (?!.*/[a-z]+/) to assert what is on the right is not only lowercase chars between forward slashes.
^/(hunde|kleinanzeigen)/(?!.*/[a-z]+/).*(prp_[a-z0-9_]+_\d+|cat_48_5030.*)\.html$
Regex demo
Or a bit broader match:
^/(hunde|kleinanzeigen)/(?!.*/[a-z]+/)\S+\.html$
Try
^(?![a-z\s]*$)
this should match strings that do not contain only lowercase characters and whitespaces. Remove \s if necessary.

vim regex matching with Uppercase letters but NOT underscore

In vim regex syntax, I am trying to match with all words with starting uppercase, and not starting underscore
\\([A-Z][a-z_][A-Za-z_]\\+\\)
This is the what i have untill now.
I want something like this:
\\([A-Z^\_][a-z_][A-Za-z_]\\+\\)
Where [A-Z^\\_] denotes that it should match with all uppercase chars, but not underscore.
Any help would be greatly apreciated. Thanks in advance.
Edit: My question was woorded poorly. I want the first set to match with an uppercase char, which does not have an underscore in front of it. Sorry.
[A-Z] already does not include underscores; I guess you want to match whole words, so you don't want your regular expression to match inside a word. Vim has built-in \< and \> (like \b in other regular expression dialects, see #npinti's answer) for keyword boundaries; as lower/uppercase and underscore characters are usually keyword characters, wrapping your pattern with those should already be close enough:
\<\([A-Z][a-z_][A-Za-z_]\+\)\>
To strictly assert no underscore before your match (but allow any other keyword or non-keyword characters there), you'd need a negative lookbehind: \#<! means is not preceded by:
_\#<!\([A-Z][a-z_][A-Za-z_]\+\)
Where [A-Z^\_] denotes that it should match with all uppercase chars, but not underscore.
[A-Z] already matches with all uppercase chars excluding underscore. However in your first solution, you request that the second letter be lowercase or underscore ([a-z_]). If I stick to your definition:
all words with starting uppercase, and not starting underscore
Then [A-Z][A-Za-z_]+ should work.

Regex to match spaces and apostrophes

I need a Regex that matches all instances of any character that is not a-z (space and things like apostrophes need to be selected). Sorry for the noob factor.
//novice
With a somewhat sophisticated regex engine (grep will do just fine) this will be quite general:
/[^[:lower:]]+/
(Note the ^!)
The difference between [:lower:] and [a-z] is that the former should be I18N friendly and match e.g. ü, â etc.
For case insensitive matching use [:alpha:], to also include digits use [:alnum:]. [:alnum:] differs from \W in that it doesn't include _ (underscore).
Note that character classes written in this style may be combined as usual (like a-z etc.), e.g. [^[:lower:][:digit:]]+ would match a non-empty string of characters not including any lowercase letters or digits.
Here is regex that will literally match any char that is not a-z. The /g flag indicates a global match which will cover all instances of the match.
/[^a-z]+/g
If you need uppercase letters too, you can either pass the /i flag which indicates case insensitivity:
/[^a-z]+/gi
or include the uppercase chars in character class:
/[^a-zA-Z]+/g
The character class [^a-zA-Z] will match any character that isn't (upper or lowercase) a-z.
I'm sure you can figure out the rest.
\W will match any non-alphanumeric (a-z, 0-9, and underscore) character.
The following regular expression matches any letter other than [a-z]:
/[^a-z]+/
OK.
/[^a-z]+/ will match anything other than lowercase letters.
/[^A-Za-z]+/ will match anything non-alpha.
/\W+/ on most systems will match non-'word' characters. Word characters include A-Z, a-z, 0-9, and '_' (underscore). Note that that is an uppercase W.
If you ever need to create another regex try reading this. Teaching to fish and all that. :)