I am trying to come up with a regex that will allow small letters alongside with other characters but not if there are only small letters.
e.g.
Example # would match
example # would not match
So a simple ^[A-Za-z0-9 ]+$ will not do the trick.
Here is an example of what I want to achieve, the last folder contains a city which is always in small letters, therefore a pattern I want to exclude:
https://regex101.com/r/gP1evZ/2
How can that be achieved in regex for python?
You could use an alternation here:
^(?:[^a-z]+|(?=[^a-z]).+)$
Demo
This regex says to match:
^(?: from the start of the string
[^a-z]+ all non lowercase letters
| OR
(?=[^a-z]) assert that at least one non lowercase letter character appears
.+ then match one or more of any type of character
)$ end of the string
If you want to allow matching spaces, and the string should not contain only lower case chars or allow an empty string:
^(?![a-z ]+$)[A-Za-z0-9 ]*[A-Za-z0-9][A-Za-z0-9 ]*$
Regex demo
Or without the lookahead, match at least an uppercase char or digit
^[A-Za-z0-9 ]*[A-Z0-9][A-Za-z0-9 ]*$
Regex demo
Edit
For the updated data, you could use a negative lookahead (?!.*/[a-z]+/) to assert what is on the right is not only lowercase chars between forward slashes.
^/(hunde|kleinanzeigen)/(?!.*/[a-z]+/).*(prp_[a-z0-9_]+_\d+|cat_48_5030.*)\.html$
Regex demo
Or a bit broader match:
^/(hunde|kleinanzeigen)/(?!.*/[a-z]+/)\S+\.html$
Try
^(?![a-z\s]*$)
this should match strings that do not contain only lowercase characters and whitespaces. Remove \s if necessary.
Related
I want to make a regex that recognize some patterns and some not.
_*[a-zA-Z][a-zA-Z0-9_][^-]*.*(?<!_)
The sample of patterns that i want to recognize:
a100__version_2
_a100__version2
And the sample of patterns that i dont want to recognize:
100__version_2
a100__version2_
_100__version_2
a100--version-2
The regex works for all of them except this one:
a100--version-2
So I don't want to match the dashes.
I tried _*[a-zA-Z][a-zA-Z0-9_][^-]*.*(?<!_)
so the problem is at [^-]
You could write the pattern like this, but [^-]* can also match newlines and spaces.
To not match newlines and spaces, and matching at least 2 characters:
^_*[a-zA-Z][a-zA-Z0-9_][^-\s]*$(?<!_)
Regex demo
Or matching only word characters, matching at least a single character repeating \w* zero or more times:
^_*[a-zA-Z]\w*$(?<!_)
^ Start of string
_* Match optional underscores
[a-zA-Z] Match a single char a-zA-Z
\w* Match optional word chars (Or [a-zA-Z0-9_]*)
$ End of string
(?<!_) Assert not _ to the left at the end of the string
Regex demo
^([a-zA-Z0-9_-]+)$ matches:
BAP-78810
BAP-148080
But does not match:
B8241066 C
Q2111999 A
Q2111999 B
How can I modify regex pattern to match any space and/or special character?
For the example data, you can write the pattern as:
^[a-zA-Z0-9_-]+(?: [A-Z])?$
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ chars listed in the character class
(?: [A-Z])? Optionally match a space and a char A-Z
$ End of string
Regex demo
Or a more exact match:
^[A-Z]+-?\d+(?: [A-Z])?$
^ Start of string
[A-Z]+-? Match 1+ chars A-Z and optional -
\d+(?: [A-Z])? Matchh 1+ digits and optional space and char A-Z
$ End of string
Regex demo
Whenever you want to match something that can either be a space or a special character, you would use the dot symbol .. Your regex pattern would then be modified to:
^([a-zA-Z0-9_-])+.$
This will match the empty space, or any other character. If you want to match the example provided, where strictly one alphabetical, numer character will follow the space, you could include \w such that:
^([a-zA-Z0-9_-])+.\w$
Note that \w is equivalent to [A-Za-z0-9_]
Further, be careful when you use . as it makes your pattern less specific and therefore more likely to false positives.
I suggest using this approach
^[A-Z][A-Z\d -]{6,}$
The first character must be an uppercase letter, followed by at least 6 uppercase letters, digits, spaces or -.
I removed the group because there was only one group and it was the entire regex.
You can also use \w - which includes A-Z,a-z and 0-9, as well as _ (underscore). To make it case-insensitive, without explicitly adding a-z or using \w, you can use a flag - often an i.
I need a regular expression to ensure that entries in a form 1) are all lower case AND 2) do not contain the string ".net"
I can do either of those separately:
^((?!.net).)*$ gives me strings that do not contain .net.
[a-z] only matches lower-cased inputs. But I have not been able to combine these.
I've tried:
^((?!.net).)(?=[a-z])*$
(^((?!.net).)*$)([a-z])
And a few others.
Can anyone spot my error? Thanks!
As you are using a dot in your pattern that would match any char except a newline, you can use a negated character class to exclude matching uppercase chars or a newline.
As suggested by #Wiktor Stribiżew, to rule out a string that contains .net you can use a negative lookahead (?!.*\.net) where the .net (note to escape the dot) is preceded by .* to match 0+ times any character.
^(?!.*\.net)[^\nA-Z]+$
^ Start of string
(?!.*\.net) negative lookahead to make sure the string does not contain .net
[^\nA-Z]+ Match 1+ times any character except a newline or a char A-Z
$ End of string
Regex demo
I need to match only those words which doesn't have special characters like # and :.
For example:
git#github.com shouldn't match
list should return a valid match
show should also return a valid match
I tried it using a negative lookahead \w+(?![#:])
But it matches gi out of git#github.com but it shouldn't match that too.
You may add \w to the lookahead:
\w+(?![\w#:])
The equivalent is using a word boundary:
\w+\b(?![#:])
Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:
^\w+(?![\w#:])
Or
(?<!\S)\w+(?![\w#:])
The ^ will match the word at the start of the string and (?<!S) will match only if the word is preceded with whitespace or start of string.
See the regex demo.
Why not (?<!\S)\w+(?!\S), the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.
You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:
(?<!\S)\w+(?!\S)
Demo: https://regex101.com/r/cjhUUM/2
I'm new to RegEx and I'm looking for a way to match sentences where the first letter is capitalized and the rest is in lowercase.
I've tried a couple of things (IF statements included), but just can't seem to get it.
This is my last version:
(([A-Z])([a-z]+\s|[a-z]+))+
I thought it worked at first, but is now accepting capitalized letters in the middle of the word.
The Output Would Be Like This (Each Word Capitalized).
Thanks!!
The expression accepts capital letters in the middle of the world because now the spaces between words are optional, and words can run into each other.
You can take a more structured approach: a sentence must have at least one word. That's
[A-Z][a-z]*
After that initial word you can get any number of more words, each preceded by whitespace. So in total:
[A-Z][a-z]*(\s[A-Z][a-z]*)*
To match whole strings that start with an uppercase letter and then have no uppercase letters use
^[A-Z][^A-Z]*$
See the regex demo. ^ matches the start of string, [A-Z] matches the uppercase letters, [^A-Z]* matches 0 or more chars other than uppercase letters and $ matches the end of string.
To match capitalized words, you may use
\b[A-Z][a-zA-Z]*\b
where \b stands for word boundaries. See the regex demo.
In various regex flavors, there are other ways to match word boundaries:
bash,r (TRE, base R): \<[A-Z][a-zA-Z]*\>
postgresql, tcl: \m[A-Z][a-zA-Z]*\M or \y[A-Z][a-zA-Z]*\y
bash, mysql (MySQL versions before 8): [[:<:]][A-Z][a-zA-Z]*[[:>:]]
Also, you may consider using [[:upper:]] or \p{Lu} instead of [A-Z] and [[:alpha:]] or \p{L} instead of [a-zA-Z] to match any Unicode uppercase letters or any letters correspondingly.
See this demo and this demo, too.