Regex allow only Uppercase Extended ASCII

Regex allow only Uppercase Extended ASCII - regex

I need a regex to allow only Uppercase Extended ASCII characters of a maxLength I set before that it's the maximum length of the word.
Regex for uppercase letters: \P{Ll}*
Regex for extended ASCII letters: [\x00-\xFF]*
Using ^[\p{Ll}] it's not enough because I need characters to be extended ASCII(to not allow emoji or other special characters outrange ASCII extended).
How can I combine that 2 requirements ? And length of maxLength.
Thank you!!

Generally, you can use
^(?:(?=\p{Lu})\p{Latin}){1,10}$
See the regex demo. Details:
^ - start of string
(?: - start of a non-capturing group:
(?=\p{Lu})\p{Latin} - a char from Latin Unicode category class that is an uppercase letter
){1,10} - end of the group, repeat one to ten occurrences
$ - end of string.
Since you are using the regex in a DevExpress masked input component you need to enumerate all these letters in a character class. Based on Regex Latin characters filter and non latin character filer, you need
Latin-1 Supplement U+0080 - U+00FF
Latin Extended-A U+0100 - U+017F
Latin Extended-B U+0180 - U+024F
All chars that are uppercase letters in these three ranges are the ones you want to allow:
var res = []
for (var i=128; i<=591; i++) { // Get chars from \u0080 to \u024F
if (/^\p{Lu}$/u.test(String.fromCharCode(i))) { // If it is an uppercase letter
res.push(String.fromCharCode(i)); // Add it to the results
}
}
console.log(res.join(""))
The code will look like
settings.MaskExpression = "[\\u00C0-\\u00D6\\u00D8-\\u00DE\\u0100\\u0102\\u0104\\u0106\\u0108\\u010A\\u010C\\u010E\\u0110\\u0112\\u0114\\u0116\\u0118\\u011A\\u011C\\u011E\\u0120\\u0122\\u0124\\u0126\\u0128\\u012A\\u012C\\u012E\\u0130\\u0132\\u0134\\u0136\\u0139\\u013B\\u013D\\u013F\\u0141\\u0143\\u0145\\u0147\\u014A\\u014C\\u014E\\u0150\\u0152\\u0154\\u0156\\u0158\\u015A\\u015C\\u015E\\u0160\\u0162\\u0164\\u0166\\u0168\\u016A\\u016C\\u016E\\u0170\\u0172\\u0174\\u0176\\u0178\\u0179\\u017B\\u017D\\u0181\\u0182\\u0184\\u0186\\u0187\\u0189-\\u018B\\u018E-\\u0191\\u0193\\u0194\\u0196-\\u0198\\u019C\\u019D\\u019F\\u01A0\\u01A2\\u01A4\\u01A6\\u01A7\\u01A9\\u01AC\\u01AE\\u01AF\\u01B1-\\u01B3\\u01B5\\u01B7\\u01B8\\u01BC\\u01C4\\u01C7\\u01CA\\u01CD\\u01CF\\u01D1\\u01D3\\u01D5\\u01D7\\u01D9\\u01DB\\u01DE\\u01E0\\u01E2\\u01E4\\u01E6\\u01E8\\u01EA\\u01EC\\u01EE\\u01F1\\u01F4\\u01F6-\\u01F8\\u01FA\\u01FC\\u01FE\\u0200\\u0202\\u0204\\u0206\\u0208\\u020A\\u020C\\u020E\\u0210\\u0212\\u0214\\u0216\\u0218\\u021A\\u021C\\u021E\\u0220\\u0222\\u0224\\u0226\\u0228\\u022A\\u022C\\u022E\\u0230\\u0232\\u023A\\u023B\\u023D\\u023E\\u0241\\u0243-\\u0246\\u0248\\u024A\\u024C\\u024E]{1,10}";
The \u... part matches any letters from the ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİĲĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸŹŻŽƁƂƄƆƇƉƊƋƎƏƐƑƓƔƖƗƘƜƝƟƠƢƤƦƧƩƬƮƯƱƲƳƵƷƸƼǄǇǊǍǏǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮǱǴǶǷǸǺǼǾȀȂȄȆȈȊȌȎȐȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺȻȽȾɁɃɄɅɆɈɊɌɎ set.
The {1,10} limiting quantifier matches one to ten occurrences. You may adjust it further.

Slight modification of #Wiktor's comment that I think is easier to read:
^[^\P{Lu}\P{Latin}]{0,10}$
should match a string of a max of 10 uppercase Latin (inc. extended) characters. Using a negation class to find 10 characters that are not not uppercase nor not Latin. It does match such beautiful and definitely not cursed strings as ĦꜴꝎꞂꜨⱠƎƢƔ.

Related

Regular Expressions - A word with only one capitalized letter and which doesn't contain numbers

I am new to RegExp. I have a sentence and I would like to pull out a word which satisfies the following -
It must contain only one capitalized letter
It must consist of only characters/letters without numbers
For instance -
"appLe", "warDrobe", "hUsh"
The words that do not fit - "sf_dsfsdF", "331ffsF", "Leopard1997", "mister_Ram" et cetera.
How would you resolve this problem?

The following regex should work:
will find words that have only one capital letter
will only find words with letters (no numbers or special characters)
will match the entire word
\b(?=[A-Z])[A-Z][a-z]*\b|\b(?=[a-z])[a-z]+[A-Z][a-z]*\b
Matches:
appLe
hUsh
Harry
suSan
I
Rejects
HarrY - has TWO capital letters
warDrobeD - has TWO capital letters
sf_dsfsdF - has SPECIAL characters
331ffsF - has NUMBERS
Leopd1997 - has NUMBERS
mistram - does not have a CAPITAL LETTER
See it in action here
Note:
If the capital letter is OPTIONAL- then you will need to add a ? after each [A-Z] like this:
\b(?=[A-Z])[A-Z]?[a-z]*\b|\b(?=[a-z])[a-z]+[A-Z]?[a-z]*\b

You can do this by using character sets ([a-z] & [A-Z]) with appropriate quantifiers (use ? for one or zero capitals), wrapped in () to capture, surrounded by word breaks \b.
If the capital is optional and can appear anywhere use:
/\b([a-z]*[A-Z]?[a-z]*)\b/ //will still match empty string check for length
If you always want one capital appearing anywhere use:
/\b([a-z]*[A-Z][a-z]*)\b/ // does not match empty string
If you always want one capital that must not be the first or last character use:
/\b([a-z]+[A-Z][a-z]+)\b/ // does not match empty string
Here is a working snippet demonstrating the second regex from above in JavaScript:
const exp = /\b([a-z]*[A-Z][a-z]*)\b/
const strings = ["appLe", "warDrobe", "hUsh", "sf_dsfsdF", "331ffsF", "Leopard1997", "mister_Ram", ""];
for (const str of strings) {
console.log(str, exp.test(str))
}
Regex101 is great for dev & testing!

RegExp:
/\b[a-z\-]*[A-Z][a-z\-]*\b/g
Demo:
RegEx101
Explanation
Segment
Description
\b[a-z\-]*
Find a point where a non-word is adjacent to a word ([A-Za-z0-9\-] or \w), then match zero or more lowercase letters and hyphens (note, the hyphen needs to be escaped (\-))
[A-Z]
Find a single uppercase letter
[a-z\-]*\b
Match zero or more lowercase letters and hyphens, then find a point where a non-word is adjacent to a word

Regex How can I find names only written in upper-case letters(mandatory), and the names may also contain numbers, literal spaces, and hyphens

If I want to find "KFC", "EU 8RF", and IK-OTP simultaneously, what should the code look like?
My code is :
db.business.find({name:/^[A-Z\s?\d?\-?]*$/}, {name:1}).sort({"name":1})
but it will return the name that is whole number, such as 1973, 1999. How should I improve my code? TIA

Use a lookahead to require at least one letter.
^(?=.*[A-Z])[A-Z\d\s-]+$
DEMO

You can use
^(?=[\d -]*[A-Z])[A-Z\d]+(?:[ -][A-Z\d]+)*$
See the regex demo.
Details:
^ - start of string
(?=[\d -]*[A-Z]) - a positive lookahead that requires an uppercase ASCII letter after any zero or more digits, spaces or hyphens immediately to the right of the current location
[A-Z\d]+ - one or more uppercase ASCII letters or digits
(?:[ -][A-Z\d]+)* - zero or more repetitions of a space or - and then one or more uppercase ASCII letters or digits
$ - end of string.

Negating a complex regex containing three parts

I need a regex which is matched when the string doesn't have both lowercase and uppercase letters.
If the string has only lowercase letters -> should be matched
If the string has only uppercase letters -> should be matched
If the string has only digits or special characters -> should be matched
For example
abc, ABC, 123, abc123, ABC123&^ - should match
AbC, A12b, AB^%12c - should not match
Basically I need an inverse/negation of the following regex:
^(?=.*[a-z])(?=.*[A-Z]).+$

Does not sound like any lookarounds would be needed.
Either match only characters that are not a-z or only characters, that are not A-Z.
^(?:[^a-z]+|[^A-Z]+)$
See this demo at regex101 (used + for one or more)

You may use
^(?!.*[A-Z].*[a-z])(?!.*[a-z].*[A-Z])\S+$
Or
^(?=(?:[^a-z]+|[^A-Z]+)$).*$
See the regex demo #1 and regex demo #2
A lookaround solution like this can be used in more complex scenarios, when you need to apply more restrictions on the pattern. Else, consider a non-lookaround solution.
Details
^ - start of string
(?!.*[A-Z].*[a-z]) - no uppercase followed with a lowercase letter
(?!.*[a-z].*[A-Z]) - no lowercase letter followed with an uppercase one
(?=(?:[^a-z]+|[^A-Z]+)$) - a positive lookahead that requires 1 or more characters other than lowercase ASCII letters ([^a-z]+) to the end of the string, or 1 or more characters other than uppercase ASCII letters ([^A-Z]+) to the end of the string
.+ - 1+ chars other than line break chars
$ - end of string.

You can use this regex
^(([A-Z0-9?&%^](?![a-z]))+|([a-z0-9?&%^](?![A-Z]))+)$
You can test more cases here.
I've only added the characcter ?&%^ as possible character, but you could add which ever you like.

I would go with:
^(?:[^a-z]+?|[^A-Z]+?)$
It translates to "If the entire string is composed of non-lowercase letters or non-uppercase letters then match the string."
Lazy quantifiers +? are used so that the end-string $ anchor is obeyed when the multiline flag is enabled. If you're only validating a single-line string the you can simply use + without the question mark.
If you have a whitelist of specific allowed special chars then change [^A-Z] into [A-Z0-9()_+=-] and list the allowed special chars.
https://regex101.com/r/Wg6tLn/1

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?

You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.

This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.

If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!

Regex. Find all words with non latin characters

How can I find all words with at least one non latin letter (arabic, chinese...) in them using regex.h library?
cityدبي

How about:
(?=\pL)(?![a-zA-Z])
This will match a letter in any alphabet that is not a latin letter:
not ok - cityدبي
ok - city
not ok - دبي

Try this :
[a-zA-Z]*[^A-Za-z \d]+[a-zA-Z]*
Means : One or more non latin letter preceded or followed by one or more latin letter i.e. a word containing atleast 1 non latin character.
See demo with some random text:
http://regexr.com?326s3
You may need to adjust this regex to your needs,and include things like digits,special characters,word boundaries as per your input.

just use [^a-zA-Z]
if not match, it should contain an international character...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex allow only Uppercase Extended ASCII - regex

Related

Regular Expressions - A word with only one capitalized letter and which doesn't contain numbers

Regex How can I find names only written in upper-case letters(mandatory), and the names may also contain numbers, literal spaces, and hyphens

Negating a complex regex containing three parts

REGEX to find the first one or two capitalized words in a string

Regex. Find all words with non latin characters

Categories

Resources