Regex alphanumeric on paragraph - regex

I am trying to create a regex that should only match the alphanumeric character having length of 11 in the paragraph as provided in the example. The problem is that it also selects the string containing alphabets only.
My regex and input data can be seen here.
Sample text:
RCLO DD 12-10-15 IAD RO N2905198759 PTD 12-08-15 SWC
CRO N2905198759 FCD 12-07-15 WOT 12-0
MCN 999LDCMCWCG PROJECT 309097-2 VER 04 OCO TSR BSRNCA70M00
WORK DESCRIPTION AND NOTES: CCO TSR BSRNCA70M00
MANUALLY
DIVERSER CIRCUITS SEE RPON, 9152 IRMK AAI DWGVILAZW02 IRMK ALCON IDR INFORMATION U
PDATED ON THE DESIGN AT HFESILWL AND EGVGILEG
The pattern is
\b([A-Z0-9]{11})\b
In the above example it should not select "DESCRIPTION" and "INFORMATION"

You may use
\b(?=[A-Z]*[0-9])(?=[0-9]*[A-Z])[A-Z0-9]{11}\b
See the regex demo
Details
\b - word boundary
(?=[A-Z]*[0-9]) - after 0+ uppercase ASCII letters, there must be 1 ASCII digit
(?=[0-9]*[A-Z]) - after 0+ ASCII digits, there must be 1 uppercase ASCII letter
[A-Z0-9]{11} - 11 uppercase ASCII letters or digits
\b - a trailing word boundary.

Related

Regex allow only Uppercase Extended ASCII

I need a regex to allow only Uppercase Extended ASCII characters of a maxLength I set before that it's the maximum length of the word.
Regex for uppercase letters: \P{Ll}*
Regex for extended ASCII letters: [\x00-\xFF]*
Using ^[\p{Ll}] it's not enough because I need characters to be extended ASCII(to not allow emoji or other special characters outrange ASCII extended).
How can I combine that 2 requirements ? And length of maxLength.
Thank you!!
Generally, you can use
^(?:(?=\p{Lu})\p{Latin}){1,10}$
See the regex demo. Details:
^ - start of string
(?: - start of a non-capturing group:
(?=\p{Lu})\p{Latin} - a char from Latin Unicode category class that is an uppercase letter
){1,10} - end of the group, repeat one to ten occurrences
$ - end of string.
Since you are using the regex in a DevExpress masked input component you need to enumerate all these letters in a character class. Based on Regex Latin characters filter and non latin character filer, you need
Latin-1 Supplement U+0080 - U+00FF
Latin Extended-A U+0100 - U+017F
Latin Extended-B U+0180 - U+024F
All chars that are uppercase letters in these three ranges are the ones you want to allow:
var res = []
for (var i=128; i<=591; i++) { // Get chars from \u0080 to \u024F
if (/^\p{Lu}$/u.test(String.fromCharCode(i))) { // If it is an uppercase letter
res.push(String.fromCharCode(i)); // Add it to the results
}
}
console.log(res.join(""))
The code will look like
settings.MaskExpression = "[\\u00C0-\\u00D6\\u00D8-\\u00DE\\u0100\\u0102\\u0104\\u0106\\u0108\\u010A\\u010C\\u010E\\u0110\\u0112\\u0114\\u0116\\u0118\\u011A\\u011C\\u011E\\u0120\\u0122\\u0124\\u0126\\u0128\\u012A\\u012C\\u012E\\u0130\\u0132\\u0134\\u0136\\u0139\\u013B\\u013D\\u013F\\u0141\\u0143\\u0145\\u0147\\u014A\\u014C\\u014E\\u0150\\u0152\\u0154\\u0156\\u0158\\u015A\\u015C\\u015E\\u0160\\u0162\\u0164\\u0166\\u0168\\u016A\\u016C\\u016E\\u0170\\u0172\\u0174\\u0176\\u0178\\u0179\\u017B\\u017D\\u0181\\u0182\\u0184\\u0186\\u0187\\u0189-\\u018B\\u018E-\\u0191\\u0193\\u0194\\u0196-\\u0198\\u019C\\u019D\\u019F\\u01A0\\u01A2\\u01A4\\u01A6\\u01A7\\u01A9\\u01AC\\u01AE\\u01AF\\u01B1-\\u01B3\\u01B5\\u01B7\\u01B8\\u01BC\\u01C4\\u01C7\\u01CA\\u01CD\\u01CF\\u01D1\\u01D3\\u01D5\\u01D7\\u01D9\\u01DB\\u01DE\\u01E0\\u01E2\\u01E4\\u01E6\\u01E8\\u01EA\\u01EC\\u01EE\\u01F1\\u01F4\\u01F6-\\u01F8\\u01FA\\u01FC\\u01FE\\u0200\\u0202\\u0204\\u0206\\u0208\\u020A\\u020C\\u020E\\u0210\\u0212\\u0214\\u0216\\u0218\\u021A\\u021C\\u021E\\u0220\\u0222\\u0224\\u0226\\u0228\\u022A\\u022C\\u022E\\u0230\\u0232\\u023A\\u023B\\u023D\\u023E\\u0241\\u0243-\\u0246\\u0248\\u024A\\u024C\\u024E]{1,10}";
The \u... part matches any letters from the ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸŹŻŽƁƂƄƆƇƉƊƋƎƏƐƑƓƔƖƗƘƜƝƟƠƢƤƦƧƩƬƮƯƱƲƳƵƷƸƼDŽLJNJǍǏǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮDZǴǶǷǸǺǼǾȀȂȄȆȈȊȌȎȐȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺȻȽȾɁɃɄɅɆɈɊɌɎ set.
The {1,10} limiting quantifier matches one to ten occurrences. You may adjust it further.
Slight modification of #Wiktor's comment that I think is easier to read:
^[^\P{Lu}\P{Latin}]{0,10}$
should match a string of a max of 10 uppercase Latin (inc. extended) characters. Using a negation class to find 10 characters that are not not uppercase nor not Latin. It does match such beautiful and definitely not cursed strings as ĦꜴꝎꞂꜨⱠƎƢƔ.

Negating a complex regex containing three parts

I need a regex which is matched when the string doesn't have both lowercase and uppercase letters.
If the string has only lowercase letters -> should be matched
If the string has only uppercase letters -> should be matched
If the string has only digits or special characters -> should be matched
For example
abc, ABC, 123, abc123, ABC123&^ - should match
AbC, A12b, AB^%12c - should not match
Basically I need an inverse/negation of the following regex:
^(?=.*[a-z])(?=.*[A-Z]).+$
Does not sound like any lookarounds would be needed.
Either match only characters that are not a-z or only characters, that are not A-Z.
^(?:[^a-z]+|[^A-Z]+)$
See this demo at regex101 (used + for one or more)
You may use
^(?!.*[A-Z].*[a-z])(?!.*[a-z].*[A-Z])\S+$
Or
^(?=(?:[^a-z]+|[^A-Z]+)$).*$
See the regex demo #1 and regex demo #2
A lookaround solution like this can be used in more complex scenarios, when you need to apply more restrictions on the pattern. Else, consider a non-lookaround solution.
Details
^ - start of string
(?!.*[A-Z].*[a-z]) - no uppercase followed with a lowercase letter
(?!.*[a-z].*[A-Z]) - no lowercase letter followed with an uppercase one
(?=(?:[^a-z]+|[^A-Z]+)$) - a positive lookahead that requires 1 or more characters other than lowercase ASCII letters ([^a-z]+) to the end of the string, or 1 or more characters other than uppercase ASCII letters ([^A-Z]+) to the end of the string
.+ - 1+ chars other than line break chars
$ - end of string.
You can use this regex
^(([A-Z0-9?&%^](?![a-z]))+|([a-z0-9?&%^](?![A-Z]))+)$
You can test more cases here.
I've only added the characcter ?&%^ as possible character, but you could add which ever you like.
I would go with:
^(?:[^a-z]+?|[^A-Z]+?)$
It translates to "If the entire string is composed of non-lowercase letters or non-uppercase letters then match the string."
Lazy quantifiers +? are used so that the end-string $ anchor is obeyed when the multiline flag is enabled. If you're only validating a single-line string the you can simply use + without the question mark.
If you have a whitelist of specific allowed special chars then change [^A-Z] into [A-Z0-9()_+=-] and list the allowed special chars.
https://regex101.com/r/Wg6tLn/1

Regex for alphanumeric word and should not be like RUN123456

I want to apply regex on a string to get alphanumeric value and the value should not start with the RUN substring followed with any digit, e.g. RUN123456.
Below is the regex I am using to get alphanumeric value
regex='[A-Z]{2,}[_0-9a-zA-Z]*'
Sample Input:
CY0PNI94980 Production AutoSys Job has failed. Call 249-3344. EC=54. RUN130990.
The matches can include CY0PNI94980 and EC, but not RUN130990.
Kindly help me on this.
You may match the strings matching your pattern excluding all those starting with RUN and a digit:
\b(?!RUN[0-9])[A-Z]{2,}[_0-9a-zA-Z]*
See the regex demo
If you do not care if you match Unicode letters or digits or not, you may contract [A-Za-z0-9_] with \w and use
\b(?!RUN[0-9])[A-Z]{2,}\w*
Details
\b - a word boundary
(?!RUN[0-9]) - a negative lookahead that fails the match if there is RUN and any ASCII digit immediately to the right of the current location
[A-Z]{2,} - 2 or more uppercase ASCII letters
[_0-9a-zA-Z]* / \w* - 0 or more word chars (letters/digits/_).

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?
You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.
This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.
If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!

Match all type of numbers

I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.
I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.