Regex: uppercase words that don´t start with a hyphen - regex

I need to match all uppercase words that don't start with a hyphen.
There are multiple uppercase words in each line.
examples:
,BOAT -> match
BANANA, -> match
WATER -> match
-ER -> no match because of hyphen
Thanks in advance :)

I need to match all uppercase words that don't start with a hyphen.
You may use this regex:
(?<!\S)[^-A-Z\s]*[A-Z]+
RegEx Demo
RegEx Explained:
(?<!\S): Make sure we don't have a non-space before current position
[^-A-Z\s]*: Match 0 or more of any characters that are not hyphen and not uppercase letters and not whitespaces
[A-Z]+: Match 1+ uppercase letters

You can use
\b(?<!-)[A-Z]+\b
\b(?<!-)\p{Lu}+\b
See the regex demo
Details:
\b - word boundary
(?<!-) - a negative lookbehind that fails the match if there is a - immediately to the left of the current position
[A-Z]+ / \p{Lu}+ - one or more uppercase letters (\p{Lu} matches any uppercase Unicode letters)
\b - word boundary.

Related

pcre2 - Regex match to distinct letters only

I am trying to create a regex that matches the following criteria below
first letter is uppercase
remaining 5 letters following the first letter are lowercase
ends in ".com"
no letter repeats only before the ".com"
no digits
there are only 5 lowercase letters before the .com, with only the first letter being uppercase
The above criteria should match to strings such as:
Amipsa.com
Ipsamo.com
I created this regex below, but the regex seem to capture repeating letters - examples here: https://regex101.com/r/wwJBmc/1
^([A-Z])(?![a-z]*\1)(?:([a-z])\1(?!\2)(?:([a-z])(?![a-z]*\3)){3}|(?:([a-z])(?![a-z]*\4)){5})\.com$
Would appreciate any insight.
You may use this regex in PCRE with a negative lookahead:
^(?!(?i)[a-z]*([a-z])[a-z]*\1)[A-Z][a-z]{5}\.com$
Updated RegEx Demo
RegEx Details:
^: Start
(?!: Start negative lookahead
(?i): Enable ignore case modifier
[a-z]*: Match 0 or more letters
([a-z]): Match a letter and capture in group #1
[a-z]*: Match 0 or more letters
\1: Match same letter as in capture group #1
): End negative lookahead
[A-Z][a-z]{5}: Match 5 lowercase letters
\.com: Match .com
$: End
You might use
^(?i)[a-z]*?([a-z])[a-z]*?\1(?-i)(*SKIP)(*F)|[A-Z][a-z]{5}\.com$
Regex demo
Explanation
^ Start of string
(?i) Case insensitive match
[a-z]*?([a-z])[a-z]*?\1 Match 2 of the same chars a-z A-Z
(?-i) Turn of case insenstive
(*SKIP)(*F) Skip the match
[A-Z][a-z]{5} Match A-Z and 5 chars a-z
\.com Match .com
$ End of string
Another idea if you are using Javascript and you can not use (?i) is to use 2 patterns, 1 for checking not a repeated character with a case insensitive flag /i and 1 for the full match.
const rFullMatch = /^[A-Z][a-z]{5}\.com$/;
const rRepeatedChar = /^[a-z]*([a-z])[a-z]*\1/i;
[
"Regxas.com",
"Ipsamo.com",
"Plpaso.com",
"Amipsa.com",
"Ipsama.com",
"Ipsima.com",
"IPszma.com",
"Ipsamo&.com",
"abcdef.com"
].forEach(s => {
if (!rRepeatedChar.test(s) && rFullMatch.test(s)) {
console.log(`Match: ${s}`);
}
});

Regex to match the letter string group between 2 numbers

Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);
You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.
To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)
You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo

Regex command to match combinations but not only uppercase letters

Is there a regex command to match all combinations of uppercase letters, lowercase, underscore, brackets, numbers, but not only Uppercase letter words or only numbers?
I thought i had it with this one:
(/\b(?![A-Z]+\b)(?![0-9]+\b)[a-zA-Z0-9_{}]+\b/)
That was until i encountered: ABC{hello}_HI_HelLo
This is not a match, and i would like my regex to match this string.
There seem to be something with the negative lookahead since it reads "ABC" and assumes it is a Uppercase letter word only so it does not match the string, only the part after the "{" is matched.
When you add an underscore after "ABC" you get a matching string: ABC_{hello}_HI_HelLo
There is a word boundary between _ and {
You can assert a whitespace boundary to the left (?<!\S) and the right (?!\S) instead.
The pattern matches:
(?<!\S) Assert a whitespace boundary to the left
(?![A-Z]+(?!\S)) Assert not only uppercase chars followed by a whitespace boundary at the right
(?![0-9]+(?!\S)) Assert not only digits followed by a whitespace boundary at the right
[a-zA-Z0-9_{}]+ Match 1 or more occurrences of any of the listed
Regex demo

Match everything until upcase word

I want to capture a word placed before another one which is full capitalized
Mister Foo BAR is here # => "Foo"
Miss Bar-Barz FOO loves cats # => "Bar-Barz"
I've been trying the following regex: (Mister|Miss)\s([[:alpha:]\s\-]+)(?=\s[A-Z]+), but sometimes it includes the rest of the sentence. For example, it'll return Bar-Barz FOO loves cats instead of Bar-Barz).
How can I say, using RegExp, "match every words until the upcase word" ?
To clarify the usage of negative lookahead, can we say it "captures until the specified sub-pattern matches, but does not include it to the match data" ?
As a non-native English speaker, apologies if my answer isn't perfectly formulated. Thanks by advance
Match 1+ word chars optionally repeated by a - and 1+ word chars to not match only hyphens or a hyphen at the end.
Assert a space followed by 1+ uppercase chars and a word boundary at the right.
\w+(?:-\w+)*(?=\s[A-Z]+\b)
Explanation
\w+ Match 1+ word char
(?:-\w+)* Optionally repeat matching - and 1+ word chars
(?=\s[A-Z]+\b) Positive lookahead, assert what is directly at the right is 1+ uppercase chars A-Z followed by a word boundary
Regex demo
If there can not be any newlines between the words, you can use [^\S\r\n] instead of \s
\w+(?:-\w+)*(?=[^\S\r\n]+[A-Z]+\b)
Regex demo
I want to capture a word placed before another one which is full capitalized
You may use this regex with a lookahead:
\b\S+(?=[ \t]+[A-Z]+\b)
RegEx Demo
RegEx Description:
\b: Word boundadry
\S+: Match 1+ non-whitespace characters
(?=[ \t]+[A-Z]+\b): Positive lookahead that asserts we have 1+ space and then a word containing only capital letters
You don't say what language you're working in, but the following works for me. The idea is to stop when the parser hits a sequence of uppercase letters/hyphens.
JS example:
let ptn = /(Mister|Miss)\s[\w\-]+(?=\s[A-Z\-]+)/;
"Mister Foo BAR is here".match(ptn); //["Mister Foo", "Mister"]
"Miss Bar-Barz FOO loves cats".match(ptn); //["Miss Bar-Barz", "Miss"]

How to write a regex in title case

I'm working with an SAP application called information steward and creating a rule where names will have to be in title case (ie each word is capitalized).
I've formulated the following rule:
BEGIN
IF(match_regex($name, '(^(\b[A-Z]\w*\s*)+$)', null)) RETURN TRUE;
ELSE RETURN FALSE;
END
Although it is successful it appears to accept inputs which should be identified as 'FALSE'. Please see the attached screenshot.
'TesT Name' and 'TEST NAME' should be FALSE but are instead passing under this regex.
Any help/guidance with the regex would be very useful.
The (^(\b[A-Z]\w*\s*)+$) regex presents a pattern that matches a string that fully matches:
^ - start of string
(\b[A-Z]\w*\s*)+ - 1 or more occurrences (due to (...)+) of
\b - a word boundary
[A-Z] - an uppercase ASCII letter
\w* - 0 or more letters/digits/underscores
\s* - 0+ whitespaces
$ - end of string.
As you see, it allows trailing whitespace, and \w matches what [A-Za-z0-9_] matches, i.e. it matches both lower- and uppercase letters.
You want to only match lowercase letters after initial uppercase ones, also allowing - and _ chars. You may use
^[A-Z][a-z0-9_-]*(\s+[A-Z][a-z0-9_-]*)*$
See the regex demo.
Details
^ - start of string anchor
[A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
(\s+[A-Z][a-z0-9_-]*)* - zero or more occurrences of:
\s+ - 1 or more whitespaces
[A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
$ - end of string.
I would write your regex as:
^[A-Z]\w*(?:\s+[A-Z]\w*)*$
This says to match a single word starting with a capital letter, then followed by one or more spaces and another word starting with a capital, this quantity zero or more times.
I phrase a matching word as starting with [A-Z] followed by \w*, meaning zero or more word characters. This allows for things like A to match.
Demo
Edit:
Based on the comments above, if you want some other character class to represent what follows the initial uppercase letter, then do that instead:
^[A-Z][something]*(?:\s+[A-Z][something]*)*$
where [something] is your character class.