Require all characters of a range [duplicate] - regex

I need a Regex to match any word that contains letters: m+a+h+d together in any order
so, Mohamed, Hamada and Mahmoud matches, but hammer don't match
I tried do the following (I'm new to the Regex!):
Regex reg=new Regex("[mahd]");
But obviously it is not the correct pattern

When you want to match some substrings in any order, you either use alternation where all possible variations are enumerated, or use anchored lookaheads.
In this case, I'd suggest using positive lookaheads that will ensure both free order of the letters in a word and their obligatory presence in the word matched.
Use
(?i)\b(?=\w*m)(?=\w*a)(?=\w*h)(?=\w*d)\w+
See the regex demo (NOTE: You may replace \w with \p{L} to only match letters).
Details:
(?i) - case insensitive mode on
\b - a leading word boundary
(?=\w*m) - after 0+ word chars (i.e. letters, digits or underscores), there must be m
(?=\w*a) - after 0+ word chars, there must be a
(?=\w*h) - after 0+ word chars, there must be h
(?=\w*d) - after 0+ word chars, there must be d
\w+ - 1 or more letters, digits or underscores (you may replace with \p{L} to only match letters).
C# demo:
var str = "Mohamed, Hamada and Mahmoud match, but not hammer";
var letters = "mahd";
var pat = string.Format(#"\b{0}\w+\b", string.Join("", letters.Select(s => string.Format(#"(?=\w*{0})", s))));
var result = Regex.Matches(str, pat, RegexOptions.IgnoreCase)
.Cast<Match>()
.Select(match => match.Value)
.ToList();
Console.WriteLine(String.Join("\n", result)); // Demo line

Related

Regex match pattern, space and character

^([a-zA-Z0-9_-]+)$ matches:
BAP-78810
BAP-148080
But does not match:
B8241066 C
Q2111999 A
Q2111999 B
How can I modify regex pattern to match any space and/or special character?
For the example data, you can write the pattern as:
^[a-zA-Z0-9_-]+(?: [A-Z])?$
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ chars listed in the character class
(?: [A-Z])? Optionally match a space and a char A-Z
$ End of string
Regex demo
Or a more exact match:
^[A-Z]+-?\d+(?: [A-Z])?$
^ Start of string
[A-Z]+-? Match 1+ chars A-Z and optional -
\d+(?: [A-Z])? Matchh 1+ digits and optional space and char A-Z
$ End of string
Regex demo
Whenever you want to match something that can either be a space or a special character, you would use the dot symbol .. Your regex pattern would then be modified to:
^([a-zA-Z0-9_-])+.$
This will match the empty space, or any other character. If you want to match the example provided, where strictly one alphabetical, numer character will follow the space, you could include \w such that:
^([a-zA-Z0-9_-])+.\w$
Note that \w is equivalent to [A-Za-z0-9_]
Further, be careful when you use . as it makes your pattern less specific and therefore more likely to false positives.
I suggest using this approach
^[A-Z][A-Z\d -]{6,}$
The first character must be an uppercase letter, followed by at least 6 uppercase letters, digits, spaces or -.
I removed the group because there was only one group and it was the entire regex.
You can also use \w - which includes A-Z,a-z and 0-9, as well as _ (underscore). To make it case-insensitive, without explicitly adding a-z or using \w, you can use a flag - often an i.

pcre2 - Regex match to distinct letters only

I am trying to create a regex that matches the following criteria below
first letter is uppercase
remaining 5 letters following the first letter are lowercase
ends in ".com"
no letter repeats only before the ".com"
no digits
there are only 5 lowercase letters before the .com, with only the first letter being uppercase
The above criteria should match to strings such as:
Amipsa.com
Ipsamo.com
I created this regex below, but the regex seem to capture repeating letters - examples here: https://regex101.com/r/wwJBmc/1
^([A-Z])(?![a-z]*\1)(?:([a-z])\1(?!\2)(?:([a-z])(?![a-z]*\3)){3}|(?:([a-z])(?![a-z]*\4)){5})\.com$
Would appreciate any insight.
You may use this regex in PCRE with a negative lookahead:
^(?!(?i)[a-z]*([a-z])[a-z]*\1)[A-Z][a-z]{5}\.com$
Updated RegEx Demo
RegEx Details:
^: Start
(?!: Start negative lookahead
(?i): Enable ignore case modifier
[a-z]*: Match 0 or more letters
([a-z]): Match a letter and capture in group #1
[a-z]*: Match 0 or more letters
\1: Match same letter as in capture group #1
): End negative lookahead
[A-Z][a-z]{5}: Match 5 lowercase letters
\.com: Match .com
$: End
You might use
^(?i)[a-z]*?([a-z])[a-z]*?\1(?-i)(*SKIP)(*F)|[A-Z][a-z]{5}\.com$
Regex demo
Explanation
^ Start of string
(?i) Case insensitive match
[a-z]*?([a-z])[a-z]*?\1 Match 2 of the same chars a-z A-Z
(?-i) Turn of case insenstive
(*SKIP)(*F) Skip the match
[A-Z][a-z]{5} Match A-Z and 5 chars a-z
\.com Match .com
$ End of string
Another idea if you are using Javascript and you can not use (?i) is to use 2 patterns, 1 for checking not a repeated character with a case insensitive flag /i and 1 for the full match.
const rFullMatch = /^[A-Z][a-z]{5}\.com$/;
const rRepeatedChar = /^[a-z]*([a-z])[a-z]*\1/i;
[
"Regxas.com",
"Ipsamo.com",
"Plpaso.com",
"Amipsa.com",
"Ipsama.com",
"Ipsima.com",
"IPszma.com",
"Ipsamo&.com",
"abcdef.com"
].forEach(s => {
if (!rRepeatedChar.test(s) && rFullMatch.test(s)) {
console.log(`Match: ${s}`);
}
});

Regex to match the letter string group between 2 numbers

Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);
You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.
To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)
You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo

How to write a regex in title case

I'm working with an SAP application called information steward and creating a rule where names will have to be in title case (ie each word is capitalized).
I've formulated the following rule:
BEGIN
IF(match_regex($name, '(^(\b[A-Z]\w*\s*)+$)', null)) RETURN TRUE;
ELSE RETURN FALSE;
END
Although it is successful it appears to accept inputs which should be identified as 'FALSE'. Please see the attached screenshot.
'TesT Name' and 'TEST NAME' should be FALSE but are instead passing under this regex.
Any help/guidance with the regex would be very useful.
The (^(\b[A-Z]\w*\s*)+$) regex presents a pattern that matches a string that fully matches:
^ - start of string
(\b[A-Z]\w*\s*)+ - 1 or more occurrences (due to (...)+) of
\b - a word boundary
[A-Z] - an uppercase ASCII letter
\w* - 0 or more letters/digits/underscores
\s* - 0+ whitespaces
$ - end of string.
As you see, it allows trailing whitespace, and \w matches what [A-Za-z0-9_] matches, i.e. it matches both lower- and uppercase letters.
You want to only match lowercase letters after initial uppercase ones, also allowing - and _ chars. You may use
^[A-Z][a-z0-9_-]*(\s+[A-Z][a-z0-9_-]*)*$
See the regex demo.
Details
^ - start of string anchor
[A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
(\s+[A-Z][a-z0-9_-]*)* - zero or more occurrences of:
\s+ - 1 or more whitespaces
[A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
$ - end of string.
I would write your regex as:
^[A-Z]\w*(?:\s+[A-Z]\w*)*$
This says to match a single word starting with a capital letter, then followed by one or more spaces and another word starting with a capital, this quantity zero or more times.
I phrase a matching word as starting with [A-Z] followed by \w*, meaning zero or more word characters. This allows for things like A to match.
Demo
Edit:
Based on the comments above, if you want some other character class to represent what follows the initial uppercase letter, then do that instead:
^[A-Z][something]*(?:\s+[A-Z][something]*)*$
where [something] is your character class.

Regex: Match all the words that contains some word

I want to match all the words that contains the word "oana". I put "OANA" with uppercase letters in some words, at the beginning, middle, and at the end of words.
blah OANAmama blah aOANAtata aOANAt msmsmsOANAasfasfa mOANAmsmf OANAtata OANA3 oanTy
Anyway, I made a regex, but it is not very good, because it doesn't select all words that contains "oana"
\b\w+(oana)\w+\b
Can anyone give me another solution?
You need to use a case insensitive flag and replace + with *:
/\b\w*oana\w*\b/i
See the regex demo (a global modifier may or may not be used, depending on the regex engine). The case insensitive modifier may be passed as an inline option in some regex engines - (?i)\b\w*oana\w*\b.
Here,
\b - a word boundary
\w* - 0+ word chars
oana - the required char string inside a word
\w* - 0+ word chars
\b - a word boundary