pcre2 - Regex match to distinct letters only - regex

I am trying to create a regex that matches the following criteria below
first letter is uppercase
remaining 5 letters following the first letter are lowercase
ends in ".com"
no letter repeats only before the ".com"
no digits
there are only 5 lowercase letters before the .com, with only the first letter being uppercase
The above criteria should match to strings such as:
Amipsa.com
Ipsamo.com
I created this regex below, but the regex seem to capture repeating letters - examples here: https://regex101.com/r/wwJBmc/1
^([A-Z])(?![a-z]*\1)(?:([a-z])\1(?!\2)(?:([a-z])(?![a-z]*\3)){3}|(?:([a-z])(?![a-z]*\4)){5})\.com$
Would appreciate any insight.

You may use this regex in PCRE with a negative lookahead:
^(?!(?i)[a-z]*([a-z])[a-z]*\1)[A-Z][a-z]{5}\.com$
Updated RegEx Demo
RegEx Details:
^: Start
(?!: Start negative lookahead
(?i): Enable ignore case modifier
[a-z]*: Match 0 or more letters
([a-z]): Match a letter and capture in group #1
[a-z]*: Match 0 or more letters
\1: Match same letter as in capture group #1
): End negative lookahead
[A-Z][a-z]{5}: Match 5 lowercase letters
\.com: Match .com
$: End

You might use
^(?i)[a-z]*?([a-z])[a-z]*?\1(?-i)(*SKIP)(*F)|[A-Z][a-z]{5}\.com$
Regex demo
Explanation
^ Start of string
(?i) Case insensitive match
[a-z]*?([a-z])[a-z]*?\1 Match 2 of the same chars a-z A-Z
(?-i) Turn of case insenstive
(*SKIP)(*F) Skip the match
[A-Z][a-z]{5} Match A-Z and 5 chars a-z
\.com Match .com
$ End of string
Another idea if you are using Javascript and you can not use (?i) is to use 2 patterns, 1 for checking not a repeated character with a case insensitive flag /i and 1 for the full match.
const rFullMatch = /^[A-Z][a-z]{5}\.com$/;
const rRepeatedChar = /^[a-z]*([a-z])[a-z]*\1/i;
[
"Regxas.com",
"Ipsamo.com",
"Plpaso.com",
"Amipsa.com",
"Ipsama.com",
"Ipsima.com",
"IPszma.com",
"Ipsamo&.com",
"abcdef.com"
].forEach(s => {
if (!rRepeatedChar.test(s) && rFullMatch.test(s)) {
console.log(`Match: ${s}`);
}
});

Related

Regex for first eight letters and last number

Please help me compose a working regular expression.
Conditions:
There can be a maximum of 9 characters (from 1 to 9).
The first eight characters can only be uppercase letters.
The last character can only be a digit.
Examples:
Do not match:
S3
FT5
FGTU7
ERTYUOP9
ERTGHYUKM
Correspond to:
E
ERT
RTYUKL
VBNDEFRW3
I tried using the following:
^[A-Z]{1,8}\d{0,1}$
but in this case, the FT5 example matches, although it shouldn't.
You may use an alternation based regex:
^(?:[A-Z]{1,8}|[A-Z]{8}\d)$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
[A-Z]{1,8}: Match 1 to 8 uppercase letters
|: OR
[A-Z]{8}\d: Match 8 uppercase letters followed by a digit
): End non-capture group
$: End
You might also rule out the first 7 uppercase chars followed by a digit using a negative lookhead:
^(?![A-Z]{1,7}\d)[A-Z]{1,8}\d?$
^ Start of string
(?![A-Z]{1,7}\d) Negative lookahead to assert not 1-7 uppercase chars and a digit
[A-Z]{1,8} Match 1-8 times an uppercase char
\d? Match an optional digit
$ End of string
Regex demo
With a regex engine that supports possessive quantifiers, you can write:
^[A-Z]{1,7}+(?:[A-Z]\d?)?$
demo
The letter in the optional group can only succeed when the quantifier in [A-Z]{1,7}+ reaches the maximum and when a letter remains. The letter in the group can only be the 8th character.
For the .net regex engine (that doesn't support possessive quantifiers) you can write this pattern using an atomic group:
^(?>[A-Z]{1,7})(?:[A-Z]\d?)?$

Regex to match the letter string group between 2 numbers

Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);
You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.
To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)
You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo

How do I write a regex for words having alphanumeric charcters but not made of only numbers?

For a line input "Abcd abcd1a 5ever qw3-fne superb5 1234 0"
I am trying to match words having letters and numbers, like "Abcd","abcd1a","5ever", "superb5","qw3","fne". But it should not match words having only numbers, like "1234", "0".
Words are separated by all the characters other than above alphanumerics.
I tried this regex (?![0-9])([A-Za-z0-9]+) which fails to match the word "5ever" but works properly for everything else.
How do I write this regex so that it also matches the word "5ever" in full?
Option 1 - Negative lookahead
See regex in use here
\b(?!\d+\b)[^\W_]+
\b(?!\d+\b)[A-Za-z\d]+
\b(?!\d+\b)[a-z\d]+ # With case-insensitive flag enabled
\b Assert position as a word boundary
(?!\d+\b) Negative lookahead ensuring the whole word isn't made up of only digits
[^\W_]+ or [A-Za-z\d]+ Matches only letters or digits one or more times
Option 2 - Without lookahead
Another alternative as seen in use here (case-insensitive i flag enabled):
\b\d*[a-z][a-z\d]* # With case-insensitive flag enabled
\b\d*[A-Za-z][A-Za-z\d]*
\b Assert position as a word boundary
\d* Match any digit any number of times
[a-z] Match any letter (with i flag enabled this also matches A-Z)
[a-z\d]* Match any letter or digit any number of times
Matches the following from the string Abcd abcd1a 5ever qw3-fne superb5 1234 0:
Abcd
abcd1a
5ever
qw3
fne
superb5
I came up with the following regex:
/\d*[a-z_]+\w*/ig
\d* starts with possible digit(s)
[a-z_]+ contains letter or underscore in qty one and more
\w* possibly followed by any characters after that letter
ig case insensitive and global flags
DEMO with detailed explanation

regex - match pattern of alternating characters

I want to match patterns of alternating lowercase characters.
ababababa -> match
I tried this
([a-z][a-z])+[a-z]
but this would be a match too
ababxyaba
You can use this regex with 2 back-reference to match alternating lowercase letters:
^([a-z])(?!\1)([a-z])(?:\1\2)*\1?$
RegEx Demo
RegEx Breakup:
^: Start
([a-z]): Match first letter in capturing group #1
(?!\1): Lookahead to make sure we don't match same letter again
([a-z]): Match second letter in capturing group #3
(?:\1\2)*: Match zero or more pairs of first and second letter
\1?: Match optional first letter before end
$: End

match only letters after comma without numbers

im using regex to match certain text after selecting with xpath
for example Huntsville, Alabama 11111
i want only Alabama which always come after comma
and i use [^,]*$ to get text after comma
but i can't seem to find a way to exclude numbers or returns only the letters
another exmaple when i want to get the numbers after the comma i use [^[0-9],]*$
but when i tried to tweak it with anything else it only return numbers or nothing.
[?<=,\s*][a-zA-Z]+ You can try this.
Explanation:
?<= => lookbehind to match a string but not include in capture group
,\s* => match comma followed by 0 or more spaces
[a-zA-Z]+ => match letters only (one or more)
HTH
To match a letter word after the last comma, you may use
[a-zA-Z]+(?=[^,]*$)
See the regex demo.
Details
[a-zA-Z]+ - 1 or more ASCII letters
(?=[^,]*$) - followed with 0+ chars other than , up to the end of the string.
To match 1 or more words in the same context, use
[a-zA-Z]+(?:\s+[a-zA-Z]+)*(?=[^,]*$)
^^^^^^^^^^^^^^^^^
See this regex demo.
The (?:\s+[a-zA-Z]+)* part matches zero or more consequent occurrences of 1+ whitespaces and 1+ ASCII letters.