How to build regex to match values if they exists - regex

I have a requirement to match the complete string if some part of value exists or not
For example :- Here are the list of strings that should be matched
en.key.value
fr.key.value
es.key.value
pt.key.value
key.value
So, length of string before first . can only be >=2.
Below are some values which should not be accepted
.key.value
z.key.value
Could someone please help ?
Thanks in advance

^[^.]{2,}\..+$
Matches
en.key.value
fr.key.value
es.key.value
pt.key.value
key.value
Does not match
.key.value
z.key.value
See yourself: Regexr.com

You could use the following regex : /[a-z]{2,}\.[a-z]+\.[a-z]+/g
[a-z]{2,} matches 2 or more repetitions of characters in the range between a and z.
\. matches the dot character.
[a-z]+ matches 1 or more repetitions of characters between a and z.
let regex = /[a-z]{2,}\.[a-z]+\.[a-z]+/g;
console.log(regex.test("fr.key.value"));
console.log(regex.test("z.key.value"));
Regex101.

You don't need to use regular expressions. You can split the string on the dots and check the length of the first part.
String[] strings = {"en.key.value",
"fr.key.value",
"es.key.value",
"pt.key.value",
"key.value",
".key.value",
"z.key.value"};
for (String string : strings) {
String[] parts = string.split("\\.");
System.out.printf("[%b] %s%n", (parts[0].length() >= 2), string);
}
Above code produces following output.
[true] en.key.value
[true] fr.key.value
[true] es.key.value
[true] pt.key.value
[true] key.value
[false] .key.value
[false] z.key.value
However, if you insist on using regular expressions, consider the following.
String[] strings = {"en.key.value",
"fr.key.value",
"es.key.value",
"pt.key.value",
"key.value",
".key.value",
"z.key.value"};
Pattern pattern = Pattern.compile("^[a-z]{2,}\\.");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
System.out.printf("[%b] %s%n", matcher.find(), string);
}
Explanation of regular expression ^[a-z]{2,}\\.
^ start of string
[a-z] any lower-case letter of the English alphabet
{2,} two or more occurrences of the preceding
\\. literal dot
In other words, the above pattern matches strings that start with two or more lower-case characters followed by a single dot.

Related

Regex: Last Occurrence of a Repeating Character

So, I am looking for a Regex that is able to match with every maximal non-empty substring of consonants followed by a maximal non-empty substring of vowels in a String
e.g. In the following strings, you can see all expected matches:
"zcdbadaerfe" = {"zcdba", "dae", "rfe"}
"foubsyudba" = {"fou", "bsyu", "dba"}
I am very close! This is the regex I have managed to come up with so far:
([^aeiou].*?[aeiou])+
It returns the expected matches except for it only returns the first of any repeating lengths of vowels, for example:
String: "cccaaabbee"
Expected Matches: {"cccaaa", "bbee"}
Actual Matches: {"ccca", "bbe"}
I want to figure out how I can include the last found vowel character that comes before (a) a constant or (b) the end of the string.
Thanks! :-)
Your pattern is slightly off. I suggest using this version:
[b-df-hj-np-tv-z]+[aeiou]+
This pattern says to match:
[b-df-hj-np-tv-z]+ a lowercase non vowel, one or more times
[aeiou]+ followed by a lowercase vowel, one or more times
Here is a working demo.
const rgx = /[^aeiou]+[aeiou]+(?=[^aeiou])|.*[aeiou](?=\b)/g;
Segment
Description
[^aeiou]+
one or more of anything BUT vowels
[aeiou]+
one or more vowels
(?=[^aeiou])
will be a match if it is followed by anything BUT a vowel
|
OR
.*[aeiou](?=\b)
zero or more of any character followed by a vowel and it needs to be followed by a non-word
function lastVowel(str) {
const rgx = /[^aeiou]+[aeiou]+(?=[^aeiou])|.*[aeiou](?=\b)/g;
return [...str.matchAll(rgx)].flat();
}
const str1 = "cccaaabbee";
const str2 = "zcdbadaerfe";
const str3 = "foubsyudba";
console.log(lastVowel(str1));
console.log(lastVowel(str2));
console.log(lastVowel(str3));

Why does the regex [a-zA-Z]{5} return true for non-matching string?

I defined a regular expression to check if the string only contains alphabetic characters and with length 5:
use regex::Regex;
fn main() {
let re = Regex::new("[a-zA-Z]{5}").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
The text I use contains many illegal characters and is longer than 5 characters, so why does this return true?
You have to put it inside ^...$ to match the whole string and not just parts:
use regex::Regex;
fn main() {
let re = Regex::new("^[a-zA-Z]{5}$").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
Playground.
As explained in the docs:
Notice the use of the ^ and $ anchors. In this crate, every expression is executed with an implicit .*? at the beginning and end, which allows it to match anywhere in the text. Anchors can be used to ensure that the full text matches an expression.
Your pattern returns true because it matches any consecutive 5 alpha chars, in your case it matches both 'shouldn't' and 'return'.
Change your regex to: ^[a-zA-Z]{5}$
^ start of string
[a-zA-Z]{5} matches 5 alpha chars
$ end of string
This will match a string only if the string has a length of 5 chars and all of the chars from start to end fall in range a-z and A-Z.

regular expressions, delimiting plus sign

Private Const SEPARATOR_REG_EXP1 As String = "SCD\+4\+[A-Z]\+"
Public Function TestReg() As Boolean
Dim s1 As String = "SCD+4+ADJUSTMENT+"
Dim match As Match = Regex.Match(s1, SEPARATOR_REG_EXP1)
If match.Success Then
Return True
Else : Return False
End If
End Function
Not sure why this does not match - haven't really used regular expressions much.
The regex pattern should be :
"SCD\+4\+[A-Z]+\+"
You have to add a + sign after [A-Z], because you want to match one or multiple of these [A-Z] characters.
This does not match, because [A-Z]matches only a single character of the given character class. You can use the + quantifier to match multiple chars. The resulting RegEx would be
SCD\+4\+[A-Z]+\+

Ignore String containing special words (Months)

I am trying to find alphanumeric strings by using the following regular expression:
^(?=.*\d)(?=.*[a-zA-Z]).{3,90}$
Alphanumeric string: an alphanumeric string is any string that contains at least a number and a letter plus any other special characters it can be # - _ [] () {} ç _ \ ù %
I want to add an extra constraint to ignore all alphanumerical strings containing the following month formats :
JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
One solution is to actually match an alphanumerical string. Then check if this string contains one of these names by using the following function:
vector<string> findString(string s)
{
vector<string> vec;
boost::regex rgx("JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
");
boost::smatch match;
boost::sregex_iterator begin {s.begin(), s.end(), rgx},
end {};
for (boost::sregex_iterator& i = begin; i != end; ++i)
{
boost::smatch m = *i;
vec.push_back(m.str());
}
return vec;
}
Question: How can I add this constraint directly into the regular expression instead of using this function.
One solution is to use negative lookahead as mentioned in How to ignore words in string using Regular Expressions.
I used it as follows:
String : 2-hello-001
Regular expression : ^(?=.*\d)(?=.*[a-zA-Z]^(?!Jan|Feb|Mar)).{3,90}$
Result: no match
Test website: http://regexlib.com/
The edit provided by #Robin and #RyanCarlson : ^[][\w#_(){}ç\\ù%-]{3,90}$ works perfectly in detecting alphanumeric strings with special characters. It's just the negative lookahead part that isn't working.
You can use negative look ahead, the same way you're using positive lookahead:
(?=.*\d)(?=.*[a-zA-Z])
(?!.*(?:JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)).{3,90}$
Also you regex is pretty unclear. If you want alphanumerical strings with a length between 3 and 90, you can just do:
/^(?!.*(?:JANVIER|F[Eé]VRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AO[Uù]T|SEPTEMBRE|OCTOBRE|NOVEMBRE|D[Eé]CEMBRE|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
[][\w#_(){}ç\\ù%-]{3,90}$/i
the i flag means it will match upper and lower case (so you can reduce your forbidden list), \w is a shortcut for [0-9a-zA-Z_] (careful if you copy-paste, there's a linebreak here for readability between (?! ) and [ ]). Just add in the final [...] whatever special characters you wanna match.

Url rewriting a Regex help

What Regex do I need for match this url:
Match:
1234
1234/
1234/article-name
Don't match:
1234absd
1234absd/article-name
1234/article.aspx
1234/any.dot.in.the.url
You can try:
^\d+(?:\/[\w-]*)?$
This matches a non-empty sequence of digits at the beginning of the string, followed by an optional suffix of a / and a (possibly empty) sequence of word characters (letters, digits, underscore) and a -.
This matches (see on rubular):
1234
1234/
1234/article-name
42/section_13
But not:
1234absd
1234absd/article-name
1234/article.aspx
1234/any.dot.in.the.url
007/james/bond
No parenthesis regex
You shouldn't need to do this, but if you can't use parenthesis at all, you can always expand to alternation:
^\d+$|^\d+\/$|^\d+\/[\w-]*$
^\d+(/?)|(/[a-zA-Z-]+)$
That may work. or not. Hope it helps
Hope this ll help u ........
string data = "1234/article-name";
Regex Constant = new Regex("(?<NUMBERS>([0-9]+))?(//)?(?<DATA>([a-zA-Z-]*))?");
MatchCollection mc;
mc = Constant.Matches(data,0);
if (mc.Count>0)
{
for (int l_nIndex = 0; l_nIndex < mc.Count; l_nIndex++)
{
string l_strNum = mc[l_nIndex].Groups["NUMBERS"].Value;
string l_strData = mc[l_nIndex].Groups["DATA"].Value;
}
}