regular expressions, delimiting plus sign - regex

Private Const SEPARATOR_REG_EXP1 As String = "SCD\+4\+[A-Z]\+"
Public Function TestReg() As Boolean
Dim s1 As String = "SCD+4+ADJUSTMENT+"
Dim match As Match = Regex.Match(s1, SEPARATOR_REG_EXP1)
If match.Success Then
Return True
Else : Return False
End If
End Function
Not sure why this does not match - haven't really used regular expressions much.

The regex pattern should be :
"SCD\+4\+[A-Z]+\+"
You have to add a + sign after [A-Z], because you want to match one or multiple of these [A-Z] characters.

This does not match, because [A-Z]matches only a single character of the given character class. You can use the + quantifier to match multiple chars. The resulting RegEx would be
SCD\+4\+[A-Z]+\+

Related

Why does the regex [a-zA-Z]{5} return true for non-matching string?

I defined a regular expression to check if the string only contains alphabetic characters and with length 5:
use regex::Regex;
fn main() {
let re = Regex::new("[a-zA-Z]{5}").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
The text I use contains many illegal characters and is longer than 5 characters, so why does this return true?
You have to put it inside ^...$ to match the whole string and not just parts:
use regex::Regex;
fn main() {
let re = Regex::new("^[a-zA-Z]{5}$").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
Playground.
As explained in the docs:
Notice the use of the ^ and $ anchors. In this crate, every expression is executed with an implicit .*? at the beginning and end, which allows it to match anywhere in the text. Anchors can be used to ensure that the full text matches an expression.
Your pattern returns true because it matches any consecutive 5 alpha chars, in your case it matches both 'shouldn't' and 'return'.
Change your regex to: ^[a-zA-Z]{5}$
^ start of string
[a-zA-Z]{5} matches 5 alpha chars
$ end of string
This will match a string only if the string has a length of 5 chars and all of the chars from start to end fall in range a-z and A-Z.

How to build regex to match values if they exists

I have a requirement to match the complete string if some part of value exists or not
For example :- Here are the list of strings that should be matched
en.key.value
fr.key.value
es.key.value
pt.key.value
key.value
So, length of string before first . can only be >=2.
Below are some values which should not be accepted
.key.value
z.key.value
Could someone please help ?
Thanks in advance
^[^.]{2,}\..+$
Matches
en.key.value
fr.key.value
es.key.value
pt.key.value
key.value
Does not match
.key.value
z.key.value
See yourself: Regexr.com
You could use the following regex : /[a-z]{2,}\.[a-z]+\.[a-z]+/g
[a-z]{2,} matches 2 or more repetitions of characters in the range between a and z.
\. matches the dot character.
[a-z]+ matches 1 or more repetitions of characters between a and z.
let regex = /[a-z]{2,}\.[a-z]+\.[a-z]+/g;
console.log(regex.test("fr.key.value"));
console.log(regex.test("z.key.value"));
Regex101.
You don't need to use regular expressions. You can split the string on the dots and check the length of the first part.
String[] strings = {"en.key.value",
"fr.key.value",
"es.key.value",
"pt.key.value",
"key.value",
".key.value",
"z.key.value"};
for (String string : strings) {
String[] parts = string.split("\\.");
System.out.printf("[%b] %s%n", (parts[0].length() >= 2), string);
}
Above code produces following output.
[true] en.key.value
[true] fr.key.value
[true] es.key.value
[true] pt.key.value
[true] key.value
[false] .key.value
[false] z.key.value
However, if you insist on using regular expressions, consider the following.
String[] strings = {"en.key.value",
"fr.key.value",
"es.key.value",
"pt.key.value",
"key.value",
".key.value",
"z.key.value"};
Pattern pattern = Pattern.compile("^[a-z]{2,}\\.");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
System.out.printf("[%b] %s%n", matcher.find(), string);
}
Explanation of regular expression ^[a-z]{2,}\\.
^ start of string
[a-z] any lower-case letter of the English alphabet
{2,} two or more occurrences of the preceding
\\. literal dot
In other words, the above pattern matches strings that start with two or more lower-case characters followed by a single dot.

How to make regexp for multiple condition?

I have regexp code like below (I'm using VerbalExpression dart plugin ), My purpose is to check that a string starts with "36", followed by "01", "02", or "03". After that can be anything as long as the whole string is 16 characters long.
var regex = VerbalExpression()
..startOfLine()
..then("36")
..then("01")
..or("02")
..anythingBut(" ")
..endOfLine();
String nik1 = "3601999999999999";
String nik2 = "3602999999999999";
String nik3 = "3603999999999999";
print('result : ${regex.hasMatch(nik1)}');
print('Hasil : ${regex.hasMatch(nik2)}');
print('Hasil : ${regex.hasMatch(nik3)}');
my code only true for nik1 and nik2, however i want true for nik3, I noticed that i can't put or() after or() for multiple check, it just give me all false result, how do i achieve that?
I'm not familiar with VerbalExpression, but a RegExp that does this is straightforward enough.
const pattern = r'^36(01|02|03)\S{12}$';
void main() {
final regex = RegExp(pattern);
print(regex.hasMatch('3601999999999999')); // true
print(regex.hasMatch('3602999999999999')); // true
print(regex.hasMatch('3603999999999999')); // true
print(regex.hasMatch('360199999999999')); // false
print(regex.hasMatch('3600999999999999')); // false
print(regex.hasMatch('36019999999999999')); // false
}
Pattern explanation:
The r prefix means dart will interpret it as a raw string ("$" and "\" are not treated as special).
The ^ and $ represent the beginning and end of the string, so it will only match the whole string and cannot find matches from just part of the string.
(01|02|03) "01" or "02" or "03". | means OR. Wrapping it in parentheses lets it know where to stop the OR.
\S matches any non-whitespace character.
{12} means the previous thing must be repeated 12 times, so \S{12} means any 12 non-whitespace characters.

VBA regexp to check special symbols

I have tried to use what I've learnt in this post,
and now I want to compose a RegExp which checks whether a string contains digits and commas. For example, "1,2,55,2" should be ok, whereas "a,2,55,2" or "1.2,55,2" should fail test. My code:
Private Function testRegExp(str, pattern) As Boolean
Dim regEx As New RegExp
If pattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If regEx.Test(str) Then
testRegExp = True
Else
testRegExp = False
End If
Else
testRegExp = True
End If
End Function
Public Sub foo()
MsgBox testRegExp("2.d", "[0-9]+")
End Sub
MsgBox yields true instead of false. What's the problem ?
Your regex matches a partial string, it matches a digit in all 55,2, a,2,55,2, 1.2,55,2 input strings.
Use anchors ^ and $ to enforce a full string match and add a comma to the character class as you say you want to match strings that only contain digits and commas:
MsgBox testRegExp("2.d", "^[0-9,]*$")
^ ^ ^
I also suggest using * quantifier to match 0 or more occurrences, rather than + (1 or more occurrences), but it is something you need to decide for yourself (whether you want to allow an empty string match or not).
Here is the regex demo. Note it is for PCRE regex flavor, but this regex will perform similarly in VBA.
Yes, as #Chaz suggests, if you do not need to match the string/line itself, the alternative is to match an inverse character class:
MsgBox testRegExp("2.d", "[^0-9,]")
This way, the negated character class [^0-9,] will match any character but a comma / digit, invalidating the string. If the result is True, it will mean the string contains some characters other than digits and a comma.
You can use the limited built in pattern matching for that:
function isOk(str) As boolean
for i = 1 To len(str)
if Mid$(str, i, 1) Like "[!0-9,]" then exit function
next
g = True and Len(str) > 0
end function

How to validate a string to have only certain letters by perl and regex

I am looking for a perl regex which will validate a string containing only the letters ACGT. For example "AACGGGTTA" should be valid while "AAYYGGTTA" should be invalid, since the second string has "YY" which is not one of A,C,G,T letters. I have the following code, but it validates both the above strings
if($userinput =~/[A|C|G|T]/i)
{
$validEntry = 1;
print "Valid\n";
}
Thanks
Use a character class, and make sure you check the whole string by using the start of string token, \A, and end of string token, \z.
You should also use * or + to indicate how many characters you want to match -- * means "zero or more" and + means "one or more."
Thus, the regex below is saying "between the start and the end of the (case insensitive) string, there should be one or more of the following characters only: a, c, g, t"
if($userinput =~ /\A[acgt]+\z/i)
{
$validEntry = 1;
print "Valid\n";
}
Using the character-counting tr operator:
if( $userinput !~ tr/ACGT//c )
{
$validEntry = 1;
print "Valid\n";
}
tr/characterset// counts how many characters in the string are in characterset; with the /c flag, it counts how many are not in the characterset. Using !~ instead of =~ negates the result, so it will be true if there are no characters not in characterset or false if there are characters not in characterset.
Your character class [A|C|G|T] contains |. | does not stand for alternation in a character class, it only stands for itself. Therefore, the character class would include the | character, which is not what you want.
Your pattern is not anchored. The pattern /[ACGT]+/ would match any string that contains one or more of any of those characters. Instead, you need to anchor your pattern, so that only strings that contain just those characters from beginning to end are matched.
$ can match a newline. To avoid that, use \z to anchor at the end. \A anchors at the beginning (although it doesn't make a difference whether you use that or ^ in this case, using \A provides a nice symmetry.
So, you check should be written:
if ($userinput =~ /\A [ACGT]+ \z/ix)
{
$validEntry = 1;
print "Valid\n";
}