VBA regexp to check special symbols - regex

I have tried to use what I've learnt in this post,
and now I want to compose a RegExp which checks whether a string contains digits and commas. For example, "1,2,55,2" should be ok, whereas "a,2,55,2" or "1.2,55,2" should fail test. My code:
Private Function testRegExp(str, pattern) As Boolean
Dim regEx As New RegExp
If pattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If regEx.Test(str) Then
testRegExp = True
Else
testRegExp = False
End If
Else
testRegExp = True
End If
End Function
Public Sub foo()
MsgBox testRegExp("2.d", "[0-9]+")
End Sub
MsgBox yields true instead of false. What's the problem ?

Your regex matches a partial string, it matches a digit in all 55,2, a,2,55,2, 1.2,55,2 input strings.
Use anchors ^ and $ to enforce a full string match and add a comma to the character class as you say you want to match strings that only contain digits and commas:
MsgBox testRegExp("2.d", "^[0-9,]*$")
^ ^ ^
I also suggest using * quantifier to match 0 or more occurrences, rather than + (1 or more occurrences), but it is something you need to decide for yourself (whether you want to allow an empty string match or not).
Here is the regex demo. Note it is for PCRE regex flavor, but this regex will perform similarly in VBA.
Yes, as #Chaz suggests, if you do not need to match the string/line itself, the alternative is to match an inverse character class:
MsgBox testRegExp("2.d", "[^0-9,]")
This way, the negated character class [^0-9,] will match any character but a comma / digit, invalidating the string. If the result is True, it will mean the string contains some characters other than digits and a comma.

You can use the limited built in pattern matching for that:
function isOk(str) As boolean
for i = 1 To len(str)
if Mid$(str, i, 1) Like "[!0-9,]" then exit function
next
g = True and Len(str) > 0
end function

Related

How to write a regular expression that includes numbers [0-9] and a defined word?

I'm using VBA and struggling to make a regex.replace function to clean my string cells
Example: "Foo World 4563"
What I want: "World"
by replacing the numbers and the word "Foo"
Another example: "Hello World 435 Foo", I want "Hello World"
This is what my code looks like so far:
Public Function Replacement(sInput) As String
Dim regex As New RegExp
With regex
.Global = True
.IgnoreCase = True
End With
regex.Pattern = "[0-9,()/-]+\bfoo\b"
Replacement = regex.Replace(sInput, "")
End Function
You can use
Function Replacement(sInput) As String
Dim regex As New regExp
With regex
.Global = True
.IgnoreCase = True
End With
regex.Pattern = "\s*(?:\bfoo\b|\d+)"
Replacement = Trim(regex.Replace(sInput, ""))
End Function
See the regex demo. Excel test:
Details:
\s* - zero or more whitespaces
(?:\bfoo\b|\d+) - either a whole word foo or one or more digits.
Note the use of Trim(), it is necessary to remove leading/trailing spaces that may remain after the replacement.
My two cents, capturing preceding whitespace chars when present trying to prevent possible false positives:
(^|\s+)(?:foo|\d+)(?=\s+|$)
See an online demo.
(^|\s+) - 1st Capture group to assert position is preceded by whitespace or directly at start of string;
(?:foo|\d+) - Non-capture group with the alternation between digits or 'foo';
(?=\s+|$) - Positive lookahead to assert position is followed by whitespace or end-line anchor.
Sub Test()
Dim arr As Variant: arr = Array("Foo World 4563", "Hello World 435 Foo", "There is a 99% chance of false positives which is foo-bar!")
For Each el In arr
Debug.Print Replacement(el)
Next
End Sub
Public Function Replacement(sInput) As String
With CreateObject("vbscript.regexp")
.Global = True
.IgnoreCase = True
.Pattern = "(^|\s+)(?:foo|\d+)(?=\s+|$)"
Replacement = Application.Trim(.Replace(sInput, "$1"))
End With
End Function
Print:
World
Hello World
There is a 99% chance of false positives which is foo-bar!
Here Application.Trim() does take care of multiple whitespace chars left inside your string.

Extract an 8 digits number from a string with additional conditions

I need to extract a number from a string with several conditions.
It has to start with 1-9, not with 0, and it will have 8 digits. Like 23242526 or 65478932
There will be either an empty space or a text variable before it. Like MMX: 23242526 or bgr65478932
It could have come in rare cases: 23,242,526
It ends with an emty space or a text variable.
Here are several examples:
From RE: Markitwire: 120432889: Mx: 24,693,059 i need to get 24693059
From Automatic reply: Auftrag zur Übertragung IRD Ref-Nr. MMX_23497152 need to get 23497152
From FW: CGMSE 2019-2X A1AN XS2022418672 Contract 24663537 need to get 24663537
From RE: BBVA-MAD MMX_24644644 + MMX_24644645 need to get 24644644, 24644645
Right now I'm using the regexextract function(found it on this web-site), which extracts any number with 8 digits starting with 2. However it would also extract a number from, let's say, this expression TGF00023242526, which is incorrect. Moreover, I don't know how to add additional conditions to the code.
=RegexExtract(A11, ""(2\d{7})\b"", ", ")
Thank you in advance.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional seperator As String = "") As String
Dim i As Long, j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
For j = 0 To allMatches.Item(i).SubMatches.Count - 1
result = result & seperator & allMatches.Item(i).SubMatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(seperator))
End If
RegexExtract = result
End Function
You may create a custom boundary using a non-capturing group before the pattern you have:
(?:[\D0]|^)(2\d{7})\b
^^^^^^^^^^^
The (?:[\D0]|^) part matches either a non-digit (\D) or 0 or (|) start of string (^).
As an alternative to also match 8 digits in values like 23,242,526 and start with a digit 1-9 you might use
\b[1-9](?:,?\d){7}\b
\b Word boundary
[1-9] Match the firstdigit 1-9
(?:,?\d){7} Repeat 7 times matching an optional comma and a digit
\b Word boundary
Regex demo
Then you could afterwards replace the comma's with an empty string.

How to make regexp for multiple condition?

I have regexp code like below (I'm using VerbalExpression dart plugin ), My purpose is to check that a string starts with "36", followed by "01", "02", or "03". After that can be anything as long as the whole string is 16 characters long.
var regex = VerbalExpression()
..startOfLine()
..then("36")
..then("01")
..or("02")
..anythingBut(" ")
..endOfLine();
String nik1 = "3601999999999999";
String nik2 = "3602999999999999";
String nik3 = "3603999999999999";
print('result : ${regex.hasMatch(nik1)}');
print('Hasil : ${regex.hasMatch(nik2)}');
print('Hasil : ${regex.hasMatch(nik3)}');
my code only true for nik1 and nik2, however i want true for nik3, I noticed that i can't put or() after or() for multiple check, it just give me all false result, how do i achieve that?
I'm not familiar with VerbalExpression, but a RegExp that does this is straightforward enough.
const pattern = r'^36(01|02|03)\S{12}$';
void main() {
final regex = RegExp(pattern);
print(regex.hasMatch('3601999999999999')); // true
print(regex.hasMatch('3602999999999999')); // true
print(regex.hasMatch('3603999999999999')); // true
print(regex.hasMatch('360199999999999')); // false
print(regex.hasMatch('3600999999999999')); // false
print(regex.hasMatch('36019999999999999')); // false
}
Pattern explanation:
The r prefix means dart will interpret it as a raw string ("$" and "\" are not treated as special).
The ^ and $ represent the beginning and end of the string, so it will only match the whole string and cannot find matches from just part of the string.
(01|02|03) "01" or "02" or "03". | means OR. Wrapping it in parentheses lets it know where to stop the OR.
\S matches any non-whitespace character.
{12} means the previous thing must be repeated 12 times, so \S{12} means any 12 non-whitespace characters.

Extracting Parenthetical Data Using Regex

I have a small sub that extracts parenthetical data (including parentheses) from a string and stores it in cells adjacent to the string:
Sub parens()
Dim s As String, i As Long
Dim c As Collection
Set c = New Collection
s = ActiveCell.Value
ary = Split(s, ")")
For i = LBound(ary) To UBound(ary) - 1
bry = Split(ary(i), "(")
c.Add "(" & bry(1) & ")"
Next i
For i = 1 To c.Count
ActiveCell.Offset(0, i).NumberFormat = "#"
ActiveCell.Offset(0, i).Value = c.Item(i)
Next i
End Sub
For example:
I am now trying to replace this with some Regex code. I am NOT a regex expert. I want to create a pattern that looks for an open parenthesis followed by zero or more characters of any type followed by a close parenthesis.
I came up with:
\((.+?)\)
My current new code is:
Sub qwerty2()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = ActiveCell.Value
MsgBox inpt
Set regex = New RegExp
regex.Pattern = "\((.+?)\)"
Set MColl = regex.Execute(inpt)
MsgBox MColl.Count
temp2 = MColl(0).Value
MsgBox temp2
End Sub
The code has at least two problems:
It will only get the first match in the string.(Mcoll.Count is always 1)
It will not recognize zero characters between the parentheses. (I think the .+? requires at least one character)
Does anyone have any suggestions ??
By default, RegExp Global property is False. You need to set it to True.
As for the regex, to match zero or more chars as few as possible, you need *?, not +?. Note that both are lazy (match as few as necessary to find a valid match), but + requires at least one char, while * allows matching zero chars (an empty string).
Thus, use
Set regex = New RegExp
regex.Global = True
regex.Pattern = "\((.*?)\)"
As for the regex, you can also use
regex.Pattern = "\(([^()]*)\)"
where [^()] is a negated character class matching any char but ( and ), zero or more times (due to * quantifier), matching as many such chars as possible (* is a greedy quantifier).

regular expressions, delimiting plus sign

Private Const SEPARATOR_REG_EXP1 As String = "SCD\+4\+[A-Z]\+"
Public Function TestReg() As Boolean
Dim s1 As String = "SCD+4+ADJUSTMENT+"
Dim match As Match = Regex.Match(s1, SEPARATOR_REG_EXP1)
If match.Success Then
Return True
Else : Return False
End If
End Function
Not sure why this does not match - haven't really used regular expressions much.
The regex pattern should be :
"SCD\+4\+[A-Z]+\+"
You have to add a + sign after [A-Z], because you want to match one or multiple of these [A-Z] characters.
This does not match, because [A-Z]matches only a single character of the given character class. You can use the + quantifier to match multiple chars. The resulting RegEx would be
SCD\+4\+[A-Z]+\+