VBA Regex issue - regex

has VBA any good mechanism for checking, if the content of a given Excel Cell matches a specific regex?
In my case i want to know, if some cell has the format
m
m2
m1234
In fact, there's just one defined letter at the beginning, followed by a not specified amount of numbers.
How do I put this into an If-Else construct?
If Doc.Cells(1,1).Value ..... ???
greets, poeschlorn

You can get at the VBScript RegExp objects via Tools->References & adding "Microsoft VBScript Regular Expressions 5.5"
Alternatively a quick way to do it, if you don't need to check for a subsequent letter as in `m1234X1 is:
if Doc.Cells(1,1).Value like "[a-zA-Z]#*" then ...
(This doesn't require a reference to anything)

I don't know VBA, but the regex [a-zA-Z][0-9]* might be able to match what you want.

Here is my RegexContains function. Pass it the cell and the pattern and it will return TRUE or FALSE if it contains it or not.
Function RegexContains(ByVal find_in As String, _
ByVal find_what As String, _
Optional IgnoreCase As Boolean = False) As Boolean
Application.ScreenUpdating = False
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = find_what
RE.IgnoreCase = IgnoreCase
RE.Global = True
RegexContains = RE.Test(find_in)
Application.ScreenUpdating = True
End Function
Now, I'm not sure exactly what you want to find in your example, but if you want to know if the cell contains a single letter followed by one or more letters, then you would use (assuming the cell is A1): =RegexContains(A1, "^\w\d+")
The ^ marks the start of the sentence
The \w marks a single alphabetic
character (a-zA-Z)
The \d+ marks one or more numeric
characters[0-9]
I hope this helps.

Related

Regular expression to match page number groups

I need a regular expression to match page numbers as found in common programs.
These usually take the form 1-5,3,5,1-9 for example.
I have a regular expression (\d+-\d+)?,(\d+-\d+?)* which I need help to refine.
As can be seen here regex101 I am matching commas and missing numbers entirely.
What I need is to match 1-5 as group 1, 3 as group 2, 5 as group 3 and 1-9 as group 4 without matching any commas.
Any help is appreciated. I will be using this in VBA.
This worked for me - am I missing something?
Sub Pages()
Dim re As Object, allMatches, m, rv, sep, c As Range, i As Long
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "(\d+(-\d+)?)"
re.ignorecase = True
re.MultiLine = True
re.Global = True
For Each c In Range("B5:B20").Cells 'for example
c.Offset(0, 1).Resize(1, 10).ClearContents 'clear output cells
i = 0
If re.test(c.Value) Then
Set allMatches = re.Execute(c.Value)
For Each m In allMatches
i = i + 1
c.Offset(0, i).Value = m
Next m
End If
Next c
End Sub
If I recall correctly, capturing a dynamic number of groups will not work. You can pre-specify the format / number of groups to be matched, or you can catch the repeated groups as one and split them afterwards.
If you know the format, just do
(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)
which of course is not very neat.
If you want the flexible structure, match the first group and all the rest as a second and then split the latter by the delimiter ',' in whichever language.
(\d+(?:-\d+)?)((?:(?:,)(\d+(?:-\d+)?))*)
You need to make the -\d+ part optional, since you don't always have ranges. And the comma between each range should be part of the second group with the * quantifier, so you can match a single range with no comma after it.
\d+(-\d+)?(,\d+(-\d+)?)*
This will match the string that contains all the ranges. To get an array of individual ranges without the commas, do a second match in this string:
\d+(-\d+)?
Use the VBA function for getting an array of all matches of a regexp (sorry, I don't know VBA, so can't provide the specific syntax).

Extract text using word VBA regex then save to a variable as string

I am trying to create code in Word VBA that will automatically save (as PDF) and name a document based on it's content, which is in text and not fields. Luckily the formatting is standardized and I already know how to save it. I tested my regex elsewhere to make sure it pulls what I am looking for. The trouble is I need to extract the matched statement, convert it to a string, and save it to an object (so I have something to pass on to the code where it names the document).
The part of the document I need to match is below, from the start of "Program" through the end of the line and looks like:
Program: Program Name (abr)
and the regex I worked out for this is "Program:[^\n]"
The code I have so far is below, but I don't know how to execute the regex in the active document, convert the output to a string and save to an object:
Sub RegExProgram()
Dim regEx
Dim pattern As String
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Global = False
regEx.pattern = "Program\:[^\n]"
(missing code here)
End Sub
Any ideas are welcome, and I am sorry if this is simple and I am just overlooking something obvious. This is my first VBA project, and most of the resources I can find suggest replacing using regex, not saving extracted text as string. Thank you!
Try this:
You can find documentation for the RegExp class here.
Dim regEx as Object
Dim matchCollection As Object
Dim extractedString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.IgnoreCase = True
.Global = False ' Only look for 1 match; False is actually the default.
.Pattern = "Program: ([^\r]+)" ' Word separates lines with CR (\r)
End With
' Pass the text of your document as the text to search through to regEx.Execute().
' For a quick test of this statement, pass "Program: Program Name (abr)"
set matchCollection = regEx.Execute(ActiveDocument.Content.Text)
' Extract the first submatch's (capture group's) value -
' e.g., "Program Name (abr)" - and assign it to variable extractedString.
extractedString = matchCollection(0).SubMatches(0)
I've modified your regex based on the assumption that you want to capture everything after Program: through the end of the line; your original regex would only have captured Program:<space>.
Enclosing [^\r]+ (all chars. through the end of the line) in (...) defines a so-called subexpression (a.k.a. capture group), which allows selective extraction of only the substring of interest from what the overall pattern captures.
The .Execute() method, to which you pass the string to search in, always returns a collection of matches (Match objects).
Since the .Global property is set to False in your code, the output collection has (at most) 1 entry (at index 0) in this case.
If the regular expression has subexpressions (1 in our case), then each entry of the match collection has a nonempty .SubMatches collection, with one entry for each subexpression, but note that the .SubMatches entries are strings, not Match objects.
Match objects have properties .FirstIndex, .Length, and Value (the captured string). Since the .Value property is the default property, it is sufficient to access the object itself, without needing to reference the .Value property (e.g., instead of the more verbose matchCollection(0).Value to access the captured string (in full), you can use shortcut matchCollection(0) (again, by contrast, .SubMatches entries are strings only).
If you're just looking for a string that starts with "Program:" and want to go to the end of the line from there, you don't need a regular expression:
Public Sub ReadDocument()
Dim aLine As Paragraph
Dim aLineText As String
Dim start As Long
For Each aLine In ActiveDocument.Paragraphs
aLineText = aLine.Range.Text
start = InStr(aLineText, "Program:")
If start > 0 Then
my_str = Mid(aLineText, start)
End If
Next aLine
End Sub
This reads through the document line by line, and stores your match in the variable "my_str" when it encounters a line that has the match.
Lazier version:
a = Split(ActiveDocument.Range.Text, "Program:")
If UBound(a) > 0 Then
extractedString = Trim(Split(a(1), vbCr)(0))
End If
If I remember correctly, paragraphs in Word end with vbCr ( \r not \n )

Check if string is uppercase and numbers

I want to check if a string contains only uppercase letters and numbers. I have attempted to solve this using RegExp and what I have so far is:
Function CheckForInvalidChars()
dim b
Set re = New RegExp
re.Pattern = "[A-Z_0-9]"
b = re.Test("eS")
msgbox b
End Function
However the variable "b" returns true since I guess it finds a match in the "S" although I want that particular string to return false since not all letters are uppercase. How would I go about achieving this?
I have tried to do this with functions as well using IsNumeric but can't find a IsUpperCase.
Generally speaking, if you want to match whole string using regex, you will usually end up using ^ and $ to describe the start and end of string.
Also, just [A-Z_0-9] matches a single character.
Assuming you don't allow whitespaces, ^[A-Z_0-9]*$ would be the regex you're looking for.
If UCase(s) <> s then there is at least one lower case letter in the string s.
I'd recommend to just UCase the string if you want to enforce uppercase letters. Then you can simplify the check to this:
Function CheckForInvalidChars(s)
Set re = New RegExp
re.Pattern = "^\w+$"
CheckForInvalidChars = re.Test(s)
End Function
teststring = InputBox("Input something")
teststring = UCase(teststring)
WScript.Echo "" & CheckForInvalidChars(teststring)
The escape sequence \w matches word characters, i.e. uppercase letters, lowercase letters (ruled out due to the prior UCase), digits, and underscores. The + rules out empty strings by requiring at least one word character.
#Andris is right, correct the regular expression as follows:
Function CheckForInvalidChars()
dim b
Set re = New RegExp
re.Pattern = "^[A-Z_0-9]*$"
b = re.Test("eS")
msgbox b
End Function

How to test for specific characters with regex in VBA

I need to test for a string variable to ensure it matches a specific format:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
...where x can be any alphanumerical character (a - z, 0 - 9).
I've tried the following, but it doesn't seem to work (test values constantly fail)
If val Like "^([A-Za-z0-9_]{8})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{12})" Then
MsgBox "OK"
Else
MsgBox "FAIL"
End If
.
fnCheckSubscriptionID "fdda752d-32de-474e-959e-4b5bf7574436"
Any pointers? I don't mind if this can be achieved in vba or with a formula.
You are already using the ^ beginning-of-string anchor, which is terrific. You also need the $ end-of-string anchor, otherwise in the last group of digits, the regex engine is able to match the first 12 digits of a longer group of digits (e.g. 15 digits).
I rewrote your regex in a more compact way:
^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$
Note these few tweaks:
[-]{1} can just be expressed with -
I removed the underscores as you say you only want letters and digits. If you do want underscores, instead of [A-Z0-9]{8} (for instance), you can just write \w{8} as \w matches letters, digits and underscores.
Removed the lowercase letters. If you do want to allow lowercase letters, we'll turn on case-insensitive mode in the code (see line 3 of the sample code below).
No need for (capturing groups), so removed the parentheses
We have three groups of four letters and a dash, so wrote (?:[A-Z0-9]{4}-) with a {3}
Sample code
Dim myRegExp, FoundMatch
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$"
FoundMatch = myRegExp.Test(SubjectString)
You can do this either with a regular expression, or with just native VBA. I am assuming from your code that the underscore character is also valid in the string.
To do this with native VBA, you need to build up the LIKE string since quantifiers are not included. Also using Option Compare Text makes the "like" action case insensitive.
Option Explicit
Option Compare Text
Function TestFormat(S As String) As Boolean
'Sections
Dim S1 As String, S2_4 As String, S5 As String
Dim sLike As String
With WorksheetFunction
S1 = .Rept("[A-Z0-9_]", 8)
S2_4 = .Rept("[A-Z0-9_]", 4)
S5 = .Rept("[A-Z0-9_]", 12)
sLike = S1 & .Rept("-" & S2_4, 3) & "-" & S5
End With
TestFormat = S Like sLike
End Function
With regular expressions, the pattern is simpler to build, but the execution time may be longer, and that may make a difference if you are processing very large amounts of data.
Function TestFormatRegex(S As String) As Boolean
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^\w{8}(?:-\w{4}){3}-\w{12}$"
TestFormatRegex = .test(S)
End With
End Function
Sub Test()
MsgBox fnCheckSubscriptionID("XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
End Sub
Function fnCheckSubscriptionID(strCont)
' Tools - References - add "Microsoft VBScript Regular Expressions 5.5"
With New RegExp
.Pattern = "^\w{8}-\w{4}-\w{4}-\w{4}-\w{12}$"
.Global = True
.MultiLine = True
fnCheckSubscriptionID = .Test(strCont)
End With
End Function
In case of any problems with early binding you can use late binding With CreateObject("VBScript.RegExp") instead of With New RegExp.

Check for Numeric Value in Word Macro using Regular Expressions

Is it possible to directly use regular expressions in Word macros?
I want to check if a string contains only numbers and the following code does not work.
isNumber = RegEx.IsMatch("234", "[0-9]+$")
I simply need it to return true or false.
You can create an instance of the VBScript.Regexp object within your code, but an easier solution would be to use the existing VBA function IsNumeric. Both methods are included below:
Sub testNumeric()
'Use IsNumeric
MsgBox (IsNumeric("12345")) 'Returns true
MsgBox (IsNumeric("123a")) 'Returns false
'Alternatively, using Regexp Object
Dim regexp
Set regexp = CreateObject("VBScript.Regexp")
regexp.Pattern = "[0-9]+$"
MsgBox (regexp.test("12345")) 'Returns true
MsgBox (regexp.test("123a")) 'Returns false
End Sub
Note that the regex pattern does not strictly return numbers, but also any string that ends with numbers (i.e. testing "a123" returns true). To make the regex check strictly numeric, the pattern could be ^[0-9]+$. Thanks to #mike for pointing this out.