VBA to VB.NET - Regex - System.Text.RegularExpressions - with no global modifier - regex

I am trying to migrate a lib of regular expressions (utilities) from VBA to VB.NET, as (my general impression is that) it offers more support to obtain "clean" and re-usable code (including Regex support).
The library is a factory pattern to reuse compiled regex'es (for performance optimization purposes; not sure at which extend the option RegexOptions.Compiled can help it). It is used in combination with a Lib that holds records of patterns (utilities) and returns an object; which, besides the pattern includes also the modifiers (as properties).
However, the RegEx object of System.Text.RegularExpressions does not have a clean system to specify flags / modifiers...
' VBA
Dim oRegExp As New RegExp
With oRegExp
.Pattern = Pattern
.IgnoreCase = IgnoreCase
.Multiline = Multiline
.Global = MatchGlobal
End With
Versus
' VB.NET
Dim opts As RegexOptions = New RegexOptions
If IgnoreCase Then opts = opts Or RegexOptions.IgnoreCase
If Multiline Then opts = opts Or RegexOptions.Multiline
Dim oRegExp As RegEx
oRegExp = New RegEx(Pattern, opts)
'Were can I specify MatchGlobal???
As I do not see this as an improvement to this part of the code, I will rely on applying inline modifiers instead (these here) (directly embedded to the Pattern itself), and get rid of the object of the library of patterns that includes the modifiers as properties (not included in the examples).
That way...
' This -> "\bpre([^\r\n]+)\b"
' in .NET, can be this -> "\bpre(?<word>\w*)\b"
' as .NET supports named groups
Dim Pattern as String = "(?i)\bpre(?<word>\w*)\b" ' case insensitive
The only problem is that, as shown at the VB.NET example above, the RegEx object of the namespace System.Text.RegularExpressions seems not to allow you changing the global match modifier (and inline modifiers, logically, do not include the global match flag).
Any idea on how to deal with it?

There is no support for a global regex option as this behavior is implemented via two different methods.
To only get the first (one) match use Regex.Match:
Searches the specified input string for the first occurrence of the regular expression specified in the Regex constructor.
To match all occurrences, use Regex.Matches:
Searches an input string for all occurrences of a regular expression and returns all the matches.
You need to implement the logic: if all matches are expected, trigger Regex.Matches, if only one, use Regex.Match.

Related

RegEx specific numeric pattern in Excel VBS

I do not have much RegEx experience and need advice to create a specific Pattern in Excel VBA.
The Pattern I want to match on to validate a Userform field is: nnnnnn.nnn.nn where n is a 0-9 digit.
My code looks like this but Reg.Test always returns false.
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "/d/d/d/d/d/d\./d/d/d\./d/d"
End With
If RegEx.Test(txtProjectNumber.Value) = False Then
txtProjectNumber.SetFocus
bolAllDataOK = False
End If
Try this. You need to match the whole contents of the textbox (I assume) so use anchors (^ and $).
Your slashes were the wrong way round. Also you can use quantifiers to simplify the pattern.
Private Sub CommandButton1_Click()
Dim RegEx As Object, bolAllDataOK As Boolean
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "^\d{6}\.\d{3}\.\d{2}$"
End With
If Not RegEx.Test(txtProjectNumber.Value) Then
txtProjectNumber.SetFocus
bolAllDataOK = False
End If
End Sub
VBA got it's own build-in alternative called Like operator. So besides the fact you made an error with forward slashes instead of backslashes (as #SJR rightfully mentioned), you should have a look at something like:
If txtProjectNumber.Value Like "######.###.##" Then
Where # stands for any single digit (0–9). Though not as versatile as using regular expressions, it seems to do the trick for you. That way you won't have to use any external reference nor extra object.

How to write a regular expressions to populate a list with given file types & exclude certain folders

I've done a lot of searching on this but still can't quite put it all together.
I'm trying create an Excel VBA program that populates a spreadsheet based on a user inputting regular expressions so that I can process the files with other vba programs.
So for example, if I want to populate a folder with all Autodesk Inventor file types , I would use:
.*\.(iam|ipt|ipn|idw)
and from what I have read, if I want a regex to skip a file in a folder OR containing a string, I would use something like:
(?iOldVersions)
but like I mentioned, I am having trouble putting this together so that it is a single reg ex call -- and also, if there are multiple strings that I want it to skip (ie; the folders OldVersions and Legacy)
I think I would like to keep it as regex although I'm guessing I could also use wScript.Shell (or whatever that object is) but It would be nice to just get familiar with regular expressions for now.
The code I am using is the same from this post, but instead I added a parameter to pass the pattern to the top level code by pulling it from a cell in excel.
List files of certain pattern using Excel VBA
Again, any help would be greatly appreciated!
Thanks again, all!
Edit: Latest attempt....
Private Sub FindPatternMatchedFiles()
objFile = "C:\OldVersions\TestFile.iam"
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
'objRegExp.Pattern = "(.*\.(iam|ipt|ipn|idw))(?!(\z))."
objRegExp.Pattern = "(^((?!OldVersions).)*$)(.*\.(iam|ipt|ipn|idw))"
objRegExp.IgnoreCase = True
res = objRegExp.test(objFile)
MsgBox (res)
'Garbage Collection
Set objRegExp = Nothing
End Sub
To exclude matching strings having \OldVersions\ or \Legacy\, just add anchors and a negative lookahead at the start:
^(?!.*\\(?:OldVersions|Legacy)\\).*\.(?:iam|ipt|ipn|idw)$
See the regex demo
Details:
^ - start of string
(?!.*\\(?:OldVersions|Legacy)\\) - a negative lookahead failing the match if there is \ + either OldVersions or Legacy + \ after 0+ chars other than \r and \n (.*).
.* - 0+ chars other than \r and \n, as many as possible, up to the last...
\. - literal .
(?:iam|ipt|ipn|idw) - one of the alternatives in the non-capturing group
$ - end of string.

Regex match a specific string

I am trying to extract the string <Num> from within Barcode(_<Num>_).PDF using Regex. I am looking at Regular Expression Language - Quick Reference but it is not easy. Thanks for any help.
Dim pattern As String = "^Barcode(_+_)\.pdf"
If Regex.IsMatch("Barcode(_abc123_).pdf", pattern) Then
Debug.Print("match")
End If
If you are trying to not only match but also READ the value of into a variable, then you will need to call the Regex.Match method instead of simply calling the boolean isMatch method. The Match method will return a Match object that will let you get to the groups and captures from your pattern.
Your pattern would need be something like "Barcode\(_(.*)_\)\.pdf"-- note the inner parenthesis which will create a capture group for you to obtain the value of the string between the underscores.. See a MSDN docs for examples of almost exactly what you are doing.
I don't know the regex in VB, but I can offer you a website to examine the correctness of your regex: Regex Tester. In this case, if the <Num> is numbers, you can use "Barcode(_\d+_).pdf"
Just for the record, this is what I ended up using:
'set up regex
'I'm using + instead of * in the pattern to ensure that if no value is
'present the match will fail
Dim pattern As String = "Barcode\(_(.+)_\)\.pdf"
Dim r As Regex = New Regex(pattern, RegexOptions.IgnoreCase)
'get match
Dim mat As Match
mat = r.Match("Barcode(_abc123_).pdf")
'output the matched string
If mat.Success Then
Dim g As Group = mat.Groups(1)
Dim cc As CaptureCollection = g.Captures
Dim c As Capture = cc(0)
Debug.Print(c.ToString)
End If
.NET Framework Regular Expressions

How to get the content of parentheses but not the parentheses themselves

I have this kind of text 1323-DI-004 (2013-07-16).pdf and I want to have the date placed in parentheses. I tried with the regex (\(.*)\). It give this (2013-07-16). I want to have the same result but without parenthses.
This is for a VBA code.
Is it possible and how to do it?
Edit: you're using VBA, so
Dim myMatches As MatchCollection
Set myRegExp = New RegExp
myRegExp.Pattern = "\((.*)\)"
Set myMatches = myRegExp.Execute(subjectString)
MsgBox(myMatches(1).Value) 'I think this be your reference? You may need to iterate myMatches to get the right one
Assuming this is a fully PCRE compliant matching platform (PHP, PERL, etc. -- not javascript), use lookarounds to achieve this, matching the () on either side without including them in the capture:
(?<=\()(.*)(?=\))
See it in action: http://regex101.com/r/oI3gD6
If you're using javascript, this won't work, however you can use \((.*)\) and retrieve the first capture group, which will be what's inside the ().

Search the VBA source code with RegEx

I need to find all the occurrences of a particular RegEx in my source code (i.e. col*r ). I realized you can programmatically search through your code for patterns (RegEx) if you use the VBComponents.CodeModule.Find() method as it's explained in here and here. But that does not meet my needs as it only tells you whether such expression is found or not. I need the actual expression found in the module as well (e.g. colour and color).
Is there any way to achieve this programmatically within VBA?
Dim re, match
Set re = CreateObject("vbscript.regexp")
re.Pattern = "your regex"
re.Global = True
For Each match In re.Execute("you input")
MsgBox match.Value
Next
for more information check this link:http://msdn.microsoft.com/en-us/library/ms974570.aspx