Search the VBA source code with RegEx - regex

I need to find all the occurrences of a particular RegEx in my source code (i.e. col*r ). I realized you can programmatically search through your code for patterns (RegEx) if you use the VBComponents.CodeModule.Find() method as it's explained in here and here. But that does not meet my needs as it only tells you whether such expression is found or not. I need the actual expression found in the module as well (e.g. colour and color).
Is there any way to achieve this programmatically within VBA?

Dim re, match
Set re = CreateObject("vbscript.regexp")
re.Pattern = "your regex"
re.Global = True
For Each match In re.Execute("you input")
MsgBox match.Value
Next
for more information check this link:http://msdn.microsoft.com/en-us/library/ms974570.aspx

Related

RegEx specific numeric pattern in Excel VBS

I do not have much RegEx experience and need advice to create a specific Pattern in Excel VBA.
The Pattern I want to match on to validate a Userform field is: nnnnnn.nnn.nn where n is a 0-9 digit.
My code looks like this but Reg.Test always returns false.
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "/d/d/d/d/d/d\./d/d/d\./d/d"
End With
If RegEx.Test(txtProjectNumber.Value) = False Then
txtProjectNumber.SetFocus
bolAllDataOK = False
End If
Try this. You need to match the whole contents of the textbox (I assume) so use anchors (^ and $).
Your slashes were the wrong way round. Also you can use quantifiers to simplify the pattern.
Private Sub CommandButton1_Click()
Dim RegEx As Object, bolAllDataOK As Boolean
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "^\d{6}\.\d{3}\.\d{2}$"
End With
If Not RegEx.Test(txtProjectNumber.Value) Then
txtProjectNumber.SetFocus
bolAllDataOK = False
End If
End Sub
VBA got it's own build-in alternative called Like operator. So besides the fact you made an error with forward slashes instead of backslashes (as #SJR rightfully mentioned), you should have a look at something like:
If txtProjectNumber.Value Like "######.###.##" Then
Where # stands for any single digit (0–9). Though not as versatile as using regular expressions, it seems to do the trick for you. That way you won't have to use any external reference nor extra object.

VBA to VB.NET - Regex - System.Text.RegularExpressions - with no global modifier

I am trying to migrate a lib of regular expressions (utilities) from VBA to VB.NET, as (my general impression is that) it offers more support to obtain "clean" and re-usable code (including Regex support).
The library is a factory pattern to reuse compiled regex'es (for performance optimization purposes; not sure at which extend the option RegexOptions.Compiled can help it). It is used in combination with a Lib that holds records of patterns (utilities) and returns an object; which, besides the pattern includes also the modifiers (as properties).
However, the RegEx object of System.Text.RegularExpressions does not have a clean system to specify flags / modifiers...
' VBA
Dim oRegExp As New RegExp
With oRegExp
.Pattern = Pattern
.IgnoreCase = IgnoreCase
.Multiline = Multiline
.Global = MatchGlobal
End With
Versus
' VB.NET
Dim opts As RegexOptions = New RegexOptions
If IgnoreCase Then opts = opts Or RegexOptions.IgnoreCase
If Multiline Then opts = opts Or RegexOptions.Multiline
Dim oRegExp As RegEx
oRegExp = New RegEx(Pattern, opts)
'Were can I specify MatchGlobal???
As I do not see this as an improvement to this part of the code, I will rely on applying inline modifiers instead (these here) (directly embedded to the Pattern itself), and get rid of the object of the library of patterns that includes the modifiers as properties (not included in the examples).
That way...
' This -> "\bpre([^\r\n]+)\b"
' in .NET, can be this -> "\bpre(?<word>\w*)\b"
' as .NET supports named groups
Dim Pattern as String = "(?i)\bpre(?<word>\w*)\b" ' case insensitive
The only problem is that, as shown at the VB.NET example above, the RegEx object of the namespace System.Text.RegularExpressions seems not to allow you changing the global match modifier (and inline modifiers, logically, do not include the global match flag).
Any idea on how to deal with it?
There is no support for a global regex option as this behavior is implemented via two different methods.
To only get the first (one) match use Regex.Match:
Searches the specified input string for the first occurrence of the regular expression specified in the Regex constructor.
To match all occurrences, use Regex.Matches:
Searches an input string for all occurrences of a regular expression and returns all the matches.
You need to implement the logic: if all matches are expected, trigger Regex.Matches, if only one, use Regex.Match.

Excluding Portion Of RegEx From Results

I have a very large text file that has multiple instances of "CLM*[NUMBER I WANT]*". I have been able to use regex to mostly obtain this thanks to another user on this site, but the results I'm getting are displaying the CLM* portion, when I really just want the number. You can see the relevant code below.
Dim strClaimData As String = ""
Dim strClaimNumber As String = ClaimLoadedGetCLM(strClaimData)
Public Function ClaimLoadedGetCLM(ByVal ediString As String) As String
Dim regex As New Regex("CLM\*(\d*?\*??\d*)")
Dim ClaimMatches As MatchCollection = regex.Matches(strClaimData)
For Each strClaimData As Match In ClaimMatches
lstClaimLoaded837Data.Items.Add(strClaimData.Value)
Next
End Function
I've tried a few things I've found online, such as appending a \K or \2, but I just get compile errors if I do that.
https://regex101.com/r/jH9eJ7/1
That shows what I want as "Match 1, Group 1", but I can't figure out how to get to it. I thought appending /1 would work, but that only returned CLM* with no number.
Any help would be greatly appreciated.
What you want to do is wrap the CLM\* part in a possitive lookbehind assertion:
(?<=CLM\*)
What this does asserts that (\d*\.?\d*) is preceded by CLM*, but doesn't include CLM* in the match.
https://regex101.com/r/jH9eJ7/3
You can tell it to use the captured group.
Replace:
strClaimData.Value
with:
strClaimData.Groups[1].Value
in your for loop.

How to get the content of parentheses but not the parentheses themselves

I have this kind of text 1323-DI-004 (2013-07-16).pdf and I want to have the date placed in parentheses. I tried with the regex (\(.*)\). It give this (2013-07-16). I want to have the same result but without parenthses.
This is for a VBA code.
Is it possible and how to do it?
Edit: you're using VBA, so
Dim myMatches As MatchCollection
Set myRegExp = New RegExp
myRegExp.Pattern = "\((.*)\)"
Set myMatches = myRegExp.Execute(subjectString)
MsgBox(myMatches(1).Value) 'I think this be your reference? You may need to iterate myMatches to get the right one
Assuming this is a fully PCRE compliant matching platform (PHP, PERL, etc. -- not javascript), use lookarounds to achieve this, matching the () on either side without including them in the capture:
(?<=\()(.*)(?=\))
See it in action: http://regex101.com/r/oI3gD6
If you're using javascript, this won't work, however you can use \((.*)\) and retrieve the first capture group, which will be what's inside the ().

RegEx : replace all Url-s that are not anchored

I'm trying to replace Urls contained inside a HTML code block the users post into an old web-app with proper anchors (<A>) for those Urls.
The problem is that Urls can be already 'anchored', that is contained in <A> elements. Those Url should not be replaced.
Example:
http://noreplace.com <- do not replace
<u>http://noreplace.com</u> <- do not replace
...http://replace.com <- replace
What would the regex to match only 'not anchored Urls' look like?
I use the following function to replace with RegEx:
Function ReplaceRegExp(strString, strPattern, strReplace)
Dim RE: Set RE = New RegExp
With RE
.Pattern = strPattern
.IgnoreCase = True
.Global = True
ReplaceRegExp = .Replace(strString, strReplace)
End With
End Function
The following non greedy regex is used to format UBB URLs. Can this regex be adapted to match only the ones I need?
' the double doublequote in the brackets is because
' double doublequoting is ASP escaping for doublequotes
strString = ReplaceRegExp(strString, "\[URL=[""]?(http|ftp|https)(:\/\/[\w\-_]+)((\.[\w\-_]+)+)([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?[""]?\](.*?)\[/URL\]", "$6")
If this really cannot be done with RegEx, what would be the solution in ASP Classic, with some code or pseudocode please? However I would really like to keep code simple with an additional regex line than add additional functions to this old code.
Thanks for your effort!
Seems like regular expressions are too complex to use for this kind of job so I went to my rusty VBScript skills and coded a function that first removes anchors and then replaces the URLs.
Here it is if somebody may need it:
Function Linkify(Text)
Dim regEx, Match, Matches, patternURLs, patternAnchors, lCount, anchorCount, replacements
patternURLs = "((http|ftp|https)(:\/\/[\w\-_]+)((\.[\w\-_]+)+)([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)"
patternAnchors = "<a[^>]*?>.*?</a>"
Set replacements=Server.CreateObject("Scripting.Dictionary")
' Create the regular expression.
Set regEx = New RegExp
regEx.Pattern = patternAnchors
regEx.IgnoreCase = True
regEx.Global = True
' Do the search for anchors.
Set Matches = regEx.Execute(Text)
lCount = 0
' Iterate through the existing anchors and replace with a placeholder
For Each Match in Matches
key = "<#" & lCount & "#>"
replacements.Add key, Match.Value
Text = Replace(Text,Cstr(Match.Value),key)
lCount = lCount+1
Next
anchorCount = lCount
' we now search for URls
regEx.Pattern = patternURLs
' create anchors from URLs
Text = regEx.Replace(Text, "$1")
' put back the originally existing anchors
For lCount = 0 To anchorCount-1
key = "<#" & lCount & "#>"
Text = Replace(Text,key, replacements.Item(key))
Next
Linkify = Text
End Function
The answer you're looking for is in negative and positive look aheads and look behinds
This article gives a pretty good overview: http://www.regular-expressions.info/lookaround.html
Here's the Regular Expression I've formulated for your case:
(?<!"|>)(ht|f)tps?://.*?(?=\s|$)
Here's some sample data I matched against:
#Matches
http://www.website.com
https://www.website.com
This is a link http://www.website.com that is not linked
This is a long link http://www.website.com/index.htm?foo=bar
ftp://www.website.com
#No Matches
<u>http://www.website.com</u>
http://website.com
http://website.com
<u>http://www.website.com</u>
ftp://www.website.com
Here's a breakdown of what the regular expression is doing:
(?<!"|>)
A negative look behind, making sure what matches next isn't preceded by a " or >
(ht|f)tps?://.*?
This looks for http, https, or ftp and anything following it. It'll also match ftps! If you want to avoid this, you could use (https?|ftp)://.*? instead
(?=\s|$)
This is a positive look ahead, which matches a space or end of line.
EXTRA CREDIT
(ht)?(?(1)tps?|ftp)://
This will match http/https/ftp but not ftps, this may be a bit overkill when you can use (https?|ftp):// but it's an awesome example of if/else in regex.
Some design issues you're going to have to work around:
Embedded URLs could be absolute or relative and may not include the protocol.
Your HTML may not have quotes around attribute values.
The character right after a URL may also be a valid URL character.
There are lots of valid URL characters these days.
If you can assume (1) absolute URLs with protocols and (2) quoted HTML attributes and (3) people will have whitespace after a URL and (4) you're sticking with supporting only basic URL characters, you can just look for URLs not preceded by a double-quote.
Here's an overly-simple example to start with (untested):
(?<!")((http|https|ftp)://[^\s<>])(?=\s|$) replaced with $1
The [^\s<>] part above is ridiculously greedy, so all of the fun will be in tweaking that to build a character set that fits the URLs your users are typing in. Your example shows a much more involved character class with \w plus a hodge-podge of other allowed characters, so you could start there if you want.