RegEx specific numeric pattern in Excel VBS - regex

I do not have much RegEx experience and need advice to create a specific Pattern in Excel VBA.
The Pattern I want to match on to validate a Userform field is: nnnnnn.nnn.nn where n is a 0-9 digit.
My code looks like this but Reg.Test always returns false.
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "/d/d/d/d/d/d\./d/d/d\./d/d"
End With
If RegEx.Test(txtProjectNumber.Value) = False Then
txtProjectNumber.SetFocus
bolAllDataOK = False
End If

Try this. You need to match the whole contents of the textbox (I assume) so use anchors (^ and $).
Your slashes were the wrong way round. Also you can use quantifiers to simplify the pattern.
Private Sub CommandButton1_Click()
Dim RegEx As Object, bolAllDataOK As Boolean
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "^\d{6}\.\d{3}\.\d{2}$"
End With
If Not RegEx.Test(txtProjectNumber.Value) Then
txtProjectNumber.SetFocus
bolAllDataOK = False
End If
End Sub

VBA got it's own build-in alternative called Like operator. So besides the fact you made an error with forward slashes instead of backslashes (as #SJR rightfully mentioned), you should have a look at something like:
If txtProjectNumber.Value Like "######.###.##" Then
Where # stands for any single digit (0–9). Though not as versatile as using regular expressions, it seems to do the trick for you. That way you won't have to use any external reference nor extra object.

Related

How to write a regular expressions to populate a list with given file types & exclude certain folders

I've done a lot of searching on this but still can't quite put it all together.
I'm trying create an Excel VBA program that populates a spreadsheet based on a user inputting regular expressions so that I can process the files with other vba programs.
So for example, if I want to populate a folder with all Autodesk Inventor file types , I would use:
.*\.(iam|ipt|ipn|idw)
and from what I have read, if I want a regex to skip a file in a folder OR containing a string, I would use something like:
(?iOldVersions)
but like I mentioned, I am having trouble putting this together so that it is a single reg ex call -- and also, if there are multiple strings that I want it to skip (ie; the folders OldVersions and Legacy)
I think I would like to keep it as regex although I'm guessing I could also use wScript.Shell (or whatever that object is) but It would be nice to just get familiar with regular expressions for now.
The code I am using is the same from this post, but instead I added a parameter to pass the pattern to the top level code by pulling it from a cell in excel.
List files of certain pattern using Excel VBA
Again, any help would be greatly appreciated!
Thanks again, all!
Edit: Latest attempt....
Private Sub FindPatternMatchedFiles()
objFile = "C:\OldVersions\TestFile.iam"
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
'objRegExp.Pattern = "(.*\.(iam|ipt|ipn|idw))(?!(\z))."
objRegExp.Pattern = "(^((?!OldVersions).)*$)(.*\.(iam|ipt|ipn|idw))"
objRegExp.IgnoreCase = True
res = objRegExp.test(objFile)
MsgBox (res)
'Garbage Collection
Set objRegExp = Nothing
End Sub
To exclude matching strings having \OldVersions\ or \Legacy\, just add anchors and a negative lookahead at the start:
^(?!.*\\(?:OldVersions|Legacy)\\).*\.(?:iam|ipt|ipn|idw)$
See the regex demo
Details:
^ - start of string
(?!.*\\(?:OldVersions|Legacy)\\) - a negative lookahead failing the match if there is \ + either OldVersions or Legacy + \ after 0+ chars other than \r and \n (.*).
.* - 0+ chars other than \r and \n, as many as possible, up to the last...
\. - literal .
(?:iam|ipt|ipn|idw) - one of the alternatives in the non-capturing group
$ - end of string.

How to get the content of parentheses but not the parentheses themselves

I have this kind of text 1323-DI-004 (2013-07-16).pdf and I want to have the date placed in parentheses. I tried with the regex (\(.*)\). It give this (2013-07-16). I want to have the same result but without parenthses.
This is for a VBA code.
Is it possible and how to do it?
Edit: you're using VBA, so
Dim myMatches As MatchCollection
Set myRegExp = New RegExp
myRegExp.Pattern = "\((.*)\)"
Set myMatches = myRegExp.Execute(subjectString)
MsgBox(myMatches(1).Value) 'I think this be your reference? You may need to iterate myMatches to get the right one
Assuming this is a fully PCRE compliant matching platform (PHP, PERL, etc. -- not javascript), use lookarounds to achieve this, matching the () on either side without including them in the capture:
(?<=\()(.*)(?=\))
See it in action: http://regex101.com/r/oI3gD6
If you're using javascript, this won't work, however you can use \((.*)\) and retrieve the first capture group, which will be what's inside the ().

RegEx pattern to extract URLs

I have to extract all there is between this caracters:
<a href="/url?q=(text to extract whatever it is)&amp
I tried this pattern, but it's not working for me:
/(?<=url\?q=).*?(?=&amp)/
I'm programming in Vb.net, this is the code, but I think that the problem is that the pattern is wrong:
Dim matches As MatchCollection
matches = regex.Matches(TextBox1.Text)
For Each Match As Match In matches
listbox1.items.add(Match.Value)
Next
Could you help me please?
Your regex is seemed to be correct except the slash(/) in the beginning and ending of expression, remove it:
Dim regex = New Regex("(?<=url\?q=).*?(?=&amp)")
and it should work.
Some utilities and most languages use / (forward slash) to start and end (de-limit or contain) the search expression others may use single quotes. With System.Text.RegularExpressions.Regex you don't need it.
This regex code below will extract all urls from your text (or any other):
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?

Need to extract text from within first curly brackets

I have strings that look like this
{/CSDC} CHOC SHELL DIP COLOR {17}
I need to extract the value in the first swirly brackets. In the above example it would be
/CSDC
So far i have this code which is not working
Dim matchCode = Regex.Matches(txtItems.Text, "/\{(.+?)\}/")
Dim itemCode As String
If matchCode.Count > 0 Then
itemCode = matchCode(0).Value
End If
I think the main issue here is that you are confusing your regular expression syntax between different languages.
In languages like Javascript, Perl, Ruby and others, you create a regular expression object by using the /regex/ notation.
In .NET, when you instantiate a Regex object, you pass it a string of the regular expression, which is delimited by quotes, not slashes. So it is of the form "regex".
So try removing the leading and trailing / from your string and see how you go.
This may not be the whole problem, but it is at least part of it.
Are you getting the whole string instead of just the 1st value? Regular expressions are greedy by default so .Net is trying to grab the largest matching string.
Try this:
Dim matchCode = Regex.Matches(txtItems.Text, "\{[^}]*\}")
Dim itemCode As String
If matchCode.Count > 0 Then
itemCode = matchCode(0).Groups(0).Value
End If
Edited: I've tried this in Linqpad and it worked.
It appears you are using a capture group.. so try matchCode(0).Groups(0).Value
Also, remove the /\ from the beginning of the pattern and remove the trailing /

RegEx : replace all Url-s that are not anchored

I'm trying to replace Urls contained inside a HTML code block the users post into an old web-app with proper anchors (<A>) for those Urls.
The problem is that Urls can be already 'anchored', that is contained in <A> elements. Those Url should not be replaced.
Example:
http://noreplace.com <- do not replace
<u>http://noreplace.com</u> <- do not replace
...http://replace.com <- replace
What would the regex to match only 'not anchored Urls' look like?
I use the following function to replace with RegEx:
Function ReplaceRegExp(strString, strPattern, strReplace)
Dim RE: Set RE = New RegExp
With RE
.Pattern = strPattern
.IgnoreCase = True
.Global = True
ReplaceRegExp = .Replace(strString, strReplace)
End With
End Function
The following non greedy regex is used to format UBB URLs. Can this regex be adapted to match only the ones I need?
' the double doublequote in the brackets is because
' double doublequoting is ASP escaping for doublequotes
strString = ReplaceRegExp(strString, "\[URL=[""]?(http|ftp|https)(:\/\/[\w\-_]+)((\.[\w\-_]+)+)([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?[""]?\](.*?)\[/URL\]", "$6")
If this really cannot be done with RegEx, what would be the solution in ASP Classic, with some code or pseudocode please? However I would really like to keep code simple with an additional regex line than add additional functions to this old code.
Thanks for your effort!
Seems like regular expressions are too complex to use for this kind of job so I went to my rusty VBScript skills and coded a function that first removes anchors and then replaces the URLs.
Here it is if somebody may need it:
Function Linkify(Text)
Dim regEx, Match, Matches, patternURLs, patternAnchors, lCount, anchorCount, replacements
patternURLs = "((http|ftp|https)(:\/\/[\w\-_]+)((\.[\w\-_]+)+)([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)"
patternAnchors = "<a[^>]*?>.*?</a>"
Set replacements=Server.CreateObject("Scripting.Dictionary")
' Create the regular expression.
Set regEx = New RegExp
regEx.Pattern = patternAnchors
regEx.IgnoreCase = True
regEx.Global = True
' Do the search for anchors.
Set Matches = regEx.Execute(Text)
lCount = 0
' Iterate through the existing anchors and replace with a placeholder
For Each Match in Matches
key = "<#" & lCount & "#>"
replacements.Add key, Match.Value
Text = Replace(Text,Cstr(Match.Value),key)
lCount = lCount+1
Next
anchorCount = lCount
' we now search for URls
regEx.Pattern = patternURLs
' create anchors from URLs
Text = regEx.Replace(Text, "$1")
' put back the originally existing anchors
For lCount = 0 To anchorCount-1
key = "<#" & lCount & "#>"
Text = Replace(Text,key, replacements.Item(key))
Next
Linkify = Text
End Function
The answer you're looking for is in negative and positive look aheads and look behinds
This article gives a pretty good overview: http://www.regular-expressions.info/lookaround.html
Here's the Regular Expression I've formulated for your case:
(?<!"|>)(ht|f)tps?://.*?(?=\s|$)
Here's some sample data I matched against:
#Matches
http://www.website.com
https://www.website.com
This is a link http://www.website.com that is not linked
This is a long link http://www.website.com/index.htm?foo=bar
ftp://www.website.com
#No Matches
<u>http://www.website.com</u>
http://website.com
http://website.com
<u>http://www.website.com</u>
ftp://www.website.com
Here's a breakdown of what the regular expression is doing:
(?<!"|>)
A negative look behind, making sure what matches next isn't preceded by a " or >
(ht|f)tps?://.*?
This looks for http, https, or ftp and anything following it. It'll also match ftps! If you want to avoid this, you could use (https?|ftp)://.*? instead
(?=\s|$)
This is a positive look ahead, which matches a space or end of line.
EXTRA CREDIT
(ht)?(?(1)tps?|ftp)://
This will match http/https/ftp but not ftps, this may be a bit overkill when you can use (https?|ftp):// but it's an awesome example of if/else in regex.
Some design issues you're going to have to work around:
Embedded URLs could be absolute or relative and may not include the protocol.
Your HTML may not have quotes around attribute values.
The character right after a URL may also be a valid URL character.
There are lots of valid URL characters these days.
If you can assume (1) absolute URLs with protocols and (2) quoted HTML attributes and (3) people will have whitespace after a URL and (4) you're sticking with supporting only basic URL characters, you can just look for URLs not preceded by a double-quote.
Here's an overly-simple example to start with (untested):
(?<!")((http|https|ftp)://[^\s<>])(?=\s|$) replaced with $1
The [^\s<>] part above is ridiculously greedy, so all of the fun will be in tweaking that to build a character set that fits the URLs your users are typing in. Your example shows a much more involved character class with \w plus a hodge-podge of other allowed characters, so you could start there if you want.