vbscript regular expression, replace between two string - regex

I have this xml:
<doc>
<ContactPrimaryEmail></ContactPrimaryEmail>
<ContactAlternateEmail></ContactAlternateEmail>
<ContactPrimaryMobile>+00xxxxxx</ContactPrimaryMobile>
<ContactAlternateMobile></ContactAlternateMobile>
</doc>
I want to apply a regular expression in VBScript to replace the content "+00xxxxxx" of the attribute ContactPrimaryMobile, simply change the number:
<ContactPrimaryMobile>+00xxxxxx</ContactPrimaryMobile>
I am new to vbscripting and my skills in creating the objects and applying the pattern are limited, so please can you help me converting this regex to use it in VBScript:
(?<=\<ContactPrimaryMobile\>)(.*)(?=\<\/ContactPrimaryMobile)
UPDATE
I get this:
Object doesn't support this property or method: 'Submatches'
when executing:
Dim oRE, oMatches
Set oRE = New RegExp
oRE.Pattern = "<ContactPrimaryMobile>(.*?)</ContactPrimaryMobile>"
oRE.Global = True
Set oMatches = oRE.Execute("<doc><ContactPrimaryEmail></ContactPrimaryEmail><ContactAlternateEmail></ContactAlternateEmail><ContactPrimaryMobile>+00xxxxxx</ContactPrimaryMobile><ContactAlternateMobile></ContactAlternateMobile></doc>")
Wscript.Echo oMatches.Submatches(0)

First of all, VBScript regex does not support lookbehinds, you need to capture the part in between the two strings.
Next, you need to obtain the submatch by accessing the match object after you .Execute the regex match, and get its .Submatches(0):
Dim oRE, oMatches, objMatch
oRE.Pattern = "<ContactPrimaryMobile>(.*?)</ContactPrimaryMobile>"
and then
Set oMatches = oRE.Execute(s)
For Each objMatch In oMatches
Wscript.Echo objMatch.Submatches(0)
Next
To replace, use the appropriate groupings and method:
oRE.Pattern = "(<ContactPrimaryMobile>).*?(</ContactPrimaryMobile>)"
' and then
s = oRE.Replace(s,"$1SOME_NEW_VALUE$2")

I know you explicitly said regex and you have your answer but an alternative approach to getting the same end goal is to use an XML parser instead.
option explicit
dim xmldoc
set xmldoc = CreateObject("MSXML2.DomDocument")
xmldoc.load "doc.xml"
dim primaryMobileNode
set primaryMobileNode = xmldoc.selectSingleNode("/doc/ContactPrimaryMobile")
primaryMobileNode.text = "new value"
xmldoc.save "changed-doc.xml"

Related

Extract ONLY THE FIRST match from Word doc using RegEx lunched from Excel VBA

There is a document like this one. I process 20 documents like this every day and they all look the same (structure, I mean, is very consistent).
The goal of this macro is to extract ONLY THE FIRST match of the RegEx pattern from the .ActiveDocument.Content. In the whole doc there is many more matches, but I need only the first one. The document being processed will be manually opened before the macro would run.
I'm just a VBA beginner so if there is a possibility to write it without using arrays, collections or some dictionaries I'd much appreciate. There is just one item to extract, so it's best to load it inside repNmbr string variable and from there just ws.Range("G30").Value = repNmbr. The simpler the better.
I used these resources Excel Regex Tutorial (Regular Expressions) which is very helpful but I still don't know how to load the FIRST MATCH alone into my repNmbr string variable. I'd like to do this without using any loop, because I just want to load a single string into this repNmbr variable.
Currently I have code like this:
Sub ExtractRepertor03()
'Application.ScreenUpdating = False
Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim ws As Worksheet
Dim regEx As Object
Dim matches As MatchCollection
Dim match As String
Dim repNmbr As String
'Assigning object variables
Set WordApp = GetObject(, "Word.Application") 'ActiveX can't create object is when
Set ExcelApp = GetObject(, "Excel.Application") 'there is no Word document open;
Set regEx = CreateObject("VBScript.RegExp")
Set WordDoc = WordApp.ActiveDocument
Set rng = WordApp.ActiveDocument.Content
'Create the regular expression object
regEx.Global = False 'because I need only the first match instead of all occurences;
regEx.IgnoreCase = True
regEx.Pattern = "([0-9]{1,5})([ ]{0,4})([/])([0-9]{4})"
'regEx.Pattern = "([0-9]{1,5})([\s]{0,4})(/[0-9]{4})"
repNmbr = regEx.Execute(rng.text) 'here is something wrong but I don't know what;
'I'm trying to assign the first RegEx match to repNmbr variable;
Debug.Print repNmbr
repNmbr = Replace(repNmbr, " ", "")
' Set matches = regEx.Execute(rng.text)
' Debug.Print regEx.Test(rng)
' 'Debug.Print regEx.Value
' For Each match In matches 'I just want this macro run without the loop
' Debug.Print match.Value 'Result: 9042 /2019
' repNmbr = match.Value
' Next match
ExcelApp.Application.Visible = True
ws.Range("G30").Value = repNmbr
End Sub
And an error like this:
Can someone explain to me why Set matches = regEx.Execute(rng.text) works fine but
repNmbr = regEx.Execute(rng.text) returns the error: "Wrong number of arguments or invalid property assignment"??
After regEx.Global = False is set, the RegEx finds only a single value, so why VBA refuses to assign this string into the repNmbr string variable??
As I said in your other question, you don't need the RegEx library for this. Stick to Word's wildcards! Try:
Sub Demo()
Application.ScreenUpdating = False
Dim WordApp As Word.Application
Set WordApp = GetObject(, "Word.Application")
With WordApp.ActiveDocument.Range
With .Find
.Text = "<[0-9 ]{1,7}/[0-9]{4}>"
.MatchWildcards = True
.Wrap = wdFindStop
.Forward = True
.Execute
End With
If .Find.Found = True Then ActiveSheet.Range("G30").Value = Replace(.Text, " ", "")
End With
Application.ScreenUpdating = True
End Sub
Note: I haven't bothered with any of:
Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim ws As Worksheet
Dim regEx As Object
Dim matches As MatchCollection
Dim match As String
Dim repNmbr As String
as it's all superfluous - even your own code never assigns anything to ws.

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

Cannot find why my VBA simple regex command isn't working

When using this line in Excel VBA:
Cells(a, 20).Value = regexProjet.Execute(Cells(a, 1).Value)(0)
I get a warning saying that either the parameter or the command is invalid.
I have use this line in many places in my code and it worked fine, the format of the cells are all standard (they're strings...).
Anyone could give me some hints of what to look for?
FYI this is how I declared the regex:
Dim regexProjet As Object
Set regexProjet = CreateObject("VBScript.RegExp")
regexProjet.IgnoreCase = True
regexProjet.Pattern = "^ ([a-z]+)(-)([0-9]+)" 'conserve seulement la clé du projet
You will get that response if your regex does not match your data. To avoid it, using the technique you are using, first do a test to see if your regex matches your string.
e.g:
If regexProjet.test(Cells(a,1).value) then
Cells(a, 20).Value = regexProjet.Execute(Cells(a, 1).Value)(0)
Else
... your error routine
End If
Also, you should note that if you are just trying to match the overall pattern, there is no need for the capturing groups (and they will add execution time, making the regex less efficient).
Here is sample code I got to run in Excel 2013 but it's slightly different than what you have.
I got the code from [How to regular expressions]: http://support.microsoft.com/kb/818802
Sub testreg()
Dim regexProjet As New RegExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim RetStr As String
RetStr = ""
regexProjet.IgnoreCase = True
regexProjet.Pattern = "^ ([a-z]+)(-)([0-9]+)"
Set colMatches = regexProjet.Execute(Cells(1, 1).Value)
For Each objMatch In colMatches ' Iterate Matches collection.
RetStr = RetStr & " " & objMatch.Value
Next
Cells(1, 20).Value = RetStr
End Sub

Regular Expression with a NOT match

I am trying to create a regular expression to validate file names for further processing.
I want to match a file with extension .txt, .csv, or .log
but I want to exclude them if it ENDS with _out.csv or mylog.log
Here is my start:
Function CanProcessFile(strFileNameIn, strLogName, strOutName)
Dim objRegEx, colMatches, strPattern
Set objRegEx = CreateObject("VBScript.RegExp")
strPattern = "((.txt$)|(.csv$)|(.Log$))"
strPattern = "(?!(" & strLogName & "$))" & strPattern
'strPattern = "(?!(" & strOutName & "$))" & strPattern
objRegEx.Pattern = strPattern
objRegEx.IgnoreCase = True
Set colMatches = objRegEx.Execute(strFileNameIn)
If colMatches.Count > 0 Then
CanProcessFile = True
Else
CanProcessFile = False
End If
End Function
but every time I try to add a ^ or ?! with (_out.csv$) it fails.
I think that creating two filters (as you suggest in a comment above) may well be the best way to go, but if you prefer to do it in a single regex, then you should be able to write:
objRegEx.Pattern = "^(?!.*(_out\.csv|mylog\.log)$).*\.(txt|csv|log)$"
which ensures that the file doesn't start with .*(_out\.csv|mylog\.log)$ (i.e., that it doesn't end with _out.csv or mylog.log).
I don't know if vbscript supports negative lookbehind, but if it does, have a try with:
(?<!_out|mylog)\.(?:txt|csv|log)$
May be vbscript accepts negative lookbehind but not variable length lookbehind, in that case, use:
(?<!_out)(?<!mylog)\.(?:txt|csv|log)$

using classic asp for regular expression

We have some Classic asp sites, and i'm working on them a lil' bit, and I was wondering how can I write a regular expression check, and extract the matched expression:
the expression I have is in the script's name
so Let's say this
Response.Write Request.ServerVariables("SCRIPT_NAME")
Prints out:
review_blabla.asp
review_foo.asp
review_bar.asp
How can I get the blabla, foo and bar from there?
Thanks.
Whilst Yots' answer is almost certainly correct, you can achieve the result you are looking for with a lot less code and somewhat more clearly:
'A handy function i keep lying around for RegEx matches'
Function RegExResults(strTarget, strPattern)
Set regEx = New RegExp
regEx.Pattern = strPattern
regEx.Global = true
Set RegExResults = regEx.Execute(strTarget)
Set regEx = Nothing
End Function
'Pass the original string and pattern into the function and get a collection object back'
Set arrResults = RegExResults(Request.ServerVariables("SCRIPT_NAME"), "review_(.*?)\.asp")
'In your pattern the answer is the first group, so all you need is'
For each result in arrResults
Response.Write(result.Submatches(0))
Next
Set arrResults = Nothing
Additionally, I have yet to find a better RegEx playground than Regexr, it's brilliant for trying out your regex patterns before diving into code.
You have to use the Submatches Collection from the Match Object to get your data out of the review_(.*?)\.asp Pattern
Function getScriptNamePart(scriptname)
dim RegEx : Set RegEx = New RegExp
dim result : result = ""
With RegEx
.Pattern = "review_(.*?)\.asp"
.IgnoreCase = True
.Global = True
End With
Dim Match, Submatch
dim Matches : Set Matches = RegEx.Execute(scriptname)
dim SubMatches
For Each Match in Matches
For Each Submatch in Match.SubMatches
result = Submatch
Exit For
Next
Exit For
Next
Set Matches = Nothing
Set SubMatches = Nothing
Set Match = Nothing
Set RegEx = Nothing
getScriptNamePart = result
End Function
You can do
review_(.*?)\.asp
See it here on Regexr
You will then find your result in capture group 1.
You can use RegExp object to do so.
Your code gonna be like this:
Set RegularExpressionObject = New RegExp
RegularExpressionObject.Pattern = "review_(.*)\.asp"
matches = RegularExpressionObject.Execute("review_blabla.asp")
Sorry, I can't test code below right now.
Check out usage at MSDN http://msdn.microsoft.com/en-us/library/ms974570.aspx