I am fiddling with regular expressions to shorten a string splitting routine I have been using.
I have a string for my cart that is submitted to an asp script as follows:
addnothing|-1, addRST115400112*2xl|0, addnothing|-1, addnothing|-1, addRST115400115*xs|0, addnothing|-1
I want to be able to extract the two entries that represent two stock items:
addRST115400112*2xl|0
addRST115400115*xs|0
I have managed to get this bit of code to work but I am unsure about the pattern I am using:
add[^n](.*)\*(.*)\|[0-9],
This returns this:
addRST115400112*2xl|0, addnothing|-1, addnothing|-1, addRST115400115*xs|0,
but I only want it to return :
addRST115400112*2xl|0
addRST115400115*xs|0
Can anybody point me in the right direction please?
You were matching it greedily (.* eats as much as it can so in your case it ends up eating till the last \|[0-9] i.e |0)
You should match it lazily by using .*? instead of .*
So your regex should be
add(?!nothing)(.*?)\*(.*?)\|\d
\d is similar to [0-9]
(?!nothing) is just a check..it doesn't match or consume anything..it's better then [^n] cuz it's more reliable,expressive and doesnt eat anything
Trying to keep the .Pattern simple (this is VBScript!) and to make tinkering with it easier (what really singles out stock items is by no means clear):
Dim sInp : sInp = "addnothing|-1, addRST115400112*2xl|0, addnothing|-1, addnothing|-1, addRST115400115*xs|0, addnothing|-1"
Dim reCut : Set reCut = New RegExp
reCut.Global = True
reCut.Pattern = "addR[^|]+\|\d"
Dim oMTS : Set oMTS = reCut.Execute(sInp)
If 2 = oMTS.Count Then
WScript.Echo "Success:", Join(Array(oMTS(0).Value, oMTS(1).Value))
Else
WScript.Echo "Bingo:", reCut.Pattern
End If
output:
Success: addRST115400112*2xl|0 addRST115400115*xs|0
Related
I'm trying with no luck to extract a recurring word inside a string using RegEx in Excel VBA.
Following an example:
I'm trying with no luck to extract a recurring word inside a string using RegEx in Excel VBA.
Following an example:
Sub RegExTest()
Dim re As Object
Dim el As Object
Const strText As String = "Fld,Fld,Fld,Fld,Fld,aFld1,bFld,cFld,Fld"
Debug.Print strText
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = False
.IgnoreCase = False
.pattern = "(^Fld\,|\,Fld\,|\,Fld$)"
If .Test(strText) Then
Set re = .Execute(strText)
End If
End With
For Each el In re
Debug.Print el
Next
End Sub
Result:
Fld,Fld,Fld,Fld,Fld,aFld1,bFld,cFld,Fld
Fld,
,Fld,
,Fld,
,Fld
The result that I get (4 elements) is not what I expect (6 elements).
I'm sure it is about a wrong pattern definition.
Can someone please help with the correct pattern?
Thanks in advance
The problem here is that your matches are overlapping. By that I mean the comma in Fld\, is already matched, so your second Fld won't match \,Fld\,
If you double up your commas you can see that you have the appropriate number of matches
The solution here is to use lookaheads to capture your matches. If you absolutely need the trailing commas for some reason, just append them to the relevant matches.
Afternoon,
I'm having trouble with some data imports from PowerPoint into Access.
Initially when I import the data the notes section comes in as the below for each row:
<div class="ExternalClass63DBAC931E7D4E4680E207BF938770AA"><p>xxxxxxxxxxx.</p> <p>xxxxxxxxxxxx</p></div>
The xxxxxxx is where the data I want to pull out is.
I have tried Regex in the form of replacing everything between the <> as seen below
Public Function AddPipesBeforeDates(ByVal strText As String) As String
Dim regex As Object
Dim matches As Object
Dim m As Object
Set regex = CreateObject("VBScript.RegExp")
regex.Global = True
regex.pattern = "<.*>"
Set matches = regex.Execute(strText)
For Each m In matches
strText = Replace(strText, m, "")
Next
AddPipesBeforeDates = strText
Set matches = Nothing
Set regex = Nothing
End Function
The problem becomes it wipes out everything.
I just found out about Regex and I'm not familiar with it.
Is there a way to delete the unwanted data?
Note the xxxxxx data can be any value spaces or special characters
Any thoughts or ideas on how to do this would be appreciated. I may be going at this the wrong way.
Thanks
You must note that . matches any character but a newline (thus, including < and >).
To remove all substrings between < and >, you may use
regex.pattern = "<[^<]+>"
This way, you will avoid "overfiring" and matching more than you need.
I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub
I am doing this task as part of a larger sub in order to massively reduce the workload for a different team.
I am trying to read in a string and use Regular Expressions to replace one-to-many spaces with a single space (or another character). At the moment I am using a local string, however in the main sub this data will come from an external .txt file. The number of spaces between elements in this .txt can vary depeneding on the row.
I am using the below code, and replacing the spaces with a dash. I have tried different variations and different logic on the below code, but always get "Run-time error '91': Object Variable or with clock variable not set" on line "c = re.Replace(s, replacement)"
After using breakpoints, I have found out that my RegularExpression (re) is empty, but I can't quite figure out how to progress from here. How do I replace my spaces with dashes? I have been at this problem for hours and spent most of that time on Google to see if someone has had a similar issue.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
Extra information: Using Excel 2010. Have successfully linked all my references (Microsoft VBScript Regular Expressions 5.5". I was sucessfully able to replace the spaces using the vanilla "Replace" function, however as the number of spaces between elements vary I am unable to use that to solve my issue.
Ed: My .txt file is not fixed either, there are a number of rows that are different lengths so I am unable to use the MID function in excel to dissect the string either
Please help
Thanks,
J.H.
You're not setting up the RegExp object correctly.
Dim pattern As String
pattern = "\s+" ' pattern is just a local string, not bound to the RegExp object!
You need to do this:
Dim re As RegExp
Set re = New RegExp
re.Pattern = "\s+" ' Now the pattern is bound to the RegExp object
re.Global = True ' Assuming you want to replace *all* matches
s = "hello World"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Try setting the pattern inside your Regex object. Right now, re is just a regex with no real pattern assigned to it. Try adding in re.Pattern = pattern after you initialize your pattern string.
You initialized the pattern but didn't actually hook it into the Regex. When you ended up calling replace it didn't know what it was looking for pattern wise, and threw the error.
Try also setting the re as a New RegExp.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
Set re = New RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
re.Pattern = pattern
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
We have some Classic asp sites, and i'm working on them a lil' bit, and I was wondering how can I write a regular expression check, and extract the matched expression:
the expression I have is in the script's name
so Let's say this
Response.Write Request.ServerVariables("SCRIPT_NAME")
Prints out:
review_blabla.asp
review_foo.asp
review_bar.asp
How can I get the blabla, foo and bar from there?
Thanks.
Whilst Yots' answer is almost certainly correct, you can achieve the result you are looking for with a lot less code and somewhat more clearly:
'A handy function i keep lying around for RegEx matches'
Function RegExResults(strTarget, strPattern)
Set regEx = New RegExp
regEx.Pattern = strPattern
regEx.Global = true
Set RegExResults = regEx.Execute(strTarget)
Set regEx = Nothing
End Function
'Pass the original string and pattern into the function and get a collection object back'
Set arrResults = RegExResults(Request.ServerVariables("SCRIPT_NAME"), "review_(.*?)\.asp")
'In your pattern the answer is the first group, so all you need is'
For each result in arrResults
Response.Write(result.Submatches(0))
Next
Set arrResults = Nothing
Additionally, I have yet to find a better RegEx playground than Regexr, it's brilliant for trying out your regex patterns before diving into code.
You have to use the Submatches Collection from the Match Object to get your data out of the review_(.*?)\.asp Pattern
Function getScriptNamePart(scriptname)
dim RegEx : Set RegEx = New RegExp
dim result : result = ""
With RegEx
.Pattern = "review_(.*?)\.asp"
.IgnoreCase = True
.Global = True
End With
Dim Match, Submatch
dim Matches : Set Matches = RegEx.Execute(scriptname)
dim SubMatches
For Each Match in Matches
For Each Submatch in Match.SubMatches
result = Submatch
Exit For
Next
Exit For
Next
Set Matches = Nothing
Set SubMatches = Nothing
Set Match = Nothing
Set RegEx = Nothing
getScriptNamePart = result
End Function
You can do
review_(.*?)\.asp
See it here on Regexr
You will then find your result in capture group 1.
You can use RegExp object to do so.
Your code gonna be like this:
Set RegularExpressionObject = New RegExp
RegularExpressionObject.Pattern = "review_(.*)\.asp"
matches = RegularExpressionObject.Execute("review_blabla.asp")
Sorry, I can't test code below right now.
Check out usage at MSDN http://msdn.microsoft.com/en-us/library/ms974570.aspx