Extract a string on a text file using VBS - regex

Okay so I have this file sample.txt
("checkAssdMobileNo1".equals(ACTION)
("checkAssdMobileNo2".equals(ACTION)
("checkAssdMobileNo3".equals(ACTION)
("checkAssdMobileNo4".equals(ACTION)
("checkAssdMobileNo5".equals(ACTION)
("checkAssdMobileNo6".equals(ACTION)
How can I output only these:
checkAssdMobileNo1
checkAssdMobileNo2
checkAssdMobileNo3
checkAssdMobileNo4
checkAssdMobileNo5
checkAssdMobileNo6
I tried using the following code but it would not output anything and I couldn't figure out what I did wrong:
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set file = objFSO.OpenTextFile("sample.txt" , ForReading)
Const ForReading = 1
Dim re
Set re = new regexp
re.Pattern = """\w+?""[.]equals(ACTION)"
re.IgnoreCase = True
re.Global = True
Dim line
Do Until file.AtEndOfStream
line = file.ReadLine
For Each m In re.Execute(line)
Wscript.Echo m.Submatches(0)
Next
Loop

Your regular expression is close, but missing 2 things:
You need to escape the parentheses surrounding ACTION
You need to use unescaped parentheses to extract the group between the quotes
Something like this should work:
re.Pattern = """(\w+?)""[.]equals\(ACTION\)"

Regex you need is
\("(\w+)"
Demo on regex101
It uses the concept of Group Capture

Related

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

Vbscript Regular expression, find all between 2 known strings

I would like to select all texts between two know strings. Say for example the following text
*starthere
*General Settings
* some text1
* some text2
*endhere
I would like to select all texts between "*starthere" and "*endhere" using vbscript. so that the final output looks like the following
*General Settings
* some text1
* some text2
I know this would be simpler using a regex since there are multiple instances of such pattern in the file i read.
I tried something like the following
/(.*starthere\s+)(.*)(\s+*endhere.*)/
/(*starthere)(?:[^])*?(*endhere)/
But they dont seem to work and it selects even the start and end strings together. Lookforward and backword dont seem to work either and iam not sure if they have support for vbscript.
This is the code I am using:
'Create a regular expression object
Dim objRegExp
Set objRegExp = New RegExp 'Set our pattern
objRegExp.Pattern = "/\*starthere\s+([\s\S]*?)\s+\*endhere\b/" objRegExp.IgnoreCase = True
objRegExp.Global = True
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
Dim objMatches
Set objMatches = objRegExp.Execute(strSearchString)
If objMatches.Count > 0 Then
out = out & objMatches(0) &vbCrLf
WScript.Echo "found"
End If
Loop
WScript.Echo out
objFile.Close
You can use:
/\bstarthere\s+([\s\S]*?)\s+endhere\b/
and grab the captured group #1
([\s\S]*?) will match any text between these 2 tags including newlines.

VBA Regex to match Two sets of Four Digits with String

Im trying to make a Excel Regex pattern to find a certain string. This what Im trying:
I'm trying to make it match 0 and 0000 to 9999
StringToMatch = "a75z6878"
Dim objRegExp As New RegExp
Set objRegExp = CreateObject("vbscript.regexp")
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[a-z]([0-9][0-9][0-9][0-9])[a-z]([0-9][0-9][0-9][0-9])"
objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
If objRegExp.Test(StringToMatch) Then MsgBox(Found!)
I have tried different patterns but none work.
What am I doing wrong???
What is wrong in objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
The quantifier must be specified as {m,n} and not {m-n}
change the regex to
[a-z][0-9]{1,4}[a-z][0-9]{1,4}
For example see the link http://regex101.com/r/wA2qM3/1
OR a shorter version like
[a-z]\d{1,4}[a-z]\d{1,4}

Regular Expression with a NOT match

I am trying to create a regular expression to validate file names for further processing.
I want to match a file with extension .txt, .csv, or .log
but I want to exclude them if it ENDS with _out.csv or mylog.log
Here is my start:
Function CanProcessFile(strFileNameIn, strLogName, strOutName)
Dim objRegEx, colMatches, strPattern
Set objRegEx = CreateObject("VBScript.RegExp")
strPattern = "((.txt$)|(.csv$)|(.Log$))"
strPattern = "(?!(" & strLogName & "$))" & strPattern
'strPattern = "(?!(" & strOutName & "$))" & strPattern
objRegEx.Pattern = strPattern
objRegEx.IgnoreCase = True
Set colMatches = objRegEx.Execute(strFileNameIn)
If colMatches.Count > 0 Then
CanProcessFile = True
Else
CanProcessFile = False
End If
End Function
but every time I try to add a ^ or ?! with (_out.csv$) it fails.
I think that creating two filters (as you suggest in a comment above) may well be the best way to go, but if you prefer to do it in a single regex, then you should be able to write:
objRegEx.Pattern = "^(?!.*(_out\.csv|mylog\.log)$).*\.(txt|csv|log)$"
which ensures that the file doesn't start with .*(_out\.csv|mylog\.log)$ (i.e., that it doesn't end with _out.csv or mylog.log).
I don't know if vbscript supports negative lookbehind, but if it does, have a try with:
(?<!_out|mylog)\.(?:txt|csv|log)$
May be vbscript accepts negative lookbehind but not variable length lookbehind, in that case, use:
(?<!_out)(?<!mylog)\.(?:txt|csv|log)$

VBA Regular Expression to Match Date

I'm new to Regular Expressions and am having difficulty getting patterns that I find online to work in VBScript/VBA. This one is supposed to return a date found in a string but it fails to find any dates. What does VBScript/VBA do different than other RegEx engines that makes this fail to return a match?
Edit1
I removed the ^ and the $ from my pattern. The problem persists.
Private Sub TestDate()
MsgBox RegExDate("cancel on 12/21/2010 ")
End Sub
Private Function RegExDate(s As String) As String
Dim re, match
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/((19|[2-9]\d)\d{2}))|(29\/02\/((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))"
re.Global = True
For Each match In re.Execute(s)
MsgBox match.value
RegExDate = match.value
Exit For
Next
Set re = Nothing
End Function
It looks as if your RegEx will only find match if the whole string you pass to it is a date.
Try removing ^ and $
Here's your example reworked using a RegEx that will find dates in the mm/dd/yyyy and mm-dd-yyyy formats -
Private Sub TestDate()
MsgBox RegExDate("cancel on 12/21/2010 ")
End Sub
Private Function RegExDate(s As String) As String
Dim re, match
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}"
re.Global = True
For Each match In re.Execute(s)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Set re = Nothing
End Function
Why not use RegEx to get the portion of the string that appears to be the date and use the IsDate Function to validate it?
Function FormatOutput(s)
Dim re, match
Set re = CreateObject("vbscript.regexp")
re.Pattern = "[\d]+[\/-][\d]+[\/-][\d]+"
re.Global = True
For Each match In re.Execute(s)
if IsDate(match.value) then
FormatOutput = CDate(match.value)
Exit For
end if
Next
Set re = Nothing
End Function
The RegEx could be cleared up a bit, but it works for your current example.