VBA Regular expression with string split - regex

Can anyone please help me out with Vba macro.
I'm using the below mentioned code. The task is to read a notepad file which contains contents and extract a certain string which looks like "Z012345" and paste them in excel row wise such cell A1 will Z067859 and A2 would be Z002674 etc.,
A sample of how the contents in the notepad file looks like
Contents:
RAF0A123 Full data len= 134
ABATWER01 Recent change by VT0123123 on 11/12/17-11:50
INCLUDE(STELLER Z067859 Z002674 Z004671 Z003450 Z005433 Z023123 Z034564 Z034554 Z043212 Z010456 Z014567
Z027716 Z028778 Z029439 Z029876 Z035766 Z036460 Z038544 Z046456 Z047680 Z052907 Z053145 Z074674 Z094887
VBA code:
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
myFile = "C:\Users\sample.txt"
Open myFile For Input As #1
With regex
.Pattern = "Z0[0-9]+"
.Global = Trueq
End With
Set matches = regex.Execute(Input)
For Each Match In matches
Range("A1:A4000").Value = Match.Value
Next Match
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
End Sub
Expected output:
Excel output column A should contain the below:
Z067859
Z002674
Z004671
Z003450
Z005433
Z023123
Z034564
Z034554
Z043212
Z010456
Z014567
Z027716
Z028778
Z029439
Z029876
Z035766
Z036460
Z038544
Z046456
Z047680
Z052907
Z053145
Z074674
Z094887
Could anyone help me out to write a macro to perform the task?

Rather than reading one line at a time, I would rather read the entire file into a string and then find the string and paste it. Sample code
Dim myFile As String, regex As Object, str As String, ctr As Long
myFile = "C:\Users\sample.txt"
With CreateObject("Scripting.FileSystemObject")
str = .OpenTextFile(myFile, 1).ReadAll
End With
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Set matches = regex.Execute(str)
ctr = 1
For Each Match In matches
Sheet1.Range("A" & ctr).Value2 = Match
ctr = ctr + 1
Next Match

I actually think your code is 85% there. I see a couple of things wrong.
1) You need to read the file before you try to output to Excel. In your code it seems you read the file after any activity in Excel
2) You are putting the same value in every single cell from A1 to A1000, overwriting them each time. I believe you want to loop down and put each value in a cell.
3) You're passing a variable that doesn't even exist to your regex
A couple of changes, and this might do it:
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
myFile = "C:\Users\sample.txt"
Open myFile For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Set matches = regex.Execute(text)
Dim row As Long
row = 1
For Each Match In matches
Cells(row, 1).Value2 = Match
row = row + 1
Next Match
End Sub

Please try the below and let me know it meets your requirement
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim str As String
Dim LineArray() As String
Dim DataArray() As String
Dim TempArray() As String
Dim rw As Long, col As Long
Dim FileContent As String
Set regex = CreateObject("vbscript.regexp")
Dim allMatches As Object
Delimiter = " "
myFile = "Path\sample.txt"
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Open myFile For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
LineArray() = Split(text, vbCrLf)
i = 1
For x = LBound(LineArray) To UBound(LineArray)
If Len(Trim(LineArray(x))) <> 0 Then
TempArray = Split(LineArray(x), Delimiter)
col = UBound(TempArray)
ReDim Preserve DataArray(col, rw)
For y = LBound(TempArray) To UBound(TempArray)
Set allMatches = regex.Execute(TempArray(y))
Range("A" & i).Value = allMatches.Item(0)
i = i + 1
Next y
End If
rw = rw + 1
Next x
Close #1
End Sub
Thanks

Related

Is there a way to find and replace in a .txt or .xml file from VBA without using REGEX?

I have an excel file that runs a model from an external .exe file, and collects its outputs to store on various sheets for data analysis.
The .exe file keeps its input files in an accessible text based format. I realised that I can create a batch run by editing the input files directly from vba and running in a loop, however it requires find and replace in a .txt or .xml file from VBA. I would need to use a wildcard as part of the string I want to find will change everytime and is unpredictable at the first iteration. E.g:
I want to find the following text within the text based file:
"0.3843 Flow_Rate_Mass kg/s"
and replace the number before rerunning my script to solve the .exe with this new input file for a new flow rate. However, the number 0.3843 could be anything in the first instance (in the format "x.xxxx Flow_Rate_Mass kg/s"), so I need to use a wildcard in order to find and replace.
After the first iteration I will have set this value at the previous iteration, and so I can use find and replace from a known string without needing a wildcard.
So the following should work, but is not flexible enough at the first iteration:
Dim sBuf As String
Dim sTemp As String
Dim iFileNum As Integer
Dim sFileName As String
sFileName = "C:\filelocation"
iFileNum = FreeFile
Open sFileName For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTemp = sTemp & sBuf & vbCrLf
Loop
Close iFileNum
sTemp = Replace(sTemp, "0.3843 Flow_Rate_Mass kg/s", "0.5000 Flow_Rate_Mass kg/s")
iFileNum = FreeFile
Open sFileName For Output As iFileNum
Print #iFileNum, sTemp
Close iFileNum
Is there any work around without REGEX?
An example regex replace might be
Option Explicit
Public Sub test()
Dim i As Long, tests()
tests = Array("some text here 0.3843 Flow_Rate_Mass kg/s other text", "some text here 0.5000 Flow_Rate_Mass kg/s other text")
For i = LBound(tests) To UBound(tests)
Debug.Print ReplaceMatch(tests(i))
Next
End Sub
Public Function ReplaceMatch(ByVal inputString As String) As String
Dim re As Object
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = True
.pattern = "\b[\.0-9]+( Flow_Rate_Mass kg\/s)"
If .test(inputString) Then
ReplaceMatch = .Replace(inputString, "$1")
Else
ReplaceMatch = inputString
End If
End With
End Function
Regex explanation:
An Instr example assuming fixed length of that flow rate value. Note that String functions do have max lengths they can work with.
Option Explicit
Public Sub test()
Dim i As Long, tests()
tests = Array("some text here 0.3843 Flow_Rate_Mass kg/s other text", "some text here 0.5000 Flow_Rate_Mass kg/s other text")
For i = LBound(tests) To UBound(tests)
Debug.Print ReplaceMatch(tests(i))
Next
End Sub
Public Function ReplaceMatch(ByVal inputString As String) As String
Dim pos As Long, fixedLength As Long, length As Long, leftPosition As Long, rightPosition As Long
fixedLength = 6
pos = InStr(inputString, " Flow_Rate_Mass kg/s")
If pos > 0 Then
length = Len(inputString)
leftPosition = pos - fixedLength - 1
rightPosition = length - pos
ReplaceMatch = Left$(inputString, leftPosition) & Right$(inputString, rightPosition)
Else
ReplaceMatch = inputString
End If
End Function
Note that there are faster ways using command line/powershell to do this and also existing tools:
examples 1, 2

RegEx for matching a special pattern in VB.net

I have code that extracts the text in different SGM files using the files Entity reference (&Ch1;). The code works great for this, but now it's expanded to need to get entity references for sectioned files with entity calls with this type of reference &Ch1-1; This can also grow to &Ch1-1-1;
I need to expand the code to accept these new entities so those files content can be added to the master file.
I believe the issue is the regular expression used so I changed it to
Dim rx = New Regex("&Ch(?<EntityNumber>\d+?[-\d+]?)?")
This doesn't create an error, but it also doesn't bring the file contents into the master document. I'm used to regular expressions but I've nevered used named capturing groups and found the explanations on the web a bit confusing.
Sub runProgram()
Dim DirFolder As String = txtDirectory.Text
Dim Directory As New IO.DirectoryInfo(DirFolder)
Dim allFiles As IO.FileInfo() = Directory.GetFiles("*.sgm")
Dim singleFile As IO.FileInfo
Dim Response As String
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
masterFileName = Prefix & $"_Master_Document.sgm"
newMasterFileName = Prefix & $"_New_Master_Document.sgm"
newMasterFilePath = IO.Path.Combine(searchDir, newMasterFileName)
Dim existingMasterFilePath = IO.Path.Combine(searchDir, masterFileName)
'Read all text of the Master Document
'and create a StringBuilder from it.
'All replacements will be done on the
'StringBuilder as it is more efficient
'than using Strings directly
Dim strMasterDoc = File.ReadAllText(existingMasterFilePath)
Dim newMasterFileBuilder As New StringBuilder(strMasterDoc)
'Create a regex with a named capture group.
'The name is 'EntityNumber' and captures just the
'entity digits for use in building the file name
Dim rx = New Regex("&Ch(?<EntityNumber>\d+(-?\d*)*)?")
Dim rxMatches = rx.Matches(strMasterDoc)
For Each match As Match In rxMatches
Dim entity = match.ToString
'Build the file name using the captured digits from the entity in the master file
Dim entityFileName = Prefix & $"_Ch{match.Groups("EntityNumber")}.sgm.bak"
Dim entityFilePath = Path.Combine(searchDir, entityFileName)
'Check if the entity file exists and use its contents
'to replace the entity in the copy of the master file
'contained in the StringBuilder
If File.Exists(entityFilePath) Then
Dim entityFileContents As String = File.ReadAllText(entityFilePath)
newMasterFileBuilder.Replace(entity, entityFileContents)
End If
Next
'write the processed contents of the master file to a different file
File.WriteAllText(newMasterFilePath, newMasterFileBuilder.ToString)
Dim largeFilePath As String = newMasterFilePath
Dim lines1 = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
Dim entities = From line In lines1
Where reg.IsMatch(line)
Dim dictionary As New Dictionary(Of Integer, String)
Dim idx = -1
For Each s In entities
idx = lines1.IndexOf(s, idx + 1)
dictionary.Add(idx, s.Trim)
Next
Dim deletedItems = 0
For Each itm In dictionary
lines1.RemoveAt(itm.Key - deletedItems)
deletedItems += 1
Next
Dim uniqueDict = dictionary.GroupBy(Function(itm) itm.Value).
Select(Function(group) group.First()).
ToDictionary(Function(itm) itm.Key, Function(itm) itm.Value)
For Each s In uniqueDict.Values
lines1.Insert(1, s)
Next
Dim builtMaster As String = Prefix & "_FinalDeliverable.sgm"
Dim newBuiltMasterFilePath = IO.Path.Combine(searchDir, builtMaster)
Dim builtMasterDoc As String = newBuiltMasterFilePath
Using sw As New System.IO.StreamWriter(builtMasterDoc)
For Each line As String In lines1
sw.WriteLine(line)
Next
sw.Flush()
sw.Close()
End Using
'Delete the master document and new master document
If System.IO.File.Exists(existingMasterFilePath) = True Then
System.IO.File.Delete(existingMasterFilePath)
End If
If System.IO.File.Exists(newMasterFilePath) = True Then
System.IO.File.Delete(newMasterFilePath)
End If
For Each filename As String In IO.Directory.GetFiles(searchDir, "*.bak")
IO.File.Delete(filename)
Next
Response = MsgBox("File 'FinalDeliverable.sgm' has been created.", vbOKOnly, "SGM Status")
If Response = vbOK Then ' User chose Yes.
Close()
Else ' User chose No.
' Perform some action.
End If
End Sub
The results I'm expecting is for files with names Ch1-1.sgm content between and content will be added to a master file.
This does work for file entities that are &Ch1; it grabs Ch1.sgm content correctly.
Thank you for the help,
Maxine
Sample Code:
Master_Document.sgm
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
]>
&Ch1;
<body numcols="2">
&Ch2-1;
&Ch2-2;
&Ch2-3;
&Ch2-4;
&Ch2-5;
&Ch2-6;
&Ch2-7;
&Ch2-8;
&Ch2-9;
&Ch3;
</body></doc>
Sample SGM file
<?Pub /_gtinsert>
<body numcols="2">
<!--#start#-->
<chapter id="Chapter_4__Procedures">
<title>Procedures</title>
<section>
<title>Introduction</title>
<!--#stop#-->
<para0 verdate="7 Never 2012" verstatu
<title>Description</title>
<para>This chapterfor the following:
It turns out the problem is that &Ch(?<EntityNumber>\d+?[-\d+]?)? matched &Ch and then one or more, but as few as possible, digits (with \d+?) and then an optional single -, digit or + symbol. That is, after &Ch, only 1 digit was matched (as there is always a digit in your cases) and then a - was matched if it followed, and then matching stopped.
Use
Dim rx = New Regex("&Ch(?<EntityNumber>\d+(?:-\d+)*);")
See the regex demo and the regex graph:

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function

Analyse format of alpha-numeric string

I'm trying to write a function that takes in a string, parses it, and returns another string that summarizes the number of consecutive alpha or numeric characters in the original string.
For example, the string 999aa45bbx would return 3N2A2N3A,
i.e.
3 numbers,
followed by 2 alpha,
by 2 numbers,
by 3 alpha.
I'm using the function to analyze formats of insurance policy ID numbers. So far, I've found solutions online that extract either alpha or numeric characters, but nothing that describes the format or order in which these characters exist in the original string.
Can anyone help?
A regexp like this will do the job
press altf11 together to go the VBE
Insert Module
copy and paste the code below
press altf11 together to go back to Excel
then you can use the function (which also detects invalid strings) within Excel, ie in B1
=AlphaNumeric(A1)
Function AlphaNumeric(strIn As String) As String
Dim objRegex As Object
Dim objRegMC As Object
Dim objRegM As Object
Dim strOut As String
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.ignorecase = True
.Pattern = "[^\w]"
If .test(strIn) Then
AlphaNumeric = "One or more characters is invalid"
Else
.Pattern = "(\d+|[a-z]+)"
Set objRegMC = .Execute(strIn)
For Each objRegM In objRegMC
strOut = strOut & (objRegM.Length & IIf(IsNumeric(objRegM), "N", "A"))
Next
AlphaNumeric = strOut
End If
End With
End Function
Old school, looping through all characters in the string:
Function IdentifyCharacterSequences(s As String) As String
Dim i As Long
Dim charCounter As Long
Dim currentCharType As String
Dim sOut As String
sOut = ""
charCounter = 1
currentCharType = CharType(Mid(s, 1, 1))
For i = 2 To Len(s)
If (Not CharType(Mid(s, i, 1)) = currentCharType) Or (i = Len(s)) Then
sOut = sOut & charCounter & currentCharType
currentCharType = CharType(Mid(s, i, 1))
charCounter = 1
Else
charCounter = charCounter + 1
End If
Next i
IdentifyCharacterSequences = sOut
End Function
This uses the following helper function. Note that non-alphanumeric characters are identified using the letter "X". You can easily modify this to suit your purposes.
Function CharType(s As String) As String
If s Like "[A-z]" Then
CharType = "A"
ElseIf s Like "[0-9]" Then
CharType = "N"
Else
CharType = "X"
'Or raise an error if non-alphanumerical chars are unacceptable.
End If
End Function
Usage example:

VBS- String split multiple on multiple lines into array

I'm trying to create a function that will take a string which could be over multiple lines, e.g.:
"declare notThese
declare orThis
hello = $notThis#butthis$
butNot= $ButNotThis$
andDefNot = getDate()"
And search through it, pulling out {string1}'s from all parts like
${whatever}#{string1}$
and then pushing them into an array.
How would I archive this? Would it be through regex or is it simpler than that?
Also would it make a difference if the string renders on multiple lines like above?
You can do it through regex. Multi-line or not does not play a role in this case.
Function ExtractStrings(input)
Dim re, matches, match, i, output
Set re = new RegExp
re.Pattern = "\$[^#]+#([^$]+)\$"
re.Global = True
Set matches = re.Execute(input)
ReDim output(matches.Count - 1)
i = 0
For Each match in matches
output(i) = match.SubMatches(0)
i = i + 1
Next
ExtractStrings = output
End Function
You can do it via the Split function:
Dim sLinesOfText As String
sLinesOfText = "Insert multiple lines of text here"
Dim aLines() As String
Dim iLine As Integer
iLine = 0
aLines = Split(sLinesOfText, vbCrLf, , vbTextCompare)
Do While iLine < UBound(aLines)
Debug.Print aLines(iLine)
iLine = iLine + 1
Loop