RegEx for matching a special pattern in VB.net - regex

I have code that extracts the text in different SGM files using the files Entity reference (&Ch1;). The code works great for this, but now it's expanded to need to get entity references for sectioned files with entity calls with this type of reference &Ch1-1; This can also grow to &Ch1-1-1;
I need to expand the code to accept these new entities so those files content can be added to the master file.
I believe the issue is the regular expression used so I changed it to
Dim rx = New Regex("&Ch(?<EntityNumber>\d+?[-\d+]?)?")
This doesn't create an error, but it also doesn't bring the file contents into the master document. I'm used to regular expressions but I've nevered used named capturing groups and found the explanations on the web a bit confusing.
Sub runProgram()
Dim DirFolder As String = txtDirectory.Text
Dim Directory As New IO.DirectoryInfo(DirFolder)
Dim allFiles As IO.FileInfo() = Directory.GetFiles("*.sgm")
Dim singleFile As IO.FileInfo
Dim Response As String
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
masterFileName = Prefix & $"_Master_Document.sgm"
newMasterFileName = Prefix & $"_New_Master_Document.sgm"
newMasterFilePath = IO.Path.Combine(searchDir, newMasterFileName)
Dim existingMasterFilePath = IO.Path.Combine(searchDir, masterFileName)
'Read all text of the Master Document
'and create a StringBuilder from it.
'All replacements will be done on the
'StringBuilder as it is more efficient
'than using Strings directly
Dim strMasterDoc = File.ReadAllText(existingMasterFilePath)
Dim newMasterFileBuilder As New StringBuilder(strMasterDoc)
'Create a regex with a named capture group.
'The name is 'EntityNumber' and captures just the
'entity digits for use in building the file name
Dim rx = New Regex("&Ch(?<EntityNumber>\d+(-?\d*)*)?")
Dim rxMatches = rx.Matches(strMasterDoc)
For Each match As Match In rxMatches
Dim entity = match.ToString
'Build the file name using the captured digits from the entity in the master file
Dim entityFileName = Prefix & $"_Ch{match.Groups("EntityNumber")}.sgm.bak"
Dim entityFilePath = Path.Combine(searchDir, entityFileName)
'Check if the entity file exists and use its contents
'to replace the entity in the copy of the master file
'contained in the StringBuilder
If File.Exists(entityFilePath) Then
Dim entityFileContents As String = File.ReadAllText(entityFilePath)
newMasterFileBuilder.Replace(entity, entityFileContents)
End If
Next
'write the processed contents of the master file to a different file
File.WriteAllText(newMasterFilePath, newMasterFileBuilder.ToString)
Dim largeFilePath As String = newMasterFilePath
Dim lines1 = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
Dim entities = From line In lines1
Where reg.IsMatch(line)
Dim dictionary As New Dictionary(Of Integer, String)
Dim idx = -1
For Each s In entities
idx = lines1.IndexOf(s, idx + 1)
dictionary.Add(idx, s.Trim)
Next
Dim deletedItems = 0
For Each itm In dictionary
lines1.RemoveAt(itm.Key - deletedItems)
deletedItems += 1
Next
Dim uniqueDict = dictionary.GroupBy(Function(itm) itm.Value).
Select(Function(group) group.First()).
ToDictionary(Function(itm) itm.Key, Function(itm) itm.Value)
For Each s In uniqueDict.Values
lines1.Insert(1, s)
Next
Dim builtMaster As String = Prefix & "_FinalDeliverable.sgm"
Dim newBuiltMasterFilePath = IO.Path.Combine(searchDir, builtMaster)
Dim builtMasterDoc As String = newBuiltMasterFilePath
Using sw As New System.IO.StreamWriter(builtMasterDoc)
For Each line As String In lines1
sw.WriteLine(line)
Next
sw.Flush()
sw.Close()
End Using
'Delete the master document and new master document
If System.IO.File.Exists(existingMasterFilePath) = True Then
System.IO.File.Delete(existingMasterFilePath)
End If
If System.IO.File.Exists(newMasterFilePath) = True Then
System.IO.File.Delete(newMasterFilePath)
End If
For Each filename As String In IO.Directory.GetFiles(searchDir, "*.bak")
IO.File.Delete(filename)
Next
Response = MsgBox("File 'FinalDeliverable.sgm' has been created.", vbOKOnly, "SGM Status")
If Response = vbOK Then ' User chose Yes.
Close()
Else ' User chose No.
' Perform some action.
End If
End Sub
The results I'm expecting is for files with names Ch1-1.sgm content between and content will be added to a master file.
This does work for file entities that are &Ch1; it grabs Ch1.sgm content correctly.
Thank you for the help,
Maxine
Sample Code:
Master_Document.sgm
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
]>
&Ch1;
<body numcols="2">
&Ch2-1;
&Ch2-2;
&Ch2-3;
&Ch2-4;
&Ch2-5;
&Ch2-6;
&Ch2-7;
&Ch2-8;
&Ch2-9;
&Ch3;
</body></doc>
Sample SGM file
<?Pub /_gtinsert>
<body numcols="2">
<!--#start#-->
<chapter id="Chapter_4__Procedures">
<title>Procedures</title>
<section>
<title>Introduction</title>
<!--#stop#-->
<para0 verdate="7 Never 2012" verstatu
<title>Description</title>
<para>This chapterfor the following:

It turns out the problem is that &Ch(?<EntityNumber>\d+?[-\d+]?)? matched &Ch and then one or more, but as few as possible, digits (with \d+?) and then an optional single -, digit or + symbol. That is, after &Ch, only 1 digit was matched (as there is always a digit in your cases) and then a - was matched if it followed, and then matching stopped.
Use
Dim rx = New Regex("&Ch(?<EntityNumber>\d+(?:-\d+)*);")
See the regex demo and the regex graph:

Related

How to get specific word in an Email thanks to regex and insert that word in a file name with Outlook VBA?

I would like to get specific word inside an Email (should be the signature of the Email) and store that word in a file name thanks to Outlook VBA.
I already have something that work to add the date in the file name but I would like to add the name (from the signature) from whom the Email has been sent.
This is the part that already work:
Public Sub Process_SAU(Item As Outlook.MailItem)
Dim object_attachment As Outlook.Attachment
Dim saveFolder As String
' Folder location when I want to save my file
saveFolder = "C:\Users\gdeange1\Desktop\suggestion updates\UpdateBusinessInformation\Processed_By_Bulks"
For Each object_attachment In Item.Attachments
' Criteria to save .doc files only
If InStr(object_attachment.DisplayName, ".json") Then
object_attachment.SaveAsFile saveFolder & "\" & Format(Now(), "dd-mm-yyyy") & "_" & object_attachment.DisplayName
End If
Next
End Sub
And I found something that could do the trick:
Function ExtractText(Str As String) ' As String
Dim regEx As New RegExp
Dim NumMatches As MatchCollection
Dim M As Match
'this pattern looks for 4 digits in the subject
regEx.Pattern = "((.*))[A-Z]{0}(Danièle Loubert|Véronique Mantha|Julie-Emmanuelle Carle|Nicolas Bertrand|Martine Jean)"
' use this if you need to use different patterns.
' regEx.Pattern = regPattern
Set NumMatches = regEx.Execute(Str)
If NumMatches.Count = 0 Then
ExtractText = ""
Else
Set M = NumMatches(0)
ExtractText = M.SubMatches(0)
End If
code = ExtractText
End Function
I tried something like this :
Public Sub Process_SAU(Item As Outlook.MailItem)
Function ExtractText(Str As String) ' As String
Dim regEx As New RegExp
Dim NumMatches As MatchCollection
Dim M As Match
'this pattern looks for 4 digits in the subject
regEx.Pattern = "((.*))[A-Z]{0}(Danièle Loubert|Véronique Mantha|Julie-Emmanuelle Carle|Nicolas Bertrand|Martine Jean)"
' use this if you need to use different patterns.
' regEx.Pattern = regPattern
Set NumMatches = regEx.Execute(Str)
If NumMatches.Count = 0 Then
ExtractText = ""
Else
Set M = NumMatches(0)
ExtractText = M.SubMatches(0)
End If
code = ExtractText
Public Sub Process_SAU(Item As Outlook.MailItem)
Dim object_attachment As Outlook.Attachment
Dim saveFolder As String
' Folder location when I want to save my file
saveFolder = "C:\Users\gdeange1\Desktop\suggestion updates\UpdateBusinessInformation\Processed_By_Bulks"
For Each object_attachment In Item.Attachments
' Criteria to save .doc files only
If InStr(object_attachment.DisplayName, ".json") Then
object_attachment.SaveAsFile saveFolder & "\" & Format(Now(), "dd-mm-yyyy") & "_" & code & "_" & object_attachment.DisplayName
End If
Next
End Function
End Sub
But I got an error : Compile Error -> Expected End Sub
Now can I mix both of these codes in order to have my file name something like this :
object_attachment.SaveAsFile saveFolder & "\" & Format(Now(), "dd-mm-yyyy") & "_" & code & "_" & object_attachment.DisplayName
That would give for example: 23-02-2021_Martine Jean_update.json (I'don't if the space has to be removed).
Thank you in advance for your help, really appreciate it!
G
UPDATED (Not sure that this work)
Public Sub Process_SAU(Item As Outlook.MailItem)
Dim object_attachment As Outlook.Attachment
Dim saveFolder As String
Dim Code As Code
' Folder location when I want to save my file
saveFolder = "C:\Users\gdeange1\Desktop\suggestion updates\UpdateBusinessInformation\Processed_By_Bulks"
For Each object_attachment In Item.Attachments
' Criteria to save .doc files only
If InStr(object_attachment.DisplayName, ".json") Then
object_attachment.SaveAsFile saveFolder & "\" & Format(Now(), "dd-mm-yyyy") & "_" & Code & "_" & object_attachment.DisplayName
End If
Next
End Sub
Function ExtractText(Str As String) ' As String
Dim regEx As New RegExp
Dim NumMatches As MatchCollection
Dim M As Match
regEx.Pattern = "((.*))[A-Z]{0}(Danièle Loubert|Véronique Mantha|Julie-Emmanuelle Carle|Nicolas Bertrand|Martine Jean)"
Set NumMatches = regEx.Execute(Str)
If NumMatches.Count = 0 Then
ExtractText = ""
Else
Set M = NumMatches(0)
ExtractText = M.SubMatches(0)
End If
Code = ExtractText
End Function
Functions and subs should be seperated into their own procedures. Here is an example where there is a main sub that calls a function. Note the sub passes the input to the function which ends up bringing the output to your sub.
Notice each macro stands alone. You do not nest the entire code for the function inside your main macro
Sub Master_Macro()
Dim Output As Double
'Call Function with Input
'Work with output in current sub
Output = Add_10(30)
MsgBox Output
End Sub
Public Function Add_10(Target As Double) As Double
'Takes a input (Target) and returns value + 10
Add_10 = Target + 10
End Function
With the code you shared, start with the first macro and simply call the function while passing in the right parameter. From there, the idea you suggested would work.

Select block of text and merge into new document

Hi I've looked online and just can't find the right answer. I have files which have <!--#start#--> and <!--#stop#--> through out them.
I only want the contents between those two strings. The code I have still prints out all the data including the start/stop lines.
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
'values from GUI form
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
' Evaluate all the lines in the file.
' Set insideBlock to false
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
' Evaluate if the next line is <!Stop>
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
Here in another part of my code I'm grabbing the entity name from the main document and replacing it with the text between start and stop
Dim strMasterDoc = File.ReadAllText(existingMasterFilePath)
Dim newMasterFileBuilder As New StringBuilder(strMasterDoc)
'Create a regex with a named capture group.
Dim rx = New Regex("&" & Prefix & "_Ch(?<EntityNumber>\d+(?:-\d+)*)[;]")
Dim reg1 As String
reg1 = rx.ToString
Debug.Write("Chapter Entity: " & reg1)
Dim rxMatches = rx.Matches(strMasterDoc)
For Each match As Match In rxMatches
Dim entity = match.ToString
'Build the file name using the captured digits from the entity in the master file
Dim entityFileName = Prefix & $"_Ch{match.Groups("EntityNumber")}.sgm"
Dim entityFilePath = Path.Combine(searchDir, entityFileName)
'Check if the entity file exists and use its contents
'to replace the entity in the copy of the master file
'contained in the StringBuilder
If File.Exists(entityFilePath) Then
Dim entityFileContents As String = File.ReadAllText(entityFilePath)
newMasterFileBuilder.Replace(entity, entityFileContents)
End If
Next
'write the processed contents of the master file to a different file
File.WriteAllText(newMasterFilePath, newMasterFileBuilder.ToString)
As Mentioned in my comment i think the problem might be that the lines in lines() include the carriage return and line feed characters. have you tried to use line.Contains(startMark) instead of testing for equality?
Also; is there a specific reason that you are reading all the lines and storing them first, before looping over them to check them? I think it would be more efficient to read, check and write in one go:
Using SR As New StreamReader(YourFilePath)
Using sw As New StreamWriter(OtherFilePath)
Do Until SR.EndOfStream
line = SR.ReadLine()
If line.contains(startMark) Then
' start writing at the line below
insideBlock = True
' Evaluate if the next line is <!Stop>
ElseIf line.Contains(stopMark) Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Loop
End Using
End Using

How to find a multiline match with RegEx in a text file using VBA?

I have a .txt report that I need to extract certain elements (using RegEx groups) that occur multiple times throughout the file. In order to capture the specific elements I need, I have to match the text in a textline against a pattern then look ahead to see if the next line matches a different pattern. If both are true, then return values of those groups into an array.
I am working in Excel 2013 using VBA and the RegExp library. I was able to write a multiline pattern in RegEx along with all the relevant fields I need in named groups. I am able to open a text file and load the data to a string. What I can't do is both at the same time. I've only been able to identify one line of text against one pattern.
Edit: Here is some sample text that has been redacted from the original source
DAILY CONTROL REPORT WORK OF: 08/07/19
ACQUIRING PAGE: 1
CUSTOMER : ACME CORP
CUSTOMER ID : 0000000001
Other miscellaneous data
DAILY CONTROL REPORT WORK OF: 08/07/19
ISSUING PAGE: 2
CUSTOMER : ACME CORP
CUSTOMER ID : 0000000001
Other miscellaneous data
DAILY CONTROL REPORT WORK OF: 08/07/19
NET SETTLEMENT SUMMARY PAGE: 3
CUSTOMER : ACME CORP
CUSTOMER ID : 0000000001
Other miscellaneous data
DAILY CONTROL REPORT WORK OF: 08/07/19
ACQUIRING PAGE: 4
CUSTOMER : ACME INC
CUSTOMER ID : 0000000002
Other miscellaneous data
DAILY CONTROL REPORT WORK OF: 08/07/19
ISSUING PAGE: 5
CUSTOMER : ACME INC
CUSTOMER ID : 0000000002
Other miscellaneous data
DAILY CONTROL REPORT WORK OF: 08/07/19
NET SETTLEMENT SUMMARY PAGE: 6
CUSTOMER : ACME INC
CUSTOMER ID : 0000000002
Other miscellaneous data
Dim rgx As Object
Set rgx = CreateObject("VBScript.RegExp")
Dim MyFolder As String
Dim MyFile As String
Dim TextLine As String
Dim strPattern As String
Dim strReport As String
Dim MyArray() As Variant
Dim i As Integer
MyFolder = "C:\Reports\Samples\"
MyFile = "MySampleFile.txt"
strPattern = "WORK OF: (?<WORKOF>\d\d\/\d\d\/\d\d)\s+\n\s+NET SETTLEMENT
SUMMARY.+\n.{22}(?<NAME>.+\b)\s+\n.{22}(?<NUM>\d+)\s+\n"
With rgx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Open (MyFolder & MyFile) For Input As #1
Do Until EOF(1)
Line Input #1, TextLine
strReport = strReport & TextLine
Loop
Close #1
i = 0
If rgx.Test(strReport) Then
ReDim Preserve MyArray(0 to i, 0 to 2)
MyArray(i, 0) = rgx.Replace(strReport, WORKOF)
MyArray(i, 1) = rgx.Replace(strReport, NAME)
MyArray(i, 2) = rgx.Replace(strReport, NUM)
i = i +1
End IF
There are a few things I know are wrong with this code:
1. The strReport string does not include CrLF at end of each line
2. The rgx.Test statement should loop through strReport looking for a
multiline match but the way it is written, it would always fail the test.
3. I am not sure how the array would be populated but it would end up with several hundred records, each with 3 values {WORKOF, NAME, NUM}
4. WORKOF is in the header of every page but the string "NET SETTLEMENT SUMMARY" only occurs about every 3rd or 4th page. That is where the information I want to capture resides.
I don't have your text files but I created a test input file with the following content:
WORK OF: 12/34/56 NET SETTLEMENT SUMMARY name1 789
WORK OF: 01/23/45 NET SETTLEMENT SUMMARY name1 6789
I could match the pattern with the following code:
Option Explicit
' Include: Tools > References > Microsoft VBScript Regular Expressions 5.5
Public Sub FindPatternInTextFile_Test()
Dim varResult As Variant: varResult = FindPatternInTextFile("C:\Reports\Samples\MySampleFile.txt", "WORK OF: (\d\d\/\d\d\/\d\d)\s+NET SETTLEMENT SUMMARY\s+(.+\b)\s+(\d+)")
End Sub
Public Function FindPatternInTextFile(strPath As String, strPattern As String) As Variant
Dim strContent As String: strContent = ReadWholeTextFile(strPath)
Dim rgxPattern As RegExp: Set rgxPattern = CreateRegex(strPattern)
Dim mtcFound As MatchCollection: Set mtcFound = rgxPattern.Execute(strContent)
If 0 < mtcFound.Count Then
Dim strResult() As String: ReDim strResult(0 To mtcFound.Count - 1, 0 To 2)
Dim i As Long: For i = 0 To mtcFound.Count - 1
strResult(i, 0) = mtcFound(i).SubMatches(0)
strResult(i, 1) = mtcFound(i).SubMatches(1)
strResult(i, 2) = mtcFound(i).SubMatches(2)
Next i
FindPatternInTextFile = strResult
Else
FindPatternInTextFile = vbNullString
End If
End Function
Public Function ReadWholeTextFile(strPath As String) As String
Dim strResult As String: strResult = vbNullString
Open strPath For Input As #1
Do Until EOF(1)
Dim strLine As String: strLine = vbNullString
Line Input #1, strLine
strResult = strResult & strLine & vbCrLf
Loop
Close #1
ReadWholeTextFile = strResult
End Function
Public Function CreateRegex(strPattern As String) As RegExp
Dim rgxResult As RegExp: Set rgxResult = New RegExp
With rgxResult
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
Set CreateRegex = rgxResult
End Function
You will have to customize the regex pattern based on your actual data.

Finding a word within a PDF then returning 11 characters after that word?

I have my code that searches each PDF page of a PDF document that has the word Data_ID.
This is on every other page in this PDF document and the way it changes is like this:
data_id 400M549822
data_id 400M549233
ETC..
So right now my console is returning all the times it finds the string data_id but I also want it to return those characters after it...
This is what I have so far:
Imports Bytescout.PDFExtractor
Imports System.IO
Imports System.Text.RegularExpressions
Module Module1
Class PageType
Property Identifier As String
End Class
Sub Main()
Dim direcory = "C:\Users\XBorja.RESURGENCE\Desktop\one main\"
Dim pageTypes As New List(Of PageType)
Dim ids = "data_id"
Dim resultstring As String
resultstring = Regex.Match(ids, "(?<=^.{1}).*(?=.{5}$)").Value
Dim currentPageTypeName = "unknown"
For Each inputfile As String In Directory.GetFiles(direcory)
For i = 0 To ids.Length - 1
pageTypes.Add(New PageType With {.Identifier = ids(i)})
Next
Dim extractor As New TextExtractor()
extractor.LoadDocumentFromFile(inputfile)
Dim pageCount = extractor.GetPageCount()
For i = 0 To pageCount - 1
' ' Find the type of the current page
' ' If it is not present on the page, then the last one found will be used.
For Each pt In pageTypes
Console.WriteLine(resultstring)
Next
Next
Next
End Sub
End Module
The resultstring is what I was trying to use with regex but it's only counting the positions within data_id and not the ones after it.
So how would I do this so that it returns the following 10 characters (excluding space) after the word data_id ??
Return 11 characters include space before:
'Dim ids = "data_id 400M549822"
Dim ids = "data_id 400M549233"
Dim resultstring = Regex.Match(ids, "(?<=data_id)(\s\w{10})$").Value
Console.WriteLine(resultstring)
Output:
400M549233
Some notes:
– ?<= = positive look behind
– \s = one space
– \w{10} = 10 word characters include A->Z, a->z, 0->9, _

VBA Regular expression with string split

Can anyone please help me out with Vba macro.
I'm using the below mentioned code. The task is to read a notepad file which contains contents and extract a certain string which looks like "Z012345" and paste them in excel row wise such cell A1 will Z067859 and A2 would be Z002674 etc.,
A sample of how the contents in the notepad file looks like
Contents:
RAF0A123 Full data len= 134
ABATWER01 Recent change by VT0123123 on 11/12/17-11:50
INCLUDE(STELLER Z067859 Z002674 Z004671 Z003450 Z005433 Z023123 Z034564 Z034554 Z043212 Z010456 Z014567
Z027716 Z028778 Z029439 Z029876 Z035766 Z036460 Z038544 Z046456 Z047680 Z052907 Z053145 Z074674 Z094887
VBA code:
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
myFile = "C:\Users\sample.txt"
Open myFile For Input As #1
With regex
.Pattern = "Z0[0-9]+"
.Global = Trueq
End With
Set matches = regex.Execute(Input)
For Each Match In matches
Range("A1:A4000").Value = Match.Value
Next Match
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
End Sub
Expected output:
Excel output column A should contain the below:
Z067859
Z002674
Z004671
Z003450
Z005433
Z023123
Z034564
Z034554
Z043212
Z010456
Z014567
Z027716
Z028778
Z029439
Z029876
Z035766
Z036460
Z038544
Z046456
Z047680
Z052907
Z053145
Z074674
Z094887
Could anyone help me out to write a macro to perform the task?
Rather than reading one line at a time, I would rather read the entire file into a string and then find the string and paste it. Sample code
Dim myFile As String, regex As Object, str As String, ctr As Long
myFile = "C:\Users\sample.txt"
With CreateObject("Scripting.FileSystemObject")
str = .OpenTextFile(myFile, 1).ReadAll
End With
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Set matches = regex.Execute(str)
ctr = 1
For Each Match In matches
Sheet1.Range("A" & ctr).Value2 = Match
ctr = ctr + 1
Next Match
I actually think your code is 85% there. I see a couple of things wrong.
1) You need to read the file before you try to output to Excel. In your code it seems you read the file after any activity in Excel
2) You are putting the same value in every single cell from A1 to A1000, overwriting them each time. I believe you want to loop down and put each value in a cell.
3) You're passing a variable that doesn't even exist to your regex
A couple of changes, and this might do it:
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
myFile = "C:\Users\sample.txt"
Open myFile For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Set matches = regex.Execute(text)
Dim row As Long
row = 1
For Each Match In matches
Cells(row, 1).Value2 = Match
row = row + 1
Next Match
End Sub
Please try the below and let me know it meets your requirement
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim str As String
Dim LineArray() As String
Dim DataArray() As String
Dim TempArray() As String
Dim rw As Long, col As Long
Dim FileContent As String
Set regex = CreateObject("vbscript.regexp")
Dim allMatches As Object
Delimiter = " "
myFile = "Path\sample.txt"
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Open myFile For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
LineArray() = Split(text, vbCrLf)
i = 1
For x = LBound(LineArray) To UBound(LineArray)
If Len(Trim(LineArray(x))) <> 0 Then
TempArray = Split(LineArray(x), Delimiter)
col = UBound(TempArray)
ReDim Preserve DataArray(col, rw)
For y = LBound(TempArray) To UBound(TempArray)
Set allMatches = regex.Execute(TempArray(y))
Range("A" & i).Value = allMatches.Item(0)
i = i + 1
Next y
End If
rw = rw + 1
Next x
Close #1
End Sub
Thanks