I have a text file containing lines, similar to these
000001 , Line 1 of text , customer 1 name
000002 , Line 2 of text , customer 2 name
000003 , Line 3 of text , customer 3 name
= = =
= = =
= = =
000087 , Line 87 of text, customer 87 name
= = =
= = =
001327 , Line 1327 of text, customer 1327 name
= = =
= = =
= = =
I can write a program that reads each line of the above file to convert it to the following format:
000001 , 1st Line , 1st Customer name
000002 , 2nd Line , 2nd Customer name
000003 , 3rd Line , 3rd Customer name
= = =
= = =
= = =
000087 , 87th Line, 87th Customer name
= = =
= = =
001327 , 1327th Line, 1327th Customer name
= = =
= = =
= = =
My Question: is there a straight forward method to achieve the same output using Regular expression?
I tried the following:
Dim pattern As String = "(\d{6}) , (Line \d+ of text) , (customer \d name)"
Dim replacement As String = " $1 , $2 Line , $3 Customer name "
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(my_input_file, replacement)
but the result is far from the desired output.
Please help
Your regex captures too much. The groups should capture only digits:
Dim pattern As String = "(\d{6}) , Line (\d+) of text , customer (\d+) name"
Also, as you want to replace the numbers with ordinal numbers, you should rather use String.Format to do the formatting (line by line):
Dim match as Match = rgx.match(my_input_file_line)
Dim outputLine as String = String.Format(" {0} , {1} Line , {2} Customer name", _
m.Groups(1).Value, GetOrdinal(m.Groups(2).Value), GetOrdinal(m.Groups(3).Value))
where GetOrdinal is a method that changes a string for number to an ordinal number.
Your matching groups are to big. What you want to match are the numbers.
Replace (\d{6}) , Line (\d+) of text , customer (\d+) name
by $1 , $2th Line , $3th Customer name
Then replace 1th by 1st
Then replace 2th by 2nd
Then replace 3th by 3rd
I do not know if it was your intention to match a real cutomer name itself and replace it in another order ... was it?
Then you could use (with global and multiline flags)
^(\d{6}) , Line (\d+) of text , ([^ ]+) (\d) ([^ ]+)$
and replace with $1 , $2th Line , $4th $3 $5
Tip: I allways use http://www.gskinner.com/RegExr/ to test my patterns and experiment with them!
Is there a reason for using regex? Maybe i have misuderstood the requirement, but it seems to be a fix format where only the first part matters, so you could use this simple query:
IEnumerable<string> lines = File.ReadLines(#"folder\input_text.txt");
IEnumerable<string> result = lines
.Where(l => l.Trim().Length > 0)
.Select(l => int.Parse(l.Split(',').First().Trim()))
.Select(num => string.Format("{0} , {1} Line , {1} Customer name"
, num.ToString("D6")
, num + (num == 1 ? "st" : num == 2 ? "nd" : "rd")));
You can use File.WriteAllLines to write the result to the output file:
File.WriteAllLines(#"folder\desired_output.txt", result);
Related
I am trying to have the user input a class number and name to pull up a list of information on that class I have on a file. I have figured out how to match the information using .toRegex. I can't figure out how to use the users input to find the match they need and not all matching in the file. I am very new to Regnex.
val pattern = """\d+\s+([A-Z]+).\s+(\d+)\s.+\s+\w.+""".toRegex()
val fileName = "src/main/kotlin/Enrollment.txt"
var lines = File(fileName).readLines()// reads every line on the file
do{
print("please enter class name")
var className = readLine()!!
print("please enter class number ")
var classNum = readLine()!!
for(i in 0..(lines.size-1) ){
var matchResult = pattern.find(lines[i])
if(matchResult != null) {
var (className,classNum) = matchResult.groupValues
println("className: $className, class number: $classNum ")
}
}
}while (readLine()!! != "EXIT") ```
example line from file
Name Num
0669 HELP 134 AN CV THING ETC 4.0 4.0 Smith P 001 0173 MTWTh 9:30A 10:30A 23 15 8 4.0
See MatchResult#groupValues reference:
This list has size of groupCount + 1 where groupCount is the count
of groups in the regular expression. Groups are indexed from 1 to
groupCount and group with the index 0 corresponds to the entire
match.
If the group in the regular expression is optional and there were no
match captured by that group, corresponding item in groupValues
is an empty string.
You need
var (_, className,classNum) = matchResult.groupValues
See Kotlin demo:
val lines = "0669 HELP 134 AN CV THING ETC 4.0 4.0 Smith P 001 0173 MTWTh 9:30A 10:30A 23 15 8 4.0 "
val pattern = """^\d+\s+([A-Z]+)\s+(\d+)""".toRegex()
var matchResult = pattern.find(lines)
if(matchResult != null) {
var (_, className,classNum) = matchResult.groupValues
println("className: $className, class number: $classNum ")
}
// => className: HELP, class number: 134
I simplified the regex a bit since find() does not require a full string match to
^\d+\s+([A-Z]+)\s+(\d+)
See the regex demo. Details:
^ - start of string
\d+ - one or more digits
\s+ - one or more whitespaces
([A-Z]+) - Group 1: one or more uppercase ASCII letters
\s+ - one or more whitespaces
(\d+) - Group 2: one or more digits
You need to use a variable in the pattern that you get from the user .readLine()
Use a loop to check each line with another loop checking if the patter is in that line. pattern.containMatchIn()
val className = readLine()!!.toUpperCase()
print("please enter class number ")
val classNum = readLine()!!
val pattern = """\s+\d+\s+$className.\s+$classNum""".toRegex()
for(i in 0..(lines.size-1) ) {
var matchResult = pattern.find(lines[i])
if(matchResult != null ){
if (pattern.containsMatchIn(lines[i])) {
println(lines[i])
}
}
}```
I'm trying to create a pattern for finding placeholders within a string to be able to replace them with variables later. I'm stuck on a problem to find all these placeholders within a string according to my requirement.
I already found this post, but it only helped a little:
Regex match ; but not \;
Placeholders will look like this
{&var} --> Variable stored in a dictionary --> dict("var")
{$prop} --> Property of a class cls.prop read by CallByName and PropGet
{#const} --> Some constant values by name from a function
Generally I have this pattern and it works well
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.pattern = "\{([#\$&])([\w\.]+)\}"
For example I have this string:
"Value of foo is '{&var}' and bar is '{$prop}'"
I get 2 matches as expected
(&)(var)
($)(prop)
I also want to add a formating part like in .Net to this expression.
String.Format("This is a date: {0:dd.mm.yyyy}", DateTime.Now());
// This is a date: 05.07.2019
String.Format("This is a date, too: {0:dd.(mm).yyyy}", DateTime.Now());
// This is a date, too: 05.(07).2019
I extended the RegEx to get that optional formatting string
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.pattern = "\{([#\$&])([\w\.]+):{0,1}([^\}]*)\}"
RegEx.Execute("Value of foo is '{&var:DD.MM.YYYY}' and bar is '{$prop}'")
I get 2 matches as expected
(&)(var)(DD.MM.YYYY)
($)(prop)()
At this point I noticed I have to take care for escapet "{" and "}", because maybe I want to have some brackets within the formattet result.
This does not work properly, because my pattern stops after "...{MM"
RegEx.Execute("Value of foo is '{&var:DD.{MM}.YYYY}' and bar is '{$prop}'")
It would be okay to add escape signs to the text before checking the regex:
RegEx.Execute("Value of foo is '{&var:DD.\{MM\}.YYYY}' and bar is '{$prop}'")
But how can I correctly add the negative lookbehind?
And second: How does this also works for variables, that should not be resolved, even if they have the correct syntax bus the outer bracket is escaped?
RegEx.Execute("This should not match '\{&var:DD.\{MM\}.YYYY\}' but this one '{&var:DD.\{MM\}.YYYY}'")
I hope my question is not confusing and someone can help me
Update 05.07.19 at 12:50
After the great help of #wiktor-stribiżew the result is completed.
As requested i provide some example code:
Sub testRegEx()
Debug.Print FillVariablesInText(Nothing, "Date\\\\{$var01:DD.\{MM\}.YYYY}\\\\ Var:\{$nomatch\}{$var02} Double: {#const}{$var01} rest of string")
End Sub
Function FillVariablesInText(ByRef dict As Dictionary, ByVal txt As String) As String
Const c_varPattern As String = "(?:(?:^|[^\\\n])(?:\\{2})*)\{([#&\$])([\w.]+)(?:\:([^}\\]*(?:\\.[^\}\\]*)*))?(?=\})"
Dim part As String
Dim snippets As New Collection
Dim allMatches, m
Dim i As Long, j As Long, x As Long, n As Long
' Create a RegEx object and execute pattern
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.pattern = c_varPattern
RegEx.MultiLine = True
RegEx.Global = True
Set allMatches = RegEx.Execute(txt)
' Start at position 1 of txt
j = 1
n = 0
For Each m In allMatches
n = n + 1
Debug.Print "(" & n & "):" & m.value
Debug.Print " [0] = " & m.SubMatches(0) ' Type [&$#]
Debug.Print " [1] = " & m.SubMatches(1) ' Name
Debug.Print " [2] = " & m.SubMatches(2) ' Format
part = "{" & m.SubMatches(0)
' Get offset for pre-match-string
x = 1 ' Index to Postion at least +1
Do While Mid(m.value, x, 2) <> part
x = x + 1
Loop
' Postition in txt
i = m.FirstIndex + x
' Anything to add to result?
If i <> j Then
snippets.Add Mid(txt, j, i - j)
End If
' Next start postition (not Index!) + 1 for lookahead-positive "}"
j = m.FirstIndex + m.Length + 2
' Here comes a function get a actual value
' e.g.: snippets.Add dict(m.SubMatches(1))
' or : snippets.Add Format(dict(m.SubMatches(1)), m.SubMatches(2))
snippets.Add "<<" & m.SubMatches(0) & m.SubMatches(1) & ">>"
Next m
' Any text at the end?
If j < Len(txt) Then
snippets.Add Mid(txt, j)
End If
' Join snippets
For i = 1 To snippets.Count
FillVariablesInText = FillVariablesInText & snippets(i)
Next
End Function
The function testRegEx gives me this result and debug print:
(1):e\\\\{$var01:DD.\{MM\}.YYYY(2):}{$var02
[0] = $
[1] = var02
[2] =
(1):e\\\\{$var01:DD.\{MM\}.YYYY
[0] = $
[1] = var01
[2] = DD.\{MM\}.YYYY
(2):}{$var02
[0] = $
[1] = var02
[2] =
(3): {#const
[0] = #
[1] = const
[2] =
(4):}{$var01
[0] = $
[1] = var01
[2] =
Date\\\\<<$var01>>\\\\ Var:\{$nomatch\}<<$var02>> Double: <<#const>><<$var01>> rest of string
You may use
((?:^|[^\\])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?}
To make sure the consecutive matches are found, too, turn the last } into a lookahead, and when extracting matches just append it to the result, or if you need the indices increment the match length by 1:
((?:^|[^\\])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?(?=})
^^^^^
See the regex demo and regex demo #2.
Details
((?:^|[^\\])(?:\\{2})*) - Group 1 (makes sure the { that comes next is not escaped): start of string or any char but \ followed with 0 or more double backslashes
\{ - a { char
([#$&]) - Group 2: any of the three chars
([\w.]+) - Group 3: 1 or more word or dot chars
(?::([^}\\]*(?:\\.[^}\\]*)*))? - an optional sequence of : and then Group 4:
[^}\\]* - 0 or more chars other than } and \
(?:\\.[^}\\]*)* - zero or more reptitions of a \-escaped char and then 0 or more chars other than } and \
} - a } char
Welcome to the site! If you need to only match balanced escapes, you will need something more powerful. If not --- I haven't tested this, but you could try replacing [^\}]* with [^\{\}]|\\\{|\\\}. That is, match non-braces and escaped brace sequences separately. You may need to change this depending on how you want to handle backslashes in your formatting string.
Ok, to start. I'm a little rusty on VBA, 3 + years since Ive need to use it.
In short, im struggling to extract text from a string. Im using regular expression to extract my department name and date from this string.
The Department will always fall between : and -.
I can't share the document due to security. But, I can explain the format and hopefully we can work from that.
Col A----Col B----Col C---Col D
Date(e)--Dept(e)--String--Duration
Where (e) means it was extracted from the string.
My code for the extraction, thus far, is below. Currently it will loop through all available rows and extract the department, but it always take the : and - with it! I can't seem to find a way to cut these out.
Any assistance?
I can probably work out the date bit eventually.
The final output from this code is ": Inbound Contacts -"
Where I need, "Inbound Contacts".
Sub stringSearch()
Dim ws As Worksheet
Dim lastRow As Long, x As Long
Dim matches As Variant, match As Variant
Dim Reg_Exp As Object
Set Reg_Exp = CreateObject("vbscript.regexp")
Reg_Exp.Pattern = "\:\s(\w.+)\s\-"
Set ws = Sheet2
lastRow = ws.Range("C" & Rows.Count).End(xlUp).Row
For x = 1 To lastRow
Set matches = Reg_Exp.Execute(CStr(ws.Range("C" & x).Value))
If matches.Count > 0 Then
For Each match In matches
ws.Range("B" & x).Value = match.Value
Next match
End If
Next x
End Sub
This is how to achieve what you want without regex, in general it should be a bit faster and way more understandable:
Sub TestMe()
Dim inputString As String
inputString = "Planning Unit: Inbound Contacts = Tuesday, 27/03/2018"
Debug.Print Split(Split(inputString, ":")(1), "=")(0)
End Sub
split the inputString by : and take the second part;
split the taken part by = and take the first part;
You are not accessing Group 1 value.
Instead of ws.Range("B" & x).Value = match.Value use
ws.Range("B" & x).Value = match.Submatches(0)
You may also enhance the regex a bit to
Reg_Exp.Pattern = ":\s*(\w.*?)\s*-"
This way, you will "trim" the Group 1 value. See the regex demo.
Details
: - a : char
\s* - 0+ whitespace chars
(\w.*?) - Group 1 (.Submatches(0)): a word char followed with any 0+ chars (other than line break chars) as few as possible (NOTE that \w does not match non-ASCII letters, probably you want to match any char that is not whitespace and not a -, then use [^\s-] instead of \w)
\s* - 0+ whitespace chars
- - a hyphen.
Regex:
You can use this Regex: ([\s\S]+?):\s*([\s\S]+?)\s*-\s*([A-z]+)\s*,\s*([0-9]{2}\/[0-9]{2}\/[0-9]{4})\b
And the demo
Code:
And this code:
Sub stringSearch()
Dim ws As Worksheet
Dim lastRow As Long, x As Long
Dim matches As Variant, match As Variant
Dim Reg_Exp As Object
Set Reg_Exp = CreateObject("vbscript.regexp")
Reg_Exp.Pattern = "([\s\S]+?):\s*([\s\S]+?)\s*-\s*([A-z]+)\s*,\s*([0-9]{2}\/[0-9]{2}\/[0-9]{4})\b"
Set ws = Sheet2
lastRow = ws.Range("C" & Rows.Count).End(xlUp).Row
For x = 1 To lastRow
Set matches = Reg_Exp.Execute(CStr(ws.Range("C" & x).Value))
If matches.Count > 0 Then
For Each match In matches
For i = 0 To match.SubMatches.Count - 1
Debug.Print match.SubMatches(i)
Next i
Next match
End If
Next x
End Sub
Result
This is the result on the immediate window:
+-------------------+
| Planning Unit |
| Inbound Contracts |
| Tuesday |
| 27/03/2018 |
| Planning Unit |
| Payments & Orders |
| Tuesday |
| 27/03/2018 |
| Planning Unit |
| Scheduling |
| Tuesday |
| 27/03/2018 |
+-------------------+
I'd use Left/Right/Mid and InStr/InStrRev instead of RegEx in this case.
For extracting the department:
Dim mainStr As String
Dim deptStr As String
mainStr = "Planning Unit: Inbound Contacts - Tuesday, 27/03/2018"
deptStr = Mid(mainStr, InStr(mainStr, ":") + 2)
deptStr = Left(deptStr, InStr(deptStr, "-") - 2)
For extracting the date:
Dim mainStr As String
Dim dateStr As String
mainStr = "Planning Unit: Inbound Contacts - Tuesday, 27/03/2018"
dateStr = Right(mainStr, Len(mainStr) - InStrRev(mainStr, " "))
To be honest, this kind of situation is common enough that you might want to write some sort of "extractText" function to get the text between delimiters. Here's the one I use.
Function extractText(str As String, leftDelim As String, rightDelim As String, _
Optional reverseSearch As Boolean = False) As String
'Extracts text between two delimiters in a string
'By default, searches for first instance of each delimiter in string from left to right
'To search from right to left, set reverseSearch = True
'If left delimiter = "", function returns text up to right delimiter
'If right delimiter = "", function returns text after left delimiter
'If left or right delimiter not found in string, function returns empty string
Dim leftPos As Long
Dim rightPos As Long
Dim leftLen As Long
If reverseSearch Then
leftPos = InStrRev(str, leftDelim)
rightPos = InStrRev(str, rightDelim)
Else
leftPos = InStr(str, leftDelim)
rightPos = InStr(str, rightDelim)
End If
leftPos = IIf(leftDelim = "", -1, leftPos)
rightPos = IIf(rightDelim = "", -1, rightPos)
leftLen = Len(leftDelim)
If leftPos > 0 Then
If rightPos = -1 Then
extractText = Mid(str, leftPos + leftLen)
ElseIf rightPos > leftPos Then
extractText = Mid(str, leftPos + leftLen, rightPos - leftPos - leftLen)
End If
ElseIf leftPos = -1 Then
If rightPos > 0 Then
extractText = Left(str, rightPos - 1)
End If
End If
End Function
The code in file abc is which needs to captured with Regex.
With TeWindow("tewindow").Tescreen("something").TeField("some")
.set "value"
.setToProperty "V"
.exist(0)
End With
This code should be replaced in abc with
'With TeWindow("tewindow").Tescreen("something").TeField("some")
myset("something_some"), "value"
mysetToProperty("something_some"), ""
myExist("something_some"), (0)
'End With
Following is the trial so far. I'm not able to make it to writing in the file.
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set testfile = objFSO.OpenTextFile("D:\test\testout4.txt", 1, True)
line = testfile.ReadAll
testfile.Close
sString = line
pat = "with[\s]{1,}tewindow\((.*?)\).tescreen\((.*?)\).tefield\((.*?)\)" '12
pat1 = "^\.[a-zA-Z]{1,}"
Call DeclareRegEx(objRE,pat)
If objRE.test(sString) Then
Set Matches = objRE.Execute(sString)
Set match = Matches(0)
intcount = match.SubMatches.Count
If intcount > 0 Then
For I = 1 To intcount-1
'If i = intcount-1 Then
objRef = objRef & match.SubMatches(I)
Next
Else '30
objRef = objRef & match.SubMatches(I) & "_"
End If
End If
call DeclareRegEx(objRE1, pat1)
If objRE1.Test(sString) Then
Set Matches1 = objRE1.Execute(sString)
For Each Match1 in Matches1
RetStr1 = Match1.Value
strplc = Right(RetStr1, Len(RetStr1) - 1)
actual = objRE1.Replace(RetStr1, "my" & strplc & "(" & objRef & ")")
MsgBox actual
Next
End If
Function DeclareRegEx(obj, pattern)
Set obj = New RegExp
obj.Global = True
obj.Multiline = True
obj.Pattern = pattern
obj.IgnoreCase = True
End Function
Suggestion for some other approach or regex is welcome.
well as the approach of finding the block , being captured by verbose regex is not seemed to be a generic in the code i tried something like the following..
take the file content into an array
2.find the line no of with and end with
3.run a loop to iterate the functions from the next line of the with till line before the end with.
it worked for me !
I have a system which generates 3 text (.txt) files on a daily basis, with 1000's of entries within each.
Once the text files are generated we run a vbscript (below) that modifies the files by entering data at specific column positions.
I now need this vbscript to do an additional task which is to separate a column in one of the text files.
So for example the TR201501554s.txt file looks like this:
6876786786 GFS8978976 I
6786786767 DDF78676 I
4343245443 SBSSK67676 I
8393372263 SBSSK56565 I
6545434347 DDF7878333 I
6757650000 SBSSK453 I
With the additional task of seperating the column, data will now look like this, with the column seperated at a specific position.
6876786786 GFS 8978976 I
6786786767 DDF 78676 I
4343245443 SBSSK 67676 I
8393372263 SBSSK 56565 I
6545434347 DDF 7878333 I
6757650000 SBSSK 453 I
I was thinking maybe I could add another "case" to accomplish this with maybe using a "regex" pattern, since the pattern would be only 3 companies to find
(DDF, GFS and SBSSK).
But after looking at many examples, I am not really sure where to start.
Could someone let me know how to accomplish this additional task in our vbscript (below)?
Option Explicit
Const ForReading = 1
Const ForWriting = 2
Dim objFSO, pFolder, cFile, objWFSO, objFileInput, objFileOutput,strLine
Dim strInputPath, strOutputPath , sName, sExtension
Dim strSourceFileComplete, strTargetFileComplete, objSourceFile, objTargetFile
Dim iPos, rChar
Dim fileMatch
'folder paths
strInputPath = "C:\Scripts\Test"
strOutputPath = "C:\Scripts\Test"
'Create the filesystem object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get a reference to the processing folder
Set pFolder = objFSO.GetFolder(strInputPath)
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
ProcessAFile cFile
Next
Sub ProcessAFile(objFile)
fileMatch = false
Select Case Left(objFile.Name,2)
Case "MV"
iPos = 257
rChar = "YES"
fileMatch = true
Case "CA"
iPos = 45
rChar = "OCCUPIED"
fileMatch = true
Case "TR"
iPos = 162
rChar = "EUR"
fileMatch = true
End Select
If fileMatch = true Then
Set objWFSO = CreateObject("Scripting.FileSystemObject")
Set objFileInput = objWFSO.OpenTextFile(objFile.Path, ForReading)
strSourceFileComplete = objFile.Path
sExtension = objWFSO.GetExtensionName(objFile.Name)
sName = Replace(objFile.Name, "." & sExtension, "")
strTargetFileComplete = strOutputPath & "\" & sName & "_mod." & sExtension
Set objFileOutput = objFSO.OpenTextFile(strTargetFileComplete, ForWriting, True)
Do While Not objFileInput.AtEndOfStream
strLine = objFileInput.ReadLine
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
objFileInput.Close
objFileOutput.Close
Set objFileInput = Nothing
Set objFileOutput = Nothing
Set objSourceFile = objWFSO.GetFile(strSourceFileComplete)
objSourceFile.Delete
Set objSourceFile = Nothing
Set objTargetFile = objWFSO.GetFile(strTargetFileComplete)
objTargetFile.Move strSourceFileComplete
Set objTargetFile = Nothing
Set objWFSO = Nothing
End If
End Sub
You could add a regular expression replacement to your input processing loop. Since you want to re-format the columns I'd do it with a replacement function. Define both the regular expression and the function in the global scope:
...
Set pFolder = objFSO.GetFolder(strInputPath)
Set re = New RegExp
re.Pattern = " ([A-Z]+)(\d+)( +)"
Function ReFormatCol(m, g1, g2, g3, p, s)
ReFormatCol = Left(" " & Left(g1 & " ", 7) & g2 & g3, Len(m)+2)
End Function
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
...
and modify the input processing loop like this:
...
Do While Not objFileInput.AtEndOfStream
strLine = re.Replace(objFileInput.ReadLine, GetRef("ReFormatCol"))
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
...
Note that you may need to change your iPos values, since splitting and re-formatting the columns increases the length of the lines by 2 characters.
The callback function ReFormatCol has the following (required) parameters:
m: the match of the regular expression (used to determine the length of the match)
g1, g2, g3: the three groups from the expression
p: the starting position of the match in the source string (but not used here)
s: the source string (but not used here)
The function constructs the replacement for the match from the 3 groups like this:
Left(g1 & " ", 7) appends 4 spaces to the first group (e.g. GFS) and trims it to 7 characters. This is based on the assumption that the first group will always be 3-5 characters long.→ GFS
" " & ... & g2 & g3 prepends the result of the above operation with 2 spaces and appends the other 2 groups (8978976 & ).→ GFS 8978976
Left(..., Len(m)+2) then trims the result string to the length of the original match plus 2 characters (to account for the additional 2 spaces inserted to separate the new second column from the former second, now third, column).→ GFS 8978976
At first replace by regex pattern (\d+)\s+([A-Z]+)(\d+)\s+(\w+) replace with $1 $2 $3 $4
and split that by +. then ok.
Live demo