VBS- String split multiple on multiple lines into array - regex

I'm trying to create a function that will take a string which could be over multiple lines, e.g.:
"declare notThese
declare orThis
hello = $notThis#butthis$
butNot= $ButNotThis$
andDefNot = getDate()"
And search through it, pulling out {string1}'s from all parts like
${whatever}#{string1}$
and then pushing them into an array.
How would I archive this? Would it be through regex or is it simpler than that?
Also would it make a difference if the string renders on multiple lines like above?

You can do it through regex. Multi-line or not does not play a role in this case.
Function ExtractStrings(input)
Dim re, matches, match, i, output
Set re = new RegExp
re.Pattern = "\$[^#]+#([^$]+)\$"
re.Global = True
Set matches = re.Execute(input)
ReDim output(matches.Count - 1)
i = 0
For Each match in matches
output(i) = match.SubMatches(0)
i = i + 1
Next
ExtractStrings = output
End Function

You can do it via the Split function:
Dim sLinesOfText As String
sLinesOfText = "Insert multiple lines of text here"
Dim aLines() As String
Dim iLine As Integer
iLine = 0
aLines = Split(sLinesOfText, vbCrLf, , vbTextCompare)
Do While iLine < UBound(aLines)
Debug.Print aLines(iLine)
iLine = iLine + 1
Loop

Related

Removing leading whitespace using VBA

I am trying to remove leading whitespace from a word " 00000000000000231647300000000002KK".
Below is my VBA code
Option Explicit
Sub myfunction()
Dim getarray, getarray1 As Variant
Dim Text As String
Dim RegularText
getarray = Sheets("Sheet1").Range("A1:A4").Value
getarray1 = getarray
Set RegularText = New regexp
RegularText.Global = True
RegularText.MultiLine = True
RegularText.Pattern = "(^\\s+)"
Text = CStr(getarray(1, 1))
getarray1(1, 1) = RegularText.Replace(getarray(1, 1), "")
Sheets("Sheet1").Range("B1:B4").Value = getarray1
End Sub
However above code fails to remove the leading whitespace from my word.
Below is the excel workbook with result and above code
https://easyupload.io/jv6n2p
If you could help to understand why my code is failing to remove leading whitespace, it will be very helpful.
Thanks for your time
There are a few things wrong with the original code.
RegularText.Pattern = "(^\\s+)"
Explanations from regex101.com.
(^\\s+) pattern:
Basically, the first backslash is escaping the second backslash. This tells the RegEx to treat the second \ as a normal character. (^\\s+) is grouping leading \s characters together not whitespace.
(^\s+) pattern:
RegularText.MultiLine = True
The MultiLine property indicates every line in a value should be searched not row in an array. This doesn't seem to be the intended result. So set it to false.
`RegularText.MultiLine = False`
Range("A1:A4").Value is 1 row by 4 columns and Range("B1:B4") is 1 column by 4 rows. In my examples I will use Range("A2:D2") for simplicity.
Sub RegExRemoveTrailingSpace()
Dim Data As Variant
Data = Sheets("Sheet1").Range("A1:A4").Value
Dim RegularText As New RegExp
RegularText.Global = False
RegularText.Pattern = "(^\s+)"
[b4] = RegularText.Replace([A1], "")
Dim r As Long, c As Long
For r = 1 To UBound(Data)
For c = 1 To UBound(Data, 2)
Data(r, c) = RegularText.Replace(Data(r, c), "")
Next
Next
Sheets("Sheet1").Range("A2:D2").Value = Data
End Sub
We could just use LTrim() to remove the leading spaces from the string.
Sub LTrimTrailingSpace()
Dim Data As Variant
Data = Sheets("Sheet1").Range("A1:A4").Value
Dim r As Long, c As Long
For r = 1 To UBound(Data)
For c = 1 To UBound(Data, 2)
Data(r, c) = LTrim(Data(r, c))
Next
Next
Sheets("Sheet1").Range("A2:D2").Value = Data
End Sub

VBA Regular expression with string split

Can anyone please help me out with Vba macro.
I'm using the below mentioned code. The task is to read a notepad file which contains contents and extract a certain string which looks like "Z012345" and paste them in excel row wise such cell A1 will Z067859 and A2 would be Z002674 etc.,
A sample of how the contents in the notepad file looks like
Contents:
RAF0A123 Full data len= 134
ABATWER01 Recent change by VT0123123 on 11/12/17-11:50
INCLUDE(STELLER Z067859 Z002674 Z004671 Z003450 Z005433 Z023123 Z034564 Z034554 Z043212 Z010456 Z014567
Z027716 Z028778 Z029439 Z029876 Z035766 Z036460 Z038544 Z046456 Z047680 Z052907 Z053145 Z074674 Z094887
VBA code:
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
myFile = "C:\Users\sample.txt"
Open myFile For Input As #1
With regex
.Pattern = "Z0[0-9]+"
.Global = Trueq
End With
Set matches = regex.Execute(Input)
For Each Match In matches
Range("A1:A4000").Value = Match.Value
Next Match
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
End Sub
Expected output:
Excel output column A should contain the below:
Z067859
Z002674
Z004671
Z003450
Z005433
Z023123
Z034564
Z034554
Z043212
Z010456
Z014567
Z027716
Z028778
Z029439
Z029876
Z035766
Z036460
Z038544
Z046456
Z047680
Z052907
Z053145
Z074674
Z094887
Could anyone help me out to write a macro to perform the task?
Rather than reading one line at a time, I would rather read the entire file into a string and then find the string and paste it. Sample code
Dim myFile As String, regex As Object, str As String, ctr As Long
myFile = "C:\Users\sample.txt"
With CreateObject("Scripting.FileSystemObject")
str = .OpenTextFile(myFile, 1).ReadAll
End With
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Set matches = regex.Execute(str)
ctr = 1
For Each Match In matches
Sheet1.Range("A" & ctr).Value2 = Match
ctr = ctr + 1
Next Match
I actually think your code is 85% there. I see a couple of things wrong.
1) You need to read the file before you try to output to Excel. In your code it seems you read the file after any activity in Excel
2) You are putting the same value in every single cell from A1 to A1000, overwriting them each time. I believe you want to loop down and put each value in a cell.
3) You're passing a variable that doesn't even exist to your regex
A couple of changes, and this might do it:
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
myFile = "C:\Users\sample.txt"
Open myFile For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
Close #1
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Set matches = regex.Execute(text)
Dim row As Long
row = 1
For Each Match In matches
Cells(row, 1).Value2 = Match
row = row + 1
Next Match
End Sub
Please try the below and let me know it meets your requirement
Sub Demo()
Dim myFile As String, text As String, textline As String
Dim str As String
Dim LineArray() As String
Dim DataArray() As String
Dim TempArray() As String
Dim rw As Long, col As Long
Dim FileContent As String
Set regex = CreateObject("vbscript.regexp")
Dim allMatches As Object
Delimiter = " "
myFile = "Path\sample.txt"
With regex
.Pattern = "Z0[0-9]+"
.Global = True
End With
Open myFile For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline
Loop
LineArray() = Split(text, vbCrLf)
i = 1
For x = LBound(LineArray) To UBound(LineArray)
If Len(Trim(LineArray(x))) <> 0 Then
TempArray = Split(LineArray(x), Delimiter)
col = UBound(TempArray)
ReDim Preserve DataArray(col, rw)
For y = LBound(TempArray) To UBound(TempArray)
Set allMatches = regex.Execute(TempArray(y))
Range("A" & i).Value = allMatches.Item(0)
i = i + 1
Next y
End If
rw = rw + 1
Next x
Close #1
End Sub
Thanks

Vba: Regular expression to count the number of words in a string delimited by special characters

Need some help writing a regular expression to count the number of words in a string (Please note the data is a html string, which needs to be placed into a spreadsheet) when separated either by any special characters like . , - , +, /, Tab etc. Count should exclude special characters.
**Original String** **End Result**
Ex : One -> 1
One. -> 1
One Two -> 2
One.Two -> 2
One Two. -> 2
One.Two. -> 2
One.Tw.o -> 3
Updated
I think you asked a valuable question and this downvoting is not fair!
Function WCount(ByVal strWrd As String) As Long
'Variable declaration
Dim Delimiters() As Variant
Dim Delimiter As Variant
'Initialization
Delimiters = Array("+", "-", ".", "/", Chr(13), Chr(9)) 'Define your delimiter characters here.
'Core
For Each Delimiter In Delimiters
strWrd = Replace(strWrd, Delimiter, " ")
Next Delimiter
strWrd = Trim(strWrd)
Do While InStr(1, strWrd, " ") > 0
strWrd = Replace(strWrd, " ", " ")
Loop
WCount = UBound(Split(strWrd, " ")) + 1
End Function
________________
You can use this function as a UDF in excel formulas or can use in another VBA codes.
Using in formula
=WCOUNT("One.Two.Three.") or =WCOUNT($A$1") assuming your string is in A1 cell.
Using in VBA
(With assume passing your string with Str argument.)
Sub test()
Debug.Print WCount(Str)
End Sub
Regards.
Update
I have test your text as shown below.
copy your text in a Cell of Excel as shown.
The code updated for Line break and Tab characters and count your string words correctly now.
Try this code, all necessary comments are in code:
Sub SpecialSplit()
Dim i As Long
Dim str As String
Dim arr() As String
Dim delimeters() As String
'here you define all special delimeters you want to use
delimetres = Array(".", "+", "-", "/")
For i = 1 To 9
str = Cells(i, 1).Value
'this will protect us from situation where last character is delimeter and we have additional empty string
str = Left(str, Len(str) - 1)
'here we replace all special delimeters with space to simplify
For Each delimeter In delimetres
str = Replace(str, delimeter, " ")
Next
arr = Split(str)
Cells(i, 2).Value = UBound(arr) - LBound(arr) + 1
Next
End Sub
With your posted data following RegExp is working correctly. Put this in General Module in Visual Basic Editor.
Public Function CountWords(strInput As String) As Long
Dim objMatches
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = "\w+"
Set objMatches = .Execute(strInput)
CountWords = objMatches.Count
End With
End Function
You have to use it like a normal formula. e.g. assuming data is in cell A1 function would be:
=CountWords(A1)
For your information, it can be also achieved through formula if number of characters are specific like so:
=LEN(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," ")))-LEN(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," "))," ",""))+1

Excel UDF for capturing numbers within characters

I have a variable text field sitting in cell A1 which contains the following:
Text;#Number;#Text;#Number
This format can keep repeating, but the pattern is always Text;#Number.
The numbers can vary from 1 digit to n digits (limit 7)
Example:
Original Value
MyName;#123;#YourName;#3456;#HisName;#78
Required value:
123, 3456, 78
The field is too variable for excel formulas from my understanding.
I tried using regexp but I am a beginner when it comes to coding. if you can break down the code with some explanation text, it would be much appreciated.
I have tried some of the suggestions below and they work perfectly. One more question.
Now that I can split the numbers from the text, is there any way to utilize the code below and add another layer, where we split the numbers into x cells.
For example: once we run the function, if we get 1234, 567 in the same cell, the function would put 1234 in cell B2, and 567 in cell C2. This would keep updating all cells in the same row until the string has exhausted all of the numbers that are retrieved from the function.
Thanks
This is the John Coleman's suggested method:
Public Function GetTheNumbers(st As String) As String
ary = Split(st, ";#")
GetTheNumbers = ""
For Each a In ary
If IsNumeric(a) Then
If GetTheNumbers = "" Then
GetTheNumbers = a
Else
GetTheNumbers = GetTheNumbers & ", " & a
End If
End If
Next a
End Function
If the pattern is fixed, and the location of the numbers never changes, you can assume the numbers will be located in the even places in the string. This means that in the array result of a split on the source string, you can use the odd indexes of the resulting array. For example in this string "Text;#Number;#Text;#Number" array indexes 1, 3 would be the numbers ("Text(0);#Number(1);#Text(2);#Number(3)"). I think this method is easier and safer to use if the pattern is indeed fixed, as it avoids the need to verify data types.
Public Function GetNums(src As String) As String
Dim arr
Dim i As Integer
Dim result As String
arr = Split(src, ";#") ' Split the string to an array.
result = ""
For i = 1 To UBound(arr) Step 2 ' Loop through the array, starting with the second item, and skipping one item (using Step 2).
result = result & arr(i) & ", "
Next
If Len(result) > 2 Then
GetNums = Left(result, Len(result) - 2) ' Remove the extra ", " at the end of the the result string.
Else
GetNums = ""
End If
End Function
The numbers can vary from 1 digit to n digits (limit 7)
None of the other responses seems to take the provided parameters into consideration so I kludged together a true regex solution.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numsOnly(str As String, _
Optional delim As String = ", ")
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
numsOnly = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[0-9]{1,7}"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numsOnly = Join(nums, delim)
End If
End With
End Function
      
Regexp option that uses Replace
Sub Test()
Debug.Print StrOut("MyName;#123;#YourName;#3456;#HisName;#78")
End Sub
function
Option Explicit
Function StrOut(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(^|.+?)(\d{1,7})"
.Global = True
If .Test(strIn) Then
StrOut = .Replace(strIn, "$2, ")
StrOut = Left$(StrOut, Len(StrOut) - 2)
Else
StrOut = "Nothing"
End If
End With
End Function

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function