Using VBA to parse and split a string with wildcards? - regex

I've got a sheet that contains item numbers of alphanumeric characters, and a bunch of other information in the row. Sometimes, similar items are combined into one row, and the difference on the item number will be shown with (X/Y) to choose which character to use at that point in the item number (not just X or Y, can be any alphanumeric character). In other words, these entries will look like this:
AB(X/Y)CD123
What I need is a way to separate that into the two item numbers ABXCD123 and ABYCD123. After that I'll have to create a row below the current one and copy the current row into it, with the changed item number, but that part is easy. I've tried using InStr to get the (X/Y) flagged, but I don't know how to pull out the X and Y characters to make new strings with them. I also don't know if a wildcard will work with InStr, and I'm not too familiar with RegEx.
Any ideas?

Here is s little introduction to regex¹ in a UDF².
Function partNums(str As String, _
Optional num As Integer = 1)
Dim tmp As String
Static rgx As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
partNums = vbNullString
With rgx
.Global = True
.IgnoreCase = True
.MultiLine = False
.Pattern = "\([A-Z]{1}/[A-Z]{1}\)"
If .Test(str) Then
tmp = .Execute(str)(0)
Select Case num
Case 2
tmp = Mid(tmp, 4, 1)
Case Else
tmp = Mid(tmp, 2, 1)
End Select
partNums = .Replace(str, tmp)
End If
End With
End Function
In B2:B3 as,
=partNums(A2)
=partNums(A3,2)
            
Here is a largely duplicated UDF that handles from 1 to 3 characters.
Function partNums(str As String, _
Optional num As Integer = 1)
Dim tmp As String
Static rgx As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
partNums = vbNullString
With rgx
.Global = True
.IgnoreCase = True
.MultiLine = False
.Pattern = "\([A-Z]{1,3}/[A-Z]{1,3}\)"
If .Test(str) Then
tmp = .Execute(str)(0)
tmp = Split(Replace(Replace(tmp, Chr(40), vbNullString), Chr(41), vbNullString), Chr(47))(num - 1)
partNums = .Replace(str, tmp)
End If
End With
End Function
            
¹ regex questions can usually be answered by the solutions in How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops.
² A User Defined Function (aka UDF) is placed into a standard module code sheet. Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the function code into the new module code sheet titled something like Book1 - Module1 (Code). Tap Alt+Q to return to your worksheet(s).

Related

VBA - Modify sheet naming from source file

I received help in the past for an issue regarding grabbing a source file name and naming a newly created worksheet the date from said source file name, i.e. "010117Siemens Hot - Cold Report.xls" and outputting "010117".
However the code only works for file names with this exact format, for example, file named "Siemens Hot - Cold Report 010117.xls", an error occurs because the newly created sheet does not find the date in the source file.
CODE
Application.ScreenUpdating = False
Dim n As Double
Dim wksNew As Excel.Worksheet
Dim src As Workbook
Set src = Workbooks.Open(filePath, False, False)
Dim srcRng As Range
With src.Worksheets("Sheet1")
Set srcRng = .Range(.Range("A1"), .Range("A1").End(xlDown).End(xlToRight))
End With
With ThisWorkbook
Set wksNew = .Worksheets.Add(After:=.Worksheets(.Sheets.Count))
n = .Sheets.Count
.Worksheets(n).Range("A1").Resize(srcRng.Rows.Count, srcRng.Columns.Count).Value = srcRng.Value
End With
' ======= get the digits part from src.Name using a RegEx object =====
' RegEx variables
Dim Reg As Object
Dim RegMatches As Variant
Set Reg = CreateObject("VBScript.RegExp")
With Reg
.Global = True
.IgnoreCase = True
.Pattern = "\d{0,9}" ' Match any set of 0 to 9 digits
End With
Set RegMatches = Reg.Execute(src.Name)
On Error GoTo CloseIt
If RegMatches.Count >= 1 Then ' make sure there is at least 1 match
ThisWorkbook.Worksheets(n).Name = RegMatches(0) ' rename "Sheet2" to the numeric part of the filename
End If
src.Close False
Set src = Nothing
So, my question is, how can I get my code to recognize the string of digits no matter its position in the file name?
Code
^\d{0,9}\B|\b\d{0,9}(?=\.)
Usage
I decided to make a function that can be called inside a cell as such: =GetMyNum(x) where x is a pointer to a cell (i.e. A1).
To get the code below to work:
Open Microsoft Visual Basic for Applications (ALT + F11)
Insert a new module (right click in the Project Pane and select Insert -> Module).
Click Tools -> References and find Microsoft VBScript Regular Expressions 5.5, enable it and click OK
Now copy/paste the following code into the new module:
Option Explicit
Function GetMyNum(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
Dim match As Object
strPattern = "^\d{0,9}\B|\b\d{0,9}(?=\.)"
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set match = regEx.Execute(strInput)
GetMyNum = match.Item(0)
Else
GetMyNum = ""
End If
End If
End Function
Results
Input
A1: Siemens Hot - Cold Report 010117.xls
A2: 010117Siemens Hot - Cold Report.xls
B1: =GetMyNum(A1)
B2: =GetMyNum(A1)
Output
010117 # Contents of B1
010117 # Contents of B2
Explanation
I will explain each regex option separately. You can reorder the options in terms of importance in such a way that the most important option is first and least important is last.
^\d{0,9}\B Match the following
^ Assert position at the start of the line
\d{0,9} Match any digit 0-9 times
\B Ensure position does not match where a word boundary matches (this is used but may be dropped depending on usage - I added it because it seems the number you're trying to get is immediately followed by a word character and not followed by a space - if that's not always the case just remove this token)
\b\d{0,9}(?=\.) Match the following
\b Assert position as a word boundary
\d{0,9} Match any digit 0-9 times
(?=\.) Positive lookahead ensuring a literal dot . follows
Just my alternative solution to RegEx :)
This finds the first occurence of 6 consecutive digits, omitting blanks and periods... although there are probably some more issues with using IsNumeric as I believe a lowercase e is considered acceptable by it...
Sub FindTheNumber()
For i = 1 To Len(Range("A1").Value)
If IsNumeric(Mid(Range("A1").Value, i, 6)) = True And InStr(Mid(Range("A1").Value, i, 6), " ") = 0 And InStr(Mid(Range("A1").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A1").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
For i = 1 To Len(Range("A2").Value)
If IsNumeric(Mid(Range("A2").Value, i, 6)) = True And InStr(Mid(Range("A2").Value, i, 6), " ") = 0 And InStr(Mid(Range("A2").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A2").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
End Sub
Examples:
Immediate window:

Excel VBA Regex Function That Returns Multiple Matches Into A Single Cell

I need help getting my Excel function to work. The goal is to run an in-cell function that extracts all pattern matches of a regex function from the input of another cell into one cell, not an array of cells.
I have tried this using an array which returns two matches in the function dialogue box preview but only outputs the first match in the cell. I have also tried using a collection but had no luck with that.
Here is my current code and a sample of text that would be used as the function's string input:
Function RegexMatches(strInput As String) As Variant
Dim rMatch As Match
Dim arrayMatches
Dim i As Long
arrayMatches = Array()
With New RegExp
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "(Code)[\s][\d]{2,4}"
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.Value
i = i + 1
Next
End With
RegexMatches = arrayMatches
End Function
Sample strInput from an Excel cell:
Code 123 some random text
goes here and continues to the next line
Code 4567 followed by more text
including new lines not just wrapped text
The desired output from this function would be both (2) matched values from the regex function into a single cell (e.g. "Code 123 Code 4567").
Any help is greatly appreciated!
Looks like you missed off the end of your function (as per Mat's Mug's comment)? Try this (which is as per Wiktor's comment).
Edit: amended in light of Mat's Mug's suggestion.
Function RegexMatches(strInput As String) As String
Dim rMatch As Object
Dim s As String
Dim arrayMatches()
Dim i As Long
With New RegExp
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "(Code)[\s][\d]{2,4}"
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.Value
i = i + 1
's = s & " " & rMatch
Next
End If
End With
RegexMatches = Join(arrayMatches, " ")
End Function

Regex extract data before certain text

I have large text documents that has some data I want to be extracted.
As you can see in a screenshot , I want to extract A040 to excel column next to the filename.
Before the A040 there is always three empty spaces and than text Sheet (also in screenshot)
Every file has different number and there is always letter A with three digits and text Sheet. --> example file uploaded:
I has something already in VB with Excel but it is not working.
Dim cell As Range
Dim rng As Range
Dim output As String
Set rng = ws.Range("A1", ws.Range("A1").SpecialCells(xlLastCell).Address)
For Each cell In rng
On Error Resume Next
output = ExtA(cell.Value)
If Len(output) > 0 Then
Range("B" & j) = output
Exit For
End If
Next
j = j + 1
ws.Cells.ClearContents
'Call DelConns
strFileName = Dir 'next file
Loop
End Sub
Function ExtA(ByVal text As String) As String
'REGEX Match VBA in excel
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(?<=Sheet)[^Sheet]*\ Sheet"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
If allMatches.Count <> 0 Then
result = allMatches.Item(0).submatches.Item(0)
End If
ExtA = result
End Function
This seems to work on your sample.
Option Explicit
Function AthreeDigits(str As String)
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
AthreeDigits = vbNullString
With rgx
.Global = False
.MultiLine = True
.Pattern = "\A[0-9]{3}[\s]{3}Sheet"
If .Test(str) Then
Set cmat = .Execute(str)
AthreeDigits = Left(cmat.Item(0), 4)
End If
End With
End Function
Did you mean to say that there are 4 spaces after the A040 and before the "Sheet"? If so, try this pattern:
.pattern = "(A\d\d\d)\s{3}Sheet"
EDIT: I thought you said 4 spaces, but you said 3. My pattern now reflects that.
EDIT 2: (I need more coffee!) Change the \b to \s.
See Example here
"\s+[Aa]\d*\s+Sheet"
Or
\s+[Aa]\d*\s+(Sheet)
Or
[Aa]\d*\s+(Sheet)
Demo
https://regex101.com/r/Qo8iUf/3
\s+ Matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible
Aa Matches a single character in the list Aa (case sensitive)
\d* Matches a digit (equal to [0-9])
* Quantifier — Matches between zero and unlimited times, as many times as possible

parse a string with VBA on EXCEL

I'm trying to analyse a text file imported in an excel tab.
The first column is a text code used to represent the timeline of operations and I would tike to convert this text code into a number of seconds.
Text codes can be formated like this :
5,9h180s -> 21420s
10min -> 600s
3,4h5min30s -> 12570s
...
My idea is to extract the values with a regex a use them later in a function, but I don't know VBA very good.
Is there a function in EXCEL VBA to perform regular expressions on a string and extract data from it ?
Do you have an example of a such function ?
After a few attempts at the various combinations (and some of my own imagination) I decided that truncating the time value string at the first letter of the unit would allow me to use that h / m / s as the last character in the regex .Pattern property. This pre-regex prepping gave me the best results.
In a standard module code sheet as,
Function howManySeconds(strTM As String) As Long
Dim s As Long, tmp As String
Dim rgx As Object, cmat As Object
Dim x As Long, vPTTRNs As Variant
Set rgx = CreateObject("VBScript.RegExp")
vPTTRNs = Array("[0-9,\.,\s]{1,9}×$", _
"h", 3600, "m", 60, "s", 1)
With rgx
.Global = True
.IgnoreCase = True
For x = LBound(vPTTRNs) + 1 To UBound(vPTTRNs) Step 2
If CBool(InStr(1, LCase(strTM), vPTTRNs(x), vbTextCompare)) Then
tmp = Replace(Replace(Replace(LCase(strTM), _
"seconds", "s"), "secs", "s"), _
Chr(44), Chr(46))
tmp = Replace(Replace(Left(tmp, InStrRev(strTM, vPTTRNs(x), -1, vbTextCompare)), _
Chr(44), Chr(46)), Chr(32), vbNullString)
.Pattern = Replace(vPTTRNs(LBound(vPTTRNs)), Chr(215), vPTTRNs(x))
Set cmat = .Execute(tmp)
If CBool(cmat.Count) Then
s = s + CLng(CDbl(Replace(cmat.Item(0), vPTTRNs(x), vbNullString)) * vPTTRNs(x + 1))
End If
End If
Next x
End With
howManySeconds = s
Set rgx = Nothing
End Function
Use like any native worksheet function. In C2 as,
=howManySeconds(A2)
        
You should note that (by the right-alignment) those values are true numbers which can be totalled or otherwise mathematically manipulated. A custom number format mask of 0\s_) has been applied to grant them a displayed s as a unit.
See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for an excellent local reference on using Regular Expressions in VBA.

Excel VBA: search a string to find the first non-text character

Cells contain a mixture of characters within a string, such as:
Abcdef_8765
QWERTY3_JJHH
Xyz9mnop
I need to find the first non A-Za-z character so that I can strip out the subsequent remainder of the string.
So the results would be:
Abcdef
QWERTY
Xyz
I know how to do this if I know exactly what character I'm looking for, but I'm not intuitively grasping how to find ANY character other than A-Za-z.
Btw, this is intended to be used within a vba solution.
====================
EDIT:
I've had success with the following...
a = "abc123"
b = Len(a)
For x = 1 To b
c = (Mid(a, x, 1) Like "[a-zA-Z]")
If c = False Then
d = Left(a, x - 1)
Exit Sub
End If
Next x
Have I stumbled upon a suitable solution, or is this destined to break?
I ask only because I look at Doug Glancy's solution and it seems much more substantial.
(btw, I have not yet tested Doug's solution)
Here is a simple way which doesn't use RegEx. I am deliberately not using RegEx as the other two answer are based on RegEx. RegEx is definitely faster but this is almost equally fast. The difference in speed is almost negligible.
Function GetWord(Rng As Range)
Dim i As Long, pos As Long
For i = 1 To Len(Rng.Value)
Select Case Asc(Mid(Rng.Value, i, 1))
Case 65 To 90, 97 To 122
Case Else: pos = i: Exit For
End Select
Next i
GetWord = Left(Rng.Value, pos - 1)
End Function
Usage:
=GetWord(A1)
EDIT:
Followup from comments. Fine tuned the code (Courtesy #brettdj) .
Function GetWord(Rng As Range)
Dim i As Long, pos As Long
Dim sString As String
sString = UCase$(Rng.Value)
For i = 1 To Len(sString)
Select Case Asc(Mid$(sString, i, 1))
Case 65 To 90
Case Else: pos = i: Exit For
End Select
Next i
GetWord = Left(Rng.Value, pos - 1)
End Function
More Followup.
Here is something which I had never tried before. I did an actual test of my code vs RegXp and I was surprised to see my code was faster than RegXp which I had not anticipated.
I tested it on 10k cells and each cell had a string of 2256 of length
The string that I put in Cell A1:A10000 is
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5Rout
Next I ran this test
The regexp below looks to remove from the first non A-Z character.
Function StrChange(strIn As String) As String
Dim objRegEx As Object
Set objRegEx = CreateObject("vbscript.regexp")
With objRegEx
.ignorecase = True
.Pattern = "^([a-z]+)([^a-z].*)"
.Global = True
StrChange = .Replace(strIn, "$1")
End With
End Function
You can use a simple regular expression to specify a numeral followed by anything and use this function to replace anything that matches that pattern:
Function Regex_Replace(strOriginal As String, strPattern As String, strReplacement, varIgnoreCase As Boolean) As String
Dim objRegExp As Object
Set objRegExp = CreateObject("vbscript.regexp")
With objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
End With
Regex_Replace = objRegExp.Replace(strOriginal, strReplacement)
Set objRegExp = Nothing
End Function
You'd call it like this:
Sub DeleteAfterNums()
Dim cell As Excel.Range
'Change "Selection" to your range
For Each cell In Selection
'"\d.+" is a numeral and whatever follows it
cell.Value = Regex_Replace(cell.Value, "\d.+", "", True)
Next cell
End Sub
Here is a lightweight and fast method that avoids regex/reference additions, thus helping with overhead and transportability should that be an advantage.
Public Function GetText(xValue As String) As Variant
For GetText = 1 To Len(xValue)
If UCase(Mid(xValue, GetText, 1)) Like "[!A-Z]" Then GetText = Left(xValue, GetText - 1): Exit Function
Next
GetText = xValue
End Function
This is then called by using GetText("Submission String") from vba or prepended with a "=" from within a cell formula.