Splitting a string and capitalizing letters based on cases - regex

I have some column names with starting coding convention that I would like to transform, see example:
Original Target
------------- --------------
partID Part ID
completedBy Completed By
I have a function in VBA that splits the original string by capital letters:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
I wrap this function within PROPER, for example, PROPER(SplitCaps(A3)) produces the desired result for the third row but leaves the "D" in ID uncapitalized.
Original Actual
------------- --------------
partID Part Id
completedBy Completed By
Can anyone think of a solution to add cases to this function?

split the word and loop the results and test whether it is all caps before using Proper. then join them back:
Sub kjl()
Dim str As String
str = "partID"
Dim strArr() As String
strArr = Split(SplitCaps(str), " ")
Dim i As Long
For i = 0 To UBound(strArr)
If UCase(strArr(i)) <> strArr(i) Then
strArr(i) = Application.Proper(strArr(i))
End If
Next i
str = Join(strArr, " ")
Debug.Print str
End Sub
If you want a formula to do what you are asking then:
=TEXTJOIN(" ",TRUE,IF(EXACT(UPPER(TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999))),TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999))),TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999)),PROPER(TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999)))))
Entered as an array formula by confirming with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Or use the code above as a Function:
Function propSplitCaps(str As String)
Dim strArr() As String
strArr = Split(SplitCaps(str), " ")
Dim i As Long
For i = 0 To UBound(strArr)
If UCase(strArr(i)) <> strArr(i) Then
strArr(i) = Application.Proper(strArr(i))
End If
Next i
propSplitCaps = Join(strArr, " ")
End Function
and call it =propSplitCaps(A1)

Instead of using the Proper function, just capitalize the first letter of each word after you have split the string on the transition.
Option Explicit
Function Cap(s As String) As String
Dim RE As RegExp, MC As MatchCollection, M As Match
Const sPatSplit = "([a-z])([A-Z])"
Const sPatFirstLtr As String = "\b(\w)"
Const sSplit As String = "$1 $2"
Set RE = New RegExp
With RE
.Global = True
.Pattern = sPatSplit
.IgnoreCase = False
If .Test(s) = True Then
s = .Replace(s, sSplit)
.Pattern = sPatFirstLtr
Set MC = .Execute(s)
For Each M In MC
s = WorksheetFunction.Replace(s, M.FirstIndex + 1, 1, UCase(M))
Next M
End If
End With
Cap = s
End Function

Related

VBA Find a string that has range of value in it with Regular Expression and replace with each value in that range

First of all, sorry for the long title. I just don't know how to put it succinctly. I am trying to do this in VBA as normal Excel will not cut it.
Basically, I have a column. Each cells may contain data in the format of something like
flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;
What I need is to find the string that has "-" in it, and attempt to replace it with anything in between. so the above code will become
Flat 10, Flat 11; Flat 12, Flat 14;Flat 18, Flat 19;Unit 7, Unit 8, Unit 9;Flat A, Flat B, Flat C; ABC;DEF;
With the help of this article on RegExpression, I have managed to work out how to expand the bits of data with number, which I will post the code below. However, I don't know a good way to expand the data with the letter. i.e from Flat A-C to Flat A, Flat B, Flat C
My code is below, please feel free to give any pointers if you think it can be more efficient. I am very much an amateur at this. Thank you in advance.
Sub CallRegEx()
Dim r As Match
Dim mcolResults As MatchCollection
Dim strInput As String, strPattern As String
Dim test As String, StrOutput As String, prefix As String
Dim startno As Long, endno As Long
Dim myrange As Range
strPattern = "(Flat|Unit) [0-9]+-+[0-9]+"
With Worksheets("Sheet1")
lrow = .Cells(Rows.Count, 9).End(xlUp).Row
For Each x In .Range("A2:A" & lrow)
strInput = Range("A" & x.Row).Value
Set mcolResults = RegEx(strInput, strPattern, True, , True)
If Not mcolResults Is Nothing Then
StrOutput = strInput
For Each r In mcolResults
startno = Mid(r, (InStr(r, "-") - 2), 2)
endno = Mid(r, (InStr(r, "-") + 1))
prefix = Mid(r, 1, 4)
test = ""
For i = startno To endno - 1
test = test & prefix & " " & i & ","
Next i
test = test & prefix & " " & endno
'this is because I don't want the comma at the end of the last value
StrOutput = Replace(StrOutput, r, test)
Debug.Print r ' remove in production
Next r
End If
.Range("D" & x.Row).Value = StrOutput
Next x
End With
End Sub
This function below is to support the Sub above
Function RegEx(strInput As String, strPattern As String, _
Optional GlobalSearch As Boolean, Optional MultiLine As Boolean, _
Optional IgnoreCase As Boolean) As MatchCollection
Dim mcolResults As MatchCollection
Dim objRegEx As New RegExp
If strPattern <> vbNullString Then
With objRegEx
.Global = GlobalSearch
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Pattern = strPattern
End With
If objRegEx.test(strInput) Then
Set mcolResults = objRegEx.Execute(strInput)
Set RegEx = mcolResults
End If
End If
End Function
Letters have character codes that are ordinal (A < B < C ...) & these can be accessed via asc()/chr$() - here is one way to do it:
inputStr = "flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;flat 6;flat T"
Dim re As RegExp: Set re = New RegExp
re.Pattern = "(flat|unit)\s+((\d+)-(\d+)|([A-Z])-([A-Z]))"
re.Global = True
re.IgnoreCase = True
Dim m As MatchCollection
Dim start As Variant, fin As Variant
Dim tokens() As String
Dim i As Long, j As Long
Dim isDigit As Boolean
tokens = Split(inputStr, ";")
For i = 0 To UBound(tokens) '// loop over tokens
Set m = re.Execute(tokens(i))
If (m.Count) Then
With m.Item(0)
start = .SubMatches(2) '// first match number/letter
isDigit = Not IsEmpty(start) '// is letter or number?
If (isDigit) Then '// number
fin = .SubMatches(3)
Else '// letter captured as char code
start = Asc(.SubMatches(4))
fin = Asc(.SubMatches(5))
End If
tokens(i) = ""
'// loop over items
For j = start To fin
tokens(i) = tokens(i) & .SubMatches(0) & " " & IIf(isDigit, j, Chr$(j)) & ";"
Next
End With
ElseIf i <> UBound(tokens) Then tokens(i) = tokens(i) & ";"
End If
Next
Debug.Print Join(tokens, "")
flat 10;flat 11;flat 12;flat 13;flat 14;Flat 18;Flat 19;unit 7;unit 8;unit 9;flat A;flat B;flat C;flat D;ABC;DEF;flat 6;flat T

Regular Expression only returns 1 match

My VBA function should take a string referencing a range of units (i.e. "WWW1-5") and then return another string.
I want to take the argument, and put it in a comma separated string,
So "WWW1-5" should become "WWW1, WWW2, WWW3, WWW4, WWW5".
It's not always going to be a single digit. For example, I might need to separate "XXX11-18" or something similar.
I have never used regular expressions, but keep trying different things to make this work and it seems to only be finding 1 match instead of 3.
Any ideas? Here is my code:
Private Function split_group(ByVal group As String) As String
Dim re As Object
Dim matches As Object
Dim result As String
Dim prefix As String
Dim startVar As Integer
Dim endVar As Integer
Dim i As Integer
Set re = CreateObject("vbscript.regexp")
re.Pattern = "([A-Z]+)(\d+)[-](\d+)"
re.IgnoreCase = False
Set matches = re.Execute(group)
Debug.Print matches.Count
If matches.Count <> 0 Then
prefix = matches.Item(0)
startVar = CInt(matches.Item(1)) 'error occurs here
endVar = CInt(matches.Item(2))
result = ""
For i = startVar To endVar - 1
result = result & prefix & i & ","
Next i
split_group = result & prefix & endVar
Else
MsgBox "There is an error with splitting a group."
split_group = "ERROR"
End If
End Function
I tried setting global = true but I realized that wasn't the problem. The error actually occurs on the line with the comment but I assume it's because there was only 1 match.
I tried googling it but everyone's situation seemed to be a little different than mine and since this is my first time using RE I don't think I understand the patterns enough to see if maybe that was the problem.
Thanks!
Try the modified Function below:
Private Function split_metergroup(ByVal group As String) As String
Dim re As Object
Dim matches As Variant
Dim result As String
Dim prefix As String
Dim startVar As Integer
Dim endVar As Integer
Dim i As Integer
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.IgnoreCase = True
.Pattern = "[0-9]{1,20}" '<-- Modified the Pattern
End With
Set matches = re.Execute(group)
If matches.Count > 0 Then
startVar = CInt(matches.Item(0)) ' <-- modified
endVar = CInt(matches.Item(1)) ' <-- modified
prefix = Left(group, InStr(group, startVar) - 1) ' <-- modified
result = ""
For i = startVar To endVar - 1
result = result & prefix & i & ","
Next i
split_metergroup = result & prefix & endVar
Else
MsgBox "There is an error with splitting a meter group."
split_metergroup = "ERROR"
End If
End Function
The Sub I've tested it with:
Option Explicit
Sub TestRegEx()
Dim Res As String
Res = split_metergroup("DEV11-18")
Debug.Print Res
End Sub
Result I got in the immediate window:
DEV11,DEV12,DEV13,DEV14,DEV15,DEV16,DEV17,DEV18
Another RegExp option, this one uses SubMatches:
Test
Sub TestRegEx()
Dim StrTst As String
MsgBox WallIndside("WAL7-21")
End Sub
Code
Function WallIndside(StrIn As String) As String
Dim objRegex As Object
Dim objRegMC As Object
Dim lngCnt As Long
Set objRegex = CreateObject("VBScript.RegExp")
With objRegex
.Global = True
.IgnoreCase = True
.Pattern = "([a-z]+)(\d+)-(\d+)"
If .test(StrIn) Then
Set objRegMC = .Execute(StrIn)
For lngCnt = objRegMC(0).submatches(1) To objRegMC(0).submatches(2)
WallIndside = WallIndside & (objRegMC(0).submatches(0) & lngCnt & ", ")
Next
WallIndside = Left$(WallIndside, Len(WallIndside) - 2)
Else
WallIndside = "no match"
End If
End With
End Function
#Shai Rado 's answer worked. But I figured out on my own WHY my original code was not working, and was able to lightly modify it.
The pattern was finding only 1 match because it was finding 1 FULL MATCH. The full match was the entire string. The submatches were really what I was trying to get.
And this is what I modified to make the original code work (asking for each submatch of the 1 full match):

Excel regex match and return multiple references in formula

I've created a function that will return the Nth reference which includes a sheetname (if it's there), however it's not working for all instances. The regex string I'm using is
'[\w ]+[']!([$]{0,1})([A-Z]{1,2})([$]{0,1})(\d{1,5})
I'm finding though it won't find the first reference in either of the below examples:
='Biscuits Raw Data'!G783/'Biscuits Raw Data'!E783
=IF('Biscuits Raw Data'!G705="","",'Biscuits Raw Data'!G723/'Biscuits Raw Data'!G7005*100)
Below is my Function code:
Function GrabNthreference(Rng As range, NthRef As Integer) As String
Dim patrn As String
Dim RegX
Dim Matchs
Dim RegEx
Dim FinalMatch
Dim Subm
Dim i As Integer
Dim StrRef As String
patrn = "'[\w ]+[']!([$]{0,1})([A-Z]{1,2})([$]{0,1})(\d{1,5})"
StrRef = Rng.Formula
Set RegEx = CreateObject("vbscript.regexp") ' Create regular expression.
RegEx.Global = True
RegEx.Pattern = patrn ' Set pattern.
RegEx.IgnoreCase = True ' Make case insensitive.
Set RegX = RegEx.Execute(StrRef)
If RegX.Count < NthRef Then
GrabNthreference = StrRef
Exit Function
End If
i= -1
For Each Matchs In RegX ' Iterate Matches collection.
Set Subm = RegX(i).submatches
i = i + 1
If i = NthRef -1 Then
GrabNthreference = RegX(i)
Exit Function
End If
'Debug.Print RegX(i)
Next
End Function
Here's my final code
Function GrabNthreference(R As range, NthRef As Integer) As String 'based on http://stackoverflow.com/questions/13835466/find-all-used-references-in-excel-formula
Dim result As Object
Dim testExpression As String
Dim objRegEx As Object
Dim i As Integer
i = 0
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True
objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(R.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)" 'grab the address think this is an old attempt so remming out
objRegEx.Pattern = "(['].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
Set result = objRegEx.Execute(testExpression)
If result.Count > 0 Then
For Each Match In result
Debug.Print Match.Value
If i = NthRef - 1 Then
GrabNthreference = result(i)
Exit Function
End If
i = i + 1
Next Match
Else
GrabNthreference = "No precedencies found"
End If
End If
End Function
This code did lead me onto thinking about using the simple activecell.precedences method but I think the problem is that it won't report offsheet and won't indicate if the formula is relative or absolute.
Any comments welcome but I think I've answered my own question :)

Getting only digits in a string using regex in vba

I have a string as such:
tempString = "65.00000000;ACCUMPOINTS;Double:0.0593000000;D"
And my output shld be "65.000000,0.0593000000" or at least give two separated values.
I am using regex to find the values in the string.
My code:
tempString = "65.00000000;ACCUMPOINTS;Double:0.0593000000;D"
temp = NumericOnly(tempString)
Public Function NumericOnly(s As String) As String
Dim s2 As String
Dim replace_hyphen As String
replace_hyphen = " "
Static re As VBScript_RegExp_55.RegExp
If re Is Nothing Then Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "[^\d+]" 'includes space, if you want to exclude space "[^0-9]"("-?\\d+");
s2 = re.Replace(s, vbNullString)
re.Pattern = "[^\d+]"
NumericOnly = re.Replace(s2, replace_hyphen)
End Function
My output is like this:
"650000000000593000000"
How to go about doing this? Need some help.
Just did a minor change in your regex. Instead of just using [^\d+], now [^\d.:+] is being used to indicate that we would like one or more of digits, dots or colons. Then, colon is replaced with a comma to get the desired result.
Sub Test()
Dim tempString As String
tempString = "65.00000000;ACCUMPOINTS;Double:0.0593000000;D"
temp = NumericOnly(tempString)
MsgBox temp
End Sub
Public Function NumericOnly(s As String) As String
Dim s2 As String
Dim replace_hyphen As String
replace_hyphen = " "
Static re As VBScript_RegExp_55.RegExp
If re Is Nothing Then Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "[^\d.:+]"
s2 = re.Replace(s, vbNullString)
re.Pattern = "[^\d.:+]"
NumericOnly = re.Replace(s2, replace_hyphen)
NumericOnly = Replace(NumericOnly, ":", ",")
End Function

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function