Removing leading whitespace using VBA - regex

I am trying to remove leading whitespace from a word " 00000000000000231647300000000002KK".
Below is my VBA code
Option Explicit
Sub myfunction()
Dim getarray, getarray1 As Variant
Dim Text As String
Dim RegularText
getarray = Sheets("Sheet1").Range("A1:A4").Value
getarray1 = getarray
Set RegularText = New regexp
RegularText.Global = True
RegularText.MultiLine = True
RegularText.Pattern = "(^\\s+)"
Text = CStr(getarray(1, 1))
getarray1(1, 1) = RegularText.Replace(getarray(1, 1), "")
Sheets("Sheet1").Range("B1:B4").Value = getarray1
End Sub
However above code fails to remove the leading whitespace from my word.
Below is the excel workbook with result and above code
https://easyupload.io/jv6n2p
If you could help to understand why my code is failing to remove leading whitespace, it will be very helpful.
Thanks for your time

There are a few things wrong with the original code.
RegularText.Pattern = "(^\\s+)"
Explanations from regex101.com.
(^\\s+) pattern:
Basically, the first backslash is escaping the second backslash. This tells the RegEx to treat the second \ as a normal character. (^\\s+) is grouping leading \s characters together not whitespace.
(^\s+) pattern:
RegularText.MultiLine = True
The MultiLine property indicates every line in a value should be searched not row in an array. This doesn't seem to be the intended result. So set it to false.
`RegularText.MultiLine = False`
Range("A1:A4").Value is 1 row by 4 columns and Range("B1:B4") is 1 column by 4 rows. In my examples I will use Range("A2:D2") for simplicity.
Sub RegExRemoveTrailingSpace()
Dim Data As Variant
Data = Sheets("Sheet1").Range("A1:A4").Value
Dim RegularText As New RegExp
RegularText.Global = False
RegularText.Pattern = "(^\s+)"
[b4] = RegularText.Replace([A1], "")
Dim r As Long, c As Long
For r = 1 To UBound(Data)
For c = 1 To UBound(Data, 2)
Data(r, c) = RegularText.Replace(Data(r, c), "")
Next
Next
Sheets("Sheet1").Range("A2:D2").Value = Data
End Sub
We could just use LTrim() to remove the leading spaces from the string.
Sub LTrimTrailingSpace()
Dim Data As Variant
Data = Sheets("Sheet1").Range("A1:A4").Value
Dim r As Long, c As Long
For r = 1 To UBound(Data)
For c = 1 To UBound(Data, 2)
Data(r, c) = LTrim(Data(r, c))
Next
Next
Sheets("Sheet1").Range("A2:D2").Value = Data
End Sub

Related

Extract first floating point number from right in excel string

I have an excel column full of strings, from which I am trying to extract one number.
Here is an example of a particular row (all rows follow this format):
5) something here 93 4. something else- here too(24+Mths) Y Y 249 5) 24+ Months 1) lots more rubbish text Y N some more rubbish text 24/04/2012 25/04/1999 0.263 10 L rubbish text 3521.37233 4130 rubbish text1041023.
I just need to extract the first decimal number from the right, in this case 3521.37233.
UPDATE: I tried using Text to Columns with space as a delimiter, but there are varying number of spaces between characters. Is there a way to delimit by any number of spaces?
This is a question that can be done swiftly by Regex. Unfortunately, Excel does not support Regex using Excel formula.
You can use the following UDF (add this to your workbook).
Usage:
if you want the last decimal number(i.e. 1st from the right): =StrRegex([cell reference],"[0-9]{1,}\.[0-9]{1,}",-1)
if you want all decimal numbers: =StrRegex([cell reference],"[0-9]{1,}\.[0-9]{1,}",0)
Function StrRegex(findIn As String, pattern As String, Optional matchID As Long = 1, Optional separator As String = ",", Optional ignoreCase As Boolean = False) As String ' matchID - 1-based, matchID=0 => return all
Application.Volatile (True)
Dim result As String
Dim allMatches As Object
Dim re As Object
Set re = CreateObject("vbscript.regexp")
Dim mc As Long
Dim i As Long
Dim j As Long
re.pattern = pattern
re.Global = True
re.ignoreCase = ignoreCase
Set allMatches = re.Execute(findIn)
mc = allMatches.count
If mc > 0 Then
If matchID > mc Then
result = CVErr(xlErrNA)
Else
If matchID > 0 Then
result = allMatches.Item(matchID - 1).Value
ElseIf matchID < 0 Then
result = allMatches.Item(mc + matchID).Value
Else
result = ""
For i = 0 To allMatches.count - 1
result = result & separator & allMatches.Item(i).Value
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & separator & allMatches.Item(i).submatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(separator))
End If
End If
End If
Else
result = ""
End If
StrRegex = result
End Function
For any interested in a native Excel function solution, if you have the FILTERXML function, you can use:
=FILTERXML("<t><s>" & SUBSTITUTE(A1," ","</s><s>")& "</s></t>","//s[number(.) = number(.) and contains(.,'.')][last()]")
The xPath looks for all nodes that are numeric, and also contain a dot, and then returns the last node that meets those specifications.
Note: If your Windows regional settings are using the dot as a thousands separator, this will not work as written. You would have to replace the . with your system decimal separator.

Remove unicode characters from string on excel sheet

I need some directions on how to use regex to remove special characters such as fractions,exponents,degree symbol and any other non normal letters in a string. I know the code below find the string base on those criteria but does it include all unicode characters?
Code for your attention:
Dim strPattern As String: strPattern = "[^\u0000-\u007F]"
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.IgnoreCase = True
regEx.Pattern = strPattern
For Each cell In ActiveSheet.Range("C:C") ' Define your own range here
If strPattern <> "" Then ' If the cell is not empty
If regEx.Test(cell.Value) Then ' Check if there is a match
cell.Interior.ColorIndex = 6 ' If yes, change the background color
End If
End If
Next
This does not use regular expressions.
There are many potentially "bad" characters. Rather than trying to remove them,
just keep the "good" ones.
Select some cell and run this short macro:
Sub UniKiller()
Dim s As String, temp As String, i As Long
Dim C As String
s = ActiveCell.Value
If s = "" Then Exit Sub
temp = ""
For i = 1 To Len(s)
C = Mid(s, i, 1)
If AscW(C) > 31 And AscW(C) < 127 Then
temp = temp & C
End If
Next i
ActiveCell.Value = temp
End Sub
If you need to "clean" more than one cell, put the logic in a loop.

Extracting Parenthetical Data Using Regex

I have a small sub that extracts parenthetical data (including parentheses) from a string and stores it in cells adjacent to the string:
Sub parens()
Dim s As String, i As Long
Dim c As Collection
Set c = New Collection
s = ActiveCell.Value
ary = Split(s, ")")
For i = LBound(ary) To UBound(ary) - 1
bry = Split(ary(i), "(")
c.Add "(" & bry(1) & ")"
Next i
For i = 1 To c.Count
ActiveCell.Offset(0, i).NumberFormat = "#"
ActiveCell.Offset(0, i).Value = c.Item(i)
Next i
End Sub
For example:
I am now trying to replace this with some Regex code. I am NOT a regex expert. I want to create a pattern that looks for an open parenthesis followed by zero or more characters of any type followed by a close parenthesis.
I came up with:
\((.+?)\)
My current new code is:
Sub qwerty2()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = ActiveCell.Value
MsgBox inpt
Set regex = New RegExp
regex.Pattern = "\((.+?)\)"
Set MColl = regex.Execute(inpt)
MsgBox MColl.Count
temp2 = MColl(0).Value
MsgBox temp2
End Sub
The code has at least two problems:
It will only get the first match in the string.(Mcoll.Count is always 1)
It will not recognize zero characters between the parentheses. (I think the .+? requires at least one character)
Does anyone have any suggestions ??
By default, RegExp Global property is False. You need to set it to True.
As for the regex, to match zero or more chars as few as possible, you need *?, not +?. Note that both are lazy (match as few as necessary to find a valid match), but + requires at least one char, while * allows matching zero chars (an empty string).
Thus, use
Set regex = New RegExp
regex.Global = True
regex.Pattern = "\((.*?)\)"
As for the regex, you can also use
regex.Pattern = "\(([^()]*)\)"
where [^()] is a negated character class matching any char but ( and ), zero or more times (due to * quantifier), matching as many such chars as possible (* is a greedy quantifier).

Slight adaptation of a User Defined Function

I would like to extract a combination of text and numbers from a larger string located within a column within excel.
The constants I have to work with is that each Text string will
•either start with a A, C or S, and
•will always be 7 Characters long
•the position of he string I would like to extract varies
The code I have been using which has been working efficiently is;
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(r.Text, " ")
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
However today I have learnt that sometimes my data may include scenarios like this;
What I would like is to adapt my code so If the 8th character is "Underscore" and the 1st character of the 7 characters is either S, A or C please extract up until the "Underscore"
Secondly I would like to exclude commons words like "Support" & "Collect" from being extracted.
Finally the 7th letter should be a number
Any ideas around this would be much appreciated.
Thanks
try this
ary = Split(Replace(r.Text, "_", " "))
or
ary = Split(Replace(r.Text, "_", " ")," ")
result will be same for both variants
test
update
Do you know how I could leave the result blank if the 7th character returned a letter?
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(Replace(r.Text, "_", " "))
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" And IsNumeric(Mid(a, 7, 1)) Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
test
Add Microsoft VBScript Regular Expressions 5.5 to project references. Use the following code to test matching and extracting with Xtractor:
Public Function Xtractor(ByVal p_val As String) As String
Xtractor = ""
Dim ary As String, v_re As New VBScript_RegExp_55.RegExp, Matches
v_re.Pattern = "^([SAC][^_]{1,6})_?"
Set Matches = v_re.Execute(p_val)
If Matches.Count > 0 Then Xtractor = Matches(0).SubMatches(0) Else Xtractor = ""
End Function
Sub test_Xtractor(p_cur As Range, p_val As String, p_expected As String)
Dim v_cur As Range, v_res As Range
p_cur.Value = p_val
Set v_cur = p_cur.Offset(columnOffset:=1)
v_cur.FormulaR1C1 = "='" & ThisWorkbook.Name & "'!Xtractor(RC[-1])"
Set v_res = v_cur.Offset(columnOffset:=1)
v_res.FormulaR1C1 = "=RC[-1]=""" & p_expected & """"
Debug.Print p_val; "->"; v_cur.Value; ":"; v_res.Value
End Sub
Sub test()
test_Xtractor ActiveCell, "A612002_MDC_308", "A612002"
test_Xtractor ActiveCell.Offset(1), "B612002_MDC_308", ""
test_Xtractor ActiveCell.Offset(2), "SUTP038_MDC_3", "SUTP038"
test_Xtractor ActiveCell.Offset(3), "KUTP038_MDC_3", ""
End Sub
Choose the workbook and cell for writing test fixture, then run test from the VBA Editor.
Output in the Immediate window (Ctrl+G):
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
UPD
Isit possible to ammend this code so if the 7th character is a letter to return blank?
Replace line with assign to v_re by the following:
v_re.Pattern = "^([SAC](?![^_]{5}[A-Z]_?)[^_]{1,6})_?"
v_re.IgnoreCase = True
And add to the test suite:
test_Xtractor ActiveCell.Offset(4), "SUTP03A_MDC_3", ""
Output:
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
SUTP03A_MDC_3->:True
I inserted negative lookahead subrule (?![^_]{5}[A-Z]_?) to reject SUTP03A_MDC_3. But pay attention: the rejecting rule is applied exactly to the 7th character. Now v_re.IgnoreCase set to True, but if only capitalized characters are allowed, set it to False. See also Regular Expression Syntax on MSDN.

Separating strings from numbers with Excel VBA

I need to
a) separate strings from numbers for a selection of cells
and
b) place the separated strings and numbers into different columns.
For example , Excel sheet is as follows:
A1 B1
100CASH etc.etc.
The result should be:
A1 B1 C1
100 CASH etc.etc.
Utilization of regular expressions will be useful, as there may be different cell formats,such as 100-CASH, 100/CASH, 100%CASH. Once the procedure is set up it won't be hard to use regular expressions for different variations.
I came across a UDF for extracting numbers from a cell. This can easily be modified to extract string or other types of data from cells simply changing the regular expression.
But what I need is not just a UDF but a sub procedure to split cells using regular expressions and place the separated data into separate columns.
I've also found a similar question in SU, however it isn't VBA.
See if this will work for you:
UPDATED 11/30:
Sub test()
Dim RegEx As Object
Dim strTest As String
Dim ThisCell As Range
Dim Matches As Object
Dim strNumber As String
Dim strText As String
Dim i As Integer
Dim CurrCol As Integer
Set RegEx = CreateObject("VBScript.RegExp")
' may need to be tweaked
RegEx.Pattern = "-?\d+"
' Get the current column
CurrCol = ActiveCell.Column
Dim lngLastRow As Long
lngLastRow = Cells(1, CurrCol).End(xlDown).Row
' add a new column & shift column 2 to the right
Columns(CurrCol + 1).Insert Shift:=xlToRight
For i = 1 To lngLastRow ' change to number of rows to search
Set ThisCell = ActiveSheet.Cells(i, CurrCol)
strTest = ThisCell.Value
If RegEx.test(strTest) Then
Set Matches = RegEx.Execute(strTest)
strNumber = CStr(Matches(0))
strText = Mid(strTest, Len(strNumber) + 1)
' replace original cell with number only portion
ThisCell.Value = strNumber
' replace cell to the right with string portion
ThisCell.Offset(0, 1).Value = strText
End If
Next
Set RegEx = Nothing
End Sub
How about:
Sub UpdateCells()
Dim rng As Range
Dim c As Range
Dim l As Long
Dim s As String, a As String, b As String
''Working with sheet1 and column C
With Sheet1
l = .Range("C" & .Rows.Count).End(xlUp).Row
Set rng = .Range("C1:C" & l)
End With
''Working with selected range from above
For Each c In rng.Cells
If c <> vbNullString Then
s = FirstNonNumeric(c.Value)
''Split the string into numeric and non-numeric, based
''on the position of first non-numeric, obtained above.
a = Mid(c.Value, 1, InStr(c.Value, s) - 1)
b = Mid(c.Value, InStr(c.Value, s))
''Put the two values on the sheet in positions one and two
''columns further along than the test column. The offset
''can be any suitable value.
c.Offset(0, 1) = a
c.Offset(0, 2) = b
End If
Next
End Sub
Function FirstNonNumeric(txt As String) As String
With CreateObject("VBScript.RegExp")
.Pattern = "[^0-9]"
FirstNonNumeric = .Execute(txt)(0)
End With
End Function