Remove unicode characters from string on excel sheet - regex

I need some directions on how to use regex to remove special characters such as fractions,exponents,degree symbol and any other non normal letters in a string. I know the code below find the string base on those criteria but does it include all unicode characters?
Code for your attention:
Dim strPattern As String: strPattern = "[^\u0000-\u007F]"
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.IgnoreCase = True
regEx.Pattern = strPattern
For Each cell In ActiveSheet.Range("C:C") ' Define your own range here
If strPattern <> "" Then ' If the cell is not empty
If regEx.Test(cell.Value) Then ' Check if there is a match
cell.Interior.ColorIndex = 6 ' If yes, change the background color
End If
End If
Next

This does not use regular expressions.
There are many potentially "bad" characters. Rather than trying to remove them,
just keep the "good" ones.
Select some cell and run this short macro:
Sub UniKiller()
Dim s As String, temp As String, i As Long
Dim C As String
s = ActiveCell.Value
If s = "" Then Exit Sub
temp = ""
For i = 1 To Len(s)
C = Mid(s, i, 1)
If AscW(C) > 31 And AscW(C) < 127 Then
temp = temp & C
End If
Next i
ActiveCell.Value = temp
End Sub
If you need to "clean" more than one cell, put the logic in a loop.

Related

Break String into individual elements and test for type of Character - NUM - LETTER - SPECIAL - Excel VBA

I need to figure out how I can test each character in the string to see if it is a number/letter/special character.
My question is, how can I break a string and test each individual character to see if the character is a number/letter/special character
Eg:
var = 1S#
Result1 = Num
Result2 = Alpha
Result3 = Special
If you mean
escaping user input that is to be treated as a literal string within a
regular expression—that would otherwise be mistaken for a special
character.
Then you can replace it with given regular expression:
/[.*+?^${}()|[\]\\]/g
So I got it to work by combining a few different posts on SO. This code breaks the string in an array and then checks each one for num/alpha/special and has a special case for *.
Split string into array of characters?
Regex Expression to check if there are any special characters in string like(!,#<#,$,%<^< etc)
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
-
Sub test()
'''Special Character Section'''
Dim special_charArr() As String
Dim special_char As String
special_char = "!,#,#,$,%,^,&,*,+,/,\,;,:"
special_charArr() = Split(special_char, ",")
'''Special Character Section'''
'''Alpha Section'''
Dim regexp As Object
Set regexp = CreateObject("vbscript.regexp")
Dim strPattern As String
strPattern = "([a-z])"
With regexp
.ignoreCase = True
.Pattern = strPattern
End With
'''Alpha Section'''
Dim buff() As String
my_string = "t3s!*"
ReDim buff(Len(my_string) - 1)
For i = 1 To Len(my_string)
buff(i - 1) = Mid$(my_string, i, 1)
char = buff(i - 1)
If IsNumeric(char) = True Then
MsgBox char & " = Number"
End If
For Each Key In special_charArr
special = InStr(char, Key)
If special = 1 Then
If Key <> "*" Then
MsgBox char & " = Special NOT *"
Else
MsgBox char & " = *"
End If
End If
Next
If regexp.test(char) Then
MsgBox char & " = Alpha"
End If
Next
End Sub

Excel VBA RegEx that extracts numbers from price values in range (has commas, $ and -)

I have a field data extracted from a database which represents a range of values, but it's coming in Excel as a String format $86,000 - $162,000.
I need to extract the minimum value and the maximum value from each cell, so I need to extract the numeric portion of it, and ignore the $, - and the ,.
I've attached an image of the data I have, and the values I want to extract from it.
This is the closest pattern I got with RegEx, but I'ts not what I'm looking for.
Pattern = (\d+)(?:\.(\d{1,2}))?
Can anyone assist ?
Just wondering why Regex?
Function GetParts(priceRange As String) As Double()
Dim arr() As String
Dim parts() As Double
If InStr(1, priceRange, "-") > 0 Then
arr = Split(priceRange, "-")
ReDim parts(0 To UBound(arr))
Dim i As Long
For i = 0 To UBound(arr)
parts(i) = CDbl(Replace$(Replace$(Trim$(arr(i)), "$", ""), ",", ""))
Next i
End If
GetParts = parts
End Function
Sub test()
MsgBox GetParts("$14,000 - $1,234,567")(0) 'Minimum
End Sub
EDIT
Yet you could do this with regex to match the data string into the parts:
Function GetPartsRegEx(priceRange As String) As Variant
Dim arr() As Double
Dim pricePattern As String
pricePattern = "(\$?\d+[\,\.\d]*)"
'START EDIT
Static re As RegExp
If re Is Nothing Then
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = pricePattern & "\s*[\-]\s*" & pricePattern 'look for the pattern first
End If
Static nums As RegExp
If nums Is Nothing Then
Set nums = New RegExp
'to remove all non digits, except decimal point in case you have pennies
nums.Pattern = "[^0-9.]"
nums.Global = True
End If
'END EDIT
If re.test(priceRange) Then
ReDim arr(0 To 1) ' fill return array
arr(0) = CDbl(nums.Replace(re.Replace(priceRange, "$1"), ""))
arr(1) = CDbl(nums.Replace(re.Replace(priceRange, "$2"), ""))
Else
'do some error handling here
Exit Function
End If 'maybe throw error if no +ve test or
GetPartsRegEx = arr
End Function
Sub test()
MsgBox GetPartsRegEx("$1,005.45 - $1,234,567.88")(1)
End Sub
Here is quick Example Demo https://regex101.com/r/RTNlVF/1
Pattern "^\$(\d+\,\d+)\s\-\s\$(\d+\,\d+)"
Option Explicit
Private Sub Example()
Dim RegExp As New RegExp
Dim Pattern As String
Dim CelValue As String
Dim rng As Range
Dim Cel As Range
Set rng = ActiveWorkbook.Sheets("Sheet1" _
).Range("A2", Range("A9999" _
).End(xlUp))
For Each Cel In rng
DoEvents
Pattern = "^\$(\d+\,\d+)\s\-\s\$(\d+\,\d+)"
If Pattern <> "" Then
With RegExp
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = Pattern
End With
If RegExp.Test(Cel.Value) Then
' Debug.Print Cel.Value
Debug.Print RegExp.Replace(CStr(Cel), "$1")
Debug.Print RegExp.Replace(CStr(Cel), "$2")
End If
End If
Next
End Sub
Without a loop (but still no regex):
Sub Split()
With Columns("B:B")
.Replace What:="$", Replacement:=""
Application.CutCopyMode = False
.TextToColumns Destination:=Range("B1"), DataType:=xlDelimited, Other:=True, OtherChar:="-", FieldInfo:=Array(Array(1, 1), Array(2, 1))
End With
Columns("B:C").Insert Shift:=xlToRight
Columns("D:E").NumberFormat = "0"
Range("D1").FormulaR1C1 = "Min Value"
Range("E1").FormulaR1C1 = "Max Value"
With Range("D1:E1").Interior
.Pattern = xlSolid
.PatternColorIndex = xlAutomatic
.Color = 12611584
End With
With Range("D1:E1").Font
.ThemeColor = xlThemeColorDark1
.TintAndShade = 0
End With
End Sub
I made this function:
Hope it helps.
Code:
Function ExtractNumber(ByVal TextInput As String, _
Optional ByVal Position As Byte = 0, _
Optional ByVal Delimiter As String = "-") As Variant
' You can use this function in a subprocess that
' writes the values in the cells you want, or
' you can use it directly in the ouput cells
' Variables
Dim RemoveItems(2) As String
Dim Aux As Variant
' The variable RemoveItems is an array
' containing the characters you want to remove
RemoveItems(0) = "."
RemoveItems(1) = ","
RemoveItems(2) = " "
' STEP 1 - The variable Aux will store the text
' given as input
Aux = TextInput
' STEP 2 - Characters stored in the variable
' RemoveItems will be removed from Aux
For i = 0 To UBound(RemoveItems)
Aux = Replace(Aux, RemoveItems(i), "")
Next i
' STEP 3 - Once Aux is "clean", it will be
' transformed into an array containing the
' values separated by the delimiter
' As you can see at the function's header,
' Delimiter default value is "-". You can change
' it depending on the situation
Aux = Split(Aux, Delimiter)
' STEP 4 - The result of this function will be
' a numeric value. So, if the value of the
' selected position in Aux is not numeric it will
' remove the first character assuming it is a
' currency symbol.
' If something fails in the process the function
' will return "ERROR", so you can know you may
' verify the inputs or adjust this code for
' your needs.
On Error GoTo ErrHndl
If Not IsNumeric(Aux(Position)) Then
ExtractNumber = CLng(Mid(Aux(Position), 2))
Else
ExtractNumber = CLng(Aux(Position))
End If
Exit Function
ErrHndl:
ExtractNumber = "ERROR"
End Function
You can even do this with just worksheet formulas. Under certain circumstances, Excel will ignore the $ and ,. The double unary converts the returned string to a numeric value.
First Value: =--LEFT(A1,FIND("-",A1)-1)
Second Value: =--MID(A1,FIND("-",A1)+1,99)

VBA - Modify sheet naming from source file

I received help in the past for an issue regarding grabbing a source file name and naming a newly created worksheet the date from said source file name, i.e. "010117Siemens Hot - Cold Report.xls" and outputting "010117".
However the code only works for file names with this exact format, for example, file named "Siemens Hot - Cold Report 010117.xls", an error occurs because the newly created sheet does not find the date in the source file.
CODE
Application.ScreenUpdating = False
Dim n As Double
Dim wksNew As Excel.Worksheet
Dim src As Workbook
Set src = Workbooks.Open(filePath, False, False)
Dim srcRng As Range
With src.Worksheets("Sheet1")
Set srcRng = .Range(.Range("A1"), .Range("A1").End(xlDown).End(xlToRight))
End With
With ThisWorkbook
Set wksNew = .Worksheets.Add(After:=.Worksheets(.Sheets.Count))
n = .Sheets.Count
.Worksheets(n).Range("A1").Resize(srcRng.Rows.Count, srcRng.Columns.Count).Value = srcRng.Value
End With
' ======= get the digits part from src.Name using a RegEx object =====
' RegEx variables
Dim Reg As Object
Dim RegMatches As Variant
Set Reg = CreateObject("VBScript.RegExp")
With Reg
.Global = True
.IgnoreCase = True
.Pattern = "\d{0,9}" ' Match any set of 0 to 9 digits
End With
Set RegMatches = Reg.Execute(src.Name)
On Error GoTo CloseIt
If RegMatches.Count >= 1 Then ' make sure there is at least 1 match
ThisWorkbook.Worksheets(n).Name = RegMatches(0) ' rename "Sheet2" to the numeric part of the filename
End If
src.Close False
Set src = Nothing
So, my question is, how can I get my code to recognize the string of digits no matter its position in the file name?
Code
^\d{0,9}\B|\b\d{0,9}(?=\.)
Usage
I decided to make a function that can be called inside a cell as such: =GetMyNum(x) where x is a pointer to a cell (i.e. A1).
To get the code below to work:
Open Microsoft Visual Basic for Applications (ALT + F11)
Insert a new module (right click in the Project Pane and select Insert -> Module).
Click Tools -> References and find Microsoft VBScript Regular Expressions 5.5, enable it and click OK
Now copy/paste the following code into the new module:
Option Explicit
Function GetMyNum(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
Dim match As Object
strPattern = "^\d{0,9}\B|\b\d{0,9}(?=\.)"
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set match = regEx.Execute(strInput)
GetMyNum = match.Item(0)
Else
GetMyNum = ""
End If
End If
End Function
Results
Input
A1: Siemens Hot - Cold Report 010117.xls
A2: 010117Siemens Hot - Cold Report.xls
B1: =GetMyNum(A1)
B2: =GetMyNum(A1)
Output
010117 # Contents of B1
010117 # Contents of B2
Explanation
I will explain each regex option separately. You can reorder the options in terms of importance in such a way that the most important option is first and least important is last.
^\d{0,9}\B Match the following
^ Assert position at the start of the line
\d{0,9} Match any digit 0-9 times
\B Ensure position does not match where a word boundary matches (this is used but may be dropped depending on usage - I added it because it seems the number you're trying to get is immediately followed by a word character and not followed by a space - if that's not always the case just remove this token)
\b\d{0,9}(?=\.) Match the following
\b Assert position as a word boundary
\d{0,9} Match any digit 0-9 times
(?=\.) Positive lookahead ensuring a literal dot . follows
Just my alternative solution to RegEx :)
This finds the first occurence of 6 consecutive digits, omitting blanks and periods... although there are probably some more issues with using IsNumeric as I believe a lowercase e is considered acceptable by it...
Sub FindTheNumber()
For i = 1 To Len(Range("A1").Value)
If IsNumeric(Mid(Range("A1").Value, i, 6)) = True And InStr(Mid(Range("A1").Value, i, 6), " ") = 0 And InStr(Mid(Range("A1").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A1").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
For i = 1 To Len(Range("A2").Value)
If IsNumeric(Mid(Range("A2").Value, i, 6)) = True And InStr(Mid(Range("A2").Value, i, 6), " ") = 0 And InStr(Mid(Range("A2").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A2").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
End Sub
Examples:
Immediate window:

How to extract ad sizes from a string with excel regex

I am trying to extract ad sizes from string. The ad sizes are all set standard sizes. So while I'd prefer to have a regex that looks for a pattern, IE 3 numbers followed by 2 or 3 numbers, hard coding it will also work, since we know what the sizes will be. Here's an example of some of the ad sizes:
300x250
728x90
320x50
I was able to find some VBScript that I modified that almost works, but because my strings that I'm searching are inconsistent, it's pulling too much in some cases. For example:
You see how it's not matching correctly in every instance.
The VB code I found is actually matching everything EXCEPT that ad sizes. I don't know enough about VBScript to reverse it to just look for ad sizes and pull them. So instead it looks for all other text and removes it.
The code is below. Is there a way to fix the Regex so that it just returns the ad sizes?
Function getAdSize(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([^300x250|728x90])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
getAdSize = regEx.Replace(strInput, strReplace)
Else
getAdSize = "Not matched"
End If
End If
End Function
NOTE, THE DATA IS NOT ALWAYS PRECEDED BY AN UNDERSCORE, SOMETIMES IT IS A DASH OR A SPACE BEFORE AND AFTER.
EDIT: Since it's not actually underscore delimited we can't use Split. We can however iterate over the string and extract the "#x#" manually. I have updated the code to reflect this and verified that it works successfully.
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
If IsNumeric(Temp) Then
Ad = Ad & Temp
Else
If Temp = "x" Then
Ad = Ad & Temp
Else
If Ad Like "*#x#*" Then
ExtractAdSize = Ad
Exit Function
Else
Ad = vbNullString
End If
End If
End If
Next i
End If
End Function
Alternate version of the same function using Select Case boolean logic instead of nested If statements:
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
Select Case Abs(IsNumeric(Temp)) + Abs((Temp = "x")) * 2 + Abs((Ad Like "*#x#*")) * 4
Case 0: Ad = vbNullString 'Temp is not a number, not an "x", and Ad is not valid
Case 1, 2, 5: Ad = Ad & Temp 'Temp is a number or an "x"
Case 4, 6: ExtractAdSize = Ad 'Temp is not a number, Ad is valid
Exit Function
End Select
Next i
End If
End Function
I have managed to make about 95% of the required answer - the RegEx below will remove the DDDxDD size and would return the rest.
Option Explicit
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
E.g. you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
From here you can continue trying to find a way to reverse the display.
Edit:
To go from the 95% to the 100%, I have asked a question here and it turns out that the conditional block should be changed to the following:
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
This formula could work if it's always 3 characters, then x, and it's always between underscores - adjust accordingly.
=iferror(mid(A1,search("_???x*_",A1)+1,search("_",A1,search("_???x*_",A1)+1)-(search("_???x*_",A1)+1)),"No match")

send regexp matches to an array of strings

I'm trying to get the code below to send the results of the regexp search to an array of strings. How can I do that?
When I change name to an array of strings i.e. Dim name() as String VBA throws a type-mismatch exception. Any idea what I can do to fix that?
Many thanks.
Do While Not EOF(1)
Line Input #1, sText
If sText <> "" Then
Dim Regex As Object, myMatches As Object
' instantiates regexp object
Set Regex = CreateObject("VBScript.RegExp")
With Regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern = "^Personal\sname\s*[:]\s*"
End With
' get name, seperated from Personal Name
If Regex.test(sText) Then
Set myMatches = Regex.Execute(sText)
Dim temp As String
temp = Regex.Replace(sText, vbNullString)
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
Dim temp2 As String
temp2 = myMatches.Item(0)
name = Trim(Left(temp2, Len(temp2) - 3))
End If
End If
Loop
You should not use "name" as a variable name as it conflicts with an excel property. Try sName or sNames instead, where s is for string.
With a array you need to give it a size before you can assign a value to each element.
Dim sNames(4) As String '// Or Dim sNames(1 To 4) As String
sName(1) = "John"
...
sName(4) = "Sam"
or if you don't know the total number of elements (names) to begin with then:
Dim sNames() As String
Dim iTotalNames As Integer
iTotalNames = '// Some code here to determine how many names you will have
ReDim sNames(iTotalNames) '// You can also use ReDim Preserve if you have existing elements
sName(1) = "John"
...
sName(4) = "Sam"
So I suspect you will need something like:
Dim sNames() As String
Dim iTotalNames As Integer
'// Your code ....
iTotalNames = iTotalNames + 1
ReDim Preserve sNames(iTotalNames)
sNames(iTotalNames) = Trim(Left(temp2, Len(temp2) - 3))
'// Rest of your code ...
Also in VBA all dimensioning of variables should be at the top of the module.
change
'call this "A"
Dim temp2 As String
temp2 = myMatches.Item(0)
to
'stick this at the top
redim temp2(0 to 0)
'replace "A" with this
new_top = ubound(temp2)+1
redim preserve temp2 (0 to new_top)
temp2(new_top) = myMatches.Item(0)