vbscript function clean string only allow certain characters - regex

Until now I've been manually adding characters to replace that break my code. I'd like to be a bit more proactive, so I found this function that is supposed to replace everything EXCEPT valid characters. My first hurdle is that it doesn't seem to work. The code below is my full test file and the MsgBox comes up blank.
My second question is about performance. This function handles very, very large strings. Will this method be considerably slower? Anyone recommend anything else?
Function CleanUp (input)
Dim objRegExp, outputStr
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "((?![a-zA-Z0-9]).)+"
outputStr = objRegExp.Replace(input, "-")
objRegExp.Pattern = "\-+"
outputStr = objRegExp.Replace(outputStr, "-")
CleanUp = outputStr
End Function
MsgBox (CleanUp("Test"))
Edit: I'm stupid and just saw the variable mixup I did which was causing it to return nothing. It is working now. Will still accept input for performance question or better suggestions.

You can simplify it even further.
objRegExp.Pattern = "[^\w+]"

It don't know what is the expected result for your example, but maybe you can try that for the pattern instead:
objRegExp.Pattern = "[^a-zA-Z0-9]"
Hope it works

Related

Changing formulas on the fly with VBA RegEx

i'm trying to change formulas in excel, i need to change the row number of the formulas.
I'm trying do use replace regex to do this. I use an loop to iterate through the rows of the excel and need to change the formula for the row that is iterating at the time. Here is an exemple of the code:
For i = 2 To rows_aux
DoEvents
Formula_string= "=IFS(N19='Z001';'xxxxxx';N19='Z007';'xxxxxx';0=0;'xxxxxxx')"
Formula_string_new = regEx.Replace(Formula_string, "$1" & i)
wb.Cells(i, 33) = ""
wb.Cells(i, 33).Formula = Formula_string_new
.
.
.
Next i
I need to replace rows references but not the ones in quotes or double quotes. Example:
If i = 2 i want the new string to be this:
"=IFS(N2='Z001';'xxxxxx';N2='Z007';'xxxxxx';0=0;'xxxxxxx')"
I'm trying to use this regex:
([a-zA-Z]+)(\d+)
But its changing everything in quotes too. Like this:
If i = 2:
"=IFS(N2='Z2';'xxxxxx';N2='Z2';'xxxxxx';0=0;'xxxxxxx')"
If anyone can help me i will be very grateful!
Thanks in advance.
As others have written, there are probably better ways to write this code. But for a regex that will capture just the Column letter in capturing group #1, try:
\$?\b(XF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?
Note that is will NOT include the $ absolute addressing token, but could be altered if that were necessary.
Note that you can avoid the loop completely with:
Formula_string = "=IFS(N19=""Z001"",""xxxxxx"",N$19=""Z007"",""xxxxxx"",0=0,""xxxxxxx"")"
Formula_string_new = regEx.Replace(Formula_string, "$1" & firstRow)
With Range(wb.Cells(firstRow, 33), wb.Cells(lastRow, 33))
.Clear
.Formula = Formula_string_new
End With
When we write a formula to a range like this, the references will automatically adjust the way you were doing in your loop.
Depending on unstated factors, you may want to use the FormulaLocal property vice the Formula property.
Edit:
To make this a little more robust, in case there happens to be, within the quote marks, a string that exactly mimics a valid address, you can try checking to be certain that a quote (single or double) neither precedes nor follows the target.
Pattern: ([^"'])\$?\b(XF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b(?!['"])
Replace: "$1$2" & i
However, this is not "bulletproof" as various combinations of included data might match. If it is a problem, let me know and I'll come up with something more robust.
If you can identify some unique features like in the example preceding bracket ( or colon ; and trailing equal = then this might work
Sub test()
Dim s As String, sNew As String, i As Long
Dim Regex As Object
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([(;][a-zA-Z]{1,3})(\d+)="
End With
i = 1
s = "=IFS(NANA19='Z001';'xxxxxx';NA19='Z007';'xxxxxx';0=0;'xxxxxxx')"
sNew = Regex.Replace(s, "$1" & i & "=")
Debug.Print s & vbCr & sNew
End Sub

How to create a regex VBA macro for GIIN format validation

I'm trying to create a macro that will verify data in one column and then let me know if they are correctly formatted in the next column. I am very new to VBA so I apologize if my code is messy.
The format I am trying to verify is ABC123.AB123.AB.123 -- The first two sections can contain letters/numbers, the third section only letters, and the last section only numbers.
Any guidance would be greatly appreciated!
Function ValidGIIN(myGIIN As String) As String
Dim regExp As Object
Set regExp = CreateObject("VBScript.Regexp")
If Len(myGIIN) Then
.Global = True
.IgnoreCase = True
.Pattern = "[a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][.][a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_][.][a-zA-z_][a-zA-z_][.][0-9][0-9][0-9]"
End With
If regExp.Test(myGIIN) = True Then
ValidGIIN = "Valid"
Else
ValidGIIN = "Invalid"
End If
End If
Set regExp = Nothing
End Function
Try the following pattern
[a-zA-Z0-9]{6}\.[a-zA-Z0-9]{5}\.[A-Za-z]{2}\.\d{3}
You could call your function in a loop over cells in a column and use offset(0,1) to write result to next column to right.

Cleaning bad data in excel, splitting words by capital letters

I'm using excel 2011 on Mac OSX. I have a data set with about 3000 entries. In the fields that contain names, many of the names are not separated. First and last names are separated by a space, but separate names are bunched together.
Here's what I have, (one cell):
Grant MorrisonSholly FischBen OliverCarlos Alberto Fernandez UrbanoBen OliverCarlos Alberto Fernandez UrbanoBen OliverBen Oliver
Here's what I want to accomplish, (one cell, comma separated with one space after comma):
Grant Morrison, Sholly Fisch, Ben Oliver, Carlos Alberto, Fernandez Urbano, Ben Oliver, Carlos Alberto, Fernandez Urbano, Ben Oliver, Ben Oliver
I have found a few VBA scripts that will split words by capital letters, but the ones I've tried will add spaces where I don't need them like this one...
Function splitbycaps(inputstr As String) As String
Dim i As Long
Dim temp As String
If inputstr = vbNullString Then
splitbycaps = temp
Exit Function
Else
temp = inputstr
For i = 1 To Len(temp)
If Mid(temp, i, 1) = UCase(Mid(temp, i, 1)) Then
If i <> 1 Then
temp = Left(temp, i - 1) + " " + Right(temp, Len(temp) - i + 1)
i = i + 1
End If
End If
Next i
splitbycaps = temp
End If
End Function
There was another one that I found here that used RegEx, (forgive me, I'm just learning all of this so I may sound a little dumb) but when I tried that one, it wouldn't work at all, and my research pointed me to a way to add references to the library that would add the necessary tools so I could use it. Unfortunately, I cannot, for the life of me, find how to add a reference to the library on my mac version of excel... I may be doing something wrong, but this is the answer that I could not get to work...
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
I am basically brand new at adding custom functions via VBA through excel, and there may even be a better way to do this, but it seems like every answer that I come to just doesn't quite get the data right. Thanks for any answers!
My function from Split Uppercase words in Excel needs udpdating for your additional string matching.
You would use this function in cell B1 for text in A1 as follows
One assumption your cleansing does make is people have only two names, so
Ben OliverCarlos Alberto
is broken to
Ben Oliver
Carlos Alberto
is that actually what should happen? (needs a minor tweak if so)
code
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = Replace(.Replace(strIn, "$1, $2"), "<br>", ", ")
End With
End Function

Changing substring in a String

I've got a variable "Variable" in VBScript that will receive different values, based on names that come from xml files i don't trust. I can't let "Variable" have forbidden caracters on it (<, >, :, ", /, \, |, ?, * ) or characters with accents (I think they are called accent in english) like (Á, á, É, é, Â, â, Ê, ê, ñ, ã).
So, my question is: How can I create a script that studies and replace these possible multiple possible characters in the variable I have? I'm using a Replace function found in MSDN Library, but it won't let me alter many characters in the way I'm using it.
Example:
(Assuming a Node.Text value of "Example A/S")
For Each Node In xmlDoc.SelectNodes("//NameUsedToRenameFile")
Variable = Node.Text
Next
Result = Replace(Variable, "<", "-")
Result = Replace(Variable, "/", "-")
WScript.Echo Result
This Echo above returns me "Example A-S", but if I change my Replaces order, like:
Result = Replace(Variable, "/", "-")
Result = Replace(Variable, "<", "-")
I get a "Example A/S". How should I program it to be prepared to any possible characters? Thanks!
As discussed, it might be easier to do things the other way around; create a list of allowed characrters as VBScript is not so good at handling unicode like characters; whilst the characters you have listed may be fine, you may run into issues with certain character sets. here's an example routine that could help your cause:
Consider this command:
wscript.echo ValidateStr("This393~~_+'852Is0909A========Test|!:~#$%####")
Using the sample routine below, it should produce the following results:
This393852Is0909ATest
The sample routine:
Function ValidateStr (vsVar)
Dim vsAllowed, vscan, vsaScan, vsaCount
vsAllowed = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
ValidateStr = ""
If vartype(vsvar) = vbString then
If len(vsvar) > 0 then
For vscan = 1 To Len(vsvar)
vsValid = False
vsaCount = 1
Do While vsaValid = false and vsaCount <= len(vsAllowed)
If UCase(Mid(vsVar, vscan, 1)) = Mid(vsAllowed, vsaCount, 1) Then vsValid = True
vsaCount = vsaCount + 1
Loop
If vsValid Then ValidateStr = ValidateStr & Mid(vsVar, vscan,1)
Next
End If
End If
End Function
I hope this helps you with your quest. Enjoy!
EDIT: If you wish to continue with your original path, you will need to fix your replace command - it is not working because you are resetting it after each line. You'll need to pump in variable the first time, then use result every subsequent time..
You had:
Result = Replace(Variable, "/", "-")
Result = Replace(Variable, "<", "-")
You need to change this to:
Result = Replace(Variable, "/", "-")
Result = Replace(Result, "<", "-")
Result = Replace(Result, ...etc..)
Result = Replace(Result, ...etc..)
Edit: You could try Ansgar's Regex, as the code is by far more simple, but I am not sure it will work if as an example you had simplified Chinese characters in your string.
I agree with Damien that replacing everything but known-good characters is the better approach. I would, however, use a regular expression for this, because it greatly simplifies the code. I would also recommend to not remove "bad" characters, but to replace them with a known-good placeholder (an underscore for instance), because removing characters might yield undesired results.
Function SanitizeString(str)
Set re = New RegExp
re.Pattern = "[^a-zA-Z0-9]"
re.Global = True
SanitizeString = re.Replace(str, "_")
End Function

Microsoft office Access `LIKE` VS `RegEx`

I have been having trouble with the Access key term LIKE and it's use. I want to use the following RegEx (Regular Expression) in query form as a sort of "verfication rule" where the LIKE operator filters my results:
"^[0]{1}[0-9]{8,9}$"
How can this be accomplished?
I know you were not asking about the VBA, but it maybe you will give it a chance
If you open a VBA project, insert new module, then pick Tools -> References and add a reference to Microsoft VBScript Regular Expressions 5.5. Given that pate the code below to the newly inserted module.
Function my_regexp(ByRef sIn As String, ByVal mypattern As String) As String
Dim r As New RegExp
Dim colMatches As MatchCollection
With r
.Pattern = mypattern
.IgnoreCase = True
.Global = False
.MultiLine = False
Set colMatches = .Execute(sIn)
End With
If colMatches.Count > 0 Then
my_regexp = colMatches(0).Value
Else
my_regexp = ""
End If
End Function
Now you may use the function above in your SQL queries. So your question would be now solved by invoking
SELECT my_regexp(some_variable, "^[0]{1}[0-9]{8,9}$") FROM some_table
if will return empty string if nothing is matched.
Hope you liked it.
I don't think Access allows regex matches (except in VBA, but that's not what you're asking). The LIKE operator doesn't even support alternation.
Therefore you need to split it up into two expressions.
... WHERE (Blah LIKE "0#########") OR (Blah LIKE "0########")
(# means "a single digit" in Access).