Find and Replace text that is within double quotes with VBA - regex

I’m looking for a way to find and replace in VBA that only looks at text within double quotes and handles all occurrences.
I am writing a SQL parser that will convert Access Jet SQL statements into T-SQL for SQL Server. One of the hang-ups I have is converting double quotes into single quotes when single quotes are part of literal output.
I had been using SQL = Replace(SQL, """", "'") until I came across some legitimate single quotes embedded in strings, which will get messed up by that command.
For example, if the SQL statement in Access is SELECT "Kat's code is righteous"
The Replace() function will end up converting this to SELECT 'Kat's code is righteous' which results in an extra single quote that T-SQL won't like.
I'm looking for a function that will return SELECT 'Kat''s code is righteous' so it will work in T-SQL.
I started by looking for a RegEx solution and then decided it may be too complicated so I started writing a function that looped through each character in the string. A challenge is that eventually I was going to use the VBA Replace() function and it doesn't report how many times it made a replace, so after the replace I wasn't sure how much to move the loop index to search for the next match. Now I'm leaning back on RegEx but am not sure in VBA how to have it replace text within each matched result and minimize the chance of it corrupting the string. I've tried a RegEx pattern of "([^"]*)" but not sure how to make it only find matches that contain a single quote. Example: https://regexr.com/483n9
I've loaded a sample SQL select statement into a variable for testing:
Public Sub Test_ReplaceInQuotes()
Dim sTest As String
sTest = "SELECT ""Kat's code is righteous."", left(""abc"",1), right('source code',4), ""Aaron's code has been righteous too."", ""Kat's code is righteous."", ""Right answer is '"" & Table.RightAnswer & ""'"""
Debug.Print "Access:", sTest
Debug.Print "Converted:", ReplaceInQuotes(sTest, "'", "''")
'Debug.Print "Converted:", ReplaceInQuotes(sTest, "code", "source code") ' <- Make sure a longer replacement string doesn't break it.
'Debug.Print "Converted:", ReplaceInQuotes(sTest, "right", "hid") ' <- Make sure it doesn't mess up the right() function.
' In another part of my parser I will replace ALL double quotes with single quotes, and & with +.
Debug.Print "Final TSQL:", replace(ReplaceInQuotes(sTest, "'", "''"), """", "'")
End Sub
This is the output I expect it to generate:
Access: SELECT "Kat's code is righteous.", left("abc",1), right('source code',4), "Aaron's code has been righteous too.", "Kat's code is righteous.", "Right answer is '" & Table.RightAnswer & "'"
Converted: SELECT "Kat''s code is righteous.", left("abc",1), right('source code',4), "Aaron''s code has been righteous too.", "Kat''s code is righteous.", "Right answer is ''" & Table.RightAnswer & "''"
Final TSQL: SELECT 'Kat''s code is righteous.', left('abc',1), right('source code',4), 'Aaron''s code has been righteous too.', 'Kat''s code is righteous.', 'Right answer is ''' & Table.RightAnswer & ''''
A nuance of Jet SQL is that it allows literal strings to be wrapped in single or double quotes, such as In ('ab',"cd", 'efg'). T-SQL only accepts strings in single quotes.

Please try this approach.
Public Sub Test_ReplaceInQuotes()
Dim sTest As String
Dim Sp() As String
Dim p As Integer, q As Integer
Dim i As Integer
sTest = "SELECT ""Kat's code is righteous."", left(""abc"",1), right('source code',4), ""Aaron's code has been righteous too."", ""Kat's code is righteous."", ""Right answer is '"" & Table.RightAnswer & ""'"""
' Debug.Print "Access:", sTest
Sp = Split(sTest, ",")
For i = 0 To UBound(Sp)
p = InStr(Sp(i), "('")
If p Then
If Right(Trim(Sp(i)), 1) = "'" Then
Sp(i) = Left(Sp(i), p) & Chr(34) & Mid(Sp(i), p + 2)
For q = Len(Sp(i)) To 1 Step -1
If Mid(Sp(i), q, 1) = "'" Then
Sp(i) = Left(Sp(i), q - 1) & Chr(34) & Mid(Sp(i), q + 1)
Exit For
End If
Next q
End If
End If
Next i
Debug.Print Replace(Replace(Join(Sp, ","), "'", "''"), Chr(34), "'")
End Sub

Here is the solution based on RegEx:
Option Explicit
Sub Test()
Dim s As String
Dim r As String
Dim i As Long
Dim m As Object
s = "SELECT ""Kat's code is righteous."", left(""abc"",1), right('source code',4), ""Aaron's code has been righteous too."", ""Kat's code is righteous."", ""Right answer is '"" & Table.RightAnswer & ""'"""
r = ""
i = 1
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "('(?:''|[^'])*')|(""[^""]*"")"
For Each m In .Execute(s)
With m
If .SubMatches(1) <> "" Then
r = r & Mid(s, i, .FirstIndex + 1 - i)
r = r & Replace(Replace(Mid(s, .FirstIndex + 1, .Length), "'", "''"), """", "'")
Else
r = r & Mid(s, i, .FirstIndex + 1 + .Length - i)
End If
i = .FirstIndex + 1 + .Length
End With
Next
End With
If i <= Len(s) Then r = r & Mid(s, i, Len(s) - i + 1)
Debug.Print s
Debug.Print r
End Sub
The output is as follows:
SELECT "Kat's code is righteous.", left("abc",1), right('source code',4), "Aaron's code has been righteous too.", "Kat's code is righteous.", "Right answer is '" & Table.RightAnswer & "'"
SELECT 'Kat''s code is righteous.', left('abc',1), right('source code',4), 'Aaron''s code has been righteous too.', 'Kat''s code is righteous.', 'Right answer is ''' & Table.RightAnswer & ''''

Related

Regex to replace word except in comments

How can I modify my regex so that it will ignore the comments in the pattern in a language that doesn't support lookbehind?
My regex pattern is:
\b{Word}\b(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
\b{Word}\b : Whole word, {word} is replaced iteratively for the vocab list
(?=([^""\](\.|""([^""\]\.)[^""\]""))[^""]$) : Don't replace anything inside of quotes
My goal is to lint variables and words so that they always have the same case. However I do not want to lint any words in a comment. (The IDE sucks and there is no other option)
Comments in this language are prefixed by an apostrophe. Sample code follows
' This is a comment
This = "Is not" ' but this is
' This is a comment, what is it's value?
Object.value = 1234 ' Set value
value = 123
Basically I want the linter to take the above code and say for the word "value" update it to:
' This is a comment
This = "Is not" ' but this is
' This is a comment, what is it's value?
Object.Value = 1234 ' Set value
Value = 123
So that all code based "Value" are updated but not anything in double quotes or in a comment or part of another word such as valueadded wouldn't be touched.
I've tried several solutions but haven't been able to get it to work.
['.*] : Not preceeding an apostrophy
(?<!\s*') : BackSearch not with any spaces with apoostrophy
(?<!\s*') : Second example seemed incorrect but this won't work as the language doesn't support backsearches
Anybody have any ideas how I can alter my pattern so that I don't edit commented variables
VBA
Sub TestSO()
Dim Code As String
Dim Expected As String
Dim Actual As String
Dim Words As Variant
Code = "item = object.value ' Put item in value" & vbNewLine & _
"some.item <> some.otheritem" & vbNewLine & _
"' This is a comment, what is it's value?" & vbNewLine & _
"Object.value = 1234 ' Set value" & vbNewLine & _
"value = 123" & vbNewLine
Expected = "Item = object.Value ' Put item in value" & vbNewLine & _
"some.Item <> some.otheritem" & vbNewLine & _
"' This is a comment, what is it's value?" & vbNewLine & _
"Object.Value = 1234 ' Set value" & vbNewLine & _
"Value = 123" & vbNewLine
Words = Array("Item", "Value")
Actual = SOLint(Words, Code)
Debug.Print Actual = Expected
Debug.Print "CODE: " & vbNewLine & Code
Debug.Print "Actual: " & vbNewLine & Actual
Debug.Print "Expected: " & vbNewLine & Expected
End Sub
Public Function SOLint(ByVal Words As Variant, ByVal FileContents As String) As String
Const NotInQuotes As String = "(?=([^""\\]*(\\.|""([^""\\]*\\.)*[^""\\]*""))*[^""]*$)"
Dim RegExp As Object
Dim Regex As String
Dim Index As Variant
Set RegExp = CreateObject("VBScript.RegExp")
With RegExp
.Global = True
.IgnoreCase = True
End With
For Each Index In Words
Regex = "[('*)]\b" & Index & "\b" & NotInQuotes
RegExp.Pattern = Regex
FileContents = RegExp.Replace(FileContents, Index)
Next Index
SOLint = FileContents
End Function
As discussed in the comments above:
((?:\".*\")|(?:'.*))|\b(v)(alue)\b
3 Parts to this regex used with alternation.
A non-capturing group for text within double quotes, as we dont need that.
A non-capturing group for text starting with single quote
Finally the string "value" is split into two parts (v) and (value) because while replacing we can use \U($2) to convert v to V and rest as is so \E$3 where \U - converts to upper case and \E - turns off the case.
\b \b - word boundaries are used to avoid any stand-alone text which is not part of setting a value.
https://regex101.com/r/mD9JeR/8

Vba: Regular expression to count the number of words in a string delimited by special characters

Need some help writing a regular expression to count the number of words in a string (Please note the data is a html string, which needs to be placed into a spreadsheet) when separated either by any special characters like . , - , +, /, Tab etc. Count should exclude special characters.
**Original String** **End Result**
Ex : One -> 1
One. -> 1
One Two -> 2
One.Two -> 2
One Two. -> 2
One.Two. -> 2
One.Tw.o -> 3
Updated
I think you asked a valuable question and this downvoting is not fair!
Function WCount(ByVal strWrd As String) As Long
'Variable declaration
Dim Delimiters() As Variant
Dim Delimiter As Variant
'Initialization
Delimiters = Array("+", "-", ".", "/", Chr(13), Chr(9)) 'Define your delimiter characters here.
'Core
For Each Delimiter In Delimiters
strWrd = Replace(strWrd, Delimiter, " ")
Next Delimiter
strWrd = Trim(strWrd)
Do While InStr(1, strWrd, " ") > 0
strWrd = Replace(strWrd, " ", " ")
Loop
WCount = UBound(Split(strWrd, " ")) + 1
End Function
________________
You can use this function as a UDF in excel formulas or can use in another VBA codes.
Using in formula
=WCOUNT("One.Two.Three.") or =WCOUNT($A$1") assuming your string is in A1 cell.
Using in VBA
(With assume passing your string with Str argument.)
Sub test()
Debug.Print WCount(Str)
End Sub
Regards.
Update
I have test your text as shown below.
copy your text in a Cell of Excel as shown.
The code updated for Line break and Tab characters and count your string words correctly now.
Try this code, all necessary comments are in code:
Sub SpecialSplit()
Dim i As Long
Dim str As String
Dim arr() As String
Dim delimeters() As String
'here you define all special delimeters you want to use
delimetres = Array(".", "+", "-", "/")
For i = 1 To 9
str = Cells(i, 1).Value
'this will protect us from situation where last character is delimeter and we have additional empty string
str = Left(str, Len(str) - 1)
'here we replace all special delimeters with space to simplify
For Each delimeter In delimetres
str = Replace(str, delimeter, " ")
Next
arr = Split(str)
Cells(i, 2).Value = UBound(arr) - LBound(arr) + 1
Next
End Sub
With your posted data following RegExp is working correctly. Put this in General Module in Visual Basic Editor.
Public Function CountWords(strInput As String) As Long
Dim objMatches
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = "\w+"
Set objMatches = .Execute(strInput)
CountWords = objMatches.Count
End With
End Function
You have to use it like a normal formula. e.g. assuming data is in cell A1 function would be:
=CountWords(A1)
For your information, it can be also achieved through formula if number of characters are specific like so:
=LEN(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," ")))-LEN(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," "))," ",""))+1

Marking word as "find and replace" in Microsoft Word with RegEx

I try to highlight a word found by RegEx, and if the right to replace it with its corresponding substitute.
The code works correctly only if NOT substituted.
Probably should every time rearrange???
Sub Replace()
Dim regExp As Object
Set regExp = CreateObject("vbscript.regexp")
Dim arr As Variant
Dim arrzam As Variant
Dim i As Long
Dim choice As Integer
Dim Document As Word.Range
Set Document = ActiveDocument.Content
On Error Resume Next
'EGN
'IBAN
arr = VBA.Array("((EGN(:{0,1})){0,1})[0-9]{10}", _
"[a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{4}[0-9]{7}([a-zA-Z0-9]?){0,16}")
arrzam = VBA.Array("[****]", _
"[IBAN]")
With regExp
For i = 0 To UBound(arr)
.Pattern = arr(i)
.Global = True
For Each Match In regExp.Execute(Document)
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).Duplicate.Select
choice = MsgBox("Replace " & Chr(34) & Match.Value & Chr(34) & " with " & Chr(34) & arrzam(i) & Chr(34) & "?", _
vbYesNoCancel + vbDefaultButton1, "Replace")
If choice = vbYes Then
Document = .Replace(Document, arrzam(i))
ElseIf choice = vbCancel Then
Next
End If
Next
Next
End With
End Sub
Actually, there are several things wrong with this.
First, the each Match in Each Match is static, determined at the moment of the first loop. You're changing the document in the meantime, so each successive Match looks at an old position.
Second, you're replacing all the occurrences at one time, so there is no need to loop through them. It seems a one line, one time Replace could do the same thing.

Excel 2010 find and replace: Replace with found wildcard

I am trying to truncate some data in an excel sheet by removing the second word (if applicable) from each cell. That is, if a cell has two words, I want it to remove the second one. An example would be finding foo bar and replacing it with foo. From research, I found that the following works in Excel 2011 (for Mac), but not other versions:
Find: (*) (*)
Replace with: \1
How can I accomplish this in my version of Excel? Also, is there an alternative method by which I can obtain the same result?
If you want a formula..
=IF(FIND(" ",A2),TRIM(LEFT(A2,FIND(" ",A2))),A2)
or a VBA subroutine..
Sub GetFirstWord()
Application.ScreenUpdating = False
Dim n As Integer
Dim strWord As String
n = 2
Do While Cells(n, 1).Value <> ""
strWord = Cells(n, 1).Value
If InStr(1, strWord, " ") Then
Cells(n, 1).Value = Replace(Left(strWord, InStr(1, strWord, " ")), strWord, "")
End If
n = n + 1
Loop
Application.ScreenUpdating = True
End Sub
Both solutions as written above assume the words are located in Column A.

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function
Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.
Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function