I’m looking for a way to find and replace in VBA that only looks at text within double quotes and handles all occurrences.
I am writing a SQL parser that will convert Access Jet SQL statements into T-SQL for SQL Server. One of the hang-ups I have is converting double quotes into single quotes when single quotes are part of literal output.
I had been using SQL = Replace(SQL, """", "'") until I came across some legitimate single quotes embedded in strings, which will get messed up by that command.
For example, if the SQL statement in Access is SELECT "Kat's code is righteous"
The Replace() function will end up converting this to SELECT 'Kat's code is righteous' which results in an extra single quote that T-SQL won't like.
I'm looking for a function that will return SELECT 'Kat''s code is righteous' so it will work in T-SQL.
I started by looking for a RegEx solution and then decided it may be too complicated so I started writing a function that looped through each character in the string. A challenge is that eventually I was going to use the VBA Replace() function and it doesn't report how many times it made a replace, so after the replace I wasn't sure how much to move the loop index to search for the next match. Now I'm leaning back on RegEx but am not sure in VBA how to have it replace text within each matched result and minimize the chance of it corrupting the string. I've tried a RegEx pattern of "([^"]*)" but not sure how to make it only find matches that contain a single quote. Example: https://regexr.com/483n9
I've loaded a sample SQL select statement into a variable for testing:
Public Sub Test_ReplaceInQuotes()
Dim sTest As String
sTest = "SELECT ""Kat's code is righteous."", left(""abc"",1), right('source code',4), ""Aaron's code has been righteous too."", ""Kat's code is righteous."", ""Right answer is '"" & Table.RightAnswer & ""'"""
Debug.Print "Access:", sTest
Debug.Print "Converted:", ReplaceInQuotes(sTest, "'", "''")
'Debug.Print "Converted:", ReplaceInQuotes(sTest, "code", "source code") ' <- Make sure a longer replacement string doesn't break it.
'Debug.Print "Converted:", ReplaceInQuotes(sTest, "right", "hid") ' <- Make sure it doesn't mess up the right() function.
' In another part of my parser I will replace ALL double quotes with single quotes, and & with +.
Debug.Print "Final TSQL:", replace(ReplaceInQuotes(sTest, "'", "''"), """", "'")
End Sub
This is the output I expect it to generate:
Access: SELECT "Kat's code is righteous.", left("abc",1), right('source code',4), "Aaron's code has been righteous too.", "Kat's code is righteous.", "Right answer is '" & Table.RightAnswer & "'"
Converted: SELECT "Kat''s code is righteous.", left("abc",1), right('source code',4), "Aaron''s code has been righteous too.", "Kat''s code is righteous.", "Right answer is ''" & Table.RightAnswer & "''"
Final TSQL: SELECT 'Kat''s code is righteous.', left('abc',1), right('source code',4), 'Aaron''s code has been righteous too.', 'Kat''s code is righteous.', 'Right answer is ''' & Table.RightAnswer & ''''
A nuance of Jet SQL is that it allows literal strings to be wrapped in single or double quotes, such as In ('ab',"cd", 'efg'). T-SQL only accepts strings in single quotes.
Please try this approach.
Public Sub Test_ReplaceInQuotes()
Dim sTest As String
Dim Sp() As String
Dim p As Integer, q As Integer
Dim i As Integer
sTest = "SELECT ""Kat's code is righteous."", left(""abc"",1), right('source code',4), ""Aaron's code has been righteous too."", ""Kat's code is righteous."", ""Right answer is '"" & Table.RightAnswer & ""'"""
' Debug.Print "Access:", sTest
Sp = Split(sTest, ",")
For i = 0 To UBound(Sp)
p = InStr(Sp(i), "('")
If p Then
If Right(Trim(Sp(i)), 1) = "'" Then
Sp(i) = Left(Sp(i), p) & Chr(34) & Mid(Sp(i), p + 2)
For q = Len(Sp(i)) To 1 Step -1
If Mid(Sp(i), q, 1) = "'" Then
Sp(i) = Left(Sp(i), q - 1) & Chr(34) & Mid(Sp(i), q + 1)
Exit For
End If
Next q
End If
End If
Next i
Debug.Print Replace(Replace(Join(Sp, ","), "'", "''"), Chr(34), "'")
End Sub
Here is the solution based on RegEx:
Option Explicit
Sub Test()
Dim s As String
Dim r As String
Dim i As Long
Dim m As Object
s = "SELECT ""Kat's code is righteous."", left(""abc"",1), right('source code',4), ""Aaron's code has been righteous too."", ""Kat's code is righteous."", ""Right answer is '"" & Table.RightAnswer & ""'"""
r = ""
i = 1
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "('(?:''|[^'])*')|(""[^""]*"")"
For Each m In .Execute(s)
With m
If .SubMatches(1) <> "" Then
r = r & Mid(s, i, .FirstIndex + 1 - i)
r = r & Replace(Replace(Mid(s, .FirstIndex + 1, .Length), "'", "''"), """", "'")
Else
r = r & Mid(s, i, .FirstIndex + 1 + .Length - i)
End If
i = .FirstIndex + 1 + .Length
End With
Next
End With
If i <= Len(s) Then r = r & Mid(s, i, Len(s) - i + 1)
Debug.Print s
Debug.Print r
End Sub
The output is as follows:
SELECT "Kat's code is righteous.", left("abc",1), right('source code',4), "Aaron's code has been righteous too.", "Kat's code is righteous.", "Right answer is '" & Table.RightAnswer & "'"
SELECT 'Kat''s code is righteous.', left('abc',1), right('source code',4), 'Aaron''s code has been righteous too.', 'Kat''s code is righteous.', 'Right answer is ''' & Table.RightAnswer & ''''
I need to extract numeric info from text.
Ready
State: CTYG Work Request #: 2880087 General
Job Address
Contact
Work Request Search
My code :
$Text = WinGetText("[ACTIVE]")
Sleep(4000)
$Value = StringSplit($Text, #CRLF)
MsgBox(0, "Hello", $Value, 10) ;---1st message box
Sleep(4000)
For $i = 1 To $Value[0]
If StringRegExp($Value[$i], "[0-9][^:alpha:]") Then
MsgBox(0, "Hello1", $Value[$i], 5) ;---2nd message box
Sleep(200)
$newWR = $Value[$i]
MsgBox(0, "Hello2", $newWR, 10)
ConsoleWrite($newWR) ;---3rd message box
EndIf
Next
1st MsgBox() shows nothing. The 2nd and 3rd show State: CTYG Work Request #: 2880087 General. But I don't need the entire line, I just want 2880087.
What about this? This will delete everything but numbers.
$str = "State: CTYG Work Request #: 2880087 General"
ConsoleWrite(StringRegExpReplace($str, '\D', '') & #CRLF)
… i just want 2880087 …
Example using regular expression State: .+ #: (\d+) :
#include <StringConstants.au3>; StringRegExp()
#include <Array.au3>
Global Const $g_sText = 'Ready' & #CRLF & #CRLF _
& 'State: CTYG Work Request #: 2880087 General' & #CRLF & #CRLF _
& 'Job Address' & #CRLF & #CRLF _
& 'Contact' & #CRLF & #CRLF _
& 'Work Request Search'
Global Const $g_sRegEx = 'State: .+ #: (\d+)'
Global Const $g_aResult = StringRegExp($g_sText, $g_sRegEx, $STR_REGEXPARRAYMATCH)
ConsoleWrite($g_sText & #CRLF)
_ArrayDisplay($g_aResult)
Stores 2880087 to $g_aResult[0].
I need to replace some text inside a file with the python re module.
Here is the input value :
<li><span class="PCap CharOverride-4">Contrôles</span> <span class="PCap CharOverride-4">Testes</span></li>
and the excepting output is this :
<li><span class="PCap CharOverride-4">C<span style="font-size:83%">ONTRôLES</span></span>
<span class="PCap CharOverride-4">T<span style="font-size:83%">ESTES</span></span></li>
but insted, I get this as result :
<li><span class="PCap CharOverride-4">C<span style="font-size:83%">ONTRôLES</span></span> <span class="PCap CharOverride-4">C<span style="font-size:83%">ONTRôLES</span></span></li>
Is there something that I missed ?
Here is what I've done so far :
for line in file_data.readlines():
#print(line)
reg = re.compile(r'(?P<b1>(<'+balise_name+' class="(([a-zA-Z0-9_\-]*?) |)'+class_value+')(| ([a-zA-Z0-9_\-]*?))">)(?P<maj>([A-ZÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ]))(?P<min>([a-zàáâãäåæçèéëìíîïðòóôõöøùúûüýÿµœš]*?))(?P<b2>(<\/'+balise_name+'>))')
#print(reg)
search = reg.findall(line)
print(search)
if (search != None):
for matchObj in search:
print(matchObj)
#print(matchObj[8])
print(line)
balise1 = matchObj[0] #search.group('b1')
print(balise1)
balise2 = matchObj[10] #matchObj.group('b2')
print(balise2)
maj = matchObj[6] #matchObj.group('maj')
print(maj)
min = matchObj[8] #matchObj.group('min')
print(min)
sub_str = balise1+""+maj+"<span style=\"font-size:83%\">"+min.upper()+"</span>"+balise2
line = re.sub(reg, sub_str, line)
#ouverture du fichier pour ajour ligne
filename = file_name.split(".")
#file_result = open(filename[0]+"-OK."+filename[1], "a")
#file_result.writelines(line)
#file_data.writelines(line)
#file_result.close()
print(line)
NB : I don't know how to use the module Beautifulsoup of python so why I do it manually.
Pardon me for my poor english.
Thanks for your answer !!
So, I totally forgot about this question but here is the solution I came up with after fixing the code I wrote long time ago :
for line in file_data.readlines():
reg = re.compile(r'(?P<b1>(\<' + balise_name + ' class=\"(([a-zA-Z0-9_\-]*?) |)' + class_value +
')(| ([a-zA-Z0-9_\-]*?))\"\>)(?P<maj>([A-ZÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ]))(?P<min>([a-zàáâãäåæçèéëìíîïðòóôõöøùúûüýÿµœš]*?))(?P<b2>(\<\/' + balise_name + '\>))')
print(line)
while reg.search(line):
search = reg.search(line)
if search:
print(search)
while search:
balise1 = search[0] # search.group('b1')
print('b1 : ' + str(balise1))
balise2 = search[11] # search.group('b2')
print('b2 : ' + str(balise2))
maj = search[7] # search.group('maj')
print('maj : ' + str(maj))
min = search[9] # search.group('min')
print('min : ' + str(min))
sub_str = search[1] + "" + maj + "<span style=\"font-size:83%\">" + min.upper() + \
"</span>" + balise2
print(sub_str)
line = re.sub(str(search[0]), sub_str, line)
print(line)
search = None
Here is what I changed with the code :
Fix some unescaped char inside the pattern
Iterate the result one by one
Fix group number for the sub function
Hope it will help someone who faced the same problem as me.
This is my string:
<HOLDERS><ACCOUNTHOLDER Title="" Initials="" FirstName="" Surname="" Name="AN'A"N&D & TEST'S"I&X" CifKey="ANA"D.TSX000" CustomerType="2" PrimaryPan="00027272898"/></HOLDERS>
how do I replace the double quotes " in the Name and cifkey and replace them with
"
while still maintaining the double quotes everywhere else in the string?
the output should be
<HOLDERS><ACCOUNTHOLDER Title="" Initials="" FirstName="" Surname="" Name="AN'A"N&D & TEST'S"I&X" CifKey="ANA"D.TSX000" CustomerType="2" PrimaryPan="00027272898"/></HOLDERS>
I wrote all the passages just to be clear but you can make a function and reduce everything to few lines.
Probably isn't the cleanest way but start making it work and improve it after. Assuming txt is your string.
index1 = InStr(txt," Name=") 'space before Name is important not to confuse with firstName, Surname and so on...
index2 = InStr(txt, " CifKey=") 'space before Cifkey is important...
index3 = InStr(txt, " CustomerType=") 'space before CustomerType is important...
SubStrName = Mid(txt, index1 + 7, index2 - 1)
txt = Replace(txt,"SubStrName","##AAA##") 'modifing original string with placeholder which you should be sure is never inside the text
NewChunk1 = Replace(SubStrName,""",""")
txt = Replace(txt,"##AAA##","NewChunk1")
SubStrCifKey = Mid(txt, index2 + 8, index3 - 1)
txt = Replace(txt,"SubStrCifKey","##BBB##")
NewChunk2 = Replace(SubStrName,""",""")
txt = Replace(txt,"##BBB##","NewChunk2")
I have a cell array 3x1 like this:
name1 = text1
name2 = text2
name3 = text3
and I want to parse it into separate cells 1x2, for example name1 , text1. In future I want to treat text1 as a string to compare with other strings. How can I do it? I am trying with regexp and tokens, but I cannot write a proper formula for that, if someone can help me with it please, I will be grateful!
This code
input = {'name1 = text1';
'name2 = text2';
'name3 = text3'};
result = cell(size(input, 1), 2);
for row = 1 : size(input, 1)
tokens = regexp(input{row}, '(.*)=(.*)', 'tokens');
if ~isempty(tokens)
result(row, :) = tokens{1};
end
end
produces the outcome
result =
'name1 ' ' text1'
'name2 ' ' text2'
'name3 ' ' text3'
Note that the whitespace around the equal sign is preserved. You can modify this behaviour by adjusting the regular expression, e.g. also try '([^\s]+) *= *([^\s]+)' giving
result =
'name1' 'text1'
'name2' 'text2'
'name3' 'text3'
Edit: Based on the comments by user1578163.
Matlab also supports less-greedy quantifiers. For example, the regexp '(.*?) *= *(.*)' (note the question mark after the asterisk) works, if the text contains spaces. It will transform
input = {'my name1 = any text1';
'your name2 = more text2';
'her name3 = another text3'};
into
result =
'my name1' 'any text1'
'your name2' 'more text2'
'her name3' 'another text3'