In MySQL, what are the differences between QUOTE() and mysql_real_escape_string()? From the MySQL documentation, I know the following:
QUOTE()
Written into SQL query
Escapes backslash, single quote, NUL, CTRL+Z
Returns a single-quoted string
Behavior relies on the MySQL server's character set
mysql_real_escape_string()
Written in C/C++ before a query is executed, allowing the escaped string to be read/modified before submission
Very inconvenient to use when compared to QUOTE()
Escapes backslash, single quote, NUL, CTRL+Z, and double quote, \n, and \r
Apparently adds more quotes to make characters easily readable in log files
Behavior relies on the MySQL server's character set
Ignoring logs, is it useful to escape \n and \r characters? With these two functions, is there a difference in client/server function efficiency? mysql_real_escape_string() sounds useful if it's desirable for a developer to process the escaped string before it's entered into a query. However, does QUOTE() not provide the most secure and reliable method of escaping strings?
I wonder if I should use QUOTE() for all queries in all languages and forget escaping strings with language-specific functions.
QUOTE()
mysql_real_escape_string()
String Literals
It seems that QUOTE() is meant to be used within SQL statements that construct other SQL statements. If you are outside of SQL, you should use mysql_real_escape_string().
[...] In a C program, you can use the mysql_real_escape_string() C API function to escape characters. [...] Within SQL statements that construct other SQL statements, you can use the QUOTE() function.
As explained at the bottom of String Literals (MySQL Manual).
QUOTE() is already in the query, so it's just as easy to break out of as if you had put nothing there. mysql_real_escape_string is essential to make an arbitrary string safe to shove in a query.
The problem of the function name being unwieldly can be easily solved by using some kind of alias. I'm no C/C++ user, but doesn't it have macros you can use to essentially write whatever you want and it gets replaced with the long function name?
Related
Consider a slightly different toy example from my previous question:
. local string my first name is Pearly,, and my surname is Spencer
. tokenize "`string'", parse(",,")
. display "`1'"
my first name is Pearly
. display "`2'"
,
. display "`3'"
,
. display "`4'"
and my surname is Spencer
I have two questions:
Does tokenize work as expected in this case? I thought local macro
2 should be ,, instead of , while local macro 3 contain the rest of the string (and local macro 4 be empty).
Is there a way to force tokenize to respect the double comma as a parsing
character?
tokenize -- and gettoken too -- won't, from what I can see, accept repeated characters such as ,, as a composite parsing character. ,, is not illegal as a specification of parsing characters, but is just understood as meaning that , and , are acceptable parsing characters. The repetition in practice is ignored, just as adding "My name is Pearly" after "My name is Pearly" doesn't add information in a conversation.
To back up: know that without other instructions (such as might be given by a syntax command) Stata will parse a string according to spaces, except that double quotes (or compound double quotes) bind harder than spaces separate.
tokenize -- and gettoken too -- will accept multiple parse characters pchars and the help for tokenize gives an example with space and + sign. (It's much more common, in my experience, to want to use space and comma , when the syntax for a command is not quite what syntax parses completely.)
A difference between space and the other parsing characters is that spaces are discarded but other parsing characters are not discarded. The rationale here is that those characters often have meaning you might want to take forward. Thus in setting up syntax for a command option, you might want to allow something like myoption( varname [, suboptions])
and so whether a comma is present and other stuff follows is important for later code.
With composite characters, so that you are looking for say ,, as separators I think you'd need to loop around using substr() or an equivalent. In practice an easier work-around might be first to replace your composite characters with some neutral single character and then apply tokenize. That could need to rely on knowing that that neutral character should not occur otherwise. Thus I often use # as a character placeholder because I know that it will not occur as part of variable or scalar names and it's not part of function names or an operator.
For what it's worth, I note that in first writing split I allowed composite characters as separators. As I recall, a trigger to that was a question on Statalist which was about data for legal cases with multiple variations on VS (versus) to indicate which party was which. This example survives into the help for the official command.
On what is a "serious" bug, much depends on judgment. I think a programmer would just discover on trying it out that composite characters don't work as desired with tokenize in cases like yours.
I have a requirement to read the string with both single quotes and without quotes from a macro retrieve_context.
While calling the macro, users can call it with either single quotes or without quotes, like below:
%retrieve_context('american%s choice', work.phone_conv, '01OCT2015'd, '12OCT2015'd)
%retrieve_context(american%s choice, work.phone_conv, '01OCT2015'd, '12OCT2015'd)
How to read the first parameter in the macro without a single quote?
I tried %conv_quote = unquote(%str(&conv_quote)) but it did not work.
You're running into one of those differences between macros and data step language.
In macros, there is a concept of "quoting", hence the %unquote macro function. This doesn't refer to traditional " or ' characters, though; macro quoting is a separate thing, with not really any quote characters [there are some sort-of-characters that are used in some contexts in this regard, but they're more like placeholders]. They come from functions like %str, %nrstr, and %quote, which tokenize certain things in a macro variable so that they don't get parsed before they're intended to be.
In most contexts, though, the macro language doesn't really pay attention to ' and " characters, except to identify a quoted string in certain parsing contexts where it's necessary to do so to make things work logically. Hence, %unquote doesn't do anything about quotation marks; they are simply treated as regular characters.
You need to, instead, call a data step function to remove them (or some other things, but all of them are more complicated, like using various combinations of %substr and %index). This is done using %sysfunc, like so:
%let newvar = %sysfunc(dequote(oldvar));
Dequote() is the data step function which performs largely the same function as %unquote, but for normal quotation characters (", '). Depending on your ultimate usage, you may need to do more than this; Tom covers several of these possibilities.
If the users are supplying your macro with a value that may or may not include outer quotes then you can use the DEQUOTE() function to remove the quotes and then add them back where you need them. So if your macro is defined as having these parameters:
%macro retrieve_context(name,indata,start,stop);
Then if you want to use the value of NAME in a data step you could use:
name = dequote(symget('name'));
If you wanted to use the value to generate a WHERE clause then you could use the %SYSFUNC() macro function to call the DEQUOTE() function. So something like this:
where name = %sysfunc(quote(%qsysfunc(dequote(%superq(name)))))
If your users are literally passing in strings with % in place of single quotes then the first thing you should probably do is to replace the percents with single quotes. But make sure to keep the result macro quoted or else you might end up with unbalanced quotes.
%let name=%qsysfunc(translate(&name,"'","%"));
When I prepare a statement with bindValue the placeholder mark is not replaced if it is surrounded with single quotes. This is problematic since in SQL strings are surrounded by single quotes to avoid keyword conflicts.
See my attachements with screenies of the content of the database once inserted with and once without single quotes.
I already reported a bug, but meanwhile I am not sure anymore if this is not just an encoding problem. Is it correct to use single quotes, i.e. should this work/ is this really a bug?
With quoutes
Without quoutes
It is not a bug. Just don't use the single quotes. The bindValue mechanism does not just replace your :path with a string in your statement. No risk of name conflicts. See it as some kind of different namespace. :-)
http://en.wikipedia.org/wiki/Prepared_statement: Prepared statements are normally executed through a non-SQL binary protocol, for efficiency and protection from SQL injection, but with some DBMSs such as MySQL are also available using a SQL syntax for debugging purposes.
http://en.wikipedia.org/wiki/SQL_injection#Parameterized_statements: With most development platforms, parameterized statements that work with parameters can be used (sometimes called placeholders or bind variables) instead of embedding user input in the statement. A placeholder can only store a value of the given type and not an arbitrary SQL fragment. Hence the SQL injection would simply be treated as a strange (and probably invalid) parameter value.
When I use the regexp-builder, I need to escape things in a different way from the way I do it when using replace-regexp. Now, this thread explains that these two commands use a different syntax, but why is that?
Also, I went through this blog post: Re-builder: The Interactive Regexp Builder, and I added
(require 're-builder)
(setq reb-re-syntax 'string)
to my .emacs file following the advice on the site. However, I still need to type " around my regexp to make it work. I thought changing the syntax language would take care of this but it doesn't.
With this, my actual questions are:
Is it sill the case that Emacs does not support PCRE? Are there any workarounds to this?
Once I have the right regexp in regex-builder, is there any way to directly send the regexp to replace-regexp and enter the replacement string?
There's a package in the MELPA repository called pcre2el that adds PCRE support to many parts of Emacs, including regexp-builder and replace-regexp.
Regarding question #2: No (at least not by default), but there's another way to do that without using re-builder.
Start by doing a regexp isearch for your pattern. Because it's an isearch, you'll see the matches interactively, a bit like re-builder (albeit without coloured groupings).
Still in isearch, once you're happy with the pattern, type C-M-% to call isearch-query-replace-regexp which will prompt you for the replacement.
You can of course simply copy your re-builder string from its buffer and yank it as a replacement string (but that's undoubtedly not news).
I was curious about the need for quotes in re-builder with string syntax. It seems that's it's just a formality of the system, and reb-read-regexp returns everything between the first and last " when using that syntax. Maybe it's intended to ensure that leading or trailing whitespace can't confuse matters -- re-builder does use leading whitespace for improved visibility, and trailing whitespace would be harder to spot. Or maybe it just made some of the code more convenient/consistent.
No, Emacs doesn't support PCRE, and as far as I know there is no work-around for that.
I don't think so.
To answer your first question, why does re-builder use a different syntax than replace-regexp:
By default, re-builder uses the syntax that is appropriate for writing elisp programs. In the context of a written program, regexps are entered within strings. Inside a string, backslashes have a special meaning which conflicts with using the backslash as part of a regexp. Consequently, within a string, you need to double a backslash to use it to signify part of the regexp syntax.
replace-regexp, on the other hand, is designed to be used interactively by the user, and it explicitly expects the input to be a regexp. As a convenience, it interprets backslashes as regexp syntax, not as string escapes. Which is why you can use single backslashes in this context.
This is a question mostly concerning WinAPI RegSetValueEx. If you look at its description in MSDN here you'd find:
lpData [in] The data to be stored.
REG_SZ, the string must be null-terminated. With the REG_MULTI_SZ data
type, the string must be terminated with two null characters. A
backslash must be preceded by another backslash as an escape
character. For example, specify "C:\\mydir\\myfile" to store the
string "C:\mydir\myfile".
The question I have, do I really need to escape slashes? Because I've never done that before and it worked perfectly fine.
This is indeed a documentation error. You do not need to escape backslashes here. The exact string that you send to this API is what will be stored in the registry. No processing of backslashes will be performed.
Now, it's true that in C and C++ you need to escape certain characters in string literals, but that's not pertinent to a Win32 API documentation. That's an issue for source code to object code translation for specific languages and quite beyond the remit of this documentation.
Yes, because \ has a meaning in C++, whereas \\ means an ordinary backslash.
When \ appears in a string, C++ compiler will look at the next character and convert the combination into something (for example \n will be converted into a "newline" character). \\ will be converted into a regular backslash. This is called "escaping" (historically, on old terminals, the ESC+key combination was used for many keys that were not on the keyboard).