I have a requirement to read the string with both single quotes and without quotes from a macro retrieve_context.
While calling the macro, users can call it with either single quotes or without quotes, like below:
%retrieve_context('american%s choice', work.phone_conv, '01OCT2015'd, '12OCT2015'd)
%retrieve_context(american%s choice, work.phone_conv, '01OCT2015'd, '12OCT2015'd)
How to read the first parameter in the macro without a single quote?
I tried %conv_quote = unquote(%str(&conv_quote)) but it did not work.
You're running into one of those differences between macros and data step language.
In macros, there is a concept of "quoting", hence the %unquote macro function. This doesn't refer to traditional " or ' characters, though; macro quoting is a separate thing, with not really any quote characters [there are some sort-of-characters that are used in some contexts in this regard, but they're more like placeholders]. They come from functions like %str, %nrstr, and %quote, which tokenize certain things in a macro variable so that they don't get parsed before they're intended to be.
In most contexts, though, the macro language doesn't really pay attention to ' and " characters, except to identify a quoted string in certain parsing contexts where it's necessary to do so to make things work logically. Hence, %unquote doesn't do anything about quotation marks; they are simply treated as regular characters.
You need to, instead, call a data step function to remove them (or some other things, but all of them are more complicated, like using various combinations of %substr and %index). This is done using %sysfunc, like so:
%let newvar = %sysfunc(dequote(oldvar));
Dequote() is the data step function which performs largely the same function as %unquote, but for normal quotation characters (", '). Depending on your ultimate usage, you may need to do more than this; Tom covers several of these possibilities.
If the users are supplying your macro with a value that may or may not include outer quotes then you can use the DEQUOTE() function to remove the quotes and then add them back where you need them. So if your macro is defined as having these parameters:
%macro retrieve_context(name,indata,start,stop);
Then if you want to use the value of NAME in a data step you could use:
name = dequote(symget('name'));
If you wanted to use the value to generate a WHERE clause then you could use the %SYSFUNC() macro function to call the DEQUOTE() function. So something like this:
where name = %sysfunc(quote(%qsysfunc(dequote(%superq(name)))))
If your users are literally passing in strings with % in place of single quotes then the first thing you should probably do is to replace the percents with single quotes. But make sure to keep the result macro quoted or else you might end up with unbalanced quotes.
%let name=%qsysfunc(translate(&name,"'","%"));
Related
Very incidentally, I wrote a findc() function and I submitted the program.
data test;
x=findc(,'abcde');
run;
I looked at the result and nothing is unnormal. As I glanced over the code, I noticed the findc() function missed the first character argument. I was immediately amazed that such code would work.
I checked the help documentation:
The FINDC function allows character arguments to be null. Null arguments are treated as character strings that have a length of zero. Numeric arguments cannot be null.
What is this feature designed for? Fault tolerance or something more? Thanks for any hint.
PS: I find findw() has the same behavior but find() not.
I suspect that allowing the argument to be not present at all is just an artifact of allowing the strings passed to it to be of zero length.
Normally in SAS strings are fixed length. So there was no such thing as an empty string, just one that was filled with spaces. If you use the TRIM() function on a string that only has spaces the result is a string with one space.
But when they introduced the TRIMN() and other functions like FINDC() and FINDW() they started allowing arguments to functions to be empty strings (if you want to store the result into a variable it will still be fixed length). But they did not modify the behavior of the existing functions like INDEX() or FIND().
For the FINDC() function you might want this functionality when using the TRIMN() function or the strip modifier.
Example use case might be to locate the first space in a string while ignoring the spaces used to pad the fixed length variable.
space = findc(trimn(string),' ');
Consider a slightly different toy example from my previous question:
. local string my first name is Pearly,, and my surname is Spencer
. tokenize "`string'", parse(",,")
. display "`1'"
my first name is Pearly
. display "`2'"
,
. display "`3'"
,
. display "`4'"
and my surname is Spencer
I have two questions:
Does tokenize work as expected in this case? I thought local macro
2 should be ,, instead of , while local macro 3 contain the rest of the string (and local macro 4 be empty).
Is there a way to force tokenize to respect the double comma as a parsing
character?
tokenize -- and gettoken too -- won't, from what I can see, accept repeated characters such as ,, as a composite parsing character. ,, is not illegal as a specification of parsing characters, but is just understood as meaning that , and , are acceptable parsing characters. The repetition in practice is ignored, just as adding "My name is Pearly" after "My name is Pearly" doesn't add information in a conversation.
To back up: know that without other instructions (such as might be given by a syntax command) Stata will parse a string according to spaces, except that double quotes (or compound double quotes) bind harder than spaces separate.
tokenize -- and gettoken too -- will accept multiple parse characters pchars and the help for tokenize gives an example with space and + sign. (It's much more common, in my experience, to want to use space and comma , when the syntax for a command is not quite what syntax parses completely.)
A difference between space and the other parsing characters is that spaces are discarded but other parsing characters are not discarded. The rationale here is that those characters often have meaning you might want to take forward. Thus in setting up syntax for a command option, you might want to allow something like myoption( varname [, suboptions])
and so whether a comma is present and other stuff follows is important for later code.
With composite characters, so that you are looking for say ,, as separators I think you'd need to loop around using substr() or an equivalent. In practice an easier work-around might be first to replace your composite characters with some neutral single character and then apply tokenize. That could need to rely on knowing that that neutral character should not occur otherwise. Thus I often use # as a character placeholder because I know that it will not occur as part of variable or scalar names and it's not part of function names or an operator.
For what it's worth, I note that in first writing split I allowed composite characters as separators. As I recall, a trigger to that was a question on Statalist which was about data for legal cases with multiple variations on VS (versus) to indicate which party was which. This example survives into the help for the official command.
On what is a "serious" bug, much depends on judgment. I think a programmer would just discover on trying it out that composite characters don't work as desired with tokenize in cases like yours.
I'd like to know which characters are safe for any use in SAS macros.
So what I mean by special characters here is any character (or group of characters) that can have a specific role in SAS in any context. I'm not that interested in keywords (made of a-z 1-9 chars).
For example = ^= ; % , # are special (not sure if # is actually used in SAS, but it's used for doc so still count as a parameter that is not 'safe for all uses').
But what about $ ! ~ § { } ° etc ?
This should include characters that are special in PROC SQL as well.
I'd like to use some of these characters and give them a special meaning in my code, but I'd rather not conflict with any existing use (I'm especially interested in ~).
A bit of general reference:
reserved macro
words
Macro word rules
SAS operators and mnemonics
Rules for SAS names
I think the vast majority of the characters on a standard English keyboard are used somewhere or other in the SAS language.
To address your examples:
$ Used in format names, put/input statements, regular expression definitions...
! 'or' operator in some environments
~ 'not' operator
§ Not used as far as I know
{} Can be used for data step array references & definitions
° Not used as far as I know
None of the above do anything special in a macro context, as Tom has already made clear in his answer.
Maybe SAS Operators in Expressions can help you for ~,
looking at the tables
Comparison Operators and
Logical Operators
The main triggers in macro code are & and % which are used to trigger macro variable references and macro statements, functions or macro calls.
The ; (semi-colon) is used in macro code (as in SAS code) to indicate the end of a statement.
For passing parameters into macro parameters you mainly need to worry about , (comma). But you will also want to avoid unbalanced (). You should avoid use = when passing parameter values by position.
You can protect them by adding quotes or extra () around the values. But those characters become part of the value passed. You can use macro quoting to protect them.
%mymac(parm1='1,200',parm2=(1,200),parm3=%str(1,200),parm4="a(b")
Equal signs can be included without quoting as long as your call is using named parameters.
%mymac(parm1=a=b)
In addition to the previous answers;
% is also used to include files in your program. %include.
Using special characters may cause your code to get stuck in a loop due to unbalanced quotes. SAS Note.
If you run into this just submit the magic string below:
*';*";*/;run;
As we know, special characters should be masked during macro compilation. But what if I wanna assign a dynamic substring to a macro variable? Like this:
%let mvSubstr = %substr(&mvString, 1, 1);
mvString can contain any symbols including unmatched single or double quotation marks.
So, in this example program works correctly:
%lev mvString = Test;
%let mvSubstr = %substr(&mvString, 1, 1);
And in the following case program doesn't work and SAS reports ERROR: Literal contains unmatched quote.:
%lev mvString = %str(%'Test%');
%let mvSubstr = %substr(&mvString, 1, 1);
How can I defeat this problem (make program works independently of mvString value)?
Use the %QSUBSTR() function if you expect that it is possible the value of the substring will contain unmatched quotes or other characters that require macro quoting. There is also the %QSCAN() function to use when the result of using %SCAN() might need quoting. And there is the %QSYSFUNC() function for when calling other SAS functions from within macro code.
This is why macro quoting exists. You have a lot of different options, depending on exactly what you're doing.
%quote, %nrquote, %bquote, and %nrbquote - all do roughly the same thing: mask quote characters and some other special characters. See for example the documentation for %bquote/nrbquote. They tell SAS not to pay attention to ' and similar, so it does not worry about matching things. I've never seen a reason to use %quote over %bquote - the B stands for 'better' - so I would use that. They work during execution, not compilation. %nrbquote masks the macro characters & and %, meaning it will prevent a macro inside the macro variable from resolving.
%str and %nrstr mask during compilation. Otherwise they are similar to %bquote and %nrbquote. If it's important that it not have the quote during compilation, use these.
%superq masks a macro variable only (not open text) and prevents all resolution from occurring. It's often the best way to assign the value of one macro variable to another variable. It importantly does not take the & - you pass the name of the macro variable, with no ampersands or whatnot (unless the name of the macro variable is stored in another macro variable).
In your case, you would need to use %bquote to quote the results of the substring assignment, so:
%let mvString = %str(%'Test%');
%put &=mvString;
%let mvSubstr = %bquote(%substr(&mvString, 1, 1));
%put &=mvString &=mvSubstr;
What about the scenario where MVSTRING contains unmasked characters that need special treatment. This requires quoting the argument of SUBSTR.
data _null_;
call symputx('mvString',"'Test",'G');
run;
%put %nrbquote(&=mvString);
%let mvSubstr = %bquote(%substr(%superq(mvString), 1, 1));
%put %nrbquote(&=mvString) %nrbquote(&=mvSubstr);
When I prepare a statement with bindValue the placeholder mark is not replaced if it is surrounded with single quotes. This is problematic since in SQL strings are surrounded by single quotes to avoid keyword conflicts.
See my attachements with screenies of the content of the database once inserted with and once without single quotes.
I already reported a bug, but meanwhile I am not sure anymore if this is not just an encoding problem. Is it correct to use single quotes, i.e. should this work/ is this really a bug?
With quoutes
Without quoutes
It is not a bug. Just don't use the single quotes. The bindValue mechanism does not just replace your :path with a string in your statement. No risk of name conflicts. See it as some kind of different namespace. :-)
http://en.wikipedia.org/wiki/Prepared_statement: Prepared statements are normally executed through a non-SQL binary protocol, for efficiency and protection from SQL injection, but with some DBMSs such as MySQL are also available using a SQL syntax for debugging purposes.
http://en.wikipedia.org/wiki/SQL_injection#Parameterized_statements: With most development platforms, parameterized statements that work with parameters can be used (sometimes called placeholders or bind variables) instead of embedding user input in the statement. A placeholder can only store a value of the given type and not an arbitrary SQL fragment. Hence the SQL injection would simply be treated as a strange (and probably invalid) parameter value.