What does compress function mean in SAS using ||?
for example:
compress("test:" || 'price_data' || ":")
is it just for connecting strings using ||?
Thanks.
The double pipe || operator appends strings together.
The compress() function removes all blanks from the argument.
In this case, the compress function does nothing because the three appended string literals clearly contain no blanks. There is furthermore no reason to use appending since, again, it's all literals.
Your statement is equivalent to this "test:price_data:"
Now, if, as it seems to be the case, price_data is supposed to be a variable, then you would need to remove the single quotes surrounding it and that statement would then make total sense
compress("test:"||price_data||":")
This produces a string that is the value of price_data prepended with the string test and appended with a colon and in which all blanks (i.e. any blanks that would have been introduced by price_data) were removed by the compress function.
Related
Very incidentally, I wrote a findc() function and I submitted the program.
data test;
x=findc(,'abcde');
run;
I looked at the result and nothing is unnormal. As I glanced over the code, I noticed the findc() function missed the first character argument. I was immediately amazed that such code would work.
I checked the help documentation:
The FINDC function allows character arguments to be null. Null arguments are treated as character strings that have a length of zero. Numeric arguments cannot be null.
What is this feature designed for? Fault tolerance or something more? Thanks for any hint.
PS: I find findw() has the same behavior but find() not.
I suspect that allowing the argument to be not present at all is just an artifact of allowing the strings passed to it to be of zero length.
Normally in SAS strings are fixed length. So there was no such thing as an empty string, just one that was filled with spaces. If you use the TRIM() function on a string that only has spaces the result is a string with one space.
But when they introduced the TRIMN() and other functions like FINDC() and FINDW() they started allowing arguments to functions to be empty strings (if you want to store the result into a variable it will still be fixed length). But they did not modify the behavior of the existing functions like INDEX() or FIND().
For the FINDC() function you might want this functionality when using the TRIMN() function or the strip modifier.
Example use case might be to locate the first space in a string while ignoring the spaces used to pad the fixed length variable.
space = findc(trimn(string),' ');
Consider a slightly different toy example from my previous question:
. local string my first name is Pearly,, and my surname is Spencer
. tokenize "`string'", parse(",,")
. display "`1'"
my first name is Pearly
. display "`2'"
,
. display "`3'"
,
. display "`4'"
and my surname is Spencer
I have two questions:
Does tokenize work as expected in this case? I thought local macro
2 should be ,, instead of , while local macro 3 contain the rest of the string (and local macro 4 be empty).
Is there a way to force tokenize to respect the double comma as a parsing
character?
tokenize -- and gettoken too -- won't, from what I can see, accept repeated characters such as ,, as a composite parsing character. ,, is not illegal as a specification of parsing characters, but is just understood as meaning that , and , are acceptable parsing characters. The repetition in practice is ignored, just as adding "My name is Pearly" after "My name is Pearly" doesn't add information in a conversation.
To back up: know that without other instructions (such as might be given by a syntax command) Stata will parse a string according to spaces, except that double quotes (or compound double quotes) bind harder than spaces separate.
tokenize -- and gettoken too -- will accept multiple parse characters pchars and the help for tokenize gives an example with space and + sign. (It's much more common, in my experience, to want to use space and comma , when the syntax for a command is not quite what syntax parses completely.)
A difference between space and the other parsing characters is that spaces are discarded but other parsing characters are not discarded. The rationale here is that those characters often have meaning you might want to take forward. Thus in setting up syntax for a command option, you might want to allow something like myoption( varname [, suboptions])
and so whether a comma is present and other stuff follows is important for later code.
With composite characters, so that you are looking for say ,, as separators I think you'd need to loop around using substr() or an equivalent. In practice an easier work-around might be first to replace your composite characters with some neutral single character and then apply tokenize. That could need to rely on knowing that that neutral character should not occur otherwise. Thus I often use # as a character placeholder because I know that it will not occur as part of variable or scalar names and it's not part of function names or an operator.
For what it's worth, I note that in first writing split I allowed composite characters as separators. As I recall, a trigger to that was a question on Statalist which was about data for legal cases with multiple variations on VS (versus) to indicate which party was which. This example survives into the help for the official command.
On what is a "serious" bug, much depends on judgment. I think a programmer would just discover on trying it out that composite characters don't work as desired with tokenize in cases like yours.
I have a string I've obfuscated in my code, by XORing each character by some random value.
However, the resulting multi-line raw string literal won't compile correctly.
In the following image, you can see how MSVS2015 is not parsing the string correctly, even when using proper delimeters on either end (notice the black text throughout, not being parsed as part of the string).
Trying to compile the code results in errors about not being able to find the closing brace of the literal (even though it's in the proper place, at the very end of the string after the closing delimeter, etc). Manually erasing the black bits results in a proper compilation (albeit with a string that can no longer be properly unscrambled, of course).
I'm assuming this is happening because various resulting characters of the XOR function cannot be properly saved inside the .h file. Is there a solution to this problem? I've tried switching the file format to Unicode but that didn't work.
Your use of raw strings is too simplistic. It takes the sequence ..|.. as delimiter, you probably don't have the sequence )..|.. at the end of your string.
Use the full specification of delimited raw strings as described in cppreference in variant (6). This is also described in the C++ standard section ยง2.14.5 String literals. The template is as follows:
R"d-char-sequence(your raw text)d-char-sequence"
The key is to use the "d-char-sequence". This sequence can contain the following:
any member of the basic source character set except:
space, the left parenthesis (, the right parenthesis ), the backslash \,
and the control characters representing horizontal tab,
vertical tab, form feed, and newline.
How that d-char-sequence works is described as follows:
A string literal that has an R in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters.
This ensures that the raw string can legally contain any character that is supported by the source character set (unicode here). The raw string may contain quotes parentheses backslashes and even newlines.
It's not as complicated as it sounds. Just add a prefix and a suffix to the raw string. It could look like this:
std::string(R"my-delimiter(... long text ...)my-delimiter");
Of course substitute ... long text ... with the raw string literal. Just make sure that the sequence )my-delimiter" does not appear in the raw string text.
I have a requirement to read the string with both single quotes and without quotes from a macro retrieve_context.
While calling the macro, users can call it with either single quotes or without quotes, like below:
%retrieve_context('american%s choice', work.phone_conv, '01OCT2015'd, '12OCT2015'd)
%retrieve_context(american%s choice, work.phone_conv, '01OCT2015'd, '12OCT2015'd)
How to read the first parameter in the macro without a single quote?
I tried %conv_quote = unquote(%str(&conv_quote)) but it did not work.
You're running into one of those differences between macros and data step language.
In macros, there is a concept of "quoting", hence the %unquote macro function. This doesn't refer to traditional " or ' characters, though; macro quoting is a separate thing, with not really any quote characters [there are some sort-of-characters that are used in some contexts in this regard, but they're more like placeholders]. They come from functions like %str, %nrstr, and %quote, which tokenize certain things in a macro variable so that they don't get parsed before they're intended to be.
In most contexts, though, the macro language doesn't really pay attention to ' and " characters, except to identify a quoted string in certain parsing contexts where it's necessary to do so to make things work logically. Hence, %unquote doesn't do anything about quotation marks; they are simply treated as regular characters.
You need to, instead, call a data step function to remove them (or some other things, but all of them are more complicated, like using various combinations of %substr and %index). This is done using %sysfunc, like so:
%let newvar = %sysfunc(dequote(oldvar));
Dequote() is the data step function which performs largely the same function as %unquote, but for normal quotation characters (", '). Depending on your ultimate usage, you may need to do more than this; Tom covers several of these possibilities.
If the users are supplying your macro with a value that may or may not include outer quotes then you can use the DEQUOTE() function to remove the quotes and then add them back where you need them. So if your macro is defined as having these parameters:
%macro retrieve_context(name,indata,start,stop);
Then if you want to use the value of NAME in a data step you could use:
name = dequote(symget('name'));
If you wanted to use the value to generate a WHERE clause then you could use the %SYSFUNC() macro function to call the DEQUOTE() function. So something like this:
where name = %sysfunc(quote(%qsysfunc(dequote(%superq(name)))))
If your users are literally passing in strings with % in place of single quotes then the first thing you should probably do is to replace the percents with single quotes. But make sure to keep the result macro quoted or else you might end up with unbalanced quotes.
%let name=%qsysfunc(translate(&name,"'","%"));
According to the manual, string concatenation isn't implemented in gdb. I need it however, so is there a way to achieve this, perhaps using array functions?
I don't have a copy of gdb around to try this on, but perhaps this line from later in the Ada section of the document will help you?
Rather than use catenation and
symbolic character names to introduce
special characters into strings, one
may instead use a special bracket
notation, which is also used to print
strings. A sequence of characters of
the form ["XX"]' within a string or
character literal denotes the (single)
character whose numeric encoding is XX
in hexadecimal. The sequence of
characters["""]' also denotes a
single quotation mark in strings. For
example, "One line.["0a"]Next
line.["0a"]"
contains an ASCII newline character
(Ada.Characters.Latin_1.LF) after each
period.
For Objective-C:
[#"asd" stringByAppendingString:#"zxc"]
[#"ID: " stringByAppendingString:(NSString*) [aTaskDict valueForKey:#"ID"]]