SAS: Where statement not working with string value - sas

I'm trying to use PROC FREQ on a subset of my data called dataname. I would like it to include all rows where varname doesn't equal "A.Never Used". I have the following code:
proc freq data=dataname(where=(varname NE 'A.Never Used'));
run;
I thought there might be a problem with trailing or leading blanks so I also tried:
proc freq data=dataname(where=(strip(varname) NE 'A.Never Used'));
run;
My guess is for some reason my string values are not "A.Never Used" but whenever I print the data this is the value I see.

This is a common issue in dealing with string data (and a good reason not to!). You should consider the source of your data - did it come from web forms? Then it probably contains nonbreaking spaces ('A0'x) instead of regular spaces ('20'x). Did it come from a unicode environment (say, Japanese characters are legal)? Then you may have transcoding issues.
A few options that work for a large majority of these problems:
Compress out everything but alphabet characters. where=(compress(varname,,'ka') ne 'ANeverUsed') for example. 'ka' means 'keep only' and 'alphabet characters'.
UPCASE or LOWCASE to ensure you're not running into case issues.
Use put varname HEX.; in a data step to look at the underlying characters. Each two hex characters is one alphabet character. 20 is space (which strip would remove). Sort by varname before doing this so that you can easily see the rows that you think should have this value next to each other - what is the difference? Probably some special character, or multibyte characters, or who knows what, but it should be apparent here.

Related

Is there any function in SAS where we can read the exact value from the variable

Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;

How to Handle Strings in SAS

Why is it that sometimes we need to wrap the string value in single quotes, sometimes double quotes, sometimes no quotes? This is extremely frustrating when I have to go from one proc to another, especially if it involves changing a file name or url dynamically. What is the logic behind this hideous monstrosity?
%let Name01 = John Smith;
%let Name02 = 'John Smith';
%let Name03 = "John Smith";
All three work.
%let Folder = /97network/read/Regions/Northeast/;
%let FileName = SalesTarget.xlsx;
proc import
datafile = "&Folder.&FileName."
dbms = xlsx
out = SymList replace;
sheet="Sheet1";
run;
Here, &Folder.&FileName. must be in double quotes.
filename OutFile "/06specialty/ATam/AMZN.csv";
proc http url = &urlAddress. method = "get" out = OutFile;
run;
Finally, if I want to download stock prices from Yahoo Finance, url = may take the address in single quotes, or &urlAddress. in no quotes, but you cannot use double quotes. OutFile can be in single or double quotes, but not no quotes. Then in the out = clause, you have OutFile, not &OutFile.
SAS strings are very simple. They are enclosed in either single or double quote characters.
'Hello there'
"Good-bye"
If the enclosing character appears in the string it needs to be doubled up.
'I don''t know'
To your first example it is probably your operating system that is allowing filenames to include optional quotes. On Windows and Linux the qutoes can even be required in some situations when the path includes spaces or other characters that the command shell would normally interpret as delimiters in the command line.
Adding macro logic into the program is probably a large part of your confusion. First figure out what code works for the commands you are using and then you can try to generate that code using the macro processor.
Once you introduce macro logic you need to pay attention to whether your strings are using single or double quotes. There is big difference between how macro logic interacts with single and double quote characters. Strings that are bounded by single quote characters are ignored by the macro processors. So the macro trigger characters & and % are treated as normal characters. But strings that are bounded by double quote characters will be processed.
Your second example adds the complexity of working with URL syntax. URL strings use the & character for its own purpose so you need to take care to understand how SAS is going to see the code you type and whether or not the macro processor will attempt to interpret it to insure the desired string needed for the URL will be created.
SAS has 50 years of history and a lot of the code is legacy. SAS is backwards compatible. You can still run code 30 years old with no issues. There are lots of oddities, such as quotes, that are there...and will always be there. SAS is kind of a conglomeration of ~300 languages (every proc is unique plus multiple meta-languages).
Since SAS will never change, best to just ignore the oddities.
One other thing. SAS runs on lots of O/Ss so every nuance there has to be accommodated in a mostly neutral way.

How to keep single quote when importing Excel data to SAS

I am importing an Excel spreadsheet into SAS using Proc Import:
Proc Import out=OUTPUT
Datafile = "(filename)"
DBMS=XLSX Replace;
Range = "Sheet1$A:Z";
run;
My numeric data columns contain a mixture of values held in Excel as numerics and '0 values held as text - i.e. with a leading apostrophe / single quote. When SAS imports these it treats them all the same (i.e. it returns Character strings of the values with the leading apostrophe stripped out).
This results in differences from the spreadsheet when calculations are applied (e.g. averaging) as Excel treats the '0 values as missing but SAS treats them as 0.
Is it possible to import the values as strings including the leading single quote / apostrophe, so that I can replace the '0 with missing values but keep the 0 records as 0? I would like to avoid having to manually manipulate the data in Excel as this data is drawn from an external source (don't ask...)
I doubt it. I think Excel doesn’t really consider the leading apostrophe as part of the value. It’s just a crazy way to indicate that a value is a text string (rather than numeric). When SAS imports the data, it recognizes that the quote is not part of the value. So if you’ve got an Excel column with ‘0 in some cells and 0 in others, it’s going to come in as character, and I don’t think you can tell the difference between them.
Unfortunately, the xlsx engine doesn’t support the s DBSASTYPE option. Other engines that import Excel have the DBSASTYPE option. That should allow you to tell SAS to import a column as a numeric variable, even if it sees character values. If it’s the case that you want all text values in the cell converted to missing, that might do the trick. But it’s possible it would still treat ‘0 the same as 0. I’m away from SAS, so can’t test.
Option:
The ~ (tilde) format modifier enables you to read and retain single quotation marks.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a003209907.htm
Is it possible to convert the .xlsx to .txt keeping the single quotes? Because it is not possible to infile xlsx in a data step.
filename df disk 'C:\data_temp\ex.txt';
data test;
infile df firstobs=2;
input ID $2. x ~$3. ;
run;
proc print data=test;
run;

SAS while reading varbinary data from Amazon RDS is appending spaces at the end of the data. Can we avoid it?

SAS while reading varbinary data from Amazon RDS is appending spaces at the end of the data.
proc sql;
select emailaddr from tablename1;
quit;
The column emailaddr is varbinary(20)
For example:
I inserted "XX#WWW.com ", but while reading from db, it is appending spaces equal to the length of the column.
Since the column length is 20 it is returning "XX#WWW.com " ( note the spaces appended. I cannot use the trim() function since this also removes spaces that might genuinely be part of the original inserted data.
How can i stop sas from appending these spaces?
For my program i need to get the exact data as present in database without any extra spaces attached.
That's how SAS works; SAS has only CHAR equivalent datatype (in base SAS, anyway, DS2 is different), no VARCHAR concept. Whatever the length of the column is (20 here) it will have 20 total characters with spaces at the end to pad to 20.
Most of the time, it doesn't matter; when SAS inserts into another RDBMS for example it will typically treat trailing spaces as nonexistent (so they won't be inserted). You can use TRIM and similar to deal with the spaces if you're using regular expressions or concatenation to work with these values; CATS and similar functions perform concatenation-with-trimming.
If trailing spaces are part of your data, you are mostly out of luck in SAS. SAS considers trailing spaces irrelevant (equivalent to null characters). You can append a non-space character in SQL, or translate the spaces to NBSPs ('A0'x) or something else, while still in SQL, or use quotes or something around your actual values - but whatever you do will be complicated.

Strip characters using compress

I'd like to strip non ASCII characters from a variable. I've not had success with more elegant methods, so I'm using compress and nominating the characters I'd like to keep (because I don't know the ones I'd like to remove). It works except I'd like to keep both characters " and ' but I can't pass both of these characters into the compress function correctly.
data _null_;
_text='#AB'!!byte(13)!!'C"D';
_text_select=compress(_text,"ABCDEFGHIJKLMNOPQRSTUVWXYZ /-1234567890(),.'&?;=%:+><`[]*#","k");
put _text;
put _text_select;
run;
First off, if your concern is 'control' characters, the 'c' option is a good one.
compress(textstr,,'c');
That removes things in the early part of ASCII like line feeds, tabs, etc. (Probably, the first 16 characters from '00'x to '0F'x, and possibly '07'x, though I've never seen an exact definition.)
If you want to keep basically 'printable characters', the 'w' option is helpful.
compress(textstr,,'kw');
Your method can be made to work, if it's the only way you can figure to do exactly what you want, by escaping the quote with another quote.
compress(_text,"ABCDEFGHIJKLMNOPQRSTUVWXYZ /-1234567890(),.'&?;=%:+><`[]*#""","k");
You could also use "p" to keep all punctuation marks. In fact, you could certainly simplify this at least.
data _null_;
_text='#AB'!!byte(13)!!'C"D';
_text_select=compress(_text," /-()&=%+><` []*#","knp");
put _text;
put _text_select;
run;
I'm not entirely sure of what is officially a 'punctuation mark', likely the - is also one, and possibly ().
Edit: Here's a good way to test what's kept (in the official ASCII set, ie, up to '7F'x):
data test;
length _text $255;
do _t = 1 to 255;
_text =byte(_t)||_text;
end;
_text_select=compress(_text," /-(),.'&""?;=%:+><`[]*#","kn");
put _text=;
put _text_select=;
run;
P seems to keep a lot of stuff that's a bit weirder, some of which are clearly not punctuation, so obviously SAS did something wrong there. I'm tempted to write a trouble ticket, honestly, as it definitely isn't doing what it clearly should be.