SAS while reading varbinary data from Amazon RDS is appending spaces at the end of the data. Can we avoid it? - amazon-web-services

SAS while reading varbinary data from Amazon RDS is appending spaces at the end of the data.
proc sql;
select emailaddr from tablename1;
quit;
The column emailaddr is varbinary(20)
For example:
I inserted "XX#WWW.com ", but while reading from db, it is appending spaces equal to the length of the column.
Since the column length is 20 it is returning "XX#WWW.com " ( note the spaces appended. I cannot use the trim() function since this also removes spaces that might genuinely be part of the original inserted data.
How can i stop sas from appending these spaces?
For my program i need to get the exact data as present in database without any extra spaces attached.

That's how SAS works; SAS has only CHAR equivalent datatype (in base SAS, anyway, DS2 is different), no VARCHAR concept. Whatever the length of the column is (20 here) it will have 20 total characters with spaces at the end to pad to 20.
Most of the time, it doesn't matter; when SAS inserts into another RDBMS for example it will typically treat trailing spaces as nonexistent (so they won't be inserted). You can use TRIM and similar to deal with the spaces if you're using regular expressions or concatenation to work with these values; CATS and similar functions perform concatenation-with-trimming.
If trailing spaces are part of your data, you are mostly out of luck in SAS. SAS considers trailing spaces irrelevant (equivalent to null characters). You can append a non-space character in SQL, or translate the spaces to NBSPs ('A0'x) or something else, while still in SQL, or use quotes or something around your actual values - but whatever you do will be complicated.

Related

SAS EG Transpose step is adding a bunch of spaces to my data. Can I prevent this?

I inherited a run that has a Transpose step in SAS EG. The data has a column with values reading NOV2020,SEP2019 etc., and a name column with Firstname Lastname.
When transposed, these columns merge into a single Character column. The name values remain the same, but now the dates read like ‘_________NOV2020’ (no underscores, just spaces) instead of just ‘NOV2020’.
Is there something in the Transpose step in SAS EG that can be modified to prevent this?
You are transposing character and numeric variables. Did you notice a message in the log about numeric to character conversion.
Most numeric formats default to right justification. The length of the new character variable is defined as the max of the (character var lengths, formatted width of numeric vars)
You can "correct" with the LEFT function.

How to Handle Strings in SAS

Why is it that sometimes we need to wrap the string value in single quotes, sometimes double quotes, sometimes no quotes? This is extremely frustrating when I have to go from one proc to another, especially if it involves changing a file name or url dynamically. What is the logic behind this hideous monstrosity?
%let Name01 = John Smith;
%let Name02 = 'John Smith';
%let Name03 = "John Smith";
All three work.
%let Folder = /97network/read/Regions/Northeast/;
%let FileName = SalesTarget.xlsx;
proc import
datafile = "&Folder.&FileName."
dbms = xlsx
out = SymList replace;
sheet="Sheet1";
run;
Here, &Folder.&FileName. must be in double quotes.
filename OutFile "/06specialty/ATam/AMZN.csv";
proc http url = &urlAddress. method = "get" out = OutFile;
run;
Finally, if I want to download stock prices from Yahoo Finance, url = may take the address in single quotes, or &urlAddress. in no quotes, but you cannot use double quotes. OutFile can be in single or double quotes, but not no quotes. Then in the out = clause, you have OutFile, not &OutFile.
SAS strings are very simple. They are enclosed in either single or double quote characters.
'Hello there'
"Good-bye"
If the enclosing character appears in the string it needs to be doubled up.
'I don''t know'
To your first example it is probably your operating system that is allowing filenames to include optional quotes. On Windows and Linux the qutoes can even be required in some situations when the path includes spaces or other characters that the command shell would normally interpret as delimiters in the command line.
Adding macro logic into the program is probably a large part of your confusion. First figure out what code works for the commands you are using and then you can try to generate that code using the macro processor.
Once you introduce macro logic you need to pay attention to whether your strings are using single or double quotes. There is big difference between how macro logic interacts with single and double quote characters. Strings that are bounded by single quote characters are ignored by the macro processors. So the macro trigger characters & and % are treated as normal characters. But strings that are bounded by double quote characters will be processed.
Your second example adds the complexity of working with URL syntax. URL strings use the & character for its own purpose so you need to take care to understand how SAS is going to see the code you type and whether or not the macro processor will attempt to interpret it to insure the desired string needed for the URL will be created.
SAS has 50 years of history and a lot of the code is legacy. SAS is backwards compatible. You can still run code 30 years old with no issues. There are lots of oddities, such as quotes, that are there...and will always be there. SAS is kind of a conglomeration of ~300 languages (every proc is unique plus multiple meta-languages).
Since SAS will never change, best to just ignore the oddities.
One other thing. SAS runs on lots of O/Ss so every nuance there has to be accommodated in a mostly neutral way.

How to keep single quote when importing Excel data to SAS

I am importing an Excel spreadsheet into SAS using Proc Import:
Proc Import out=OUTPUT
Datafile = "(filename)"
DBMS=XLSX Replace;
Range = "Sheet1$A:Z";
run;
My numeric data columns contain a mixture of values held in Excel as numerics and '0 values held as text - i.e. with a leading apostrophe / single quote. When SAS imports these it treats them all the same (i.e. it returns Character strings of the values with the leading apostrophe stripped out).
This results in differences from the spreadsheet when calculations are applied (e.g. averaging) as Excel treats the '0 values as missing but SAS treats them as 0.
Is it possible to import the values as strings including the leading single quote / apostrophe, so that I can replace the '0 with missing values but keep the 0 records as 0? I would like to avoid having to manually manipulate the data in Excel as this data is drawn from an external source (don't ask...)
I doubt it. I think Excel doesn’t really consider the leading apostrophe as part of the value. It’s just a crazy way to indicate that a value is a text string (rather than numeric). When SAS imports the data, it recognizes that the quote is not part of the value. So if you’ve got an Excel column with ‘0 in some cells and 0 in others, it’s going to come in as character, and I don’t think you can tell the difference between them.
Unfortunately, the xlsx engine doesn’t support the s DBSASTYPE option. Other engines that import Excel have the DBSASTYPE option. That should allow you to tell SAS to import a column as a numeric variable, even if it sees character values. If it’s the case that you want all text values in the cell converted to missing, that might do the trick. But it’s possible it would still treat ‘0 the same as 0. I’m away from SAS, so can’t test.
Option:
The ~ (tilde) format modifier enables you to read and retain single quotation marks.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a003209907.htm
Is it possible to convert the .xlsx to .txt keeping the single quotes? Because it is not possible to infile xlsx in a data step.
filename df disk 'C:\data_temp\ex.txt';
data test;
infile df firstobs=2;
input ID $2. x ~$3. ;
run;
proc print data=test;
run;

How to prevent illegal characters error in DB2 SQL query?

I'm working with a huge DB2 table (hundreds of millions of rows), trying to select only the rows that are matched by this regular expression:
\b\d([- \/\\]?\d){12,15}(\D|$)
(That is, a word boundary, followed by 13 to 16 digits separated by nothing or a single dash, space, slash, or backslash, followed be either a non-digit or the end of the line.)
After much Googling, I've managed to create the following SQL:
SELECT idx, comment FROM tblComment
WHERE xmlcast(xmlquery('fn:matches($c,"\b\d([- \/\\]?\d){12,15}(\D|$)")' PASSING comment AS "c") AS INTEGER)=1
Which works perfectly, as far as I can tell... unless it finds a row with an illegal character:
An illegal XML character "#x3" was found in an SQL/XML expression or function argument that begins with string [...]
The data contains many illegal XML characters, and changing the data is not an option (I have limited read-only access, and there are far too many rows that would need to be fixed). Is there a way to strip out or ignore illegal characters, without first modifying the database? Or, is there a different way for me to write my query that has the same effect?
You will have to identify what are all the illegal XML characters that occur in your data. Once you know them, you can use the TRANSLATE() function to eliminate them during the pattern matching.
Say, you determine that all ASCII control characters (0x00 through 0x0F and 0x7F) may be present in the COMMENT column. Your query might then look like:
SELECT idx, comment FROM tblComment
WHERE xmlcast(xmlquery(
'fn:matches($c,"\b\d([- \/\\]?\d){12,15}(\D|$)")'
PASSING TRANSLATE(comment, ' ', x'01020304050607080B0C0F7F') AS "c")
AS INTEGER)=1
All legal XML characters are listed in the manual. 0x09, 0x0A and 0x0D are legal, so you don't need to TRANSLATE() them, for example.

SAS: Where statement not working with string value

I'm trying to use PROC FREQ on a subset of my data called dataname. I would like it to include all rows where varname doesn't equal "A.Never Used". I have the following code:
proc freq data=dataname(where=(varname NE 'A.Never Used'));
run;
I thought there might be a problem with trailing or leading blanks so I also tried:
proc freq data=dataname(where=(strip(varname) NE 'A.Never Used'));
run;
My guess is for some reason my string values are not "A.Never Used" but whenever I print the data this is the value I see.
This is a common issue in dealing with string data (and a good reason not to!). You should consider the source of your data - did it come from web forms? Then it probably contains nonbreaking spaces ('A0'x) instead of regular spaces ('20'x). Did it come from a unicode environment (say, Japanese characters are legal)? Then you may have transcoding issues.
A few options that work for a large majority of these problems:
Compress out everything but alphabet characters. where=(compress(varname,,'ka') ne 'ANeverUsed') for example. 'ka' means 'keep only' and 'alphabet characters'.
UPCASE or LOWCASE to ensure you're not running into case issues.
Use put varname HEX.; in a data step to look at the underlying characters. Each two hex characters is one alphabet character. 20 is space (which strip would remove). Sort by varname before doing this so that you can easily see the rows that you think should have this value next to each other - what is the difference? Probably some special character, or multibyte characters, or who knows what, but it should be apparent here.