I am new to SAS. I was trying to print only rows where the string size is less than 20, for the column words. I tried this, but that doesn't work. What is the right syntax?
FILENAME REFFILE '<path_to_the_file>';
...
PROC PRINT DATA=WORK.IMPORT;
WHERE length("words") < 20;
RUN;
This is the error I get
ERROR: Invalid characters were present in the data.
ERROR: An error occurred while processing text data.
I don't think there is any problem with the data itself, as the following works fine.
PROC PRINT DATA=WORK.IMPORT;
WHERE words = "some string";
RUN;
As it turned out, the problem was not with the code itself, but because I did not specify the encoding. So instead of
FILENAME REFFILE '<path_to_the_file>';
I used the following, which worked.
FILENAME REFFILE '<path_to_the_file>' encoding="latin1";
Related
This is a follow-up to my recent question on calculating md5 hash in SAS and python. So, I'm using SAS v9.2 and there is an md5 hash function which takes in a string and returns a hash. What I'd really like though is a way to compute the hash for the file as a whole. Given that I have a hash for each record , is there any way to do this and have the file hash match up with the value obtained by using , say, python code. Taking the sashelp.shoes dataset as an example I exported this to a CSV file and manually removed double quotes and dollars and commas of the currency fields. I then computed the hash for the file as a whole using this python code:
filename = "f:/test/shoes.csv"
md5_hash = hashlib.md5()
with open(filename,"rb") as f:
# Read and update hash string value in blocks of 4K
for byte_block in iter(lambda: f.read(1024*1024),b''):
md5_hash.update(byte_block.replace(b'\r', b'').replace(b'\n', b''))
print(md5_hash.hexdigest())
And got this hash back as output:
f7f205b5b844bf57f5f51685969e0df0
If anyone can replicate this final hash value in SAS for that dataset that would be great.
PS I'm on SAS V9.2
You have two options:
Implement the MD5 algorithm in SAS. I'm aware of existing implementations for SHA and CRC but I'm not sure about MD5.
Call an external utility from SAS to calculate the md5 hash for the file. There is an example here.
My earlier note on limitations applies only when working with DS1. There is no way around the length restriction in DS1. You could try this and you will get an error:
data test;
length x $30000;
x = repeat('-', 30000);
run;
data _null_;
set test;
format m $hex32.;
m = md5(catx(',', x, x));
put m=;
run;`
But Robert Pendridge is correct to point out that DS2 can solve this issue.
%let reclen = 201; /* Length of each record */
%let records = 2000; /* Number of records */
%let totlen = %eval(&reclen * &records);
proc ds2;
data _null_;
retain m;
dcl char(&totlen) m;
method run();
dcl char(200) c;
set shoes;
c = catx(',',&varstr2);
m = strip(m)|| strip(c);
end;
method term();
dcl char(32) hh;
hh = put(md5(m), $hex32.);
put hh=;
end;
enddata;
run;
quit;
This is essentially doing what the Python code is doing. The update merely concatenates the strings and applies the hash. You may have to tighten this up a little bit to remove any extraneous spaces etc., but should work.
Unfortunately you cannot in DS1. The reason is that the maximum variable size that SAS allows is only 32,767 bytes long. You could group the variables in multiple variables, but still when you try to concatenate them (even directly when invoking the md5 function), it will end up truncating it. Your best bet is writing the output to an external text file (as shown below based on your previous example) and generating md5sum on it. This is actually just one little extra step.. You could just use the X command to do that from within SAS itself (provided you are configured to do so).
filename ff "contents.txt" TERMSTR=CR;
data _null_;
set shoes end = lastrec;
newvar2 = catx(',',&varstr2);
file ff;
put newvar2;
run;
First i have created this table
data rmlib.tableXML;
input XMLCol1 $ 1-10 XMLCol2 $ 11-20 XMLCol3 $ 21-30 XMLCol4 $ 31-40 XMLCol5 $ 41-50 XMLCol6 $ 51-60;
datalines;
| AAAAA A||AABAAAAA|| BAAAAA|| AAAAAA||AAAAAAA ||AAAA |
;
run;
I want to clean, concatenate and export. I have written the following code
data rmlib.tableXML_LARGO;
file CleanXML lrecl=90000;
set rmlib.tableXML;
array XMLCol{6} ;
array bits{6};
array sqlvars{6};
do i = 1 to 6;
*bits{i}=%largo(XMLCol{i})-2;
%let bits =input(%largo(XMLCol{i})-2,comma16.5);
sqlvars{i} = substr(XMLCol{i},2,&bits.);
put sqlvars{i} &char10.. #;
end;
run;
the macro largo count how many characters i have
%macro largo(num);
length(put(&num.,32500.))
%mend;
What i need is instead of have char10, i would like that this number(10) would be the length, of each string, so to have something like
put sqlvars{i} &char&bits.. #;
I don't know if it possible but i can't do it.
I would like to see something like
AAAAA AAABAAAAA BAAAAA AAAAAAAAAAAAA AAAA
It is important to me to keep the spaces(this is only an example of an extract of a xml extract). In addition I will change (for example) "B" for "XPM", so the size will change after cleaning the text, that it what i need to be flexible in the char
Thank you for your time
Julen
I'm still not quite sure what you want to achieve, but if you want to combine the text from multiple varriables into one variable, then you could do something along the lines:
proc sql;
select name into :names separated by '||'
from dictionary.columns
where 1=1
and upcase(libname)='YOURLIBNAME'
and upcase(memname)='YOURTABLENAME';
quit;
data work.testing;
length resultvar $ 32000;
set YOURLIBNAME.YOURTABLENAME;
resultvar = &names;
resultvar2 = compress(resultvar,'|');
run;
Wasn't able to test this, but this should work if you replace YOURLIBNAME and YOURTABLENAME with your respective tables. I'm not 100% sure if the compress will preserve the spaces in the text.. But I think it should.
The format $VARYING. <length-variable> is a good candidate for solving this output problem.
On the presumption of having a number of variables whose values are vertical-bar bounded and wanting to output to a file the concatenation of the values without the bounding bars.
data have;
file "c:\temp\want.txt" lrecl=9000;
length xmlcol1-xmlcol6 $100;
array values xmlcol1-xmlcol6 ;
xmlcol1 = '| A |';
xmlcol2 = '|A BB|';
xmlcol3 = '|A BB|';
xmlcol4 = '|A BBXC|';
xmlcol5 = '|DD |';
xmlcol6 = '| ZZZ |';
do index = 1 to dim(values);
value = substr(values[index], 2); * ignore presumed opening vertical bar;
value_length = length(value)-1; * length with still presumed closing vertical bar excluded;
put value $varying. value_length #; * send to file the value excluding the presumed closing vertical bar;
end;
run;
You have some coding errors in that is making it difficult to understand what you want to do.
Your %largo() macro doesn't make any sense. There is no format 32500.. The only reason it would run in your code is because you are trying to apply the format to a character variable instead of a number. So SAS will automatically convert to use the $32500. instead.
The %LET statement that you have hidden in the middle of your data step will execute BEFORE the data step runs. So it would be less confusing to move it before the data step.
So replacing the call to %largo() your macro variable BITS will contain this text.
%let bits =input(length(put(XMLCol{i},32500.))-2,comma16.5);
Which you then use inside a line of code. So that line will end up being this SAS code.
sqlvars{i} = substr(XMLCol{i},2,input(length(put(XMLCol{i},$32500.))-2,comma16.5));
Which seems to me to be a really roundabout way to do this:
sqlvars{i} = substr(XMLCol{i},2,length(XMLCol{i})-2);
Since SAS stores character variables as fixed length, it will pad the value stored. So what you need to do is to remember the length so that you can use it later when you write out the value. So perhaps you should just create another array of numeric variables where you can store the lengths.
sqllen{i} = length(XMLCol{i})-2;
sqlvars{i} = substr(XMLCol{i},2,sqllen{i});
so I have a dataset whose elements are strings of emails in quotes. A single data element might look like this:
"john#cool.com" "jacob#cool.com" "jingleheimer#cool.com" "smith#cool.com"
I have the following macro command and data step:
%macro Emailer(RCP=);
/* body of the e-mail*/
data _null_;
file tmp;
put "Hello, World! <BR>";
run;
/*to-from*/
Filename tmp Email
Subject="Hello World Test"
To= (&RCP)
CT= "text/html";
%mend Emailer;
data _null_;
set EmailLists;
call execute('%Emailer(RCP='||ListOfEmails||')');
run;
But I keep getting "ERROR: Macro parameter contains syntax error."
Is it because my data elements have spaces or quotation marks or both?
Thanks in advance.
One way to test it is to pass the parameters directly, rather than with a data step. First I'll rearrange the order of the statements, as commenters pointed out.
%macro Emailer(RCP=);
filename myEmail Email;
data _null_;
file myEmail Subject = "Hello World Test"
To = (&RCP)
CT = "text/html";
put "Hello, World! <BR>";
run;
filename myEmail clear;
%mend Emailer;
And try making any of those work (can't make my 64-bit SAS work with my 32-bits Outlook so I can't test any of this):
%Emailer(RCP="john#cool.com" "jacob#cool.com" "jingleheimer#cool.com")
%Emailer(RCP="john#cool.com jacob#cool.com jingleheimer#cool.com")
%Emailer(RCP=john#cool.com jacob#cool.com jingleheimer#cool.com)
%Emailer(RCP=john#cool.com ; jacob#cool.com ; jingleheimer#cool.com)
After you figure out which form works, the rest should be easy.
I want to insert multiple line into one cell, but DDE does't work with directly put '0A'x.
filename xlSheet1 dde "Excel|c:\test.xlsx.Report!R1.C1:R1.C3" notab;
data _null_;
file xlSheet1;
a = "test";
b = cat("&sysdate","-", "&systime");
c = translate("Hello World", '0A'x, " ",);
put a '09'x b '09'x c ;
run;
Only first part write in the cell.
Any good advice?
Hmm, thought there would be an easier way, but this is the simplest thing I could get working:
filename xlSheet1 dde "Excel|sheet1!R1C1:R1C1" notab;
data _null_;
file xlSheet1;
a = cat('="line1 " & Char(10) & "line2"');
put a;
run;
Basically convert your value into a formula. Use the formula to append the text together and use excel to create the carriage return.
For this to work the cell also needs to be formatted with the 'wrap text' option. In fact, if you go to any cell in excel and use alt-enter to manually create a carraige return, you will notice it automatically turns on 'wrap text' for you so I don't think this part is optional.
I have a question on how to use the value from a SAS database in another command. In my case, I have a database with two variables (cell and res). "Cell" contains a reference to a cell in an Excel sheet where the value of "res" should be copied.
So I would like to use the value stored in "cell" in my command linking to the Excel sheet. This code does not work (concatenating with || does not work.)
DATA _null_;
SET test;
FILENAME ExcelTmp DDE "EXCEL|[&myInputTemplate.]&mySheet.!" || cell;
FILE ExcelTmp NOTAB LRECL=7000;
PUT res;
RUN;
Error message:
ERROR 23-2: Invalid option name ||.
1491! DDE "EXCEL|[&myInputTemplate.]&mySheet.!" || cell;
ERROR: Error in the FILENAME statement.
ERROR 23-2: Invalid option name cell.
1492 FILE ExcelTmp NOTAB LRECL=7000;
ERROR 23-2: Invalid option name NOTAB.
If I write
FILENAME ExcelTmp DDE "EXCEL|[&myInputTemplate.]&mySheet.!R1C1:R1C1";
then the value is written to cell A1 in Excel.
Is there some similar approach that works without invoking a macro?
Thanks for your help!
Christoph
The usual way to use values from a dataset as a part of command/statement is CALL EXECUTE routine:
DATA _null_;
SET test;
call execute("DATA _NULL_;");
call execute(cats("FILENAME ExcelTmp DDE ""EXCEL|[&myInputTemplate.]&mySheet.!",cell,""";"));
call execute("FILE ExcelTmp NOTAB LRECL=7000;");
call execute("PUT '"||res||"';");
call execute("RUN;");
run;
This code generates DATA-steps that stacked up in a buffer and will be executed after the step above is executed. So basically you will generate as many DATA NULL steps as you have records in your test dataset.
Assuming you're trying to update multiple cells, and cell is in the form RnCn, something like this may work...
You also need to determine the cell range beforehand, e.g. R2C2:R100:C5.
%LET RANGE = R2C2:R100C5 ;
DATA _null_;
SET test;
FILENAME ExcelTmp DDE "EXCEL|[&myInputTemplate.]&mySheet.!&RANGE" ;
FILE ExcelTmp NOTAB LRECL=7000;
put "[select(""" cell """)]" ;
PUT res;
RUN;