SAS - MD5 entire table - sas

I'm calculating a hash (md5) row by row of an entire table, 1 hash for each row.
The table is selected by the user with a prompt (&sel_lib for the selected lib and &sel_tab for the table).
1st i get my columns from sas.help.vcolumn and concatenated all to a attribute called lista:
data contents;
do until (last.memname);
set sashelp.vcolumn;
where upcase(libname)="&sel_livraria"
and upcase(memname)="&sel_tabela";
by libname memname varnum;
length lista $&max_len ; /* sum(len col) + number of col */
lista=catx(',',lista,name);
end;
keep libname memname varnum lista;
run;
2nd i write that value in another table, because i need it for other operations not related with this issue:
proc sql;
update md5_table
set nom_colunas=(
select lista
from contents)
where libname="&sel_lib"
and memname="&sel_tab";
quit;
3rd Then i pass that concatenated columns to macro var scolunas
proc sql;
select columns
into :scolunas
from md5_table
where libname="&sel_lib"
and memname="&sel_tab";
quit;
Then i use it to run my hash in my data step
data want;
livraria="&sel_lib";
tabela="&sel_tab";
length check $32.;
format check $hex32.
set &sel_livraria..&sel_tabela.;
check = md5(cats(&scolunas));
hash = put(check,$hex32.);
keep livraria tabela hash;
put hash;
put _all_;
run;
My problem is that i need to compare the output with the output of a same table in another server (plataform migration), so i need a reference to compare both.
I prefer to do that by adding a row number of the source table (&sel_lib &sel_tab) to my data set. Any way to do that?
A more complex one would be adding the concatenate PK values to it.
Tks in advance.

STRIP will remove leading and trailing spaces of only one expression (i.e. variable) and thus accepts only a single argument.
You want your comma separated variable list concatenated as input to MD5.
CATS() will concatenate variables and implicitly strip.
...
check = md5(cats(&scolumnlist));
...
If you think that data values may in some edge cases cause a concatenation that is the same in different contexts use the CATX(<sep>,....) to explicitly separate the fields when concatenated.
Example:
A B C CATS() CATX('|',)
-- --- -- ------ ----------
PP QQ RR PPQQRR PP|QQ|RR
P PQQ RR PPQQRR P|PQQ|RR

Related

How to remove first 4 and last 3 letters, digits or punctuation using PROC SQL in SAS Enterprise Guide?

I am trying to remove 009, and ,N from 009,A,N just to obtain the letter A in my dataset. Please guide how do I obtain such using PROC SQL in SAS or even in the data step. I want to keep the variable name the same and just remove the above-mentioned digits, letters and punctuation from the data.
Is your original value saved in one variable? If so, you can utilize the scan function in a data step or in Proc SQL to extract the second element from a string that contains comma delimiter:
data want (drop=str rename=(new_str=str));
set orig;
length new_str $ 1;
new_str = strip(scan(str,2,','));
run;
proc sql;
create table work.want
as select *, strip(scan(str,2,',')) as new_str length=1
from orig;
quit;
In Proc SQL, if you want to replace the original column with the updated column, you can replace the * in the SELECT clause with the names of all columns other than original variable you are modifying.
What you code really depends on what other values there are in the other rows of the data set.
From the one sample value you provide the following code could be what you want.
data have;
input myvar $char80.;
datalines;
009,A,N
009-A-N
Okay
run;
proc sql;
update work.have
set myvar = scan(myvar,2,',')
where count(myvar,',') > 1
;

appending text to all columns at once in sas

I have a tables which have columns from col1 to col10.
I would like to append a string such as italy_col1 to italy_col10.
how can I achieve this without a macro.
Since i am joining multiple table i want to append a text "Italy" for all column in table 1 and "USA" in table 2. I tried below example it doesnt suit my requirement
https://support.sas.com/kb/48/674.html
cats function appends all the values in the column of the tables. Any suggestions?
One way is to generate macro variables and then use those in your code.
First get lists of variables to rename from TABLE1 and TABLE2.
proc sql noprint;
select catx('=',name,cats('Italy_',name) into :rename1 separated by ' '
from dictionary.columns
where libname="WORK" and memname="TABLE1" and upcase(name) ne 'ID'
;
select catx('=',name,cats('USA_',name) into :rename2 separated by ' '
from dictionary.columns
where libname="WORK" and memname="TABLE2" and upcase(name) ne 'ID'
;
quit;
Then use the list of rename pairs in your code that merges the datasets.
data want;
merge table1(rename=(&rename1)) table2(rename=(&rename2));
by id;
run;
Note this will only work when the number of variables to rename is small enough to fit into a single macro variable. If the list is longer just use another method, such as a data step, to generate the same code.
Also watch out for variable names that are too long. SAS has a limit of 32 bytes for variable names so adding 4 or 7 extra characters might result in names that are too long. You might just truncate to 32 characters , but then you risk forming duplicate names.

Insert into function with SAS/SQL

I want to insert values into a new table, but I keep getting the same error: VALUES clause 1 attempts to insert more columns than specified after the INSERT table name. This is if I don't put apostrophes around my date. If I do put apostrophes then I get told that the data types do not correspond for the second value.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', 2015-02-01)
values ('201502', 2015-03-01)
values ('201503', 2015-04-01)
values ('201504', 2015-05-01);
quit;
The second value has to remain as a date because it used with > and < symbols later on. I think the problem may be that 2015-02-01 just isn't a valid date format since I couldn't find it on the SAS website, but I would rather not change my whole table.
Date literals (constants) are quoted strings with the letter d immediately after the close quote. The string needs to be in a format that is valid for the DATE informat.
'01FEB2015'd
"01-feb-2015"d
'1feb15'd
If you really want to insert a series of dates then just use a data step with a DO loop. Also make sure to attach one of the many date formats to your date values so that they will print as human understandable text.
data data_table ;
length cvmo $6 next_beg_dt 8;
format next_beg_dt yymmdd10.;
do _n_=1 to 4;
cvmo=put(intnx('month','01JAN2015'd,_n_-1,'b'),yymmn6.);
next_beg_dt=intnx('month','01JAN2015'd,_n_,'b');
output;
end;
run;
#tom suggest you in comments how to use date and gives very good answer how to it efficently, which is less error prone than typing values. I am just putting the same into the insert statement.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', "01FEB2015"D)
;

How to change the column headers of a sas dataset into an observation?

I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank observation). How can we do it?
one way to do this would be to use dictionary.columns
proc sql;
create table Attribute as
select * from dictionary.columns;
Read through the table and check what attributes you are interested in. For your case you might be interested in the column "NAME" <- consist of the name of all columns.
Modify the table by adding where statement to the proc sql based on the identity of the column ( from which library / what type of file / name of file) e.g. where upcase(libname)= "WORK"
data attribute;
array column [ n ] $ length ;
do i=1 to n;
set attribute ( keep = name) ;
column [ i ] = name ;
end;
run;
Then I would proceed with data step. You could use macro variable to store the value of column's names by select variable into : but anyhow you still need to hardcode the size for the array n or any other method that store value into one observation . Also remember define the length and the type of array accordingly. You can give name to the variable in the result dataset Attribute by adding var1-varnafter the length at array statement.
For simplicity I use set statement to read observation one and one and store the value of column NAME, which is the official column name derived when using dictionary.columns into the array
Note that creating a non-temporary array would create variable(s) .
Add if you want to add the blank,
data younameit ;
merge attribute attribute(firstobs=2 keep=name rename=(name=_name));
output;
if name ne _name then do;
call missing(of _all_);
output;
end;
run;
As two datasets start with different observation and column names do not duplicate within one dataset, the next row of a valid observation ( derived from the first output statement in the resulting dataset would be empty due to call missing ( of _all_ ) ; output;
Sounds like you just want to combine the datasets and write the results to the Excel file. Do you really need the extra empty row?
libname out xlsx 'myfile.xlsx';
data out.report ;
set ds1 ds2 ...;
run;
Ensure that all your columns are character (or numeric, substitute numeric), then in your data step use:
array names{*} _character_;
do i=1 to dim(names);
call label(names{i}, names{i});
end;
output;

SAS sum variables using name after a proc transpose

I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!
You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;
I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.