SAS Transpose Variable to Observations - sas

I have a set of data which has multiple columns but only one observation.
I need to transpose the data to have multiple observations with 2 column of data.The very first column in my Data is the Status. I want this to be the 2nd column of data and all remaining columns observations labeled in a column called 'Category'
Proc tranpose data=RNAD_STG out=RNAD;by Status; Run;
I want it to look like this.
I've transposed from Observation to Variable before but the reverse has me stuck. What can I do to achieve my desired output?

The log should state: NOTE: No variables to transpose.
Adding in a VAR statement solves this issue, either listing all variables, or a shortcut list or a wildcard list for all character variables.
Proc tranpose data=RNAD_STG out=RNAD (rename=(col1=status _name_=category));
by Status;
var CH7--PPE2;
*var _character_;
Run;

Related

SAS targed encored

Hi,
Can someone explain to me what a given code sequence does step by step?**
I must describe it in detail what is happening in turn
%macro frequency_encoding(dataset, var);
proc sql noprint;
create table freq as
select distinct(&var) as values, count(&var) as number
from &dataset
group by Values ;
create table new as select *, round(freq.number/count(&var),00.01) As freq_encode
from &dataset left join freq on &var=freq.values;
quit;
data new(drop=values number &var);
set new;
rename freq_encode=&var;
run;
data new;
set new;
keep &var;
run;
data dane(drop = &var);
set dane;
run;
data dane;
set dane;
set new;
run;
The SQL is first finding the frequency of each value of the variable. Then it divides those counts by the total number of non-missing values and rounds that percentage to two decimal places (or integers when you think of the ratio as a percentage).
This could be done in one step with:
proc sql noprint;
create table new as
select *,round(number/count(&var),0.01) as freq_encode
from (select *,&var as values,count(&var) as number
from &dataset
group by &var
)
;
quit;
It is not clear what the DANE dataset is supposed to be. If &DATESET does not equal DANE then those last four data steps make no sense. If it does then it is a convoluted way to replace the original variable with the percentage.
The first one is basically trying to rename the calculated percentage as the original variable and eliminate the original variable and the other two intermediate variables used in calculating the percentage.
The second one is dropping all of the variables except the new percentage.
The third one is dropping the original variable from "dane".
The last one is adding the new variable back to "dane".
Assuming DANE should be replaced with &DATASET then those four data steps could be reduced to one:
data &dataset;
set &dataset(drop=&var);
set new(keep=freq_encode rename=(freq_encode=&var));
run;
It is probably best not to overwrite your original dataset in that way. So perhaps you should add an OUT parameter to your macro to name to new dataset you want to create.
You could have avoided all of those data steps by just adding the DROP= and RENAME= dataset options to the dataset generated by the SQL query.
So perhaps you want something like this:
%macro frequency_encoding(dataset, var,out);
proc sql noprint;
create table &out(drop=&var number rename=(freq_encode=&var)) as
select *,round(number/count(&var),0.01) as freq_encode
from (select *,count(&var) as number
from &dataset
group by &var
)
;
quit;
%mend ;
%frequency_encoding(sashelp.class,sex,work.class);

load and combine all SAS dataset

I have multiple SAS dataset in single location(folder) with two columns and name of the SAS dataset seems to be Diagnosis_<diagnosis_name>.
Here I want to load all dataset and combine all together like below,
Sample data set
File Location: C:\Users\xyz\Desktop\diagnosis\Diagnosis_<diagnosis_name>.sas7bdat
1. Dataset Name : Diagnosis_Diabetes.sas7bdat
2. Dataset Name : Diagnosis_Obesity.sas7bdat
Ouput which I expect like this
Could you please help me on this.
You can just combine the datasets using SET statement. If want all of the datasets with names that start with a constant prefix you can use the : wildcard to make a name list.
First create a libref to reference the directory:
libname diag 'C:\Users\xyz\Desktop\diagnosis\';
Then combine the datasets. If the original datasets are sorted by the PersonID then you can add a BY statement and the result will also be sorted.
data tall;
set diag.diagnosis_: ;
by person_id;
run;
If want to generate that wide dataset you could use PROC TRANSPOSE, but in that case you will need some extra variable to actually transpose.
data tall;
set diag.diagnosis_: ;
by person_id;
present=1;
run;
proc transpose data=tall out=want(drop=_name_);
by person_id;
id diagnosis;
var present;
run;

How can I make the first row of a SAS dataset the variable names?

I have an already imported dataset where the first row contains the variable names. I know that typically when importing a dataset you use getnames = yes. However, if the data is already imported how can I make the first row the variable names using a data step?
Data looks like:
A B C
1 Name 1 Name 2 Name 3
2 2 4 66
3 3 5 6
Since reading the names as data probably made all of your variables character you can try just transposing the data twice to fix it. That will work well for small datasets.
So the first transpose will place the current name into the _NAME_ variable and convert each row into a column. The second proc transpose can drop the original name and use the first row (new COL1 variable) as the names.
proc transpose data=have out=wide ;
var _all_;
run;
proc transpose data=wide(drop=_name_ rename=(col1=_name_)) out=want(drop=_name_ _label_);
var col:;
id _name_;
run;
The problem with the already imported data is that all the numeric data was likely placed in a character variables because the 'first row' of data seen by the import process contained some character data, and drove the inference for automatic column construction.
Regardless, you will need to construct renaming pairs old-name=new-name for each variables that has to be renamed. The new-name being in row 1 makes it possible to transpose that row to arrange those name parts as data. SQL with :into and separated by can populate a macro variable for use in a proc datasets step that performs the column renaming without rewriting the entire data set. Finally, a DATA step with modify can remove a row in place, again, without rewriting the entire data set.
filename sandbox temp;
data _null_;
file sandbox;
put 'A,B,C';
put 'Name 1, Name 2, Name 3';
put '2,4,66';
put '3,5,6';
run;
proc import datafile=sandbox dbms=csv replace out=work.oops;
run;
proc transpose data=oops(obs=1) out=renames;
var _all_;
run;
proc sql noprint;
select cats(_name_,"=",compress(col1,,"KN"))
into :renames separated by ' '
from renames;
%put NOTE: &=renames;
proc datasets nolist lib=work;
modify oops;
rename &renames;
run;
data oops;
modify oops;
remove;
stop;
run;
%let syslast=oops;

SAS sum variables using name after a proc transpose

I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!
You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;
I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.

SAS: concatenate different datasets while keeping the individual data table names

I'm trying to concatenate multiple datasets in SAS, and I'm looking for a way to store information about individual dataset names in the final stacked dataset.
For eg. initial data sets are "my_data_1", "abc" and "xyz", each with columns 'var_1' and 'var_2'.
I want to end up with "final" dataset with columns 'var_1', 'var_2' and 'var_3'. where 'var_3' contains values "my_data_1", "abc" or "xyz" depending on from which dataset a particular row came.
(I have a cludgy solution for doing this i.e. adding table name as an extra variable in all individual datasets. But I have around 100 tables to be stacked and I'm looking for an efficient way to do this.)
If you have SAS 9.2 or newer you have the INDSNAME option
http://support.sas.com/kb/34/513.html
So:
data final;
format dsname datasetname $20.; *something equal to or longer than the longest dataset name including the library and dot;
set my_data_1 abc xyc indsname=dsname;
datasetname=dsname;
run;
Use the in statement when you set each data set:
data final;
set my_data_1(in=a) abc(in=b) xyc(in=c);
if a then var_3='my_data_1';
if b then var_3='abc';
if c then var_3='xyz';
run;