I'm trying to concatenate multiple datasets in SAS, and I'm looking for a way to store information about individual dataset names in the final stacked dataset.
For eg. initial data sets are "my_data_1", "abc" and "xyz", each with columns 'var_1' and 'var_2'.
I want to end up with "final" dataset with columns 'var_1', 'var_2' and 'var_3'. where 'var_3' contains values "my_data_1", "abc" or "xyz" depending on from which dataset a particular row came.
(I have a cludgy solution for doing this i.e. adding table name as an extra variable in all individual datasets. But I have around 100 tables to be stacked and I'm looking for an efficient way to do this.)
If you have SAS 9.2 or newer you have the INDSNAME option
http://support.sas.com/kb/34/513.html
So:
data final;
format dsname datasetname $20.; *something equal to or longer than the longest dataset name including the library and dot;
set my_data_1 abc xyc indsname=dsname;
datasetname=dsname;
run;
Use the in statement when you set each data set:
data final;
set my_data_1(in=a) abc(in=b) xyc(in=c);
if a then var_3='my_data_1';
if b then var_3='abc';
if c then var_3='xyz';
run;
Related
I have a set of data which has multiple columns but only one observation.
I need to transpose the data to have multiple observations with 2 column of data.The very first column in my Data is the Status. I want this to be the 2nd column of data and all remaining columns observations labeled in a column called 'Category'
Proc tranpose data=RNAD_STG out=RNAD;by Status; Run;
I want it to look like this.
I've transposed from Observation to Variable before but the reverse has me stuck. What can I do to achieve my desired output?
The log should state: NOTE: No variables to transpose.
Adding in a VAR statement solves this issue, either listing all variables, or a shortcut list or a wildcard list for all character variables.
Proc tranpose data=RNAD_STG out=RNAD (rename=(col1=status _name_=category));
by Status;
var CH7--PPE2;
*var _character_;
Run;
I have multiple SAS dataset in single location(folder) with two columns and name of the SAS dataset seems to be Diagnosis_<diagnosis_name>.
Here I want to load all dataset and combine all together like below,
Sample data set
File Location: C:\Users\xyz\Desktop\diagnosis\Diagnosis_<diagnosis_name>.sas7bdat
1. Dataset Name : Diagnosis_Diabetes.sas7bdat
2. Dataset Name : Diagnosis_Obesity.sas7bdat
Ouput which I expect like this
Could you please help me on this.
You can just combine the datasets using SET statement. If want all of the datasets with names that start with a constant prefix you can use the : wildcard to make a name list.
First create a libref to reference the directory:
libname diag 'C:\Users\xyz\Desktop\diagnosis\';
Then combine the datasets. If the original datasets are sorted by the PersonID then you can add a BY statement and the result will also be sorted.
data tall;
set diag.diagnosis_: ;
by person_id;
run;
If want to generate that wide dataset you could use PROC TRANSPOSE, but in that case you will need some extra variable to actually transpose.
data tall;
set diag.diagnosis_: ;
by person_id;
present=1;
run;
proc transpose data=tall out=want(drop=_name_);
by person_id;
id diagnosis;
var present;
run;
I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank observation). How can we do it?
one way to do this would be to use dictionary.columns
proc sql;
create table Attribute as
select * from dictionary.columns;
Read through the table and check what attributes you are interested in. For your case you might be interested in the column "NAME" <- consist of the name of all columns.
Modify the table by adding where statement to the proc sql based on the identity of the column ( from which library / what type of file / name of file) e.g. where upcase(libname)= "WORK"
data attribute;
array column [ n ] $ length ;
do i=1 to n;
set attribute ( keep = name) ;
column [ i ] = name ;
end;
run;
Then I would proceed with data step. You could use macro variable to store the value of column's names by select variable into : but anyhow you still need to hardcode the size for the array n or any other method that store value into one observation . Also remember define the length and the type of array accordingly. You can give name to the variable in the result dataset Attribute by adding var1-varnafter the length at array statement.
For simplicity I use set statement to read observation one and one and store the value of column NAME, which is the official column name derived when using dictionary.columns into the array
Note that creating a non-temporary array would create variable(s) .
Add if you want to add the blank,
data younameit ;
merge attribute attribute(firstobs=2 keep=name rename=(name=_name));
output;
if name ne _name then do;
call missing(of _all_);
output;
end;
run;
As two datasets start with different observation and column names do not duplicate within one dataset, the next row of a valid observation ( derived from the first output statement in the resulting dataset would be empty due to call missing ( of _all_ ) ; output;
Sounds like you just want to combine the datasets and write the results to the Excel file. Do you really need the extra empty row?
libname out xlsx 'myfile.xlsx';
data out.report ;
set ds1 ds2 ...;
run;
Ensure that all your columns are character (or numeric, substitute numeric), then in your data step use:
array names{*} _character_;
do i=1 to dim(names);
call label(names{i}, names{i});
end;
output;
I have a sas dataset with columns shiyas1,shiyas2,shiyas3 in it. That dataset has some other columns also. I want to combine all the columns with header with shiyas in it.
We can't use cats(shiyas1,shiyas2,shiyas3) because similar datasets have columns upto shiyas10. As I am generating general sas code, we cannot use cats(shiyas1,shiyas2 .... shiyas10).
So how can we do this?
When I tried to use cats(shiyas1,shiyas2 .... shiyas10), eventhough my dataset have columns upto shiyas3, it created columns shiyas4 to shiyas10 with . filled in them.
SO one solution is to combine shiyas till the dataset have or to delete the unnecessary shiyas columns...
Pls help me.
Use variable list.
data have;
input (shiyas1-shiyas3) (:$1.);
cards;
1 2 3
;
data want;
set have;
length cat_shiyas $ 100 /*large enough to hold the content*/
;
cat_shiyas=cats(of shiyas:);
run;
Use the of statement (which lets you read across a row, similar to arrays) with the : wildcard operator. This will concatenate all columns beginning with 'shiyas'
cats(of shiyas:)
So, I have a significant problem with proc compare. I have two datasets with the two columns. One column lists table names and the other one - names of variables which correspond to table names from the first column. I want compare values of one of them based on the values of first column. I somewhat made it work but the thing is that these datasets have different sizes due to additional values in one of them. Which means that some new variable was added in the middle of a dataset (new variable was added to a table). Unfortunately, proc compare compares values from two datasets horizontally and checks them against each other for values, so in my case it looks like this:
ds 1 | ds 2
cost | box_nr
other | cost_total
As you can see, a new value box_nr was added to the second dataset that appears above the value that I want it to compare variable cost to (cost_total). So I would like to know if it's possible to compare values (check for differences in character sequence) that have at least minimal similarity - for example 3 letters (cos) or if it's possible to just put values like box_nr at the end suggesting that they don't appear in a certain dataset.
My code:
PROC Compare base=USERSPDS.MIzew compare=USERSPDS.MIwew
out=USERSPDS.result outbase outcomp outdif noprint;
id 'TABLE HD'n;
where ;
run;
proc print data=USERSPDS.result noobs;
by 'TABLE HD'n;
id 'TTABLE HD'n;
title 'COMPARISON:';
run;
Untested, but this should get you some of the way.
proc sql;
create table compare as
select
coalesce(a.cola, b.cola) as cola,
a.colb as acolb,
b.colb as bcolb
from dataa as a
full outer join datab as b
on
a.cola = b.cola and
compged(a.colb, b.colb) <= 100;
quit;
Have a look at the compged documentation for further information.
Sounds like you could make a new variable in both datasets, VAR3chars=substr(var,1,3) and then add that variable to your ID statement. I think that should work unless there are duplicate values.
So if one dataset had var="cost" and the other had var="cost_total", they would match on the id so they would be compared and found to be different.
If one dataset had var="box_nr" and the other did not have any values starting with "box", they would not match on the id so compare would find that a record exists for that id in one dataset but not the other.