Combining SAS Data Sets with different no. of columns - sas

I am having problem in combining two tables with different no. of columns.
Say my first table is table1:
table1
t1_col_1 t1_col_2 t1_col_3 ... t1_col_13
and my second table is table2:
table2
t2_col_1 t2_col2 t2_col3 t2_col4
Now if I type command:
data table3;
set tabel1 table2;
run;
What will be the out put of table3 ?
The SAS link says this command do a concatanation:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001107839.htm
Since the columns no. are different, concatenation will cause problem.
So how does this command exactly works ? And what will be its output in this case ?

Appending (concatenating) two or more data sets is basically just stacking the data sets together with values in variables of the same name being stacked together. Unique variables in each data set will form their own variables in the new combined data set. Right now we have different number of variables. This article explains how concatenation works between data sets with different variables: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001312944.htm
For example, suppose we have:
data work.table1;
input col1 $ col2 col3 col4;
datalines;
George 10 10 10
Lucy 10 10 10
;
run;
data work.table2;
input col1 $ col2;
datalines;
Shane 3
Peter 3
;
run;
data work.table3;
set table1 table2;
run;
OUTPUT:
col1 col2 col3 col4
George 10 10 10
Lucy 10 10 10
Peter 3 . . <== These entries are
Shane 3 . . empty.
col1 and col2 are present in both sets, so the values inside them will be stacked. col3 and col4 are only present in table1, so some of the values under them in the new combined set will be empty.

Related

How do I avoid spaces/tabs in columns names when I use proc transpose?

How do I avoid spaces/tabs in columns names when I use proc transpose? The best way to illustrate my problem is by giving an example:
Data tst; input ColA $ ColB; datalines;
Cat1 1
Cat2 2
Cat3 3
; run;
proc transpose data = tst out= tst_out (drop = _name_); id ColA;
run;
When running this code my column names look something like this:
Basically I want the column names to be "Cat1", "Cat1", "Cat1" and not " Cat1", " Cat1", " Cat1".
(If that is not possible then I have an alternative question: How do I remove the spaces AFTER proc transpose? In my real data set I have a lot of columns so I prefer a method where I don't have to type for every column)
Just change the setting of VALIDVARNAME option to V7 instead of ANY. It won't remove the leading spaces/tabs but it will change them to underscores so the result are valid names.
Example:
data tst;
input ColA $& ColB;
datalines;
Cat 1 1
Cat 2 2
Cat 3 3
;
options validvarname=v7;
proc transpose data=tst out=tst2; id cola ; var colb; run;
proc print;
run;
Result:
Obs _NAME_ Cat_1 Cat_2 Cat_3
1 ColB 1 2 3
PS When using in-line data in your SAS program make sure to start the lines of data in the first column. That will prevent the accidental inclusion of spaces (or tabs when using SAS/Studio interface) in the lines of data. Placing the DATALINES (also known as CARDS) statement starting in the first column will also prevent the editor from automatically indenting when you start adding lines of data.

How to transpose my data on sas by observation on data step

I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.

Merging but keeping all observations?

I have three data sets of inpatient, outpatient, and professional claims. I want to find the number of unique people who have a claim related to tobacco use (1=yes tobacco, 0=tobacco) in ANY of these three data sets.
Therefore, the data sets pretty much are all:
data inpatient;
input Patient_ID Tobacco;
datalines;
1 0
2 1
3 1
4 1
5 0
;
run;
I am trying to merge the inpatient, outpatient, and professional so that I am left with those patient ids that have a tobacco claim in any of the three data sets using:
data tobaccoall;
merge inpatient outpatient professional;
by rid;
run;
However, it is overwriting some of the 1's with 0's in the new data set. How do I better merge the data sets to find if the patient has a claim in ANY of the datasets?
When you merge data sets in SAS that share variable names, the values from the data set listed on the right in the merge statement overwrite the values from data set to its left. In order to keep each value, you'd want to rename the variables before merging. You can do this in the merge statement by adding a rename= option after each data set.
If you want a single variable that represents whether a tobacco claim exists in any of the three variables, you could create a new variable using the max function to combine the three different values.
data tobaccoall;
merge inpatient (rename=(tobacco=tobacco_in))
outpatient (rename=(tobacco=tobacco_out))
professional (rename=(tobacco=tobacco_pro));
by rid;
tobacco_any = max(tobacco_in,tobacco_out,tobacco_pro,0);
run;
If your data were 1=has .=doesn't have (missing), then you could use the UPDATE statement, which mostly works like Merge except it wouldn't overwrite nonmissing data with missing.
For example:
data inpatient;
input Patient_ID Tobacco;
datalines;
1 .
2 1
3 1
4 1
5 .
;
run;
data outpatient;
input Patient_ID Tobacco;
datalines;
1 1
2 1
3 .
4 .
5 .
;
run;
data want;
update inpatient outpatient;
by patient_id;
run;

outputting multiple data sets into excel workbook

Another question. I have multiple data sets that generate ouput how can output these into one excel work sheet and apply my own formating. For example I have data set 1, data set 2, data set 3
each data set has two coloumns, for example
Col 1 Col 2
1 2
3 4
5 6
I want each data set to be in one worksheet and seperated by column , so in excel it should look like
Col 1 Col 2 Blank Col Col 1 Col 2 Blank Col
Somone told me I need to look at DDE for this is this true
Regards,
You can definitely do it using DDE. What DDE does it just simulates user's clicks at Excel's menus, buttons, cells etc. Here's an example how you can do that with macro loop for 3 datasets with names have1, have2 and have3. If you need more general solution (unknown number of datasets, with various number of variables, random datasets' names etc), the code should be updated, but its 'DDE-part' will be essentially pretty the same.
One more assumption - your Excel workbook should be open during code execution. Though it can be also automated - Excel can be started and file can be open using DDE itself.
You can find a very nice introduction into DDE here, where all these trick discussed in details.
data have1;
input Col1 Col2;
datalines;
1 2
3 4
5 6
;
run;
data have2;
input Col1 Col2;
datalines;
1 2
3 4
5 6
7 8
;
run;
data have3;
input Col1 Col2;
datalines;
1 2
3 4
7 8
5 6
9 10
;
run;
%macro xlsout;
/*iterating through your datasets*/
%do i=1 %to 3;
/*determine number of records in the current dataset*/
proc sql noprint;
select count(*) into :noobs
from have&i;
quit;
/*assign a range on the workbook spreadsheet matching to data in the current dataset*/
filename range dde "excel|[myworkbook.xls]sas!r1c%eval((&i-1)*3+1):r%left(&noobs)c%eval((&i-1)*3+2)" notab;
/*put data into selected range*/
data _null_;
set have&i;
file range;
put Col1 '09'x Col2;
run;
%end;
%mend xlsout;
%xlsout
You cannot do exactly this with SAS (DDE is probably possible). I would suggest looking at SaviCells Pro.
http://www.sascommunity.org/wiki/SaviCells
http://www.savian.net/utilities.html
You could likely accomplish what you're asking through ODS TAGSETS.EXCELXP or the new ODS EXCEL (9.4 TS1M1). You would need to arrange the datasets ahead of time (ie, merge them together or transpose or whatnot to get one dataset with the right columns), however, or else use PROC REPORT or some other procedure to get them in the right format.

SAS Merge with column filters

I have the below two datasets and need the thord one as output.
ONE TWO
----------- ------------------
ID ID TAG VALUE
1 1 Y 1000
2 2 N 2000
3
OUTPUT
------------
ID TAG VALUE
1 Y 1000
2 . .
3 . .
The merge should happen only if the TAG = 'Y' in TWO dataset.
Also need all the values from ONE dataset.
Can this be done using SAS MERGE?
data output;
merge one (in=a)
two (in=b where=(tag = 'Y'));
by id;
if a;
run;