merging all columns in sas dataset who has column "shiyas" in header - sas

I have a sas dataset with columns shiyas1,shiyas2,shiyas3 in it. That dataset has some other columns also. I want to combine all the columns with header with shiyas in it.
We can't use cats(shiyas1,shiyas2,shiyas3) because similar datasets have columns upto shiyas10. As I am generating general sas code, we cannot use cats(shiyas1,shiyas2 .... shiyas10).
So how can we do this?
When I tried to use cats(shiyas1,shiyas2 .... shiyas10), eventhough my dataset have columns upto shiyas3, it created columns shiyas4 to shiyas10 with . filled in them.
SO one solution is to combine shiyas till the dataset have or to delete the unnecessary shiyas columns...
Pls help me.

Use variable list.
data have;
input (shiyas1-shiyas3) (:$1.);
cards;
1 2 3
;
data want;
set have;
length cat_shiyas $ 100 /*large enough to hold the content*/
;
cat_shiyas=cats(of shiyas:);
run;

Use the of statement (which lets you read across a row, similar to arrays) with the : wildcard operator. This will concatenate all columns beginning with 'shiyas'
cats(of shiyas:)

Related

SAS Transpose Variable to Observations

I have a set of data which has multiple columns but only one observation.
I need to transpose the data to have multiple observations with 2 column of data.The very first column in my Data is the Status. I want this to be the 2nd column of data and all remaining columns observations labeled in a column called 'Category'
Proc tranpose data=RNAD_STG out=RNAD;by Status; Run;
I want it to look like this.
I've transposed from Observation to Variable before but the reverse has me stuck. What can I do to achieve my desired output?
The log should state: NOTE: No variables to transpose.
Adding in a VAR statement solves this issue, either listing all variables, or a shortcut list or a wildcard list for all character variables.
Proc tranpose data=RNAD_STG out=RNAD (rename=(col1=status _name_=category));
by Status;
var CH7--PPE2;
*var _character_;
Run;

Row-wise operation for subset of columns

I have the following data:
data df;
input id $ d1 d2 d3;
datalines;
a . 2 3
b . . .
c 1 . 3
d . . .
;
run;
I want to apply some transformation/operation across a subset of columns. In this case, that means dropping all rows where columns prefixed with d are all missing/null.
Here's one way I accomplished this, taking heavy influence from this SO post.
First, sum all numeric columns, row-wise.
data df_total;
set df;
total = sum(of _numeric_);
run;
Next, drop all rows where total is missing/null.
data df_final;
set df_total;
where total is not missing;
run;
Which gives me the output I wanted:
a . 2 3
c 1 . 3
My issue, however, is that this approach assumes that there's only one "primary-key" column (id, in this case) and everything else is numeric and should be considered as a part of this sum(of _numeric_) is not missing logic.
In reality, I have a diverse array of other columns in the original dataset, df, and it's not feasible to simply drop all of them, writing all of that out. I know the columns for which I want to run this "test" all are prefixed with d (and more specifically, match the pattern d<mm><dd>).
How can I extend this approach to a particular subset of columns?
Use a different short cut reference, since you know it all starts with D,
total = sum( of D:);
if n(of D:) = 0 then delete;
Which will add variables that are numeric and start with D. If you have variables you want to exclude that start with D, that's problematic.
Since it's numeric, you can also use the N() function instead, which counts the non missing values in the row. In general though, SAS will do this automatically for most PROCS such as REG/GLM(not in a data step obviously).
If that doesn't work for some reason you can query the list of variables from the sashelp table.
proc sql noprint;
select name into :var_list separated by ", " from sashelp.vcolumn
where libname='WORK' and memname='DF' and name like 'D%';
quit;
data df;
set have;
if n(&var_list.)=0 then delete;
run;

extracting a list of observations from a single sas cell

I have a sas dataset that has a list of variables embedded within a single character variable, delimited by pipes. It looks something like this:
Obs. List_of_forms
1,"|FormA(04-15-2003)||FormB(04-15-2004)|",
2,"|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|"
I would like to extract each of the items delimited by pipes as individual variables, so the data would look something like this:
Obs., form1, form2, form3
1,"FormA(04-15-2003)","FormB(04-15-2004)",.,
2,"FormA(04-15-2002)","FormA(04-15-2003)","FormB(04-15-2003)"
But I'm at a loss for how to do this. I've thought about coding a do-loop to iterate through each pipe, but this seems needlessly complex. Any advice for a more elegant solution?
Use the SCAN() function. First we can setup your example data.
data have ;
obs+1;
input list_of_forms $60. ;
cards;
|FormA(04-15-2003)||FormB(04-15-2004)|
|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|
;;;;
Now we can convert it to multiple columns.
data want;
set have ;
array form (3) $60 ;
do i=1 to dim(form);
form(i) = scan(list_of_forms,i,'|');
end;
drop i;
run;
To make it more dynamic you could find the maximum number of values over the whole dataset and replace the hard coded upper bound of 3 on the new variables.
proc sql noprint ;
select max(countw(list_of_forms,'|'))
into :nforms
from have
;
run;
...
array form (&nforms) $60 ;

SAS: concatenate different datasets while keeping the individual data table names

I'm trying to concatenate multiple datasets in SAS, and I'm looking for a way to store information about individual dataset names in the final stacked dataset.
For eg. initial data sets are "my_data_1", "abc" and "xyz", each with columns 'var_1' and 'var_2'.
I want to end up with "final" dataset with columns 'var_1', 'var_2' and 'var_3'. where 'var_3' contains values "my_data_1", "abc" or "xyz" depending on from which dataset a particular row came.
(I have a cludgy solution for doing this i.e. adding table name as an extra variable in all individual datasets. But I have around 100 tables to be stacked and I'm looking for an efficient way to do this.)
If you have SAS 9.2 or newer you have the INDSNAME option
http://support.sas.com/kb/34/513.html
So:
data final;
format dsname datasetname $20.; *something equal to or longer than the longest dataset name including the library and dot;
set my_data_1 abc xyc indsname=dsname;
datasetname=dsname;
run;
Use the in statement when you set each data set:
data final;
set my_data_1(in=a) abc(in=b) xyc(in=c);
if a then var_3='my_data_1';
if b then var_3='abc';
if c then var_3='xyz';
run;

How to read a single date column in SAS?

What's wrong with the below SAS code? The single date column cannot be read correctly.
DATA test;
INPUT mydate MMDDYY8.;
FORMAT mydate YYMMDD10.;
DATALINES;
01-22-98
03-03-97
;
PROC PRINT DATA = test;
RUN;
Edit: Thanks for the answer. Another follow-up question is, when I try to read CSV format where datetime is quoted, it always fails to read correctly. How to read CSV format with quoted datetime values correctly? DSD option doesn't help much in my case.
Try left-aligning the datalines.
Though SAS is a free format language. I.e. Any statement can start in any line, one statement can span across multiple lines, multiple statement can be on online.
However with the datalines - statement that represents data within the code, data should start from column 1 / at least in column 2. Hence if the first two columns are blank, SAS assumes that the row is blank and goes to the next row.
Hence the mistake in your code is to start the data from the right column.