Converting multiple treatment records to one row - sas

So I have this dataset. I basically need to concatenate all the con med treatments (CMTRT) so that they are horizontal so that each result for each day has only one row. My thinking here is to use PROC TRANSPOSE and create a prefix label CMTRT so I have cmtrt_1, cmtrt_2 etc so in the following data step I can use catx to combine into one variable. Is that right or is there a better way to do this?

Assuming you need the last value for RELCM, the following should give you the expected output.
proc sort data=have out=stage1; by subjid paramcd avalc avisit; run;
data stage2;
set stage1;
by subjid paramcd avalc avisit;
if first.avisit then
group_number+1;
run;
data want;
set stage2;
by subjid paramcd avalc avisit group_number;
retain _cmtrt;
*-- set large enough to accommodate the maximum number of records --*;
length _cmtrt $2000;
if first.group_number then
_cmtrt='';
_cmtrt=catx(', ', _cmtrt, cmtrt);
if last.group_number then
output;
rename _cmtrt=cmrtrt;
drop cmtrt group_number;
run;
Another method would be the DoW loop as pointed by #Richard's comment.
proc sort data=have out=stage1; by subjid paramcd avalc avisit; run;
data want(rename=(_cmtrt=cmtrt));
do _n_ = 1 by 1 until (last.avisit);
set stage1;
by subjid paramcd avalc avisit;
length _cmtrt $2000;
_cmtrt = catx(', ', _cmtrt, cmtrt);
end;
drop cmtrt;
run;
Assuming you don't need RELCM in your output dataset.
proc sort data=have out=stage1;
by subjid paramcd avalc avisit;
run;
proc transpose data=stage1 out=stage2;
by subjid paramcd avalc avisit;
var cmtrt;
run;
data want;
set stage2;
length cmtrt $2000;
cmtrt=catx(', ' , of col:);
drop _name_ col:;
run;

Related

Overlay the average trend on group by trends using Proc sgplot

I want to create a line graph that includes the overall trend of a disease rate and the specific trends for males and females. I use the following code for to create the group by trends. How to add he average trend to this line graph. Thanks for your help.
proc sgplot data=have ;
vline year/response=disease_rate group=sex stat=mean datalabel=disease_rate ;
yaxis values=(0,1) label="Percentage";
run;
Here's an example of summarizing it and then displaying it on the graph. There are more than one way to do this though, this is just one.
data have;
set sashelp.heart(in=a);
year=round(2021-ageAtStart, 10);
disease_rate= status="Dead";
run;
proc means data=have mean noprint;
class sex year;
types sex sex*year;
var disease_rate;
output out=summary_stats mean=average_value;
run;
proc sort data=summary_stats;
by sex year;
run;
data graph_data;
merge summary_stats(where=(_type_=2) rename=average_value=mean_sex_year)
summary_stats(where=(_type_=3) rename=average_value = mean_sex);
by sex;
format mean_sex: percent12.1;
run;
proc sgplot data=graph_data ;
*where year > 1990;
vline year/response=mean_sex_year group=sex stat=mean datalabel=mean_sex_year ;
vline year/response=mean_sex group=sex stat=mean datalabel=mean_sex ;
run;
Use series instead of vline so that you can overlay a regression on top of it to get an average trend line. For example:
proc sql;
create table have as
select date
, region
, sum(sale) as sale
from sashelp.pricedata
group by region, date
order by region, date
;
quit;
proc sgplot data=have;
series x=date y=sale / group=region;
reg x=date y=sale / group=region;
xaxis fitpolicy=rotatethin;
run;

All values for only most recent occurrence

I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;

Can create a subset based on date

I have the following dataset and code:
DATA survey;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
PROC PRINT; RUN;
data work;
set survey;
where '11JAN2007'<= order_date <= '13JAN2007';
proc print data=work;
run;
When I run this code it does give the desired output however. It only gives a table with three empty order_date columns.
Any thoughts on what goes wrong here?
This would work:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
data work;
set survey;
where '11JAN2007'd<= order_date <= '13JAN2007'd;
run;
proc print data=work;
format order_date date9. ;
run;
See SAS help for topics date, informat,...
If you want to query based on date, you need to tell SAS that your string is a date. You do this by putting a 'd' after the date string, e.g.
'11JAN2007'd

How to rename variables without using their original names?

I have a data set that I am uploading to sas. There are always 4 variables in the exact same order. The problem is sometimes the variables could have slightly different names.
For example the first variable user . The next day i get the same dataset, it might be userid . . . So I cannot use rename(user=my_user)
Is there any way i could refer to the variable by their order . . something like this
rename(var_order_1=my_user) ;
rename(var_order_3=my_inc) ;
rename _ALL_=x1-x4 ;
There are a few ways to do this. One is to determine the variable names from PROC CONTENTS or dictionary.columns and generate rename statements.
data have;
input x1-x4;
datalines;
1 2 3 4
5 6 7 8
;;;;
run;
%macro rename(var=,newvar=);
rename &var.=&newvar.;
%mend rename;
data my_vars; *the list of your new variable names, and their variable number;
length varname $10;
input varnum varname $;
datalines;
1 FirstVar
2 SecondVar
3 ThirdVar
4 FourthVar
;;;;
run;
proc sql; *Create a list of macro calls to the rename macro from joining dictionary.columns with your data. ;
* Dictionary.columns is like proc contents.;
select cats('%rename(var=',name,',newvar=',varname,')')
into :renamelist separated by ' '
from dictionary.columns C, my_vars M
where C.memname='HAVE' and C.libname='WORK'
and C.varnum=M.varnum;
quit;
proc datasets;
modify have;
&renamelist; *use the calls;
quit;
Another is to put/input the data using the input stream and the _INFILE_ automatic variable (that references the current line in the input stream). Here's an example. You would of course keep only the new variables if you wanted.
data have;
input x1-x4;
datalines;
1 2 3 4
5 6 7 8
;;;;
run;
data want;
set have;
infile datalines truncover; *or it will go to next line and EOF prematurely;
input #1 ##; *Reinitialize to the start of the line or it will eventually EOF early;
_infile_=catx(' ',of _all_); *put to input stream as space delimited - if your data has spaces you need something else;
input y1-y4 ##; *input as space delimited;
put _all_; *just checking our work, for debugging;
datalines; *dummy datalines (could use a dummy filename as well);
;;;;
run;
Here is another approach using the dictionary tables..
data have;
format var1-var4 $1.;
call missing (of _all_);
run;
proc sql noprint;
select name into: namelist separated by ' ' /* create macro var */
from dictionary.columns
where libname='WORK' and memname='HAVE' /* uppercase */
order by varnum; /* should be ordered by this anyway */
%macro create_rename(invar=);
%do x=1 %to %sysfunc(countw(&namelist,%str( )));
/* OLDVAR = NEWVARx */
%scan(&namelist,&x) = NEWVAR&x
%end;
%mend;
data want ;
set have (rename=(%create_rename(invar=&namelist)));
put _all_;
run;
gives:
NEWVAR1= NEWVAR2= NEWVAR3= NEWVAR4=

SAS: PROC IMPORT: CSV WITH DATES AS VAR NAMES

I'm importing CSV data in the following format:
SEDOL,12/08/2009,13/08/2009,14/08/2009,17/08/2009,18/08/2009
B1YVN39,7.8431,7.8431,7.8431,7.8431,7.598
B00G7R3,3.8,3.61,3.81,3.81,3.81
2965237,4.5351,4.5351,4.5351,4.5351,4.5351
2554345,7.355,7.355,7.355,7.355,7.355
I'm using the following command:
PROC IMPORT OUT= want
DATAFILE= have
DBMS=CSV REPLACE;
RUN;
Then transposing the data to long format, as follows:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
proc print; run;
How can I import the dates correctly formatted and change the variable type from default to date?
Importing and transposing are handy procedures, but if you understand your data well, a little data step program can deal with this in one step:
data want(keep=sedol v_date v_value);
infile have dsd dlm=',' truncover;
informat sedol $8. d1-d50 ddmmyy10. v1-v50 8.;
format v_date yymmdd10.;
array d(50) d1-d50;
array v(50) v1-v50;
/* Retain the date values and the count of dates */
retain d1-d50 idx;
/* Read header */
if _n_ = 1 then do;
input sedol d1-d50;
/* loop to find how many date columns there are */
do idx=1 to 50 while(d(idx) ne .);
end;
idx = idx - 1; /* must subtract one here */
delete;
end;
/* Read data lines */
input sedol v1-v50;
do i=1 to idx;
v_date = d(i);
v_value = v(i);
output;
end;
run;
As long as your input file is exactly as you describe (a header record with a leading ID variable less than 8 characters followed by some number of date values representing columns), this will process up to 50 measurements. It should be easy enough to modify if your needs change.
I would suggest in this case importing separately data and headers.
First, we import data:
PROC IMPORT OUT= want
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
datarow=2;
RUN;
Then we import only the first row with variables' names:
options obs=1;
PROC IMPORT OUT= header
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
RUN;
options obs=max;
Then we transpose row with headers into column and "mask" illegal (as SAS-names) values - add letter (doesn't matter which one, I chose 'D') as the first character and replace all slashes '/' to underscores '_':
proc transpose data=header out=header(drop=_name_);var _all_;run;
data header;
set header;
if anydigit(substr(COL1,1,1)) then COL1=cats("D",COL1);
COL1=translate(COL1,"_","/");
run;
Put this new 'cleaned' column names into a macrovariable:
proc sql noprint;
select COL1 into :names separated by ' '
from header;
quit;
And generate DATA-step for renaming using CALL EXECUTE routine:
data _null_;
dsid=open("want","i");
num=attrn(dsid,"nvars");
call execute("data want;");
call execute("set want;");
call execute("rename");
do i=1 to num;
call execute(varname(dsid,i)||"="||scan("&names",i," "));
end;
call execute(";run;");
rc=close(dsid);
run;
Now your original SORT and TRANSPOSE:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
And at last 'unmask' those dates back (deleting first D and replacing _ to /), and covert them to real dates with INPUT(). RETAIN statement is added just to put the new variable DATE at the second place right after SEDOl.
data transp;
retain SEDOL date;
set transp;
substr(_name_,1,1)='';
_name_=translate(_name_,"/","_");
date=input(strip(_name_),ddmmyy10.);
drop _name_;
format date ddmmyy10.;
run;