How do I censor data points after 10 years? Some people experience a lot of events after 10 years, which messes up the data. So I only want to focus on the events during the first 10 years.
This is my fine and gray model:
proc phreg data=WORK.data plots(overlay=row )=cif ;
class Ratio / param=glm;
model TimeOutcome*Outcome(0)=Ratio / eventcode=1 rl;
strata Ratio;
run;
Here is my marginal means model:
proc phreg data=work.data plots(overlay=row)=mcf covs(aggregate);
class Ratio / param=glm;
model Stop * Recur(0) = Ratio/entry = Start ;
strata Ratio;
id = id;
hazardratio 'Hazard Ratio Statement 1' Ratio;
run;
Thanks for your help.
Typically you would pre-process your data to account for this.
Check if the time variable is greater than 5, and if so, set it to 5 and censored. IF/THEN statements work well.
data data2;
set data;
if timeOutcome > 5 then do;
timeOutcome=5; *sets time to 5;
Outcome=0; *sets to censored;
end;
run;
Then use data2 in your further data analysis.
Related
Ive got 50 columns of data, with 4 different measurements in each, as well as designation tags (groups C, D, and E). Ive averaged the 4 measurements... So every data point now has an average. Now, I am supposed to take the average of all the data points averages of each specific group.
So I want all the data in group C to be averaged, and so on for D and E.... and I dont know how to do that.
avg1=(MEAS1+MEAS2+MEAS3+MEAS4)/4;
avg_score=round(avg1, .1);
run;
proc print;
run;
This is what I have so far.
There are several procedures, and SQL that can average values over a group.
I'll guess you meant to say 50 rows of data.
Example:
Proc MEANS
data have;
call streaminit(314159);
do _n_ = 1 to 50;
group = substr('CDE', rand('integer',3),1);
array v meas1-meas4;
do _i_ = 1 to dim(v);
num + 2;
v(_i_) = num;
end;
output;
end;
drop num;
run;
data rowwise_means;
set have;
avg_meas = mean (of meas:);
run;
* group wise means of row means;
proc means noprint data=rowwise_means nway;
class group;
var avg_meas;
output out=want mean=meas_grandmean;
run;
rowwise_means
want (grandmean, or mean of means)
I am comparing the evolution of plasma concentrations over time for different treatments of patients.
We applied each treatment to different subjects and for each treatment we want a graph with the evolution for each subject in black, as well as for the the mean in red.
It should look like this
but it does look like this
My data has variable
trtan and trta for treatment number and name
subjid for the patient receiving that treatment
ATPT for timepoint
AVAL for Individual Concentrations
MEAN for average Concentrations
I am using SGPLOT to produce this line plot. y axis has concentrations while x axis has time points, I am sorting data by treatment, subject and timepoint before passing to Proc SGPLOT.
Lines for indivizual subjects are fine, Issue is with mean line plot, Since dataset is sorted by subject i am getting multiple mean plots by subject as well.
My requirement is to have multiple indivizual plots and an overlaying mean plot. Can anyone advise how can i solve this.
I am using below code. How can I repair it?
proc sort data = pc2;
by trtan trta subjid atptn atpt;
run;
proc sgplot data = pc2 dattrmap = anno pad = (bottom = 20%) NOAUTOLEGEND ;
by trtan trta;
series x = atptn y = aval/ group = trta
lineattrs = (color = black thickness = 1 pattern = solid );
series x = atptn y = mean/ group = trta attrid = trtcolor
lineattrs = (thickness = 2 pattern = solid );
xaxis label= "Actual Time (h)"
labelattrs = (size = 10)
values = (0 12 24 36 48 72 96 120 168)
valueattrs = (size = 10)
grid;
yaxis label= "Plasma Concentration (ng/mL)"
labelattrs = (size = 10)
valueattrs = (size = 10)
grid;
run;
This is not a problem with the mean only.
Leave out the mean, ass min=-20 to your yaxis specification, and you will see the same problem.
Alternatively run this code
data pc2;
do subj = 1 to 3;
do time = 1 to 25;
value = 2*sin(time/3) + rand('Normal');
output;
end;
end;
run;
proc sgplot data=pc2;
series x=time y=value;
run;
and you will get
The solution is to have one plot for each subject, so first sort the data by time and transpose it to have one variable subj_1 etc. for each subject.
proc sort data=pc2 out=SORTED;
by time subj;
run;
proc transpose data=TEST out=TRANS prefix=subj_;
by time;
id subj;
run;
I leave it as an exercise for you to add the mean to this dataset.
Then run sgplot with a series statement per subject. To build these statements, we interrogate the meta data in dataset WORK.TRANS
proc sql;
select distinct 'series x=time y='|| name ||'/lineattrs = (color=black)'
into :series_statements separated by ';'
from sasHelp.vColumn
where libname eq 'WORK' and memName eq 'TRANS'
and (name like 'subj%' or name = mean;
quit;
proc sgplot data=TRANS;
&series_statements;
run;
The result, without the mean, looks like this for my example:
Of course, you will have to do some graphical fine tuning.
We can achive it simply by taking the mean by ATPT and then instead of merging the mean record to the PK data by ATPT, you need to append the records and then you can run your code and it will give you the result you are expecting, please let me know if it does not work, it seems to have worked for me.
Grateful for feedback, I'm still a notice programmer. I'm trying to code the below in SAS.
I have two data sets a) and b), containing the following variables:
a) Bene_ID, county_id_1, county_id_2, county_id_3 etc (it's 12 months)
b) county_ID, rural (yes/no)
What I would normally do is create an array in a data step:
Array country (12) county_ID_1- county_ID_12
and use by group processing on bene_ID, to output a long (normalized) data set like this:
bene_id, month 1, county_id
bene_id, month 2, county_id
bene_id, month 3, county_id
etc.
BUT, how do I access the other data set b) within a data step? to pull in the rural variable? This is what I want:
bene_id, month 1, county_id, if rural = "yes"
bene_id, month 2, county_id, if rural = "yes"
bene_id, month 3, county_id, if rural = "yes"
I tried looking for other similar questions on this bulletin board but I wasn't even sure of the correct terms to search for. The reason I don't want to do a full merge is: how to filter on an array value? e.g. when rural = "no"?
Thanks everyone,
Lori
This is an example where using a FORMAT would help. You can use your second dataset to create a format
data formats;
retain fmtname 'rural';
set b;
rename county_id=start rural=label;
run;
proc format cntlin=formats ;
run;
and then use the format when processing the first dataset.
data want ;
set A;
array county_id_ [12];
do month=1 to dim(county_id_);
county=county_id_[month];
rural = put(county,rural3.);
output;
end;
drop county_id_: ;
run;
You are transforming the data structure from wide (array form) to tall (categorical form). This is generally known as a pivot or transpose. The transformation turns the information stored in each array element name (columns) into data that becomes accessible at the row-level.
You can merge the transpose with the counties to select rural ones.
* 80% of counties are rural;
data counties;
do countyId = 1 to 50;
if ranuni(123) < 0.80 then rural='Yes'; else rural='No';
output;
end;
run;
* for 10 people track with county they are in each month;
data have;
do personId = 1 to 10;
array countyId (12);
countyId(1) = ceil(50*ranuni(123));
do _n_ = 2 to dim(countyId);
if ranuni(123) < 0.15 then
countyId(_n_) = ceil(50*ranuni(123)); * simulate 15% chance of moving;
else
countyId(_n_) = countyId(_n_-1) ;
end;
output;
end;
run;
proc transpose data=have out=have_transpose(rename=(col1=countyId)) ;
by personId;
var countyId:;
run;
proc sort data=have_transpose;
by countyId personId;
run;
data want_rural;
merge have_transpose(in=tracking) counties;
by countyId;
if tracking and rural='Yes';
month = input(substr(_name_, length('countyId')+1), 8.);
drop _name_;
run;
If your wide data also has an additional a set of 12 columns, for say an array of amounts disbursed in each month, the best approach is to do 'DATA Step' transpose like #Tom showed, with an additional assignment inside the loop
amount = amount_[month];
I have a process flow in SAS Enterprise Guide which is comprised mainly of Data views rather than tables, for the sake of storage in the work library.
The problem is that I need to calculate percentiles (using proc univariate) from one of the data views and left join this to the final table (shown in the screenshot of my process flow).
Is there any way that I can specify the outfile in the univariate procedure as being a data view, so that the procedure doesn't calculate everything prior to it in the flow? When the percentiles are left joined to the final table, the flow is calculated again so I'm effectively doubling my processing time.
Please find the code for the univariate procedure below
proc univariate data=WORK.QUERY_FOR_SGFIX noprint;
var CSA_Price;
by product_id;
output out= work.CSA_Percentiles_Prod
pctlpre= P
pctlpts= 40 to 60 by 10;
run;
In SAS, my understanding is that procs such as proc univariate cannot generally produce views as output. The only workaround I can think of would be for you to replicate the proc logic within a data step and produce a view from the data step. You could do this e.g. by transposing your variables into temporary arrays and using the pctl function.
Here's a simple example:
data example /view = example;
array _height[19]; /*Number of rows in sashelp.class dataset*/
/*Populate array*/
do _n_ = 1 by 1 until(eof);
set sashelp.class end = eof;
_height[_n_] = height;
end;
/*Calculate quantiles*/
array quantiles[3] q40 q50 q60;
array points[3] (40 50 60);
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/*Keep only the quantiles we calculated*/
keep q40--q60;
run;
With a bit more work, you could also make this approach return percentiles for individual by groups rather than for the whole dataset at once. You would need to write a double-DOW loop to do this, e.g.:
data example;
array _height[19];
array quantiles[3] q40 q50 q60;
array points[3] _temporary_ (40 50 60);
/*Clear heights array between by groups*/
call missing(of _height[*]);
/*Populate heights array*/
do _n_ = 1 by 1 until(last.sex);
set class end = eof;
by sex;
_height[_n_] = height;
end;
/*Calculate quantiles*/
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/* Output all rows from input dataset, with by-group quantiles attached*/
do _n_ = 1 to _n_;
set class;
output;
end;
keep name sex q40--q60;
run;
I want to perform the standard likelihood ratio test in logsitic regression using SAS. I will have a full logistic model, containing all variables, named A and a nested logistic model B, which is derived by dropping out one variable from A.
If I want to test whether that drop out variable is significant or not, I shall perform a likelihood ratio test of model A and B. Is there an easy way to perform this test (essentially a chi-square test) in SAS using a PROC? Thank you very much for the help.
If you want to perform likelihood ratio tests that are full model v.s. one variable dropped model, you can use the GENMOD procedure with the type3 option.
Script:
data d1;
do z = 0 to 2;
do y = 0 to 1;
do x = 0 to 1;
input n ##;
output;
end; end; end;
cards;
100 200 300 400
50 100 150 200
50 100 150 200
;
proc genmod data = d1;
class y z;
freq n;
model x = y z / error = bin link = logit type3;
run;
Output:
LR Statistics For Type 3 Analysis
Chi-
Source DF Square Pr > ChiSq
y 1 16.09 <.0001
z 2 0.00 1.0000
I'm not sure about a PROC statement that can specifically perform LRT but you can compute the test for nested models.
Script
proc logistic data = full_model;
model dependent_var = independent_var(s);
ods output GlobalTests = GlobalTests_full;
run;
data _null_;
set GlobalTests_full;
if test = "Likelihood Ratio" then do;
call symput("ChiSq_full", ChiSq);
call symput("DF_full", DF);
end;
run;
proc logistic data = reduced_model;
model dependent_var = independent_var(s);
ods output GlobalTests = GlobalTests_reduced;
run;
data _null_;
set GlobalTests_reduced;
if test = "Likelihood Ratio" then do;
call symput("ChiSq_reduced", ChiSq);
call symput("DF_reduced", DF);
end;
run;
data LRT_result;
LR = &ChiSq_full - &ChiSq_reduced;
DF = &DF_full - &DF_reduced;
p = 1 - probchi(ChiSq,DF);
run;
I'm no expert on logistic regression, but I think what you are trying to accomplish can be done with PROC LOGISTIC, using the "SELECTION=SCORE" option on the MODEL statement. There are other SELECTION options available, such as STEPWISE, but I think SCORE matches closest to what you are looking for. I would suggest reading up on it though, because there are some associated options (BEST=, START= STOP=) that you might benefit from too.