I want to have a sas dataset with 1 decimal of some variables, so my code is the following
data a;
set a;
dif=put(t0d,4.1);
drop t0d;
run;
Although in some cases with the dif variable I don't have this format. For example I have
dif
-1.0
-9
15.0
2
3.0
5.0
15.0
how can i fix this ?? I want
dif
-1.0
-9.0
15.0
2.0
3.0
5.0
15.0
Thank you!!
I suspect you have left something out of your explanation. The code you showed works fine for the values you showed.
data test;
input t0d;
dif = put(t0d,4.1);
cards;
-1.0
-9
15.0
2
3.0
5.0
15.0
;
proc print;
run;
Results (plain old text output)
Obs t0d dif
1 -1 -1.0
2 -9 -9.0
3 15 15.0
4 2 2.0
5 3 3.0
6 5 5.0
7 15 15.0
As you can see the new variable DIF is character of length 4 with the strings right aligned.
If instead you wanted DIF to be a numeric variable then change the code to just assign the value and attach a format to DIF so that the default way that the values are displayed is as 4 character strings with one decimal place.
dif = t0d ;
format dif 4.1;
PS The ODS output system does not display the leading spaces.
Put function DOES work well in SAS, you just have to use it correctly.
As for what you want to do, it depends on what kind of variable is your t0d. If it is char, then you have to use INPUT function:
DATA have;
input x $;
datalines;
8722
-93.2
-0.1122
15.116
5
1.5
;
run;
data want;
set have;
dif=input(x, 8.);
drop x;
format dif 8.1;
run;
But if the original variable is numeric, you just put a proper format on it. You can put it where you created the original table, or just use format statement.
DATA have;
input x ;
datalines;
8722
-93.2
-0.1122
15.116
5
1.5
;
run;
data want;
set have;
format x 8.1;
run;
Related
EDIT!!!! GO TO BOTTOM FOR BETTER REPRODUCABLE CODE!
I have a data set with a quantitative variable that's missing 65 values that I need to impute. I used the ODS output and proc glm to simultaneously fit a model for this variable and predict values:
ODS output
predictedvalues=pred_val;
proc glm data=Six_min_miss;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;
However, I am missing 21 predicted values because 21 of my observations are missing either of the two independent predictors.
If SAS can't make a prediction because of this missingness, it leaves an underscore (not a period) to show that it didn't make a prediction.
For some reason, if it can't make a prediction, SAS also puts an underscore for the 'observed' value--even if an observed value is present (the value in the highlighted cell under 'observed' should be 181.0512):
The following code merges the ODS output data set with the observed and predicted values, and the original data. The second data step attempts to create a new 'imputed' version of the variable that will use the original observation if it's not missing, but uses the predicted value if it is missing:
data PT_INFO_6MIN_IMP_temp;
merge PT_INFO pred_val;
drop dependent observation biased residual;
run;
data PT_INFO_6MIN_IMP_temp2;
set PT_INFO_6MIN_IMP_temp;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;
However, as you can see, SAS is putting an underscore in the imputed column, when there was an original value that should have been used:
In other words, because the original variable values is not missing (it's 181.0512) SAS should have taken that value and copied it to the imputed value column. Instead, it put an underscore.
I've also tried if SIX_MIN_WALK_z =. then observed=predicted
Please let me know what I'm doing wrong and/or how to fix. I hope this all makes sense.
Thanks
EDIT!!!!! EDIT!!!!! EDIT!!!!!
See below for a truncated data set so that one can reproduce what's in the pictures. I took only the first 30 rows of my data set. There are three missing observations for the dependent variable that I'm trying to impute (obs 8, 11, 26). There are one of each of the independent variables missing, such that it can't make a prediction (obs 8 & 24). You'll notice that the "_IMP" version of the dependent variable mirrors the original. When it gets to missing obs #8, it doesn't impute a value because it wasn't able to predict a value. When it gets to #11 and #26, it WAS able to predict a value, so it added the predicted value to "_IMP." HOWEVER, for obs #24, it was NOT able to predict a value, but I didn't need it to, because we already have an observed value in the original variable (181.0512). I expected SAS to put this value in the "_IMP" column, but instead, it put an underscore.
data test;
input Study_ID nyha_4_enroll kccq12sf_both_base SIX_MIN_WALK_z;
cards;
01-001 3 87.5 399.288
01-002 4 83.333333333 411.48
01-003 2 87.5 365.76
01-005 4 14.583333333 0
01-006 3 52.083333333 362.1024
01-008 3 52.083333333 160.3248
01-009 2 56.25 426.72
01-010 4 75 .
01-011 3 79.166666667 156.3624
01-012 3 27.083333333 0
01-013 4 45.833333333 0
01-014 4 54.166666667 .
01-015 2 68.75 317.2968
01-017 3 29.166666667 196.2912
01-019 4 100 141.732
01-020 4 33.333333333 0
01-021 2 83.333333333 222.504
01-022 4 20.833333333 389.8392
01-025 4 0 0
01-029 4 43.75 0
01-030 3 83.333333333 236.22
01-031 2 35.416666667 302.0568
01-032 4 64.583333333 0
01-033 4 33.333333333 0
01-034 . 100 181.0512
01-035 4 12.5 0
01-036 4 66.666666667 .
01-041 4 75 0
01-042 4 43.75 0
01-043 4 72.916666667 0
;
run;
data test2;
set test;
drop Study_ID;
run;
ODS output
predictedvalues=pred_val;
proc glm data=test2;
class nyha_4_enroll;
model SIX_MIN_WALK_z= nyha_4_enroll kccq12sf_both_base /p solution;
run;
ODS output close;
data combine;
merge test2 pred_val;
drop dependent observation biased residual;
run;
data combine_imp;
set combine;
if missing (SIX_MIN_WALK_z) then observed=predicted;
rename observed=SIX_MIN_WALK_z_IMPUTED;
run;
The special missing values (._) mark the observations excluded from the model because of missing values of the independent variables.
Try a simple example:
data class;
set sashelp.class(obs=10) ;
keep name sex age height;
if _n_=3 then age=.;
if _n_=4 then height=.;
run;
ods output predictedvalues=pred_val;
proc glm data=class;
class sex;
model height = sex age /p solution;
run; quit;
proc print data=pred_val; run;
Since for observation #3 the value of the independent variable AGE was missing in the predicted result dataset the values of observed, predicted and residual are set to ._.
Obs Dependent Observation Biased Observed Predicted Residual
1 Height 1 0 69.00000000 64.77538462 4.22461538
2 Height 2 0 56.50000000 58.76153846 -2.26153846
3 Height 3 1 _ _ _
4 Height 4 1 . 61.27692308 .
5 Height 5 0 63.50000000 64.77538462 -1.27538462
6 Height 6 0 57.30000000 59.74461538 -2.44461538
7 Height 7 0 59.80000000 56.24615385 3.55384615
8 Height 8 0 62.50000000 63.79230769 -1.29230769
9 Height 9 0 62.50000000 62.26000000 0.24000000
10 Height 10 0 59.00000000 59.74461538 -0.74461538
If you really want to just replace the values of OBSERVED or PREDICTED in the output with the values of the original variable that is pretty easy to do. Just re-combine with the source dataset. You can use the ID statement of PROC GLM to have it include any variables you want into the output. Like
id name sex age height;
Now you can use a dataset step to make any adjustments. For example to make a new height variable that is either the original or predicted value you could use:
data want ;
set pred_val ;
NEW_HEIGHT = coalesce(height,predicted);
run;
proc print data=want width=min;
var name height age predicted new_height ;
run;
Results:
NEW_
Obs Name Height Age Predicted HEIGHT
1 Alfred 69.0 14 64.77538462 69.0000
2 Alice 56.5 13 58.76153846 56.5000
3 Barbara 65.3 . _ 65.3000
4 Carol . 14 61.27692308 61.2769
5 Henry 63.5 14 64.77538462 63.5000
6 James 57.3 12 59.74461538 57.3000
7 Jane 59.8 12 56.24615385 59.8000
8 Janet 62.5 15 63.79230769 62.5000
9 Jeffrey 62.5 13 62.26000000 62.5000
10 John 59.0 12 59.74461538 59.0000
Here is a dataset example :
data data;
input group $ date value;
datalines;
A 2001 1.5
A 2002 2.6
A 2003 2.8
A 2004 2.9
A 2005 .
B 2001 0.1
B 2002 0.6
B 2003 0.7
B 2004 1.4
B 2005 .
C 2001 4.7
C 2002 4.6
C 2003 4.8
C 2004 5.0
C 2005 .
;
run;
I want to replace the missing values of the variable "value" for each group using linear interpolation.
I tried using proc expand :
proc expand data=data method = join out=want;
by group;
id date;
convert value;
run;
But it's not replacing any value in the output database.
Any idea what I'm doing wrong please?
Here are three ways to do it. Your missing data is at the end of the series. You are effectively doing a forecast with a few points. proc expand isn't good for that, but for the purposes of filling in missing values, these are some of the options available.
1. PROC EXPAND
You were close! Your missing data is at the end of the series, which means it has no values to join between. You need to use the extrapolate option in this case. If you have missing values between two data points then you do not need to use extrapolate.
proc expand data=data method = join
out=want
extrapolate;
by group;
id date;
convert value;
run;
2. PROC ESM
You can do interpolation with exponential smoothing models. I like this method since it can account for things like seasonality, trend, etc.
/* Convert Date to SAS date */
data to_sas_date;
set data;
year = mdy(1,1,date);
format year year4.;
run;
proc esm data=to_sas_date
out=want
lead=0;
by group;
id year interval=year;
forecast value / replacemissing;
run;
3. PROC TIMESERIES
This will fill in values using mean/median/first/last/etc. for a timeframe. First convert the year to a SAS date as shown above.
proc timeseries data=to_sas_date
out=want;
by group;
id year interval=year;
var value / setmissing=average;
run;
I don't know much about the expand procedure, but you can add extrapolate to the proc expand statement.
proc expand data=data method = join out=want extrapolate;
by group;
id date;
convert value;
run;
Results in:
Obs group date value
1 A 2001 1.5
2 A 2002 2.6
3 A 2003 2.8
4 A 2004 2.9
5 A 2005 3.0
6 B 2001 0.1
7 B 2002 0.6
8 B 2003 0.7
9 B 2004 1.4
10 B 2005 2.1
11 C 2001 4.7
12 C 2002 4.6
13 C 2003 4.8
14 C 2004 5.0
15 C 2005 5.2
Please take note of the statement here
By default, PROC EXPAND avoids extrapolating values beyond the first or last input value for a series and only interpolates values within the range of the nonmissing input values. Note that the extrapolated values are often not very accurate and for the SPLINE method the EXTRAPOLATE option results may be very unreasonable. The EXTRAPOLATE option is rarely used."
do you know how to use n in function LAGn(variable) that refer to another macro variable in the program-> max in my case?
data example1;
input value;
datalines;
1.0
3.0
1.0
1.0
4.0
1.0
1.0
2.0
4.0
2.0
;
proc means data=example1 max;
output out=example11 max=max;
run;
data example1;
%let n = max;
lagval=lag&n.(value);
run;
proc print data=example1;
run;
Thank you in advance!
Wiola
Is this what you're trying to do?
data example1;
input value;
datalines;
1.0
3.0
1.0
1.0
4.0
1.0
1.0
2.0
4.0
2.0
;
proc sql;
select max(value) format = 1. into :n
from example1;
quit;
data example1;
set example1;
lagval=lag&n(value);
run;
The format = 1. bit makes sure that the macro variable generated by proc sql doesn't contain any leading or trailing spaces that would mess up the subsequent data step code.
It is easy to use a macro variable to generate the N part of LAGn() function call.
%let n=4 ;
data want;
set have ;
newvar = lag&n(oldvar);
run;
Remember that macro code is evaluated by the macro pre-processor and then the generated code is executed by SAS. So placing %LET statements in the middle of a data step is just going to confuse the human programmer.
I have encountered the code in terms of different ways in SAS to handle
"Date". However, I do not understand what the last "format" was doing here.
The code is as below.
DATA SASWEEK.Datestest;
INPUT d1 MMDDYY8. +1 d2 DATE9.;
* informat does;
* Can be replaced by an informat statement;
d1f = d1;
d2f= d2;
d3f = d2;
FORMAT d1f DATE9. d2f WORDDATE. d3f MMDDYY8.; *formats;
datalines;
01111960 12JAN1960
01011961 01MAR2013
;
PROC PRINt;
RUN;
PROC PRINT;
FORMAT d1f 9.0 d1 WEEKDATE.;
RUN;
Any suggestion and explanation would be highly appreciated!!
I'm not sure if you're referring to the format in the proc print as FORMAT d1f 9.0 d1 WEEKDATE.;, or the last format in the data step FORMAT d1f DATE9. d2f WORDDATE. d3f MMDDYY8.;.
Either way the format just affects the way data is displayed without changing the saved value, which in your case are just numeric as that is what dates are stored as.
SAS dates start from 0 as 1st January 1960, then 1 for 2nd January 1960, and so on. This is why values such as 10 and 11 can be seen in the proc print output in the first observation and the second observation follows the same counting sequence.
To reiterate, the format is just affecting the display, not the value.
Edit
In response to question in comment:
So based on your explanation, the last "d1f 9.0" was to make d1f from
01/11/60 -> "01111960"?
The short answer to your question is "no", an explanation is below.
The first observation for d1f is based on the input data 01111960 which is 11th January 1960, and so is day 10 as SAS counts days (as explained in my initial response).
Variable d1f contains the value 10 for the first observation.
d1f has the format DATE9. applied to it in the initial data step so the first proc print shows 11JAN1960 as the first value of d1f.
The second proc print has the format 9.0 applied to the value of d1f, this instructs SAS to display the value of d1f (value is 10) within 9 columns (with 0 decimal places), this is why you see 10 displayed as the first value of d1f in the second proc print.
The following data step might also help demonstrate what is happening if you run it after your code and check the log for the results:
data _null_;
set SASWEEK.Datestest;
put d1f;
put d1f 9.0;
put d1f 8.0;
put d1f 7.0;
put d1f 6.0;
put d1f 5.0;
put d1f 4.0;
put d1f 3.0;
run;
I have data structured as follows in Table 1:
ID Variable1 Variable2
1 2 5
2 10 2
3 14 3
4 4 3
I need to add the following data to the above table for each row in Table 2:
Coef Value
Variable1C 4.2
Variable2C 5.6
The final result should be:
ID Variable1 Variable2 Variable1C Variable2C
1 2 5 4.2 5.6
2 10 2 4.2 5.6
3 14 3 4.2 5.6
4 4 3 4.2 5.6
How might I pursue this? So far, I've only be able to get one of data by transforming table 2 and then adding it, but this is not what I want.
A simple data step should do that.
data want ;
set have ;
Variable1C=4.2 ;
Variable2=5.6;
run;
If you have the data in a table then transpose it and combine them.
proc transpose data=table2 out=wide ;
id coef ;
var value ;
run;
data want ;
set table1;
if _n_=1 then set wide ;
run;