How can I avoid double rounding when using the round function and put function in SAS together? Take the following code for example:
data _null_;
sd = 11.863499608;
sdc = strip(put(round(sd,0.0001),10.3));
put sdc=;
run;
The actual result should be 11.863 but the put function rounds up from the already rounded value of 11.8635 to give a final result of 11.864.
Could someone please tell me how to avoid the second round up by the put function? Please note that the first round function is extremely important and can't be avoided.
There are situations where Fw.d format does not round as expected. It is best to round to desired number and decimals before using the format.
20 data _null_;
21 x=0-1e-5;
22 put x=best.;
23 y = put(x,5.2);
24 put y= 'Negative 0 need to round';
25 z = put(round(x,.01),5.2);
26 put z=;
27 run;
x=-0.00001
y=-0.00 Negative 0 need to round
z=0.00
Try using:
sdc = strip(put(round(sd,0.001),10.3));
Related
I imported a CSV file in SAS and it has a time variable.But the time variable is in numeric format like 515 which refers 05:15 , 1110 refers 11:10 and 2030 which refers 20:30.
I need to convert it into proper time format and then take out the Hour from it.I have tried:
new_time=put(time,hhmm.);
The output which i got is like 0:09 , 0:10.
Please help me out.
Try this:
data have;
input mytimevar;
cards;
515
1110
2030
;
run;
data want;
set have;
time = input(put(mytimevar, 4.) || '00', hhmmss.);
format time tod5.;
run;
How this works:
Convert timevar to text using put
Append 00 for seconds
Input as time variable using hhmmss. informat
Display with leading zeros for hours using tod5. format
A Time value in SAS is numeric whose value is the number of seconds. The display of such a value typically involves associating a time format such as TIMEw.d, HHMMw.d, MMSSw.d. There are many more.
From SAS Help (my italics)
SAS time value
is a value representing the number of seconds since
midnight of the current day. SAS time values are between 0 and 86400.
NOTE: The time formats will handle time values (number of seconds) outside the 0 to 24 hours range.
For your case of a time value (call it the csvtime) encoded as 100*hours + minutes the SAS time value can be computed using the dhms function specifying zero for the d and s arguments, and the h and m arguments parsed from the csv time using integer division and modulus arithmetic.
sastime = dhms (0, floor(cvstime/100), mod(cvstime,100), 0);
format sastime time7.; * values displayed will be rendered using hh:mm:ss construct;
I'm trying to apply the same logical on all the variables and create a new variable based on the logical:
DATA want;
SET have;
IF "range" = 25 THEN "new range" = 1
ELSE "new range" = 0;
RUN;
If it's easier I can also just change the variables themselves as opposed to creating new variables from the logical statement.
As an example, I want any value within the variables of 25 to be 1, and everything else to be 0:
HAVE:
var_100 var_101 var_102
30 25 20
45 100 25
25 25 10
WANT:
var_100 var_101 var_102
0 1 0
0 0 1
1 1 0
So I have about 100 variables with all the same prefix and increasing suffices. Instead of writing 100 logicals. I am trying to write one that will apply to every variable in that range of var_1 to var_100.
Lots of ways you can do this, mostly based on what exactly you're doing to what.
Arrays are the simplest:
data want;
set have;
array vars[25] var1-var25;
array newvars[25] n_var1 - n_var25;
do _i = 1 to dim(vars);
if vars[_i] = 25 then newvars[_i] = 1;
else newvars[_i] = 0;
end;
run;
Of course you need some reasonable way to specify those variable lists (var1-var25 and n_var1 - n_var25); if they're not just sequential, you'll either have to write them all out, or use the macro language to do that.
Another way is to write a macro to do what you want.
%macro recode(invar=, outvar=, inval=, outval=, otherval=);
if &invar. = &inval. then &outvar. = &outval.;
else &outvar. = &otherval.;
%mend recode;
data want;
set have;
%recode(invar=var1, outvar=n_Var1, inval=25, outval=1, otherval=0);
.. 25 of these ..
run;
You can then generate these macro calls with code; search on "sas data driven programming" either here or on a search engine for examples.
The latter is better if that 25 -> 1 changes by the variable. The former is better if it doesn't and the variables are easily "listable" (like var1-var25). If they're not listable, but the 25->1 is fixed, either one works about the same in my opinion.
And of course instead of using newvars you can just recode var[_i] = 1 or whatever if that's easier.
As an aside, there are also simpler ways of coding variables to 1/0 flags if that's what you're doing using procs. I think PROC SCORE is one common way, but probably worth a separate question if you want to go this route.
Attempting to calculate age, I did a bit of googling and discovered that yrdif was updated in 9.3 to include a handy-dandy 'AGE' option.
However, in using it, I noticed that when calculating date spans ranging from Jan 1st to Dec 31st, we get some unexpected results. Examples:
age = yrdif('01Jan1932'd,'31Dec2012'd,'Age');
put age;
The above yields 81 years, when it should be one day less than 81 years (80.9972222). But more surprising is the result when we increment the dates by one day:
age = yrdif('02Jan1932'd,'01Jan2013'd,'AGE');
put age;
Now we get the expected value (80.997222).
Bug? Something else going on here that I'm not aware of? Desired next step was to simply do floor(yrdif(dob,dod,'AGE')) to get age, but it seems like it will not be quite so easy.
In 9.3 TS1M2 and 9.4 TS1M2 I get the expected result:
1 data _null_;
2 age = yrdif('01Jan1932'd,'31Dec2012'd,'Age');
3 put age;
4 run;
80.997260274
Perhaps it was a fixed bug. Searching TS notes for that doesn't come up with anything.
In 9.3+ you can also use INTCK to correctly calculate age if you want the years as an integer.
age2= intck('YEAR','01Jan1932'd,'31Dec2012'd,'c');
The 'c' at the end asks SAS to consider the interval continuous, so it correctly handles intrayear differences.
data _null_;
age = yrdif('01Jan1932'd,'02Jan2013'd,'Age');
age2= intck('YEAR','01Jan1932'd,'31Dec2012'd,'c');
age3= intck('YEAR','01Mar1932'd,'01Jan2013'd,'c');
age4= intck('YEAR','01Mar1932'd,'01Apr2013'd,'c');
put age= age2= age3= age4=;
run;
So here, age3 is correctly 80 while age4 is correctly 81; in the past this would've been incorrect (both would be 81).
I am using the INTNX function to calculate month intervals. I'm finding that the results are frequently one day off from what I would expect... For example, look at this code:
data test;
olddate='20140531';
oldsasdate=input(olddate,yymmdd8.);
newsasdate=intnx('month',oldsasdate,-17);
newdate=put(newsasdate,yymmdd8.);
run;
In this code, I try to find the date 17 months before 05/31/2014. I would expect the function to return 11/30/2012, but it instead returns 12/1/2012. Any idea what's going on here? Is there a way to fix this?
The default for intnx is to align with the start of the month. It basically tracks interval boundaries, so each time it goes from MM/01/YY to MM/30/YY it ticks one interval crossed.
So,
data _null_;
x = intnx('month','31MAY2014'd, -1);
put x= date9.;
run;
Returns '01APR14'd, not '30APR14'd.
You can change it to 'same' alignment with the optional 4th parameter (SAS 9.2+ I believe).
data _null_;
x = intnx('month','31MAY2014'd, -1,'s');
put x= date9.;
run;
I have the following dataset:
AGE HSQ PCT
65 1 0.7
65 2 0.2
65 3 0.1
66 1 0.5
66 2 0.25
66 3 0.25
[...]
What I need is to get the followig output:
AGE P1 P2 P3
65 0.7 0.2 0.1
66 0.5 0.25 0.25
[...]
I have been told to adopt LAG and FIRST.AGE or LAST.AGE in order to do that, and to me it seems a good strategy. However I am not able to get the final result.. the (wrong) code I am using is:
DATA OUTPUT;
SET SAMPLE;
BY AGE HSQ;
IF LAST.AGE THEN DO;
P1=LAG2(PCT);
P2=LAG1(PCT);
P3=PCT;
END;
RUN;
But it jumps to previus ages percentages, which is not what I need.. where is the syntax error? Thanks!
Have been told as in this is an assignment to use them, or as in this is the easiest way to do it?
The easiest way to do this is PROC TRANSPOSE:
data have;
input AGE HSQ PCT;
datalines;
65 1 0.7
65 2 0.2
65 3 0.1
66 1 0.5
66 2 0.25
66 3 0.25
;;;;
run;
proc transpose data=have out=want prefix=P;
by age;
var pct;
id hsq;
run;
LAG does not work the way you think it works - it does not give you the value of the previous row; it instead creates a queue and takes the current value of (argument) and gives you the previous value on the queue. So you can't use it in an IF statement like that.
If you for some reason had to do this in a datastep, then you would want to do it like this:
data want;
array p[3];
do _n_ = 1 by 1 until (last.age);
set have;
by age;
p[hsq]=pct;
end;
keep p1-p3 age;
run;
Really no reason to use lag, or any concept of lag; just as you come across values that belong in a place, you assign them to that place, and when you hit last.age then output.
Anybody want to join me in putting in a SASware request to remove the LAG function?
Just for fun, the direct answer to the original question (to show how this could be done):
DATA want;
SET have;
BY AGE HSQ;
p1=lag2(pct);
p2=lag1(pct);
p3=pct;
if last.age then output;
run;
This goes over a lot of extra work (by a lot I mean a few nanoseconds of CPU time, of course) because it calculates the lags six times and only outputs two of the results. It also is a bit 'risky' because it doesn't check to make sure HSQ is the correct value - ie, if you missed one entry for an age, and only had 2 rows for it, you'd have the previous age's HSQ=3 value for P1, which is probably not desired.
The ultimate point is that with LAG, if you do intend to use it as a stand-in for "previous row's record", you need to keep it outside of conditional blocks. Calculate the lag for every row, and use the result conditionally (in this case, output is used conditionally).