How to combine two columns in new rows in SAS? - sas

I have a dataset as follows:
variable level value
-----------------------
Age_group 1 0.1
Age_group 2 0.3
Age_group 3 0.2
Age_group 4 0.5
Sex 1 0.9
0 0.6
I would like to reformat it to get,
variable value
------------------------
Age_group
1 0.1
2 0.3
3 0.2
4 0.5
Sex
1 0.9
0 0.6
Is there any way to perform this?

The 'reformat' is more appropriately an output report versus a data set.
Example:
data have;
length variable $32;
input variable $ level value;
datalines;
Age_group 1 0.1
Age_group 2 0.3
Age_group 3 0.2
Age_group 4 0.5
Sex 1 0.9
Sex 0 0.6
;
ods html file='report.html' style=plateau;
proc report data=have;
define variable / order order=data noprint;
define level / 'variable';
compute before variable;
line variable $32.;
endcomp;
run;
ods html close;
Output

If you really want the output as a dataset and not a report (which you should perhaps consider), and you do not want to change the sorting of the input dataset, the following should work:
data have;
length variable $32;
input variable $ level value;
datalines;
Age_group 1 0.1
Age_group 2 0.3
Age_group 3 0.2
Age_group 4 0.5
Sex 1 0.9
Sex 0 0.6
;
data newdata;
set have(rename = value = old_val);
if lag(variable) ne variable then output;
variable = left(put(level,best.));
value = old_val;
output;
keep variable value;
run;

Sorry i don't have Sas installed and i didn't programmed for 3 years. But this could be helpful.
BTW it's not good to have your key field (variable) with spaces
libname class 'SAS-library';
proc sort data= (yourdataset);
by variable;
run;
data newdata ;
set yourdataset;
by variable;
retain variable;
if first.variable then newvariable=variable;
else newvariable=level;
run;
your will need to remove variable and rename the newvariable.
documentation
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214163.htm

Related

Trying to Format a Field in SAS, ignoring the decimals

I am trying to format a particular field in the below format:
If the value is 60.00 it has to be displayed as 60.
if the value is 14.32 it has to displayed as 1432
and if it is 0.00 then output should be 0
i.e. No decimal should be Displayed.
Below is the datasets and the option i have tried.
data input_dataset;
input srno $ bill_amt $10.;
datalines;
1 60.00
2 0.00
3 14.32
;
run;
data test;
set input_dataset;
format mc062 $10.;
mc062 = put((bill_amt *100 ),10.);
run;
expected Results are:
mc062:
60
0
1432
How about something like:
data input_dataset;
input srno $ bill_amt ;
datalines;
1 60.00
2 0.00
3 14.32
;
run;
data output_dataset;
set input_dataset;
if (bill_amt NE int(bill_amt))then bill_amt = bill_amt * 100;
run;
A custom PICTURE format will scale a data value prior to rendering the digits according to the picture template.
This example has a template for values up to 12 digits after scaling.
proc format;
picture nodecimals
0 = '9'
other='000000000000' /* 12 digit selectors */
(mult=100) /* value scaling multiplier */
;
run;
data have;
input srno $ bill_amt ; * get some NUMERIC billing amounts;
format bill_amt 10.2; * format to be used when showing value;
* copy amount into bill_raw (no format) just for learning about
* the effect of a format (or not) in output produced for viewing;
bill_raw = bill_amt;
* copy amount in mc062 variable that will custom PICTURE format applied
* when output (for viewing) is produced;
mc062 = bill_amt;
format mc062 nodecimals.;
datalines;
1 60.00
2 0.00
3 14.32
4 123.456
;
run;
* produce some output for viewing;
proc print data=have;
run;

Evaluate string fraction

Suppose I have the following dataset:
data df;
input frac $;
datalines;
1/3
1/4
5/12
1
0
7/12
;
run;
And I want to get to this:
frac
0.33
0.25
0.42
1.00
0.00
0.58
I know I could get this output by doing this:
proc sql;
select
case
when frac = '1/12' then 0.083
when frac = '1/6' then 0.167
...
end as frac_as_num
from df
;
quit;
But I'd rather not hard-code everything. I know I could do something like this in Python:
frac = ['1/12', '1/6', ...]
[eval(f) for f in frac]
something like below using scan and input should work.
proc sql;
create table want as
select frac,
case when index(frac,'/') then
input(scan(frac, 1),best32.)/input(scan(frac, 2),best32.)
else input(frac,best32.) end as frac_as_num format= 5.2
from df;
This is how I would do it so that the results are numbers and have not been converted to and from character which would happen if you use %SYSEVALF via RESOLVE in a single step.
filename FT76F001 temp;
data _null_;
file FT76F001;
input frac $;
put +3 'x=' frac ';' frac=$quote. '; output;';
datalines;
1/3
1/4
5/12
1
0
7/12
;
run;
data frac;
length x 8 frac $16;
%inc FT76F001;
run;
proc print;
run;
This is using SYSEVALF
464 data _null_;
465 input frac $;
466 x = input(resolve(cats('%sysevalf(',frac,')')),f32.);
467 put 'NOTE: ' (_all_)(=);
468 datalines;
NOTE: frac=1/3 x=0.3333333333
NOTE: frac=1/4 x=0.25
NOTE: frac=5/12 x=0.4166666667
NOTE: frac=1 x=1
NOTE: frac=0 x=0
NOTE: frac=7/12 x=0.5833333333
I'd say the simplest way to do it would be to put your fractional value into a macro variable, call the sysevalf function on it to evaluate the value, and finally convert it back into a normal variable. This has the added benefit of being able to work with any math expression, not just fractions.
Something like:
data test;
set df;
call symput('myMacroVariable',frac); /* Put the value of frac into a macro variable */
dec = resolve('%sysevalf(&myMacroVariable)'); /* Evaluate the value of the macro variable */
run;
Edit: Don't listen to me, data_null_'s answer does the same thing but in one line.
data df;
input frac $;
_frac=put(scan(frac,1)/coalesce(scan(frac,2),1),4.2);
datalines;
1/3
1/4
5/12
1
0
7/12
;
run;

Is there a way in SAS to print the value of a variable in label using proc sql?

I have a situation where I would like to put the value of a variable in the label in SAS.
Example: Median for Total_Days is 2. I would like to put this value in Days_Median_Split label. The median keeps on changing with varying data, so I would like to automate it.
Phy_Activity Total_Days "Days_Median_Split: Number of Days with Median 2"
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
Sample Dataset
Thanks so much!
* step 1 create data;
data have;
input Phy_Activity $ Total_Days Days_Median_Split;
datalines;
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
run;
*step 2 sort data on Total_days;
proc sort data = have;
by Total_days;
run;
*step 3 get count of obs;
proc sql noprint;
select count(*) into: cnt
from have;quit;
* step 4 calulate median;
%let median = %sysevalf(&cnt/2 + .5);
*step 5 get median obsevation;
proc sql noprint;
select Total_days into: medianValue
from have
where monotonic()=&median;quit;
*step 6 create label;
data have;
set have;
label Days_Median_split = 'Days_Median_split: Number of Days with Median '
%trim(&medianValue);
run;

Proc means - Calculating the share / weight

I am using a proc means to calculate the share of the payments made by business line, the data looks like this:
data Test;
input ID Business_Line Payment2017;
Datalines;
1 1 1000
2 1 2000
3 1 3000
4 1 4000
5 2 500
6 2 1500
7 2 3000
;
run;
i'm looking to calculate an additional column which, by group (business_line) calculates the percentage share (weight) of the payment as such:
data Test;
input ID Business_Line Payment2017 share;
Datalines;
1 1 1000 0.1
2 1 2000 0.2
3 1 3000 0.3
4 1 4000 0.4
5 2 500 0.1
6 2 1500 0.3
7 2 3000 0.6
;
run;
the code I have used so far:
proc means data = test noprint;
class ID;
by business_line;
var Payment2017;
output out=test2
sum = share;
weight = payment2017/share;
run;
I have also tried
proc means data = test noprint;
class ID;
by business_line;
var Payment2017 /weight = payment2017;
output out=test3 ;
run;
appreciate the help.
Proc FREQ will compute percentages. You can divide the PERCENT column of the output to get the fraction, or work with percents downstream.
In this example id crosses payment2017 in order to ensure all original rows are part of the output. If the id was not present, and there were any replicate payment amounts, FREQ would aggregate the payment amounts.
proc freq data=have noprint;
by business_line;
table id*payment2017 / out=want all;
weight payment2017 ;
run;
It is convenient to do with proc sql:
proc sql;
select *, payment2017/sum(payment2017) as share from test group by business_line;
quit;
data step:
data have;
do until (last.business_line);
set test;
by business_line notsorted;
total+payment2017;
end;
do until (last.business_line);
set test;
by business_line notsorted;
share=payment2017/total;
output;
end;
call missing(total);
drop total;
run;

Output variables within a by group only if

My first time posting. I'm pretty new to SAS programming (actually all programming). This seems like a simple problem but can't figure it out. I have some crosstab output and I'm trying to get it into shape for easy output to tables. I want to retain the first observation in a by group if there are only 3 observations in that group. If there are more than 3 observations, I want to retain all observations but the last. So, for example, here's what I have:
Group1 Group2 Percent
var1 1 0.25
var1 1 0.75
var1 1 1
var1 2 0.4
var1 2 0.6
var1 2 1
var1 3 0.7
var1 3 0.3
var1 3 0.6
var2 1 0.1
var2 1 0.2
var2 1 0.4
var2 1 0.3
var2 1 1
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 1
var2 3 0.7
var2 3 0.1
var2 3 0.05
var2 3 0.05
var2 3 0.1
and here's what I want in a new dataset
Group1 Group2 Percent
var1 1 0.25
var1 2 0.4
var1 3 0.7
var2 1 0.1
var2 1 0.2
var2 1 0.4
var2 1 0.3
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 3 0.7
var2 3 0.1
var2 3 0.05
var2 3 0.05
Hopefully, that's clear but please let me know if more information is needed.
I've broken it out in a few steps to help you see the logic and have used both data steps and SQL. Basically you want to count how many are in each group and keep all counts (the count within the group and the total count) around so you can use them to make your final logic.
data test;
length GROUP1 $5 GROUP2 PERCENT 8;
input GROUP1 $ GROUP2 PERCENT;
datalines;
var1 1 0.25
var1 1 0.75
var1 1 1
var1 2 0.4
var1 2 0.6
var1 2 1
var1 3 0.7
var1 3 0.3
var1 3 0.6
var2 1 0.1
var2 1 0.2
var2 1 0.4
var2 1 0.3
var2 1 1
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 0.2
var2 2 1
var2 3 0.7
var2 3 0.1
var2 3 0.05
var2 3 0.05
var2 3 0.1
;
run;
** count the number of obs per group **;
data test_ct; set test;
by GROUP1 GROUP2;
COUNT + 1;
if first.GROUP2 then COUNT = 1;
run;
** count the total number of obs per group and output on each row **;
proc sql noprint;
create table test_ct_all as
select *, count(*) as COUNT_TOTAL
from test_ct group by GROUP1,GROUP2
order by GROUP1, GROUP2, COUNT;
quit;
** logic to keep records **;
data keep_flags; set test_ct_all;
if COUNT=1 and COUNT_TOTAL=3 then KEEP=1;
*the last record will have COUNT and COUNT_TOTAL equal;
if COUNT_TOTAL > 3 and (COUNT_TOTAL ne COUNT) then KEEP=1;
run;
** output only the keep records **;
data keepers; set keep_flags;
if KEEP=1;
run;