Update counts according to a label - sas

suppose to have the following:
data have;
input ID :$20. Label :$20. Hours :$20. Days :$20.;
cards;
0001 w 3144 3
0001 w 23 54
0001 p 12 1
0002 m 456 34
0002 w 2 1
0002 s 231 45
0002 w 98 23
0003 w 12 6
0003 w 98 76
;
Is there a way to, for each ID, sum the Hours, so get the total and then split it by the days but only when the label is == w? If the label is not w put a missing.
Desired output:
data have;
input ID :$20. Label :$20. Hours :$20. Days :$20.;
cards;
0001 w 167.3158 3
0001 w 3011.684 54
0001 p . 1
0002 m . 34
0002 w 32.79167 1
0002 s . 45
0002 w 754.2084 23
0003 w 8.048778 6
0003 w 101.9512 76
;
In other words: for 0001 in the desired output example I added: 3144+23+12 = 3179, the 54+3=57 that are the days where the label is "w" then I divided 3179 by 57 and multiplied the result for 3 and 54 but not for 1 respectively.
Thank you in advance

Same idea with #Stu Sztukowski, but use DOW-Loop skill:
data want;
do until(last.id);
set have;
by id notsorted;
sum_of_hours=sum(sum_of_hours,input(hours,best.));
sum_of_days_w=sum(sum_of_days_w,(label='w')*input(days,best.));
end;
do until(last.id);
set have;
by id notsorted;
if label='w' then hours=cats(sum_of_hours*(input(days,best.)/sum_of_days_w));
else hours='';
output;
end;
run;

The calculation you need to do looks like this in code form:
if(id = 'w') then hours = sum_of_hours_w/sum_of_days * days
else hours = .
All we need to do is get the sum of hours and days where label = 'w', then merge it back with our original table by id. The table to do this calculation would look like this:
id
label
hours
days
sum_of_hours
sum_of_days_w
0001
w
3144
3
3179
57
0001
w
23
54
3179
57
You can accomplish this all in a single SQL step.
proc sql;
create table want as
select t1.id
, t1.label
, CASE(t1.label)
when('w') then t2.sum_hours/t2.sum_days_w * t1.days
else .
END as hours
, t1.days
from have as t1
/* Get the sum of all hours and days where label = 'w' */
LEFT JOIN
(select id
, sum( (label = 'w')*days ) as sum_days_w
, sum(hours) as sum_hours
from have
group by id
) as t2
ON t1.id = t2.id
;
quit;

Related

Count how many days pass between two periods

is there a way to count how many days pass from a start-end to the next one? Let say:
ID Start End
0001 22JAN2022 23JAN2022
0001 26JAN2022 30JAN2022
0001 03MAR2022 08MAR2022
0001 09MAR2022 15MAR2022
0001 17MAR2022 30MAR2022
desired output:
ID Start End days
0001 22JAN2022 23JAN2022 3
0001 26JAN2022 30JAN2022 4
0001 03FEB2022 08MAR2022 1
0001 09MAR2022 15MAR2022 2
0001 17MAR2022 30MAR2022 .......
I believe I demonstrated this in another thread but there you go
data have;
input ID $ (Start End)(:date9.);
format Start End date9.;
datalines;
0001 22JAN2022 23JAN2022
0001 26JAN2022 30JAN2022
0001 03FEB2022 08MAR2022
0001 09MAR2022 15MAR2022
0001 17MAR2022 30MAR2022
;
data want;
set have;
by ID;
set have(firstobs = 2 rename = start = s keep = start)
have(obs = 1 drop = _all_);
if last.ID then s = .;
days = s - end;
run;

Add flags after comparing columns of dates

suppose to have the following data set.
ID Hired Start_date End_date Flag_Start Flag_End
0001 1-1900 01JAN2018 21DEC2018 1 2
0001 1-1900 01JAN2019 01DEC2020 2 2
0002 10-2020 26MAR2020 03MAY2020 1 2
0003 03-2021 18DEC2020 31DEC2020 1 2
..... ....... ......... ......... ........... ...........
I would like the desired output. Sorry if I ask you but I'm a newbie and this seems to be a very difficult task with SAS. I'm familiar with R.
Desired output:
ID Hired Start_date End_date Flag_Start Flag_End
0001 1-1900 01JAN2018 21DEC2018 1 2
0001 1-1900 01JAN2019 01DEC2020 2 3
0002 03-2020 26MAR2020 03MAY2020 1 0
0003 03-2021 18DEC2020 31DEC2020 1 3
..... ....... ......... ......... ........... ...........
So, for each ID, if, after sorting, the last End_date is "x" and the "Hired" is 1-1900 then in Flag_End add 3 otherwise if Hired is < End_date add 0 otherwise if Hired is > End_date but not 1-1900 add 3.
Thank you in advance
I think this is what you want.
The Hired Date does not match between your two posted data sets. I chose the second one (03-2020).
data have;
input ID $ Hired :anydtdte. (Start_date End_date)(:date9.) Flag_Start Flag_End;
format Hired Start_date End_date date9.;
datalines;
0001 1-1900 01JAN2018 21DEC2018 1 2
0001 1-1900 01JAN2019 01DEC2020 2 2
0002 03-2020 26MAR2020 03MAY2020 1 2
0003 03-2021 18DEC2020 31DEC2020 1 2
;
data want;
set have;
by ID;
if last.ID then do;
if Hired = '01jan1900'd then flag_end = 3;
else if Hired < End_date then flag_end = 0;
else if Hired >= End_date then flag_end = 3;
end;
run;

SAS problem: sum up rows and divide till it reach a specific value

I have the following problem, I would like to sum up a column and divide the sum every line through the sum of the whole column till a specific value is reached. so in Pseudocode it would look like that:
data;
set auto;
sum_of_whole_column = sum(price);
subtotal[i] = 0;
i =1;
do until (subtotal[i] = 70000)
subtotal[i] = (subtotal[i] + subtotal[i+1])/sum_of_whole_column
i = i+1
end;
run;
I get the error that I haven't defined an array... so can I use something else instead of subtotal[i]?and how can I put a column in an array? I tried but it doesn't work (data = auto and price the column I want to put into an array)
data invent_array;
set auto;
array price_array {1} price;
run;
EDIT: maybe the dataset I used is helpful :)
DATA auto ;
LENGTH make $ 20 ;
INPUT make $ 1-17 price mpg rep78 ;
CARDS;
AMC Concord 4099 22 3
AMC Pacer 4749 17 3
Audi 5000 9690 17 5
Audi Fox 6295 23 3
BMW 320i 9735 25 4
Buick Century 4816 20 3
Buick Electra 7827 15 4
Buick LeSabre 5788 18 3
Cad. Eldorado 14500 14 2
Olds Starfire 4195 24 1
Olds Toronado 10371 16 3
Plym. Volare 4060 18 2
Pont. Catalina 5798 18 4
Pont. Firebird 4934 18 1
Pont. Grand Prix 5222 19 3
Pont. Le Mans 4723 19 3
;
RUN;
Perhaps I am missing your point but your subtotal will never be equal to 70 000 if you divide by the sum of its column. The maximum value will be 1. Your incremental sum however can be equal or superior to 70 000.
data stage1;
retain _sum 0;
set auto;
_sum = sum(_sum, price);
if _sum < 70000 then output;
run;
proc sql;
create table want as
select t1.*, t1._sum/sum(price) as subtotal
from stage1 as t1;
quit;
subtotal
0.0607268256
0.1310834235
0.2746411058
0.3679017467
0.5121261056
0.5834753107
0.6994325842
0.7851820027
1

combine and merge rows in SAS

I have a SAS Table like:
DATA test;
INPUT id sex $ age inc r1 r2 Zaehler work $;
DATALINES;
1 F 35 17 7 2 1 w
17 M 40 14 5 5 1 w
33 F 35 6 7 2 1 w
49 M 24 14 7 5 1 w
65 F 52 9 4 7 1 w
81 M 44 11 7 7 1 w
2 F 35 17 6 5 1 n
18 M 40 14 7 5 1 n
34 F 47 6 6 5 1 n
50 M 35 17 5 7 1 w
;
PROC PRINT; RUN;
proc sort data=have;
by county;
run;
I want compare rows if sex and age is equal and build sum over Zaehler. For example:
1 F 35 17 7 2 1 w
and
33 F 35 6 7 2 1 w
sex=f and age=35 are equale so i want to merge them like:
id sex age inc r1 r2 Zaehler work
1 F 35 17 7 2 2 w
I thought i can do it with proc sql but i can't use sum in proc sql. Can someone help me out?
PROC SUMMARY is the normal way to compute statistics.
proc summary data=test nway ;
class sex age ;
var Zaehler;
output out=want sum= ;
run;
Why would you want to include variables other than SEX, AGE and Zaehler in the output?
Your requirement is not difficult to understand or to satisfy, however, I am not sure what is your underline reason for doing this. Explain more on your purpose may help to facilitate better answers that work from the root of your project. Although I have a feeling the PROC MEAN may give you better matrix, here is a one step PROC SQL solution to get you the summary as well as retaining "the value of first row":
proc sql;
create table want as
select id, sex , age, inc, r1, r2, sum(Zaehler) as Zaehler, work
from test
group by sex, age
having id = min(id) /*This is tell SAS only to keep the row with the smallest id within the same sex,age group*/
;
quit;
You can use proc sql to sum over sex and age
proc sql;
create table sum as
select
sex
,age
,sum(Zaehler) as Zaehler_sum
from test
group by
sex
,age;
quit;
You can than join it back to the main table if you want to include all the variables
proc sql;
create table test_With_Sum as
select
t.*
,s.Zaehler_sum
from test t
inner join sum s on t.sex = s.sex
and t.age = s.age
order by
t.sex
,t.age
;
quit;
You can write it all as one proc sql query if you wish and the order by is not needed, only added for a better visibility of summarised results
Not a good solution. But it should give you some ideas.
DATA test;
INPUT id sex $ age inc r1 r2 Zaehler work $;
DATALINES;
1 F 35 17 7 2 1 w
17 M 40 14 5 5 1 w
33 F 35 6 7 2 1 w
49 M 24 14 7 5 1 w
65 F 52 9 4 7 1 w
81 M 44 11 7 7 1 w
2 F 35 17 6 5 1 n
18 M 40 14 7 5 1 n
34 F 47 6 6 5 1 n
50 M 35 17 5 7 1 w
;
run;
data t2;
set test;
nobs = _n_;
run;
proc sort data=t2;by descending sex descending age descending nobs;run;
data t3;
set t2;
by descending sex descending age;
if first.age then count = 0;
count + 1;
zaehler = count;
if last.age then output;
run;
proc sort data=t3 out=want(drop=nobs count);by nobs sex age;run;
thanks for your help. Here is my final code.
proc sql;
create table sum as
select distinct
sex
,age
,sum(Zaehler) as Zaehler
from test
WHERE work = 'w'
group by
sex
,age
;
PROC PRINT;quit;
I just modify the code a little bit. I filtered the w and i merg the Columns with the same value.
It was just an example the real Data is much bigger and has more Columns and rows.

Why informat is not working in SAS

Tried various formats of date, but output do not reflects any date. What could be the issue?
data c;
input age gender income color$ doj$;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;
You are mixing things up a bit.
The date formats are to be applied on numeric data, not on text data.
So you should not read in doj as $ (text), but as a date (so a date informat).
Try DDMMYY10. for doj on your input statement:
data c;
input age gender income color$ doj ddmmyy10.;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;