Insert rows based on group ID and difference sum between two datasets - sas

Hello Stack community,
I have a problem where I would appreciate your time and help.
Let's say I have data A with group ID 'A' and the total sum of the group ID 'A' is 11. I have another data B with the same group ID 'A' and the total sum of the group ID 'A' is 20. Now the difference is 9, I want to expand this difference of 9 into 9 rows by that group ID 'A' and /append/insert into data A. I have put the tables for your reference.
data A  
Group Sum
A 1
A 3
A 4
A 1
A 2
Total 11
data b 
Group Sum
A 5
A 2
A 3
A 5
A 5
Total 20
expand the difference of 9 into rows 
Group Count
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
data want 
Group Sum
A 1
A 3
A 4
A 1
A 2
Total 11
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
Friends, I really appreciate and thank you for your time and help on this.
I didn't program this yet. I am not sure how to solve this.

This is a strange table format, but given the information you've posted here is a solution that will work for any number of groups. Note that it is not good practice to keep a total row within your data. You can always calculate the total with SQL or PROCs.
First, calculate the differences for all groups:
proc sql;
create table difs as
select a.group, b.total - a.total as dif
from (select group, sum(sum) as total
from a
where group NE 'Total'
group by group
) as a
LEFT JOIN
(select group, sum(sum) as total
from b
where group NE 'Total'
group by group
) as b
ON a.group = b.group
order by group
;
quit;
This creates the following table:
group dif
A 9
Next, we need to add a 1 for every value in the group:
data counts;
set difs;
by group;
do i = 1 to dif;
sum = 1;
output;
end;
drop i dif;
run;
This creates the following table:
group sum
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
Now we simply append this to the original table to get the desired output:
data want;
set a counts;
run;
Which produces the table we need:
group sum
A 1
A 3
A 4
A 1
A 2
Total 11
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1

Related

SAS Create conditional duplicate rows - repeating rows for an instance and assigning weight to the duplicate rows

I need to transform my data to input into a model. I am doing this with sas. Below is the original format of the data and two options of what the model will accept. Option 2 is ideal. Is there a way to do this in SAS? I keep trying to come up with data steps but end up in circles.
ORIGINAL DATA FORMAT
ID
Total
Risk
recordA
3
3
recordB
5
2
OPTION #1:
ID
Target
recordA
1
recordA
1
recordA
1
recordB
1
recordB
1
recordB
0
recordB
0
recordB
0
OPTION #2:
ID
Target
Weight
recordA
1
3
recordB
1
2
recordB
0
3
I tried subtracting columns and making a flag for whether Risk>0 then Target 1 else 0 but run into issues creating repeated records
You can do this with a data step with two output statements. weight is always equal to risk when target = 1, and we always want to have an output row for that. If total does not equal weight, then we need to create a second output row where target = 0. In that case, weight is total - risk.
data want;
set have;
target = 1;
weight = risk;
output;
if(total NE risk) then do;
target = 0;
weight = total - risk;
output;
end;
keep id target weight;
run;
Output:
id target weight
recordA 1 3
recordB 1 2
recordB 0 3

apply proc means to quickly calculate multiple observations?

I have a dataset with varying observations per ID, and these participants are also in different treatment status (Group). I wonder if I can use proc means to quickly calculate the number of participants and visits to clinic per group status by using proc means? Ideally, I can use proc means sum function quickly capture those with 0 and 1 based on group status and gain the total number? However, I got stuck in how to proceed.
ID Visit Group
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
6 1 1
6 2 1
6 3 1
6 4 1
Specifically, I am interested in 1) the total number of participants in each group status. In this case we can 3 participants (ID:1,3,and 5)in the control group (0) and another 3 participants (ID:2,4,and 6) in the treatment group (1).
2) the total number of visits per group status. In this case, the total visits in the control group (0) will be 5 (2+1+2=5) and the total visits in the treatment group (1) will be 9 (3+2+4=9).
I wonder if proc means procedure can help quickly calculate such values? Thanks.
Yes, you can use proc means to get counts.
data have;
input ID$ Visit Group;
cards;
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
;
run;
proc means data=have n;
class group id;
var visit;
types group id group*id;
run;
If you want the sum of visit, add "sum" behind proc means data=have n and ;.
It looks like GROUP is assigned at the ID level and not the ID/VISIT level. In that case if you want to count the number of ID's in each group you need to first get down to one observation per ID.
proc sort data=have nodupkey out=unique_ids ;
by id;
run;
Now you can count how many ID's are in each group. The normal way is to use PROC FREQ.
proc freq data=unique_ids;
tables group;
run;
But you can count with PROC MEANS/SUMMARY also.
proc summary data=unique_ids nway;
class group;
output out=counts N=N_ids ;
run;
proc print data=counts;
var group n_ids;
run;
MEANS doesn't do a distinct count easily so SQL may be a simpler to understand option here.
proc sql;
create table want as
select group, count(*) as num_visits, count(distinct ID) as num_participants
from have
group by group
order by 1;
quit;

Get the difference between values at the row level in sas

The data is basically month on month price of configuration. I wanted to get a trend of the AMOUNT. As to how is the price behaving over a period of 12 months, for each configuration and overall trend.
Proc sql doesn't support "dif" syntax. I am unable to use the regular "do" loop in data-set as this is not really helpful here.
So can anyone help me with this ?
This code is to basically group the data and get a mean price for each configuration in that month.
proc sql;
create table c.price1 as
select
configuration,
month,
mean(retail_price) as amount format = dollar7.2
from c.price
where
configuration is not missing
and month is not missing
and retail_price is not missing
group by configuration, month;
quit;
DATA :
Configuration Month Amount
1 1 $370.00
1 2 $365.00
1 3 $318.00
1 4 $355.00
1 5 $350.00
1 6 $317.40
1 7 $340.00
1 8 $335.00
1 9 $297.00
1 10 $325.00
1 11 $320.00
1 12 $286.65
2 1 $320.00
2 2 $315.00
2 3 $287.86
2 4 $305.00
2 5 $300.00
2 6 $263.76
.......and so on
Use the DIF function in conjunction with BY group processing.
Data want;
Set have;
By config;
New_var = dif(amount);
If first.config then new_var = .;
Run;

Sas value counting

I am wanting to count the number of time a certain value appears in a particular column in sas. For example in the following dataset the value 1 appears 3 times
value 2 appears twice, value 3 appears once, value 4 appears 4 times and value 5 appears four times.
Game_ball
1
1
1
2
2
3
4
4
4
5
5
5
5
5
I want the dataset to represented like the following:
Game_ball Count
1 3
2 2
3 1
4 4
5 4
. .
. .
. .
Thanks in advance
As per #Dwal, proc freq is the easiest solution.
Using your sample data,
proc freq data=sample;
table game_ball/out=output;
run;
Or do it in one-pass data step
proc sort data = sample;by game_ball;run;
data output;
set sample;
retain count;
if first.game_ball then count = 0;
count + 1;
if last.game_ball then output;
by game_ball;
run;
Or in SQL
proc sql;
create table output as
select game_ball, count(*) as count
from sample
group by game_ball;
quit;

mysql connect all fields in two columns

I have a view with two columns: a person's ID (a number) and the sector that they below to (given as numbers 1-5).
I want to create a view to show whether people belong to the same sector. I think this would have three columns: ID1, ID2, and SameSector. The first column would list IDs, and for each ID in column 1 the second column would list ALL of the IDs. The third column would be an if statement, 1 if the sector was the same for both IDs, 0 if it wasn't. This is made slightly more complicated because a person can belong to more than one sector.
For example:
I have:
ID Sector
1 1
2 1
2 5
3 1
I want:
ID1 ID2 SameSector
1 1 1
1 2 1
1 2 0
1 3 0
2 1 1
2 1 0
etc.
I'm guessing this involves some sort of self join and if statement but I can't figure out how to get all of the ID fields to be listed in ID1 column and matched to all of the ID fields in ID2 any ideas?
This should be what you want:
SELECT a.ID AS ID1, b.ID AS ID2, IF(a.Sector=b.Sector,1,0) AS SameSector
FROM theTable AS a, theTable AS b
http://sqlfiddle.com/#!2/f2cbc/4
I initially had a much more complicated query, but then realized you wanted a complete cross-join, including the same ID comparing to itself.