SAS Create conditional duplicate rows - repeating rows for an instance and assigning weight to the duplicate rows - sas

I need to transform my data to input into a model. I am doing this with sas. Below is the original format of the data and two options of what the model will accept. Option 2 is ideal. Is there a way to do this in SAS? I keep trying to come up with data steps but end up in circles.
ORIGINAL DATA FORMAT
ID
Total
Risk
recordA
3
3
recordB
5
2
OPTION #1:
ID
Target
recordA
1
recordA
1
recordA
1
recordB
1
recordB
1
recordB
0
recordB
0
recordB
0
OPTION #2:
ID
Target
Weight
recordA
1
3
recordB
1
2
recordB
0
3
I tried subtracting columns and making a flag for whether Risk>0 then Target 1 else 0 but run into issues creating repeated records

You can do this with a data step with two output statements. weight is always equal to risk when target = 1, and we always want to have an output row for that. If total does not equal weight, then we need to create a second output row where target = 0. In that case, weight is total - risk.
data want;
set have;
target = 1;
weight = risk;
output;
if(total NE risk) then do;
target = 0;
weight = total - risk;
output;
end;
keep id target weight;
run;
Output:
id target weight
recordA 1 3
recordB 1 2
recordB 0 3

Related

apply proc means to quickly calculate multiple observations?

I have a dataset with varying observations per ID, and these participants are also in different treatment status (Group). I wonder if I can use proc means to quickly calculate the number of participants and visits to clinic per group status by using proc means? Ideally, I can use proc means sum function quickly capture those with 0 and 1 based on group status and gain the total number? However, I got stuck in how to proceed.
ID Visit Group
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
6 1 1
6 2 1
6 3 1
6 4 1
Specifically, I am interested in 1) the total number of participants in each group status. In this case we can 3 participants (ID:1,3,and 5)in the control group (0) and another 3 participants (ID:2,4,and 6) in the treatment group (1).
2) the total number of visits per group status. In this case, the total visits in the control group (0) will be 5 (2+1+2=5) and the total visits in the treatment group (1) will be 9 (3+2+4=9).
I wonder if proc means procedure can help quickly calculate such values? Thanks.
Yes, you can use proc means to get counts.
data have;
input ID$ Visit Group;
cards;
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
;
run;
proc means data=have n;
class group id;
var visit;
types group id group*id;
run;
If you want the sum of visit, add "sum" behind proc means data=have n and ;.
It looks like GROUP is assigned at the ID level and not the ID/VISIT level. In that case if you want to count the number of ID's in each group you need to first get down to one observation per ID.
proc sort data=have nodupkey out=unique_ids ;
by id;
run;
Now you can count how many ID's are in each group. The normal way is to use PROC FREQ.
proc freq data=unique_ids;
tables group;
run;
But you can count with PROC MEANS/SUMMARY also.
proc summary data=unique_ids nway;
class group;
output out=counts N=N_ids ;
run;
proc print data=counts;
var group n_ids;
run;
MEANS doesn't do a distinct count easily so SQL may be a simpler to understand option here.
proc sql;
create table want as
select group, count(*) as num_visits, count(distinct ID) as num_participants
from have
group by group
order by 1;
quit;

Insert rows based on group ID and difference sum between two datasets

Hello Stack community,
I have a problem where I would appreciate your time and help.
Let's say I have data A with group ID 'A' and the total sum of the group ID 'A' is 11. I have another data B with the same group ID 'A' and the total sum of the group ID 'A' is 20. Now the difference is 9, I want to expand this difference of 9 into 9 rows by that group ID 'A' and /append/insert into data A. I have put the tables for your reference.
data A  
Group Sum
A 1
A 3
A 4
A 1
A 2
Total 11
data b 
Group Sum
A 5
A 2
A 3
A 5
A 5
Total 20
expand the difference of 9 into rows 
Group Count
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
data want 
Group Sum
A 1
A 3
A 4
A 1
A 2
Total 11
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
Friends, I really appreciate and thank you for your time and help on this.
I didn't program this yet. I am not sure how to solve this.
This is a strange table format, but given the information you've posted here is a solution that will work for any number of groups. Note that it is not good practice to keep a total row within your data. You can always calculate the total with SQL or PROCs.
First, calculate the differences for all groups:
proc sql;
create table difs as
select a.group, b.total - a.total as dif
from (select group, sum(sum) as total
from a
where group NE 'Total'
group by group
) as a
LEFT JOIN
(select group, sum(sum) as total
from b
where group NE 'Total'
group by group
) as b
ON a.group = b.group
order by group
;
quit;
This creates the following table:
group dif
A 9
Next, we need to add a 1 for every value in the group:
data counts;
set difs;
by group;
do i = 1 to dif;
sum = 1;
output;
end;
drop i dif;
run;
This creates the following table:
group sum
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
Now we simply append this to the original table to get the desired output:
data want;
set a counts;
run;
Which produces the table we need:
group sum
A 1
A 3
A 4
A 1
A 2
Total 11
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1
A 1

SAS Add 1 to flag if previous row value doesn't match with current flag following certain condition

I have a table that is being created, we will say that column 1 is YearMonth and column2 as Flag.
YearMonth Flag
200101 1
200102 1
200103 0
200104 1
200105 1
200106 0
200107 1
200108 0
Note: First entry of flag column will always be 1.
I want to add 1 to flag if current flag doesn't match with previous row value along with one major condition (explained after output for a better explanation).
The output should be:
YearMonth Flag Stage
200101 1 1
200102 1 1
200103 0 2
200104 1 3
200105 1 3
200106 0 4
200107 1 3
200108 0 4
Please note there are only 4 stages. Hence if a flag is repeated after stage 4, then it should not increment and should give output as either stage=3 if flag=1 or stage=4 if flag=0.
I am trying something like this:
data one;
set Query;
Stage=lag(flag);
if first.flag then Stage=1;
if first.flag then Stage=1;
if flag ne Stage+1 then Stage=Stage+1;
run;
An explanation of why this code isnt working would be really helpful. Thank you!
Also, I am aware that I am not doing something once it reaches stage 4.
This is essentially counting groups of observations. So use BY group processing. Add the NOTSORTED keyword in the BY statement so SAS doesn't complain that the values aren't sorted. Increment the counter when starting a new group.
data want;
set have;
by flag notsorted;
stage + first.flag;
run;
To add your second criteria you could just add this line
stage = min(stage,4 - flag);
The stage value needs to be retained between iterations of the implicit loop. An implicit retain will accompany the use of the sum statement whose syntax is <variable>+<expression>;
Example:
data have; input
YearMonth Flag; datalines;
200101 1
200102 1
200103 0
200104 1
200105 1
200106 0
200107 1
200108 0
;
data want(drop=maxstage_reached);
set have;
if flag ne lag(flag) then stage+1; * increment per rules;
if stage = 4 then maxstage_reached=1; * tracking flag (retained);
if maxstage_reached then stage = 4 - flag; * special rule;
retain maxstage_reached;
run;

SAS: PROC FREQ with multiple ID variables

I have data that's tracking a certain eye phenomena. Some patients have it in both eyes, and some patients have it in a single eye. This is what some of the data looks like:
EyeID PatientID STATUS Gender
1 1 1 M
2 1 0 M
3 2 1 M
4 3 0 M
5 3 1 M
6 4 1 M
7 4 0 M
8 5 1 F
9 6 1 F
10 6 0 F
11 7 1 F
12 8 1 F
13 8 0 F
14 9 1 F
As you can see from the data above, there are 9 patients total and all of them have the particular phenomena in one eye.
I need the count the number of patients with this eye phenomena.
To get the number of total patients in the dataset, I used:
PROC FREQ data=new nlevels;
tables PatientID;
run;
To count the number of patients with this eye phenomena, I used:
PROC SORT data=new out=new1 nodupkey;
by Patientid Status;
run;
proc freq data=new1 nlevels;
tables Status;
run;
However, it gave the correct number of patients with the phenomena (9), but not the correct number without (0).
I now need to calculate the gender distribution of this phenomena. I used:
proc freq data=new1;
tables gender*Status/chisq;
run;
However, in the cross table, it has the correct number of patients who have the phenomena (9), but not the correct number without (0). Does anyone have any thoughts on how to do this chi-square, where if the has this phenomena in at least 1 eye, then they are positive for this phenomena?
Thanks!
PROC FREQ is doing what you told it to: counting the status=0 cases.
In general here you are using sort of blunt tools to accomplish what you're trying to accomplish, when you probably should use a more precise tool. PROC SORT NODUPKEY is sort of overkill for example, and it doesn't really do what you want anyway.
To set up a dataset of has/doesn't have, for example, let's do a few things. First I add one more row - someone who actually doesn't have - so we see that working.
data have;
input eyeID patientID status gender $;
datalines;
1 1 1 M
2 1 0 M
3 2 1 M
4 3 0 M
5 3 1 M
6 4 1 M
7 4 0 M
8 5 1 F
9 6 1 F
10 6 0 F
11 7 1 F
12 8 1 F
13 8 0 F
14 9 1 F
15 10 0 M
;;;;
run;
Now we use the data step. We want a patient-level dataset at the end, where we have eye-level now. So we create a new patient-level status.
data patient_level;
set have;
by patientID;
retain patient_status;
if first.patientID then patient_status =0;
patient_status = (patient_Status or status);
if last.patientID then output;
keep patientID patient_Status gender;
run;
Now, we can run your second proc freq. Also note you have a nice dataset of patients.
title "Patients with/without condition in any eye";
proc freq data=patient_level;
tables patient_status;
run;
title;
You also may be able to do your chi-square analysis, though I'm not a statistician and won't dip my toe into whether this is an appropriate analysis. It's likely better than your first, anyway - as it correctly identifies has/doesn't have status in at least one eye. You may need a different indicator, if you need to know number of eyes.
title "Crosstab of gender by patient having/not having condition";
proc freq data=patient_level;
tables gender*patient_Status/chisq;
run;
title;
If your actual data has every single patient having the condition, of course, it's unlikely a chi-square analysis is appropriate.

Get the difference between values at the row level in sas

The data is basically month on month price of configuration. I wanted to get a trend of the AMOUNT. As to how is the price behaving over a period of 12 months, for each configuration and overall trend.
Proc sql doesn't support "dif" syntax. I am unable to use the regular "do" loop in data-set as this is not really helpful here.
So can anyone help me with this ?
This code is to basically group the data and get a mean price for each configuration in that month.
proc sql;
create table c.price1 as
select
configuration,
month,
mean(retail_price) as amount format = dollar7.2
from c.price
where
configuration is not missing
and month is not missing
and retail_price is not missing
group by configuration, month;
quit;
DATA :
Configuration Month Amount
1 1 $370.00
1 2 $365.00
1 3 $318.00
1 4 $355.00
1 5 $350.00
1 6 $317.40
1 7 $340.00
1 8 $335.00
1 9 $297.00
1 10 $325.00
1 11 $320.00
1 12 $286.65
2 1 $320.00
2 2 $315.00
2 3 $287.86
2 4 $305.00
2 5 $300.00
2 6 $263.76
.......and so on
Use the DIF function in conjunction with BY group processing.
Data want;
Set have;
By config;
New_var = dif(amount);
If first.config then new_var = .;
Run;