Plotting log odds against mid-point of category - stata

I have a binary outcome variable (disease) and a continuous independent variable (age). There's also a cluster variable clustvar. Logistic regression assumes that the log odds is linear with respect to the continuous variable. To visualize this, I can categorize age as (for example, 0 to <5, 5 to <15, 15 to <30, 30 to <50 and 50+) and then plot the log odds against the category number using:
logistic disease i.agecat, vce(cluster clustvar)
margins agecat, predict(xb)
marginsplot
However, since the categories are not equal width, it would be better to plot the log odds against the mid-point of the categories. Is there any way that I can manually define that the values plotted on the x-axis by marginsplot should be 2.5, 10, 22.5, 40 and (slightly arbitrarily) 60, and have the points spaced appropriately?

If anyone is interested, I achieved the required graph as follows:
Recategorised age variable slightly differently using (integer) labels that represent the mid-point of the category:
gen agecat = .
replace agecat = 3 if age<6
replace agecat = 11 if age>=6 & age<16
replace agecat = 23 if age>=16 & age<30
replace agecat = 40 if age>=30 & age<50
replace agecat = 60 if age>=50 & age<.
For labelling purposes, created a label:
label define agecat 3 "Less than 5y" 11 "10 to 15y" 23 "15 to <30y" 40 "30 to <50y" 60 "Over 50 years"
label values agecat
Ran logistic regression as above:
logistic disease i.agecat, vce(cluster clustvar)
Used margins and plot using marginsplot:
margins agecat, predict(xb)
marginsplot

Related

Moving variable to next column in SAS

I'm having the following table in SAS
SAS Table: Price
ID Description Price Discount
20 Hot blue warm 12.0
21 Durable A 15.0 0
22 Flexible 13.5 0
23 Bendable and A 12.3
I'm planning to move 'warm' and 'and A' from Price column to Description column while '12.0' and '12.3' to Price, what should I do?
You cannot change the type of an existing variable, but you can change the name.
Use the INPUT() function to convert strings to numbers. You can use the ?? modifier to suppress errors generated by strings that do not represent numbers.
data want;
set price(rename=(price=char_price));
price = input(char_price,??32.)
if missing(price) then description=catx(' ',description,char_price);
run;

Calculating BMI using SAS logic

I have a dataset that looks like the following for many patients
Patient VisitDate Value Unit Type
A Jan12019 1 m Height
A Jan12019 50 kg Weight
A Jan52019 2 m Height
A Jan52019 55 kg Weight
I am trying to add BMI to get the following dataset for those patients:
Patient VisitDate Value Unit Type
A Jan12019 1 m Height
A Jan12019 50 kg Weight
A Jan52019 2 m Height
A Jan52019 55 kg Weight
A Jan52019 50/1^2 kg/m2 BMI
A Jan52019 55/2^2 kg/m2 BMI
I am not too concerned about the actual code but I am trying to understand the logic beyond programming this in SAS. Below is what I have so far in psuedo code:
Create BMI data set. For each patient on each visit date, value = weight/height^2. Type = BMI. Unit = kg/m2. Keep Patient, VisitDate info.
As per the logic, you are trying to get the BMI calculated from two rows of a patient with weight and Height on same date.
So one way is to use proc sql, where you can join the main table with itself on the basis of same patient name and visitdate ..
Secondly, you can also separate the main dataset into two based on "Type" or "Unit" and then do merge .. it is upto you which logic you want to implement. My approach looks like this:
proc sql;
create table BMI
as
select a.Patient,
a.VisitDate,
b.value/(a.value*a.value) as value,
"kg/m2" as Unit,
"BMI" as Type
from have a
inner join have b
on a.patient=b.Patient
and a.visitdate=b.visitdate
and a.Type="Height"
and b.Type="Weight";
quit;

Order that variables appear in plot output of proc freq

I have created a frequency plot using the plot option in proc freq. However, I am not able to order that I want. I have categories of '5 to 10 weeks' 'Greater than 25 weeks', '10 to 15 weeks', '15 to 20 weeks'. I want them to go in the logical order of increasing weeks but I'm not sure how to do that. I tried using the order option but nothing seemed to fix that.
A possible solution would be to code the order I want as values of 1-5, order them using the order= option and then have a label for 1-5. But I'm not sure if that's possible.
Tried the order= option, however, that didn't fix the issue.
I want the bins to show up as 'less then 5 weeks' '5 to 10 weeks' '10 to 15 weeks' '15 to 20 weeks' '20 to 25 weeks' 'greater then 25 weeks'
When the Proc FREQ plot displays the tabled variables values in alphabetic order, and the plot option order= is not specified you have the following scenario
variable is character
display order is default (INTERNAL)
Note: Other frequency plotting techniques, such a SGPLOT VBAR recognize midpoint axis specification that can control the explicit order the character values appear. Proc FREQ does not have a plot option for mxaxis.
You are correct in presuming an inverse map (or remap, or unmap) from label to a desired ordered value is essential. The are two main ways to remap
custom format to map label to a character value (via PUT)
custom informat to map label to a numeric value (via INPUT)
Once you have remapped the labels to a value, you need a second custom format to map the values back to the original labels.
Example:
* format to map unmapped labels back to original labels;
proc format;
value category
1 = 'Less than 5 weeks'
2 = '5 to 10 weeks'
3 = '10 to 15 weeks'
4 = '15 to 20 weeks'
5 = '20 to 25 weeks'
6 = 'Greater than 25 weeks'
;
* informat to unmap labels to numeric with desired freq plot order;
invalue category_to_num
'Less than 5 weeks' = 1
'5 to 10 weeks' = 2
'10 to 15 weeks' = 3
'15 to 20 weeks' = 4
'20 to 25 weeks' = 5
'Greater than 25 weeks' = 6
;
* generate sample data;
data have;
do itemid = 1 to 500;
cat_num = rantbl(123,0.05,0.35,0.25,0.15,0.07); * for demonstration purposes;
cat_char = put(cat_num, category.); * your actual category values;
output;
end;
run;
* demonstration: numeric category (unformatted) goes ascending internal order;
proc freq data=have;
table cat_num / plots=freqplot(scale=percent) ;
run;
* demonstration: numeric category (formatted) in desired order with desired category text;
proc freq data=have;
table cat_num / plots=freqplot(scale=percent) ;
format cat_num category.;
run;
* your original plot showing character values being ordered alphabetically
* (as is expected from default order=internal);
proc freq data=have;
table cat_char / plots=freqplot(scale=percent) ;
run;
* unmap the category texts to numeric values that are ordered as desired;
data have_remap;
set have;
cat_numX = input(cat_char, category_to_num.);
run;
* table the numeric values computed during unmap, using format to display
* the desired category texts;
proc freq data=have_remap;
table cat_numX / plots=freqplot(scale=percent) ; * <-- cat_numX ;
format cat_numX category.; * <-- format ;
run;

xline option when date is formatted %th?

I'm doing a connected twoway plot with x-axis as dates formatted as %th with values 2011h1 to 2017h2. I want to put a vertical line at 2016h2 but nothing I've tried has worked.
xline(2016h2)
xline("2016h2")
xline(date==2016h2)
xline(date=="2016h2")
I'm thinking it might be because I formatted dates with
gen date = yh(year, half)
format date %th
I think this is a MWE:
age1820 date
10.42 2011h1
10.33 2011h2
11.66 2012h1
11.01 2012h2
14.29 2013h1
10.95 2013h2
12.42 2014h1
7.04 2014h2
7.07 2015h1
6.95 2015h2
4 2016h1
8.07 2016h2
5.98 2017h1
3.19 2017h2
graph twoway connected age1820 date, xline(2016h2)
Your example will not really work as written without some additional work. I think in future posts you may want to shoot for a fully working example to maximize the chance that you get a good answer quickly. This is why I made up some fake data below.
Try something like this:
clear
set obs 20
gen date = _n + 100
format date %th
gen age = _n*2
display %th 116
display %th 117
tw connected age date, xline(116 `=th(2018h2)') tline(2019h1)
The crux of the matter is that Stata deals with dates as integers that have a special label attached to them by the format command (but not a value label). For example, 0 corresponds to 1960h1. In other words, you need to either:
tell xline() the number that corresponds to the date you want
use th() to figure out what that number is and force the evaluation inside xline().
use tline(), which is smart enough to understand dates.
I think the third is the best option.

modify a dataset to extend time range

Sorry for the confusing title.
Background
data looks like this
Area Date Ind LB UB
A 1mar 14 1 20
A 2mar 3 1 20
B 1mar 11 7 22
B 2mar 0 7 22
Area has several distinct values. For each area, LB and UB are fixed across multiple dates, while Ind varies. Date always starts from month start to certain day of the month.
Target
My target is to run a control chart for each area to see if Ind exceeds the range (LB,UB).
But if I just plot the raw data for each area, the xaxis by default not ends at the last day of the month (In the previous example, the plot will be from 1-Mar to 2-Mar instead of 31-Mar. I do know the by specifying the xmax option in xaxis the plot will extends to 31-Mar. But this only extends the xaxis, LB and UB still display from 1-Mar to 2-Mar, leaving the right side of the graph empty.
Thus I use modify to add in some date records.
What I have done
data have;
modify have;
do i = 0 to intck('day',today(),intnx('month',today(),0,'E'));
Date = today()+i;
call missing(Ind);
output;
end;
stop;
run;
proc sgplot data=have missing;
series ... Ind ...;
series ... LB ...;
series ... UB ...;
run;
Question
But this only works for one area. I need to modify each area first then plot them one by one. How can I relatively efficient to get below data
Area Date Ind LB UB
A 1mar 14 1 20
A 2mar 3 1 20
A 3mar . 1 20
....
A 31mar. 1 20
B 1mar 11 7 22
B 2mar 0 7 22
B 3mar . 7 22
....
B 31mar. 7 22
Or there's other options in proc sgplot to solve this?
You can use proc timeseries with the by-group area to get it into the form that you need. The end= option will let you specify an ending date for your data. It looks like you're using the current month, so we'll take your intnx function and plop it into a set of macro functions that resolve to a date literal (most ETS procs require a date literal for some reason).
We'll use two var statements: one for ind where we fill in unobserved values with ., and another for LB & UB to set their unobserved values with the previous valid value.
Note that we are assuming you've already put date into a SAS date. Make sure you do this first before running the below code.
proc timeseries data=have
out=want;
by area;
id Date interval=day notsorted
accumulate=none
end="%sysfunc(intnx(month, %sysfunc(today() ), 0, E), date9.)"d;
var Ind / setmissing=missing;
var LB UB / setmissing=previous;
run;
Your final dataset will look exactly as you'd like.