I have a dataset that looks like the following for many patients
Patient VisitDate Value Unit Type
A Jan12019 1 m Height
A Jan12019 50 kg Weight
A Jan52019 2 m Height
A Jan52019 55 kg Weight
I am trying to add BMI to get the following dataset for those patients:
Patient VisitDate Value Unit Type
A Jan12019 1 m Height
A Jan12019 50 kg Weight
A Jan52019 2 m Height
A Jan52019 55 kg Weight
A Jan52019 50/1^2 kg/m2 BMI
A Jan52019 55/2^2 kg/m2 BMI
I am not too concerned about the actual code but I am trying to understand the logic beyond programming this in SAS. Below is what I have so far in psuedo code:
Create BMI data set. For each patient on each visit date, value = weight/height^2. Type = BMI. Unit = kg/m2. Keep Patient, VisitDate info.
As per the logic, you are trying to get the BMI calculated from two rows of a patient with weight and Height on same date.
So one way is to use proc sql, where you can join the main table with itself on the basis of same patient name and visitdate ..
Secondly, you can also separate the main dataset into two based on "Type" or "Unit" and then do merge .. it is upto you which logic you want to implement. My approach looks like this:
proc sql;
create table BMI
as
select a.Patient,
a.VisitDate,
b.value/(a.value*a.value) as value,
"kg/m2" as Unit,
"BMI" as Type
from have a
inner join have b
on a.patient=b.Patient
and a.visitdate=b.visitdate
and a.Type="Height"
and b.Type="Weight";
quit;
Related
I am trying to create a table which has two categories - X an Y. I am trying to create a table in SAS visual analytics that tells me the share of total in each category. My table looks something like this
Category A
Catgeoy B
Total
40%
60%
100%
I was trying to follow the below link but unfortunately my version of SAS VA does not have Aggregated measure ( tabular) option in it so I do not know how can I proceed forward with it.
How can i go about creating one without the aggregated tabular option
https://communities.sas.com/t5/SAS-Communities-Library/SAS-Visual-Analytics-Report-Example-Percent-of-Total-For-All-For/ta-p/636030
To do this in VA 7.5, we'll use a Crosstab object, a transposed form of your data, and use the "Percent of row total" calculation option within the crosstab. Let's use the below data for our example:
data have;
input id x y;
datalines;
1 40 60
2 30 70
3 90 10
;
run;
Step 1: Transpose to long and create by-groups
Transpose your data so that it is in a long format, then load it and register it to LASR.
proc transpose data = have
out = want(rename=(COL1 = value))
name = category
;
by id;
var x y;
run;
Output:
id category value
1 x 40
1 y 60
2 x 30
2 y 70
3 x 90
3 y 10
Step 2: Create a crosstab
Change id to a category, then create a crosstab that looks like this:
Columns: category
Rows: id
Measures: value
Go to Options --> Scroll to the bottom --> expand "Totals and Subtotals," and Enable "Totals" for rows and set the Placement to "After."
Step 3: Create a row-level Percent Calculation
Right-click the header value within the table and select "Create and add calculation...".
Select "Percent of row total - Sum" under the "Type" drop-down menu.
Remove Value as a role from the crosstab graph, format Percent to have 0 decimal places, and you'll have a table with row-wise percentages.
I am looking to join two tables together
Table 1 - The baseball dataset
DATA baseball;
SET sashelp.baseball
(KEEP = crhits);
RUN;
Table 2 - A table containing the percentiles of CRhits
PROC STDIZE
DATA = baseball
OUT=_NULL_
PCTLMTD=ORD_STAT
PCTLDEF=5
OUTSTAT=STDLONGPCTLS
(WHERE = (SUBSTR(_TYPE_,1,1) = "P"))
pctlpts = 1 TO 99 BY 1;
RUN;
I would like to join these tables together to create a table that contains the values for crhits and then a column identifying which percentile that value belongs to like below
crhits percentile percentile_value
54 p3 54
66 p5 66
825 p63 825
1134 p76 1133
The last column indicates the percentile value given by stdlongpctls
I currently use the following code to calculate the percentiles and a loop to count the number of "Events" per percentile, per factor
I have tried a cross-join but I am having trouble visualising how to join these two tables without an explicit key
PROC SQL;
CREATE TABLE cross_join_table AS
SELECT
a.crhits
, b._TYPE_
, CASE WHEN
a.crhits < b.type THEN b._TYPE_ END AS percentile
FROM
baseball a
CROSS JOIN
stdlongpctls b;
QUIT;
If there is another easier / more efficient way to find the number of observations and number of dependent variables (e.g. I am modelling on a default flag event in my actual dataset, so the sum of 1's per percentile group, I would appreciate it)
Use PROC RANK instead to group it into the percentiles.
proc rank data=sashelp.baseball out=baseball_ranks group=100;
var crhits;
rank rank_crhits;
run;
You can then summarize it using PROC MEANS.
I am trying to seek some validation, this may be trivial for most but I am by no means an expert at statistics. I am trying to select patients in the top 1% based on a score within each drug and location. The data would look something like this (on a much larger scale):
Patient drug place score
John a TX 12
Steven a TX 10
Jim B TX 9
Sara B TX 4
Tony B TX 2
Megan a OK 20
Tom a OK 10
Phil B OK 9
Karen B OK 2
The code snipit I have written to calculate those top 1% patients is as follows:
proc sql;
create table example as
select *,
score/avg(score) as test_measure
from prior_table
group by drug, place
having test_measure>.99;
quit;
Does this achieve what I am trying to do, or am going about it all wrong? Sorry if this is really trivial to most.
Thanks
There are multiple ways to calculate and estimate a percentile. A simple way is to use PROC SUMMARY
proc summary data=have;
var score;
output out=pct p99=p99;
run;
This will create a data set named pct with a variable p99 containing the 99th percentile.
Then filter your table for values >=p99
proc sql noprint;
create table want as
select a.*
from have as a
where a.score >= (select p99 from pct);
quit;
I have the following dataset:
Date Occupation Tota_Employed
1/1/2005 Teacher 45
1/1/2005 Economist 76
1/1/2005 Artist 14
2/1/2005 Doctor 26
2/1/2005 Economist 14
2/1/2005 Mathematician 10
and so on until November 2014
What I am trying to do is to calculate a column of percentage of employed by occupation such that my data will look like this:
Date Occupation Tota_Employed Percent_Emp_by_Occupation
1/1/2005 Teacher 45 33.33
1/1/2005 Economist 76 56.29
1/1/2005 Artist 14 10.37
2/1/2005 Doctor 26 52.00
2/1/2005 Economist 14 28.00
2/1/2005 Mathematician 10 20.00
where the percent_emp_by_occupation is calculated by dividing total_employed by each date (month&year) by total sum for each occupation to get the percentage:
Example for Teacher: (45/135)*100, where 135 is the sum of 45+76+14
I know I can get a table via proc tabulate, but was wondering if there is anyway of getting it through another procedure, specially since I wanted this as a separate dataset.
What is the best way to go about doing this? Thanks in advance.
Extract month and year from the date and create a key:
data ds;
set ds;
month=month(date);
year=year(date);
key=catx("_",month,year);
run;
Roll up the total at month level:
Proc sql;
create table month_total as
select key,sum(total_employed) as monthly_total
from ds
group by key;
quit;
Update the original data with the monthly total:
Proc sql;
create table ds as
select a.*,b.monthly_total
from ds as a left join month_total as b
on a.key=b.key;
quit;
This would lead to the following data set:
Date Occupation Tota_Employed monthly_total
1/1/2005 Teacher 45 135
1/1/2005 Economist 76 135
1/1/2005 Artist 14 135
Finally calculate the percentage as:
data ds;
set ds;
percentage=total_employed/monthly_total;
run;
Here you go:
proc sql;
create table occ2 as
select
occ.*,
total_employed/employed_by_date as percentage_employed_by_date format=percent7.1
from
occ a
join
(select
date,
sum(total_employed) as employed_by_date
from occ
group by date) b
on
a.date = b.date
;
quit;
Produces a table like so:
One last thought: you can create all of the totals you desire for this calculation in one pass of the data. I looked at a prior question you asked about this data and assumed that you used proc means to summarize your initial data by date and occupation. You can calculate the totals by date as well in the same procedure. I don't have your data, so I'll illustrate the concept with sashelp.class data set that comes with every SAS installation.
In this example, I want to get the total number of students by sex and age, but I also want to get the total students by sex because I will calculate the percentage of students by sex later. Here's how to summarize the data and get counts for 2 different levels of summary.
proc summary data=sashelp.class;
class sex age;
types sex sex*age;
var height;
output out=summary (drop=_freq_) n=count;
run;
The types statement identifies the levels of summary of my class variables. In this case, I want counts of just sex, as well as the counts of sex by age. Here's what the output looks like.
The _TYPE_ variable identifies the level of summary. The total count of sex is _TYPE_=2 while the count of sex by age is _TYPE_=3.
Then a simple SQL query to calculate the percentages within sex.
proc sql;
create table summary2 as
select
a.sex,
a.age,
a.count,
a.count/b.count as percent_of_sex format=percent7.1
from
summary (where=(_type_=3)) a /* sex * age */
join
summary (where=(_type_=2)) b /* sex */
on
a.sex = b.sex
;
quit;
The answer is to look back at the questions you have asked in the last few days about this same data and study those answers. Your answer is there.
While you are reviewing those answers, take time to thank them and give someone a check for helping you out.
I have a binary outcome variable (disease) and a continuous independent variable (age). There's also a cluster variable clustvar. Logistic regression assumes that the log odds is linear with respect to the continuous variable. To visualize this, I can categorize age as (for example, 0 to <5, 5 to <15, 15 to <30, 30 to <50 and 50+) and then plot the log odds against the category number using:
logistic disease i.agecat, vce(cluster clustvar)
margins agecat, predict(xb)
marginsplot
However, since the categories are not equal width, it would be better to plot the log odds against the mid-point of the categories. Is there any way that I can manually define that the values plotted on the x-axis by marginsplot should be 2.5, 10, 22.5, 40 and (slightly arbitrarily) 60, and have the points spaced appropriately?
If anyone is interested, I achieved the required graph as follows:
Recategorised age variable slightly differently using (integer) labels that represent the mid-point of the category:
gen agecat = .
replace agecat = 3 if age<6
replace agecat = 11 if age>=6 & age<16
replace agecat = 23 if age>=16 & age<30
replace agecat = 40 if age>=30 & age<50
replace agecat = 60 if age>=50 & age<.
For labelling purposes, created a label:
label define agecat 3 "Less than 5y" 11 "10 to 15y" 23 "15 to <30y" 40 "30 to <50y" 60 "Over 50 years"
label values agecat
Ran logistic regression as above:
logistic disease i.agecat, vce(cluster clustvar)
Used margins and plot using marginsplot:
margins agecat, predict(xb)
marginsplot