Count unique patients and overall observation using PROC SQL - sas

Working in SAS but using some SQL code to count the number of unique patients but also the total number of observations for a set of indicators. Each record has a patient identifier, the facility where the patient is, and a group of binary indicators (0,1) for each bed section (the particular place in the hospital where the patient is). For each patient record, only 1 bed section can have a value of '1'. Overall, patients can have multiple observations in a bed section or in other bed sections, i.e. patients can be hospitalized > 1. The idea is to roll this data set up by facility and count the total # of admissions for each bed section but also the total people for each bed section. The people count will always be <= to the observation count. Counting people was just added to my to-do list and to this point I was only summing up observations for each bed section using the code below:
proc sql;
create table fac_bedsect as
select facility,
sum(bedsect_alc) as bedsect_alc,
sum(bedsect_blind) as bedsect_blind,
sum(bedsect_gen) as bedsect_gen
from bedsect_type
group by facility;
quit;
Is there a way I can incorporate into this code the # of unique people for each bed section? Thanks.

With no knowledge of the source table(s) it is impossible to answer precisely, but the syntax for counting distinct values is as seen below. You will need to use the correct column name where I have used "patient_id":
SELECT
facility
, COUNT(DISTINCT patient_id) AS patient_count
, SUM(bedsect_alc) AS bedsect_alc
, SUM(bedsect_blind) AS bedsect_blind
, SUM(bedsect_gen) AS bedsect_gen
FROM bedsect_type
GROUP BY
facility
;

Related

Combining surveys with distinct analytical weights in Stata

I have a dataset which combine 14 household surveys in 14 countries. Each survey was conducted in different years and each survey has a household weight variable that only specifies to this country's context (data structure is the same across 14 countries).
Now I merged them and tried to cross tabulate the country and gender_area (four types of value: male_rural, female_rural, male_urban, female_urban) variable with weights (tab country gender [aw=hhweight], m). But I found that such a cross-tabulation would create weird values for some of the countries.
For example, if I add one if condition by the end of the tab (tab country gender [aw=hhweight] if abc==1, m), some country (KHM, NPL) 's row total would be greater than their original row total without the condition. But in this dataset, a condition would give a smaller subsample. If I don't add the weight (tab country gender, m), there is no such a problem. If I just tab one country with weight, there is no such a problem either.
So I wonder if there is any way for me to compare all countries with weight. I am not that familiar with survey data reference in Stata (svyset, strata, etc).
I tried to refer to the book Applied Survey Data Analysis, but it seems that it doesn't contain methodology to deal with such a combination.

identify groups with few observations in paneldata models (stata)

How can I identify groups with few observations in panel-data models?
I estimated using xtlogit several random effects models. On average I have 26 obs per group but some groups only record 1 observation. I want to identify them and exclude them from the models... any suggestion how?
My panel data is set using: xtset countrycode year
Let's suppose your magic number for a big enough panel is 7 and that you fit a first model.
bysort countrycode : egen n_used = total(e(sample))
then gives you a count of how many observations were available and can be used, after which your criterion for a later model is if n_used >= 7
You could just go
bysort countrycode : gen n_available = _N
regardless of a model fit.
The differences are two-fold:
That last statement would disregard any missing values in the variables used in a model fit.
If you also used if and/or in to restrict model fit to particular subsets of observations, then e(sample) knows about that, but the last statement does not.

Displaying variable sets that define each row

To say that a dataset is (person, year) level means that each row of that dataset has different (person, year) like this:
person year wage
Mike 2000 10
Mike 2010 30
Jack 1990 20
How can I make Stata display exactly those (person, year) variable sets that uniquely define each row?
I want to make a log file to record
person year
only, but not display any individual information (displaying individuals' information in a log file is against the rules set by the data provider).
How could I do this?
What I thought about is using bysort in some way
bysort person year: gen num=_n
and if every num is 1, then it means (person, year) defines each row.
But if a dataset is extremely large, then checking whether every num is 1 is too tedious. Is there any smarter way?
The command isid checks whether the variables you supply do jointly specify observations uniquely. Here is an example you can try:
. webuse grunfeld, clear
. isid company
variable company does not uniquely identify the observations
r(459);
. isid company year
Note the principle: no news is good news.
Another way to check for problems is through duplicates. For example, try duplicates list person year. In your case, you don't want that in the log. But what you can do first is anonymise your persons through
egen id = group(person)
and then check for duplicates on id year.
See also this FAQ.

Setting op table in SAS via proc tabulate

I have some data about students and their dropout procent. I have information abaout which education they started on, in a city (some educations are found in more cities) and the year they started their education. I also have information about wheter there were a quotient the studetns had to meet to be able to start their education.
The quotient variable can consist of numeric values and character values (see the table)
I want to make a table in SAS where I have the quotient and the dropout % like in the below picure:
So for each education and for each city I have the years out as rows and in the cells I have the quota for that year and the dropout % for the year.
I can not do it in SAS. I have tried:
proc tabulate data= sammensat missing;
var dropout;
class education year city quota ;
Table education* city,year *dropout all/ rts=180;
run;
This gives me part of the output I want. But I want another row showing the quota for each combination of education and city for each year.
Two problems: including the quota, and dealing with the char values.
Including quota is easy, if it's numeric.
proc tabulate data= sammensat missing;
class education year city;
var dropout quota;
Table education*city*(quota dropout),year all/ rts=180;
run;
You might need to add in statistics to those if they're not both the same (and both N); probably *mean for both, not sure exactly what your data looks like.
To deal with the character problem, you need to create a format that has either special values for the quota, if they're just assigned values (this city-year-education combination has no quota by definition), or uses values that show there is no quota (missing, 0, etc.).
proc format;
value quotaf
-1='NO QUOTA'
-9='PASSED AUDITON'
0-high=[3.1]
;
quit;
Then use that to format quota, either in the dataset or with a f= option on quota in the proc tabulate.

Counting different people associated with one phone number in SAS

I have a few million records with a list of names and phone numbers. I need to count how many people are associated with each unique phone number. The phone numbers are associated with duplicate names and unique names. So for each phone number I need to count the number of distinct users. Then this needs to be mapped to a list of stores. I tried selecting distinct phones/distinct phones but that only gives me a ratio of a distribution. So for example, if there is 10 people using three phones, then my ratio tells me that 3 phones are distributed among 10 people, but it doesnt tell me the actual number of people withn that distribution associated with the phone. Can anyone please help me with the SAS code to get the correct count where I know exactly how many phones are associated with the same phone number. Thanks in advance.
-r
If you want just the number of distinct rows that have the same phone number, you use:
proc sql;
create table phone_number_counts as
select phonenumber, count(1) as count_users
from dset
group by phonenumber;
quit;
If you want to find out distinct names within phone number, ie, if
555-123-4567 John H
555-123-4567 John H
555-123-4567 Mary Y
should result in 2, not in 3 (the first code would yield 3), then use count(distinct name) instead of count(1).
If you want something else, some example data might be helpful - ie, an example of the initial data and an example of a correct final dataset would be helpful.
I believe you're looking for count(distinct name):
proc sql;
create table phone_number_counts as
select phonenumber,
count(*) as count_rows,
count(distinct name) as unique_names
from dset
group by phonenumber;
quit;