Matched case control study with household clusters - sas

I am trying to analyze a 3:1 matched case control study (matched on age and date of medical encounter) but realized that there may be household clusters represented among cases and controls. Cases were not matched to any controls who were household members, and controls in the same matched group were not siblings, either. My questions are:
Do we need to account for household clusters if cases and controls may be more similar to each other, despite not being matched to the same group?
If the answer to #1 is yes (which I think it is), how would one program this in SAS? Logically, I would use conditional logistic regression to analyze matched case control data and proc glimmix to account for household clustering, but I don't know how to account for both. Is there a way in SAS to account for household clustering using the glimmix procedure?
Any insights would be much appreciated!

Related

Any way to sync one parameter to another (same field names in two different tables)

I am building a dashboard to compare baseline demographics to those of a population of interest.
I am visualizing demographic composition with a 100% bar chart, with bars over time, and the Legend as a parameter containing the 4 field options.. I then set up a similar visual adjacent to the first to show a one-bar chart across the top of overall baseline group composition as a comparison to the population of interest over time.
Data on the population of interest is at the individual person level:
Name
Race
Ethnicity
Gender
Degree
xxxx
Asian
non-Hisp
Female
Doctorate
xxxx
White
Hispanic
Female
Masters
xxxx
White
non-Hisp
Male
Doctorate
Baseline data was obtained from a self-serve table builder, so data was not at level of record but at the "smallest subgroup" (i.e. a hierarchical table ultimately broken into individual subgroups of unique Race-Ethnicity-Gender-Degree value options, plus respective counts (e.g., one record could be for 5000 Asian non-Hispanic Women with a Doctorate):
Race
Ethnicity
Gender
Degree
Count
Asian
non-Hisp
Female
Doctorate
5000
White
Hispanic
Female
Masters
3000
White
non-Hisp
Male
Doctorate
200
I grouped the first table across those four fields to align the data for comparison -- My initial plan was to combine the tables so the same field parameter can be used for both visuals, and the visuals just need to be filtered subsets of the combined table (a simple "Source" field accomplishes this).
The wrinkle is that I can't utilize the grouped target data in the bar chart over time, as I am contextualizing it with other record-level dimension tables in the data model (including the date variable used to stratify the bars over time). This means that the target data's bar chart and the baseline bar chart must rely on two different parameters, one for each table's fields.
Especially since the fields are identically named, I was hoping for a way to command one field parameter to just take the current value of another so they can be synced and only one parameter needs to be manipulated.
Is this possible? Or, can this use case be clarified and implemented differently? The only workable solutions I can figure are clunky and undesirable (e.g., require users to manage two identical slicers). Other avenues I explored were not fruitful (create a Measure equal to the current value of the param, manipulate the param table to reference the second table in an additional field or embedded as two-element lists in the <Param> Field column). I was also hopeful for Sync Sliders, however that just imposes the same parameter onto multiple slicers and does not address this use case.

Do you still need to include "site" as a random effect when modeling matched data set?

I am working on a multicenter propensity matched cohort study. The primary outcome is binary while the secondary outcome is continuous. First I performed multiple imputation to address the missing data. I initially planned exact matching on the sites in addition to matching on other variables of interests but got very poor matches. Then I used variables that described the characteristics of the sites, which I compared with the site variable using c statistic and they had similar values. With this new variables and the other variables of interest I got a much better match. I then performed within imputation conditional logistic regression for the binary variable and pulled the results. For the secondary outcome I used negative binomial regression including the match ID in the class statement and as a repeated statement. Do I need to include 'site' as a random statement in the model? I don't know if this is possible in conditional logistic regression. What would be the best way to model this data after matching? For this study I used SAS for analysis.

Dealing with repetitive questions or text in a Proc Transpose

Hello I have this data
And I need to perform these edit checks
"If answer to “Did the subject consume the entire high fat/high calorie
breakfast?” is YES, then “If no, did the subject consume at least 50%
of the high fat/high calorie breakfast?” fasting question must not be
present"
"If answer to “Did the subject consume the entire high fat/high calorie
breakfast?” is NO, then “If no, did the subject consume at least 50%
of the high fat/high calorie breakfast?” fasting question must be
present"
"If answer to “Did the subject consume the entire high fat/high calorie
breakfast?” is NO, then answer to “If no, did the subject consume at
least 50% of the high fat/high calorie breakfast?” must be present"
I have written the following code
*data fq;
set dm.fq;
ptno=strip(compress(clientid,'-'))+0;
run;
proc sort;
by ptno period day hour;
run;
proc transpose data=fq out=tempfq (DROP=_NAME_ _LABEL_);
by ptno period day hour;
var fq_yn;
id fq_qst;
run;
data final;
set tempfq;*
However, I get the following error in the log
ERROR: The ID value "DID_THE_SUBJECT_FAST_AT_LEAST_4" occurs twice in the same BY group.
And the headers come out capitalised and truncated
How do I deal with the error in the log and how can I expand the column width to produce the whole question when transposing?
You will need do two things
determine what you want for repeated questions, such as:
for the case of occurs twice in the same BY group.
do you want 2 columns, or
one column from the first occurrence, or
one column from the second occurrence, or
one column with YES if both YES & NO occur, or
one column with NO if both YES & NO occur, or
one column with MULTI if both YES & NO occur, or
one column with YES, NO if they occur in that order per period, day, hour, or
one column with NO, YES if they occur in that order per period, day, hour
one column with NO, YES regardless of order
one column with YES, NO regardless of order
one column with (freq) suffix such as YES(2) or NO(2)
same ideas for cases of BY GROUP replicates of 3 or more
each idea will require some preprocessing of the data before TRANSPOSE
Use IDLABEL to have the whole original question in the column header when output.
Pivoting survey data may be useful for modeling and forecasting purposes, however, if done for reporting purposes you might be better off using Proc TABULATE or Proc REPORT.
Possible 'Fix'
When an ID value occurs more than once in a BY group there are more than one VAR values going into a single pivot destination. Hence the ERROR:
For the case of a question repeated and having the same answer within group ptno period day hour you can SORT by one key more, adding FQ_QST and specify option NODUPKEY. After such sorting no duplicates of FQ_QST occur, only a single FQ_YN value is being pivoted into a column via ID.
proc sort NODUPKEY;
by ptno period day hour FQ_QST;
run;
If you have data with repeated questions within group and the questions have different answers, the answer remaining per NODUPKEY is dependent on how SORT is being run. From help:
When the SORT procedure’s input is a Base SAS engine data set and the sorting is done by SAS, then the order of observations within an output BY group is predictable. The order of the observations within the group is the same as the order in which they were written to the data set when it was created. Because the Base SAS engine maintains observations in the order that they were written to the data set, they are read by PROC SORT in the same order. While processing, PROC SORT maintains the order of the observations because it uses a stable sorting algorithm. The stable sorting algorithm is used because the EQUALS option is set by default. Therefore, the observation that is selected by PROC SORT to be written to the output data set for a given BY group is the first observation in the data set having the BY variable values that define the group.

Geocoding for full addresses

I have more than two million records in my dataset. The addresses are in full, not parsed out into fields such as address number, street, city and state. There isn't a standardized pattern in the way these addresses are formed and since there are two million records, I can't investigate the whole dataset fully.
Neither can I change anything in the address field as I pulled the dataset from my company's database.
I want to turn addresses into longitudes and latitudes, but the procedure from SAS requires addresses being parsed into smaller fields and as I mentioned above, it's not practical for me to do so.
http://support.sas.com/documentation/cdl/en/graphref/63022/HTML/default/viewer.htm#overview-geocode.htm
My company is in Financial sector, so due to security measures, I can't install 3rd party software or applications. I need to do so in SAS Enterprise. If you have any suggestion, that'd be greatly appreciated.

Keeping Survey Data Anonymous

We have been conducting a survey within our business and I will shortly be preparing the results to share with a number of internal customers.
The number of survey respondents is around 700, so I want to allow the people looking at the reports to be able to filter the data to identify trends according to the demographic and organisational groups identified within the report (sample data below).
What I am looking to do is present the information in a way that prevents the users from using a combination of the slicers to identify the responses of specific individuals.
I wanted to obscure the results for cases where the group of people selected is less than 5 individuals and created the following measure:
[Anonymised Rowcount]:=if(COUNTROWS(('Table1'))<5,0,[RowCount])
However the results it gives aren't quite what I need:
I want all the values of [Anonymised Rowcount] to be zero when the combination of slicers gives me less than 5 active rows in the table.
I'm getting a zero in any columns when the count for that column falls below 5.
NOT the desired result!
I've tried various combinations of ALL() and ALLSELECTED(), but I've not yet managed to come up with a correct combination the will give me a count of the rows in the table that ignores the column, but respects the slicer selections. My attempts so far have either given me every row in the table, ignoring the columns and slicers, or only the selected rows within each column heading.
EDIT:
Main question resolved
I've found the Solution using:
[Anonymised Rowcount]:=if(COUNTROWS(ALL('Table 1'[Question 1]))<5,0,[RowCount])
Bonus Question: Does anyone have a more elegant way of obscuring the results of small groups than just setting all the responses to zero.