claims data analysis:select participants based on given criteria - sas

I have a little knowledge of SAS but I have no idea about the structure of claims data .I want to select participants for a cohort study based on the following criteria from three data sets:
o Received a dementia diagnosis (ICD-9-CM: 290.0x - 290.4x, 294.1x, 331.0x, 331.1x, 331.2x, 331.82, and 331.9x in any position) during 2001-2005. The date when the patient first received a dementia diagnosis is defined as the index date.
o ≥ age 65 on the index date
o ≥12 months enrollment prior to the index date
files:
File name 1: Inpatient_claims2001.csv - Inpatient_claims2005.csv
Data fields: patID admission_date discharge_date (e.g. 9/27/2002)
Primary_diagnosis diagnosis_code1 – diagnosis_code6
File name 2: outpatient_claims2001.csv - outpatient_claims2005.csv
Data fields: patID service_date (e.g. 3/27/2002)
diagnosis_code1 diagnosis_code2
File name 3 : enrollment_file2001.csv - enrollment_file2005.csv
Data fields: patID date_of_birth (e.g. 10/18/1945)
enroll_mon1 – enroll_mon12
(values: 0 or 1 with 0=not enrolled and 1=enrolled for the month)
If anyone have an experience in claims data please help me on how can I approach and code in SAS to get the requested results
Thanks,
Ankur.

Related

Combining surveys with distinct analytical weights in Stata

I have a dataset which combine 14 household surveys in 14 countries. Each survey was conducted in different years and each survey has a household weight variable that only specifies to this country's context (data structure is the same across 14 countries).
Now I merged them and tried to cross tabulate the country and gender_area (four types of value: male_rural, female_rural, male_urban, female_urban) variable with weights (tab country gender [aw=hhweight], m). But I found that such a cross-tabulation would create weird values for some of the countries.
For example, if I add one if condition by the end of the tab (tab country gender [aw=hhweight] if abc==1, m), some country (KHM, NPL) 's row total would be greater than their original row total without the condition. But in this dataset, a condition would give a smaller subsample. If I don't add the weight (tab country gender, m), there is no such a problem. If I just tab one country with weight, there is no such a problem either.
So I wonder if there is any way for me to compare all countries with weight. I am not that familiar with survey data reference in Stata (svyset, strata, etc).
I tried to refer to the book Applied Survey Data Analysis, but it seems that it doesn't contain methodology to deal with such a combination.

AWS Athena date sql query

Below is the data in csv file in s3 bucket which I have used to build Athena database.
John
Wright
cricket
25
Steve
Adams
football
30
I am able to run the query and get the data.
Now I am trying to fetch date of birth based on age column. Is it possible to generate date of birth from age column like current date - age (column) and print only the date of birth?
I tried below query but not sure whether it is correct way
select (current_date - interval age day) from table_name;
Please help me with this.
You can use the date_add function, like this:
SELECT date_add('year', -age, current_date) FROM table_name
I.e. subtract age number of 'year'(s) from the current date.

Power BI translating a sql query to filters

I was wondering if this is possible in Power BI, I am extremely new to this and I am trying to relate how a sql query can translate in to a power bi report.
SELECT
expiresDate,
Name,
Addr,
ValidFrom,
ValidTo,
ChildName,
ChildValidFrom,
ChildValidTo,
RecValidFrom,
RecValidTo
FROM Table
WHERE expiresDate Between <date1> and <date2>
AND <Date3> BETWEEN ValidFrom AND ValidTo
AND <Date3> BETWEEN ValidFrom AND ValidTo
AND <Date3> BETWEEN ValidFrom AND ValidTo
A brief explanation. The report is for 3 months in advance. So in August the report is for September <date1 = 01/09/2021) and October (date2 = 31/10/2021) data. However the data can change on a daily basis. So this depends on Date3 which could be any day in August.
I have created a table that is a calendar and has the additional columns that calculate the start and end dates from a particular date. I just can't work out how to relate this to the dataset which is the query without the WHERE. I would then want the filters to be able to determine the result. Ultimately as I have it at present a single date that will then get the dates from the start and end dates as described earlier. Or display by range using the latest iteration of the record to display.
For example, First part of table
expiresDate
AccNo
Name
Addr
ValidFrom
ValidTo
ChildName
2021-10-01
1
Robert
1 Here
2019-01-01
2021-08-16
Cheese
2021-10-01
1
Robert
1 Here
2019-01-01
2021-08-16
Rhubarb
2021-10-01
1
Bob
1 Here
2021-08-17
2020-08-23
Rhubarb
Second half of table
ChildValidFrom
ChildValidTo
RecValidFrom
RecValidTo
2019-01-01
2021-08-10
2019-19-01
2020-12-31
2021-08-11
2021-08-23
2021-01-01
2021-08-15
2021-08-11
2021-08-23
2021-08-16
2020-08-23
The table is a view which has squashed the data to unique records and when the changes occurred. The dataset is considerably lower, a record count from 10m to 54k.
The requirement is that all To - From dates are within the date specified. Either being a date in the calendar that is entered as a filter... or today.
The report would bring out all records that have an expiryDate greater than 1 calendar month of the date, and less than 3 calendar months. I am just using August dates for the example so this would be from the 01/09/2021 - 31/10/2021.
If I use date 2021-08-01.
In my example there are 3 results for AccNo 1, but Only 1 should be displayed.
If I use the date 2021-08-01 the first row would be displayed.
If I use the date 2021-08-12 the second row should displayed.
If I use the date 2021-08-23 the third row should displayed.
Because the date used should fall between the date range of all 3 criteria
ValidFrom - ChildValidTo
ChildValidFrom - ChildValidTo
RecValidFrom - RecValidTo
Any help would be greatly appreciated. This is extremely frustrating, but I can understand that if this is possibly that this would make a nice visual for the users to check through their data based on entering a date.
Many thanks

Check if value is in another table and add columns in Power BI

I have 2 tables, table1 contains some survey data and table2 is a full list of students involved. I want to check if Name in table2 is also found in table1. If yes, add Age and Level information in table2, otherwise, fill these columns with no data.
table1:
id Name Age Level
32 Anne 13 Secondary school
35 Jimmy 5 Primary school
38 Becky 10 Primary school
40 Anne 13 Secondary school
table2:
id Name
1 Anne
2 Jimmy
3 Becky
4 Jack
Expected output:
id Name Age Level
1 Anne 13 Secondary school
2 Jimmy 5 Primary school
3 Becky 10 Primary school
4 Jack no data no data
Update:
I created a relationship between table1 and table2 using the common column id(which can be repeated in table1).
Then I used:
Column = RELATED(table1[AGE])
but it caught error:
The column 'table1[AGE]' either doesn't exist or doesn't have a relationship to any table available in the current context.
There are various ways to achieve the desired output, but the simplest of them I found is to use the RELATED DAX function. If you found this answer then mark it as the answer.
Create a relationship between table1 and table2 using 'Name` column.
Create a calculated column in table2 as:
Column = RELATED(table1[AGE])
Repeat the same step for the Level column also.
Column 2 = RELATED(table1[LEVEL])
This will give you a table with ID, Name, Age, and Level for the common names between the two tables.
Now to fill those empty rows as no data, simply create another calculated column with following DAX:
Column 3 = IF(ISBLANK(table2[Column]), "no data", table2[Column])
Column 4 = IF(ISBLANK(table2[Column 2]), "no data", table2[Column 2])
This will give you the desired output.
EDIT:- You can also use the following formula to do the same thing in a single column
Column =
VAR X = RELATED(table`[AGE])
VAR RES = IF(ISBLANK(X), "no data", X)
RETURN
RES
and
Column 2 =
VAR X = RELATED(table1[LEVEL])
VAR RES = IF(ISBLANK(X), "no data", X)
RETURN
RES
This will also give you the same output.

Informatica Powercenter One to Many SQ Query and Mapping Issues

I have multiple VIEWs where the PERSON_VIEW to a PHONE_VIEW has a one to many relationship. In the following query, I got TOAD to correctly output the result into 1 row for each person record.
I am having a problem getting it to work with Informatica Powercenter. I copied/pasted the query to SQ SQL Query section.
Since the query takes the PHONE_NUMBER and check against the PHONE_TYPE on whether it is of type HOME, BUSINESS, or PERSONAL, it output 3 phone number columns called HOME, BUSINESS, and PERSONAL.
I created 3 new columns in the SQ Ports called HOME, BUSINESS, and PERSONAL to match the query output columns. When I validate the query, it constantly says it must match 28 ports from the SQ. When I just add 1 column and map to Exp Transformation and then to the Target, it still give this error. I counted the ports and it is 29. If I removed the phone columns, it works and the count is 28. When I add just one phone column, it gives the error.
I think I am missing a step.
Any help is appreciated.
PERSON VIEW
1 John M. Doe
PHONE VIEW
1 111-111-1111 HOME
1 222-222-2222 BUSINESS
1 333-333-3333 WORK
TOAD Result
1 John M. Doe 111-111-1111 222-222-2222 333-333-3333
Here is the QUERY (This works in TOAD)
SELECT PERSON.PERSON_ID,
PERSON.FIRST_NAME,
PERSON.MIDDLE_NAME,
PERSON.LAST_NAME,
PHONE.HOME,
PHONE.BUSINESS,
PHONE.PERSONAL,
PHONE_TYPE
FROM PERSON_VIEW PERSON
LEFT JOIN (SELECT * FROM (SELECT PERSON_ID, PHONE_TYPE,PHONE_NUMBER
FROM PHONE_VIEW)
PIVOT
(
MAX(PHONE_NUMBER)
FOR PHONE_TYPE in ('HOME' AS HOME,'PERSONAL' AS PERSONAL , 'BUSINESS' AS BUSINESS)
)
)PHONE ON PHONE.PERSON_ID = PERSON.PERSON_ID