We have a table that contains customer's accounts. One of these customers happens to have multiple accounts.
Customer table:
(Format: customer_ID Name)
101 Smith
102 Williams
103 Martin
104 Jack
Account table
(Format: Account_ID, customer_ID,Account_Type)
Account_ID customer_ID Account_Type
201 101 A1
202 101 B1
203 101 C1
301 102 B1
302 102 C1
401 103 A1
402 103 C1
501 104 B1
If one customer has multiple accounts, we select account_type base on this order: A1, C1, and B1
The result should be :
customer_ID Account_Type
101 A1
102 C1
103 A1
104 B1
I write the following query:
Select c.customer_ID, case
when Account_Type in ('A1','B1' 'C1' ) then A1
when Account_Type in (''B1' 'C1' ) then C1
else Account_Type
End
From customer c
join account a
on a.customer_ID=c.customer_ID
How can I put the condition: one customer has multiple accounts in this query?
Thanks
Simple solution would be
SELECT * FROM
(
SELECT *, CASE AccountType WHEN 'A1' THEN 1 WHEN 'B1' THEN 3 WHEN 'C1' THEN 2 END As Rank1 FROM testAccount
) A Order by CustomerID, Rank1
Related
I have a view by the name of info and it's structure and data sample is the following:
id
name
contacts
1
ali
1234
1
ali
122
2
john
133
2
john
144
2
john
122
3
mike
111
4
khan
444
5
jan
122
5
jan
155
So I am using the above view data in oracle apex report. I want to search data by id for example I search for id=1, it contains two values in contacts column one of the value which is 122 is also included in another records so the result should also contain all the other records which contain 122 in their contacts column.
The expected result which I want is:
id
name
contacts
1
ali
1234
1
ali
122
2
john
133
2
john
144
2
john
122
5
jan
122
5
jan
155
We can phrase your requirement as wanting to return any record with id = 1 or any record whose contacts overlap with the contacts of id = 1.
SELECT id, name, contacts
FROM yourTable
WHERE id = 1 OR
id IN (
SELECT id
FROM yourTable
WHERE contacts IN (SELECT contacts FROM yourTable WHERE id = 1)
)
ORDER BY id;
Demo
I need some help with Stata data transformation.
I have a survey, where the user can answer with "no response" which has been coded to integer 98. The variables can be of different data types. I need to get the number of "no response"/98 by a user into a separate variable.
I attached the dataset sample:
UserN Q1 Q2 Q3 Q4 Q5 Q6 NewCreatedColumn
User1 11 "male" "12:55pm" 98 "Answer1" "other" 1
User2 98 "female" "1:00am" 98 "AnswerX" "Batman" 2
User3 16 "male" "1:00am" 34 "other" "superman" 0
User4 98 "female" "1:00am" 98 "other" "Dog" 2
User5 66 "male" "1:00am" 98 "Life" "Cat" 1
This would have been fairly easy in python, with each user in the dataframe is a list and you can scan for integer 98 in the list.
Is there an equivalent in Stata?
Sample Data
Thanks for the data example, improved below to become reproducible code. See also help dataex within Stata (or search dataex in an ancient Stata).
clear
input str5 UserN Q1 str7 (Q2 Q3) Q4 str8 (Q5 Q6) NewCreatedColumn
User1 11 "male" "12:55pm" 98 "Answer1" "other" 1
User2 98 "female" "1:00am" 98 "AnswerX" "Batman" 2
User3 16 "male" "1:00am" 34 "other" "superman" 0
User4 98 "female" "1:00am" 98 "other" "Dog" 2
User5 66 "male" "1:00am" 98 "Life" "Cat" 1
end
ds Q* , has(type numeric)
egen wanted = anycount(`r(varlist)'), values(98)
For counting the string foo, a loop will do it
ds Q*, has(type string)
gen WANTED = 0
quietly foreach v in `r(varlist)' {
replace WANTED = WANTED + (`v' == "foo")
}
I am facing an issue in PowerBI matrix visualization.I have a school table with column values Student_ID,Location and AttendanceDate.
I need to find the sum of the number of times each student who attended classes >=1 days per location per month.
I have created a custom measure named Attendance as stated below to calculate students who the attended classes >=1
Attendance = IF(DISTINCTCOUNT(school[Attendance_Date])>=1,1,0)
In my visualization, I am able to get all the flags which is set to '1' for all the students who meet the condition of attending classes>=1.But as per my requirement I want to get the sum of these 1 flags to get the number of times all students attended classes >=1 per location per month.My final visualization should not contain the student ID, it should only have the location and months and the sum of the flags set to 1 indicating the number of times students attended the classes >=1 .
Expected Output:-
Location January February March
Chennai 1 1 1
Delhi 2 2 2
Goa 0 2 0
I tried to implement the fixed LOD concept as we do in tableau to handle this scenario in PowerBI but no luck.
I created a calculated measure 'CalculateAttendance as below but it is not working :-
CalculateAttendance = CALCULATE((school[Attendance]),ALLEXCEPT(school[Student_ID],school[Location],school[Attendance]))
Could you please provide any changes to my above calculations to resolve this issue.Please suggest how can I handle it or modify my calculations.
Regards
Sameer
My current matrix visualization in PowerBI
Input data source [text/excel[any]] for powerBi
Attendance Student_ID location
01.01.2017 100 Delhi
02.01.2017 100 Delhi
03.01.2017 100 Delhi
04.01.2017 100 Delhi
05.01.2017 100 Delhi
06.01.2017 100 Delhi
01.01.2017 101 Delhi
02.01.2017 101 Delhi
03.01.2017 101 Delhi
04.01.2017 101 Delhi
05.01.2017 101 Delhi
06.01.2017 101 Delhi
08.01.2017 101 Delhi
09.01.2017 102 Chennai
01.01.2017 102 Chennai
02.01.2017 102 Chennai
03.01.2017 102 Chennai
04.01.2017 102 Chennai
05.01.2017 102 Chennai
06.01.2017 102 Chennai
08.01.2017 102 Chennai
11.01.2017 102 Chennai
01.02.2017 101 Delhi
02.02.2017 101 Delhi
03.02.2017 101 Delhi
04.02.2017 101 Delhi
05.02.2017 101 Delhi
06.02.2017 101 Delhi
01.02.2017 100 Delhi
02.02.2017 100 Delhi
03.02.2017 100 Delhi
04.02.2017 100 Delhi
05.02.2017 100 Delhi
06.02.2017 100 Delhi
01.02.2017 102 Chennai
02.02.2017 102 Chennai
03.02.2017 102 Chennai
04.02.2017 102 Chennai
05.02.2017 102 Chennai
06.02.2017 102 Chennai
01.02.2017 103 Goa
02.02.2017 103 Goa
03.02.2017 103 Goa
04.02.2017 103 Goa
05.02.2017 103 Goa
06.02.2017 103 Goa
01.02.2017 104 Goa
02.02.2017 104 Goa
03.02.2017 104 Goa
04.02.2017 104 Goa
01.03.2017 100 Delhi
02.03.2017 100 Delhi
03.03.2017 100 Delhi
04.03.2017 100 Delhi
05.03.2017 100 Delhi
06.03.2017 100 Delhi
01.03.2017 101 Delhi
02.03.2017 101 Delhi
03.03.2017 101 Delhi
04.03.2017 101 Delhi
05.03.2017 101 Delhi
06.03.2017 101 Delhi
08.03.2017 101 Delhi
09.03.2017 102 Chennai
01.03.2017 102 Chennai
02.03.2017 102 Chennai
03.03.2017 102 Chennai
04.03.2017 102 Chennai
05.03.2017 102 Chennai
It looks like you just need the number of distinct students per month/location
This measure produces this matrix
# Students = DISTINCTCOUNT( School[Student_ID] )
To verify it at the students level, here it is the same matrix with students detail
I have large dataset of a few million patient encounters that include a diagnosis, timestamp, patientID, and demographic information.
We have found that a particular type of disease is frequently comorbid with a common condition.
I would like to count the number of this type of disease that each patient has, and then create a histogram showing how many people have 1,2,3,4, etc. additional diseases.
This is the format of the data.
PatientID Diagnosis Date Gender Age
1 282.1 1/2/10 F 25
1 282.1 1/2/10 F 87
1 232.1 1/2/10 F 87
1 250.02 1/2/10 F 41
1 125.1 1/2/10 F 46
1 90.1 1/2/10 F 58
2 140 12/15/13 M 57
2 282.1 12/15/13 M 41
2 232.1 12/15/13 M 66
3 601.1 11/19/13 F 58
3 231.1 11/19/13 F 76
3 123.1 11/19/13 F 29
4 601.1 12/30/14 F 81
4 130.1 12/30/14 F 86
5 230.1 1/22/14 M 60
5 282.1 1/22/14 M 46
5 250.02 1/22/14 M 53
Generally, I was thinking of a DO loop, but I'm not sure where to start because there are duplicates in the dataset, like with patient 1 (282.1 is listed twice). I'm not sure how to account for that. Any thoughts?
Target diagnoses to count would be 282.1, 232.1, 250.02. In this example, patient 1 would have a count of 3, patient 2 would have 2, etc.
Edit:
This is what I have used, but the output is showing each PatientID on multiple lines in the output.
PROC SQL;
create table want as
select age, gender, patientID,
count(distinct diagnosis_description) as count
from dz_prev
where diagnosis in (282.1, 232.1)
group by patientID;
quit;
This is what the output table looks like. Why is this patientID showing up so many times?
Obs AGE GENDER PATIENTID count
1 55 Male 107828695 1
2 54 Male 107828695 1
3 54 Male 107828695 1
4 54 Male 107828695 1
5 54 Male 107828695 1
If you include variables that are neither grouping variables or summary statistics then SAS will happily re-merge your summary statistics back with all of the source records. That is why you are getting multiple records. AGE can usually vary if your dataset covers many years. And GENDER can also vary if your data is messy. So for a quick analysis you might try something like this.
create table want as
select patientID
, min(age) as age_at_onset
, min(gender) as gender
, count(distinct diagnosis_description) as count
from dz_prev
where diagnosis in (282.1, 232.1)
group by patientID
;
I think you can get what you want with an SQL statement
PROC SQL NOPRINT;
create table want as
select PatientID,
count(distinct Diagnosis) as count
from have
where Diagnosis in (282.1, 232.1, 250.02)
group by PatientID;
quit;
This filters to only the diagnoses you are interested in, counts the distinct times they are seen, by the PatientID, and saves the results to a new table.
I have 2 datasets like below.
dataset ab;
input m;
cards;
102
103
104
run;
dataset ac;
input m;
cards;
102
102
103
103
104
104
104
run;
when i wrote the below statement,
data a;
merge ab ac;
by m;
run;
I got the output as 102 102 103 103 104 104 104
but when i wrote update statement,
data b;
update ab ac;
by m;
run;
i got output as 102 103 104.
Can you please explain me what has happened in the update statement.
Thanks in Advance,
Nikhila
Update applies the transactions 1 by 1. The master table is required to have Unique BY values which is true. The transaction table has multiples, but doesn't have any new values so they are not added.
If the transaction had a BY value not in the table it would add it.
With an UPDATE and BY the following may help:
BY value is in transaction dataset AND master - >records in master are updates with values from transaction. If there are multiple records in the transaction table for a BY group they're each applied in order. There will only be one record in the Master table with the value from the last match in the transaction table.
BY value in transaction, not in master -> Record is added to master table
BY value is not in transaction, is in Master -> record in master remains unchanged.
This would be easier to see if you add a second variable to your test datasets that are unique.
data ab;
input m ##;
cards;
101 102 103 104
;;;;
run;
data ac;
input m ##;
cards;
102 102 103 103 104 104 104
;;;;
run;
data b;
update ab ac(in=in1);
by m;
if first.m then tCount=0;
tCount + in1;
run;
proc print;
run;