I have data structured like this:
ID
test_date
test_result
1
27Mar1992
N
1
08Dec1999
P
1
29Jan2005
N
2
13Jan2015
N
2
09Mar2017
P
2
05Jun2018
P
3
15Oct1996
N
3
05Sep1997
N
3
28Jun1998
N
I need to keep all records for each ID if they had a test_result=P that occurred within 2017-2018. For this example only records from ID 2 would be kept.
Thank you!
This is a solution if you were using python and pandas
Your data
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3}, 'test_date': {0: '27Mar1992', 1: '08Dec1999', 2: '29Jan2005', 3: '13Jan2015', 4: '09Mar2017', 5: '05Jun2018', 6: '15Oct1996', 7: '05Sep1997', 8: '28Jun1998'}, 'test_result': {0: 'N', 1: 'P', 2: 'N', 3: 'N', 4: 'P', 5: 'P', 6: 'N', 7: 'N', 8: 'N'}} )
This line of code implements your logic
df.loc[df['test_date'].str.extract('(\d{4})')[0].astype(int).between(2017,2018) & df['test_result'].eq('P')]
ID test_date test_result
4 2 09Mar2017 P
5 2 05Jun2018 P
A SQL query with a having clause will produce the result set.
data have;
input ID test_date date9. test_result $;
format test_date date9.;
datalines;
1 27Mar1992 N
1 08Dec1999 P
1 29Jan2005 N
2 13Jan2015 N
2 09Mar2017 P
2 05Jun2018 P
3 15Oct1996 N
3 05Sep1997 N
3 28Jun1998 N
;
proc sql;
create table want as
select * from have
group by id
having sum (year(test_date)=2017 and test_result='P') > 0
;
Logs (remerge is special SAS Proc SQL feature)
56 ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.WANT created, with 3 rows and 3 columns.
with result
Related
I have a table like below:
id | type | code
-------------------------------------
0. 5 2
1 6 6
2 8 16
3 4 11
4 5 4
5 2 2
6 4 1
7 10 6
8 9 2
All I need like output is a list of groupby on codes like here:
{ '2': [5,2,9], '6': [6, 10], '16': [8], '11':[4], ...}
I did this query but it's not doing the right query:
type_codes = [{ppd['code']:ppd['type_id']} for ppd in \
MyClass.objects \
.values('code', 'type_id')
.order_by()
]
Any help will be appericiated
You can try the following, iterating over the query result (using values_list):
data = MyClass.objects..values_list('code', 'type_id')
res = {}
for code, type in data:
res[code] = [type] if code not in res.keys() else res[code] + [type]
Im trying to set any5 = 'Yes' if there is a number 5 in any of the columns Q1 to Q5. However my code below only shows for the last column.
data survey;
infile datalines firstobs=2;
input ID 3. Q1-Q5;
array score{5} _temporary_ (5,5,5,5,5);
array Ques{5} Q1-Q5;
do i =1 to 5;
if Ques{i} = score{i} then any5='Yes';
else any5='No';
end;
drop i;
datalines;
ID Q1 Q2 Q3 Q4 Q5
535 1 3 5 4 2
12 5 5 4 4 3
723 2 1 2 1 1
7 3 5 1 4 2
;
run;
Keep it simple :-)
data survey;
infile datalines;
input ID 3. Q1-Q5;
array Ques{*} Q1 - Q5;
any5 = ifc(5 in Ques, 'Yes', 'No');
datalines;
535 1 3 5 4 2
12 5 5 4 4 3
723 2 1 2 1 1
7 3 5 1 4 2
;
Use the COUNTC function to compute the number of times 5 is repeated in your Q 1-Q5 columns then use the IFC function to return a character value based on whether the expression is true, false, or missing.
data survey;
infile datalines firstobs=2;
input ID 3. Q1-Q5;
any5 = ifc(countc(cats(of Q:),'5')>0,'Yes','No');
datalines;
ID Q1 Q2 Q3 Q4 Q5
535 1 3 5 4 2
12 5 5 4 4 3
723 2 1 2 1 1
7 3 5 1 4 2
;
run;
Result:
535 1 3 5 4 2 Yes
12 5 5 4 4 3 Yes
723 2 1 2 1 1 No
7 3 5 1 4 2 Yes
Use the WHICHN function to determine the index of the target value in a list of values.
In your case assign the test for any index matching
any5 = whichn (5, of ques(*)) > 0;
From the documentation:
WHICHN Function
Searches for a numeric value that is equal to the first argument, and
returns the index of the first matching value.
Syntax
WHICHN(argument, value-1 <, value-2, ...>)
It is a simple mistake in your logic. You are setting ANY5 to YES or NO on every time through the loop. Since you continue going through the loop even after the match is found you overwrite the results from the previous times through the loop, so only the results of the last test survive.
Here is one way. Set the answer to NO before the loop and remove the ELSE clause.
any5='No ';
do i =1 to 5;
if Ques{i} = 5 then any5='Yes';
end;
Or stop when you have your answer.
do i =1 to 5 until(any5='Yes');
if Ques{i} = score{i} then any5='Yes';
else any5='No';
end;
Or skip the looping altogether.
if whichn(5, of Q1-Q5) then any5='Yes';
else any5='No';
Or even easier create any5 as numeric instead of character. SAS will return 1 for TRUE and 0 for FALSE as the result of a boolean expression.
any5 = ( 0 < whichn(5, of Q1-Q5) );
I have data set,
CustID Rating
1 A
1 A
1 B
2 A
2 B
2 C
2 D
3 X
3 X
3 Z
4 Y
4 Y
5 M
6 N
7 O
8 U
8 T
8 U
And expecting Output
CustID Rating ID
1 A 1
1 A 1
1 B 1
2 A 1
2 B 2
2 C 3
2 D 4
3 X 1
3 X 1
3 Z 2
4 Y 1
4 Y 1
5 M 1
6 N 1
7 O 1
8 U 1
8 T 2
8 U 1
In the solution below, I selected the distinct possible ratings into a macro variable to be used in an array statement. These distinct values are then searched in the ratings tolumn to return the number assigned at each successful find.
You can avoid the macro statement in this case by replacing the %sysfunc by 3 (the number of distinct ratings, if you know it before hand). But the %sysfunc statement helps resolve this in case you don't know.
data have;
input CustomerID Rating $;
cards;
1 A
1 A
1 B
2 A
2 A
3 A
3 A
3 B
3 C
;
run;
proc sql noprint;
select distinct quote(strip(rating)) into :list separated by ' '
from have
order by 1;
%put &list.;
quit;
If you know the number before hand:
data want;
set have;
array num(3) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
Otherwise:
%macro Y;
data want;
set have;
array num(%sysfunc(countw(&list., %str( )))) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
%mend;
%Y;
The output:
Obs CustomerID Rating id
1 1 A 1
2 1 A 1
3 1 B 2
4 2 A 1
5 2 A 1
6 3 A 1
7 3 A 1
8 3 B 2
9 3 C 3
Assuming data is sorted by customerid and rating (as in the original unedited question). Is the following what you want:
data want;
set have;
by customerid rating;
if first.customerid then
id = 0;
if first.rating then
id + 1;
run;
I want to extract specific set of rows from a large SAS dataset based on a particular cell value of a variable into a new dataset. In this dataset, I have 6 variables. Following is an example of this dataset:
Variable names: Var1 Var2 Var3 Var4 Var5 Var6
Row 1 A 1 2 3 4 5
Row 2 B 1 2 3 4 5
Row 3 A 1 2 3 4 5
Row 4 B 1 2 3 4 5
Row 5 Sample 1 2 3 4 5
Row 6 A 1 2 3 4 5
Row 7 B 1 2 3 4 5
Row 8 A 1 2 3 4 5
Row 9 B 1 2 3 4 5
Row 10 A 1 2 3 4 5
Row 11 B 1 2 3 4 5
Row 12 A 1 2 3 4 5
Row 13 B 1 2 3 4 5
From this dataset, I want to select a set of next 8 rows starting from a row in which Var 1 has a value = "Sample". I want to extract multiple such sets of 8 rows from this dataset into a new dataset. Can someone please guide me how I can accomplish this in SAS?
Thank you
Would the output statement work for you?
data have;
infile datalines dsd dlm=",";
input Variable_names : $char10.
Var1 : $char10.
Var2 : 8.
Var3 : 8.
Var4 : 8.
Var5 : 8.
Var6 : 8.;
datalines;
Row 1 , A , 1, 2, 3, 4, 5
Row 2 , B , 1, 2, 3, 4, 5
Row 3 , A , 1, 2, 3, 4, 5
Row 4 , B , 1, 2, 3, 4, 5
Row 5 , Sample, 1, 2, 3, 4, 5
Row 6 , A , 1, 2, 3, 4, 5
Row 7 , B , 1, 2, 3, 4, 5
Row 8 , A , 1, 2, 3, 4, 5
Row 9 , B , 1, 2, 3, 4, 5
Row 10, A , 1, 2, 3, 4, 5
Row 11, B , 1, 2, 3, 4, 5
Row 12, A , 1, 2, 3, 4, 5
Row 13, B , 1, 2, 3, 4, 5
;
run;
data want_without
want_with;
set have;
if strip(Var1) = "Sample" then output want_with;
else output want_without;
run;
One way to do this is to set a counter to 8 whenever the previous record has var1="Sample", and then decrement the counter for each record. And only output records where counter is >= 1.
data want ;
set have ;
if lag(var1) = "Sample" then counter = 8 ;
else counter+(-1) ; *counter is implicitly retained ;
if counter>=1 then output ;
* drop counter ;
run ;
You can set a counter and output as desired, use the RETAIN coupled with an IF (& OUTPUT) statement. You may need to tweak the IF condition but I think you get the idea here.
data want;
set have;
retain counter 10;
if strip(Var1) = "Sample" then counter=1;
else counter+1;
if 2<=counter<=9 then OUTPUT;
*if 2<=counter<=9; *this is the same as above, but less code;
run;
I have the following dataset
data input;
input Row$ A B;
datalines;
1 1 2
2 1 2
3 1 1
4 1 1
5 2 3
6 2 3
7 2 3
8 2 2
9 2 2
10 2 1
;
run;
My goal is only to keep records of the first group of data for the variable A. For example I only want records where A=1 and B=2 (lines 1 and 2) and for the next group where A=2 and B=3 and so on...
I tried the following code
data input (rename= (count=rank_b));
set input;
count + 1;
by A descending B;
if first.B then count = 1;
run;
which just gives the number of observations in A (1 to 4) and B (1 to 6). What I would like is
A B rank_b rank_b_desired
1 2 1 1
1 2 2 1
1 1 1 2
1 1 2 2
2 3 1 1
2 3 2 1
2 2 1 2
2 2 2 2
2 1 1 3
So that I can then eliminate all obs where rank_b_desired does not equal 1.
Set a flag to 1 when you encounter a new value of A, then set it to 0 if B changes. retain will preserve the value of the flag when a new line is read from the input.
data want;
set input;
by A descending B;
retain flag;
if first.B then flag = 0;
if first.A then flag = 1;
run;
The desired result can also be achieved via proc sql, with the added benefit that it does not depend on the data being pre sorted.
proc sql;
create table want as
select *
from input
group by A
having B = max(B)
order by Row;
quit;
Or to match user234821's output:
proc sql;
create table want as
select
*,
ifn(B = max(B), 1, 0) as flag
from input
group by A
order by Row;
quit;