Compare Value of Current Observation with First Observation

Compare Value of Current Observation with First Observation - sas

I have a set of multiple choice responses from a survey with 45 questions, and I've placed the correct responses as my first observation in the dataset.
In my DATA step I would like to set values to 0 or 1depending on whether the variable in each observation matches the same variable in the first observation, I want to replace the response letter (A-D) with the 0 or 1 in the dataset, how do I go about doing that comparison?
I'm not doing any grouping, so I believe I can access the first row using First.x, but I'm not sure how to compare that across each variable(answer1-answer45).
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | A | C |
| 3 | C | D |
| 4 | A | B |
| 5 | D | C |
| 6 | B | B |
Should become:
| Id | answer1 | answer2 | ...through answer 45
|:-------------|---------:|
| KEY | A | B |
| 2 | 1 | 0 |
| 3 | 0 | 0 |
| 4 | 1 | 1 |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
Current code for reading in the data:
DATA TEST(drop=name fill answer0);
INFILE SCORES DSD firstobs=2;
length id $4;
length answer1-answer150 $1;
INPUT name $ fill id $ (answer0-answer150) ($);
RUN;
Thanks in advance!

Here's how I might do it. Create a data set to PROC COMPARE the KEY to the observed. Then you have X for not matching key and missing for matched. You can then use PROC TRANSREG to score the 'X.' to 01. PROC TRANSREG also creates macro variables which contain the names of the new variables and the number.
From log NOTE: _TRGINDN=2 _TRGIND=answer1D answer2D
data questions;
input id:$3. (answer1-answer2)(:$1.);
cards;
KEY A B
2 A C
3 C D
4 A B
5 D C
6 B B
;;;;
run;
data key;
if _n_ eq 1 then set questions(obs=1);
set questions(keep=id firstobs=2);
run;
proc compare base=key compare=questions(firstobs=2) out=comp outdiff noprint;
id id;
run;
options validvarname=v7;
proc transreg design data=comp(drop=_type_ type=data);
id id;
model class(answer:) / noint;
output out=scored(drop=intercept _:);
run;
%put NOTE: &=_TRGINDN &=_TRGIND;

I don't have my SAS license here at home, so I can't actually test this code. I'll give it me best shot, though ...
First, I'd keep my correct answers in a separate table, and then merge it with the answers from the respondents. That also makes the solution scalable, should you have more multiple choice solutions and answers in the same table, since you'd be joining on the assignment ID as well.
Now, import all your correct answers to a table answers_correct with column names answer_correct1-answer_correct45.
Then, merge the two tables and determine the outcome for each question.
DATA outcome;
MERGE answers answers_correct;
* We will not be using any BY.;
* If you later add more questionnaires, merge BY the questionnaire ID;
ARRAY answer(*) answer1-answer45;
ARRAY answer_correct(*) answer_correct1-answer_correct45;
LENGTH result1-result45 $1;
ARRAY result(*) result1-result45;
DROP i;
FOR i = 1 TO DIM(answer);
IF answer(i) = answer_correct(i) THEN result(i) = '1';
ELSE result(i) = '0';
END;
RUN;

Related

SAS IFN function gets stuck

Before we get to my question please note that I purposefully did not include example data in this post, as my problem occurs on my full dataset and subsets of it. I have two dataset with client data in the following format.
Have_1
+------------+------------+------+
| dt | dt_next | id |
+------------+------------+------+
| 30.09.2010 | 31.10.2010 | 0001 |
+------------+------------+------+
| 31.10.2010 | 30.11.2010 | 0001 |
+------------+------------+------+
| 30.11.2010 | 31.12.2010 | 0001 |
+------------+------------+------+
| 31.12.2010 | 31.01.2011 | 0001 |
+------------+------------+------+
Have_2
+------+-------+------------+------------+
| id | event | start_date | end_date |
+------+-------+------------+------------+
| 0001 | 1 | 31.10.2010 | 30.11.2010 |
+------+-------+------------+------------+
| 0001 | 2 | 31.10.2010 | 31.12.2010 |
+------+-------+------------+------------+
I am trying to use the IFN function to put 1-0 flags in my dataset by using the following logic:
Proc SQL;
Create table want as
Select a.*
,ifn(a.id in (select id from have_2 where a.dt <= end_date and start_date <= a.dt_next), 1, 0) as flg_1
,ifn(a.id in (select id from have_2 where a.dt <= end_date and start_date <= a.dt), 1, 0) as flg_2
From have_1 as a;
Quit;
The code works fine if I take only one client, however, if I take the full dataset (or even a small subset of it such as only 10 clients) then the code gets stuck in the sense that the process begins without error but simply never finishes. I tried setting indexes to both my input datasets, without success.
Are there any peculiarities to the IFN function, which can make it behave that way?

So why not just join and take the max of all events if any event's dates fall into those periods? That should eliminate the need to do two subqueries for every observation in HAVE1.
proc sql;
create table want2 as
select a.id
, a.dt
, a.dt_next
, max(a.dt <= b.end_date and b.start_date <= a.dt_next) as flg1
, max(a.dt <= b.end_date and b.start_date <= a.dt) as flg2
from have1 a
left join have2 b
on a.id = b.id
group by 1,2,3
;
quit;
Note the issue is with the subqueries, not the IFN() function call. Also there is no need for IFN() function here. SAS evaluates boolean expressions to 1 or 0. So the expression a=b returns the same result as IFN(a=b,1,0) returns.

Grouping child items and displaying parent sum

I have the following table
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
I would like to group the table by group, insert the grouped sum into value, and then ungroup:
+-------+--------+
| item | value |
+-------+--------+
| 1 | 30 |
| a | 10 |
| b | 20 |
| 2 | 70 |
| b | 30 |
| c | 40 |
+-------+--------+
The purpose of the result is to interpret the first column as items a and b belonging to group 1 with sum 30 and items b and c belonging to group 2 with sum 70.

Such a data transformation can be indicative of a reporting requirement more than a useful data structure for downstream processing. Proc REPORT can create output in the form desired.
data have;
infile datalines;
input group $ item $ value ##; datalines;
1 a 10 1 b 20 2 b 30 2 c 40
;
proc report data=have;
column group item value;
define group / order order=data noprint;
break before group / summarize;
compute item;
if missing(item) then item=group;
endcomp;
run;

I assume that both group and item are character variables
data have;
infile datalines firstobs=4 dlm='|';
input group $ item $ value;
datalines;
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
;
data want (keep=group value);
do _N_=1 by 1 until (last.group);
set have;
by group;
v + value;
end;
value = v;output;v=0;
do _N_=1 to _N_;
set have;
group = item;
output;
end;
run;

How can I add observations to the existing dataset based on dates?

I have a dataset like this:
data have;
input date :date9. index;
format date date9.;
datalines;
31MAR2019 10
30APR2019 12
31MAY2019 15
30JUN2019 14
;
run;
I would like to add observations with dates from the maximum date (hence from 30JUN2019) until 31DEC2019 (by months) with the value of index being the last available value: 14. How can I achieve this in SAS? I want the code to be flexible, thus for every such dataset, take the maximum of date and add monthly observations from that maximum until DEC2019 with the value of index being equal to the last available value (here in the example the value in JUN2019).

An explicit DO loop over the SET provides the foundation for a concise solution with no extraneous worker variables. Automatic variable last is automatically dropped.
data have;
input date :date9. index;
format date date9.;
datalines;
31MAR2019 10
30APR2019 12
31MAY2019 15
30JUN2019 14
;
data want;
do until (last);
set have end=last;
output;
end;
do last = month(date) to 11; %* repurpose automatic variable last as a loop index;
date = intnx ('month',date,1,'e');
output;
end;
run;
Always helpful to refresh understanding. From SET Options documentation
END=variable
creates and names a temporary variable that contains an end-of-file indicator. The variable, which is initialized to zero, is set to 1 when SET reads the last observation of the last data set listed. This variable is not added to any new data set.

You can do it using end in set statement and retain statement.
data want(drop=i tIndex tDate);
set have end=eof;
retain tIndex tDate;
if eof then do;
tIndex=Index;
tDate=Date;
end;
output;
if eof then do;
do i=1 to 12-month(tDate);
index=tIndex;
date = intnx('month',tDate,i,'e');
output;
end;
end;
run;
INPUT:
+-----------+-------+
| date | index |
+-----------+-------+
| 31MAR2019 | 10 |
| 30APR2019 | 12 |
| 31MAY2019 | 15 |
| 30JUN2019 | 14 |
+-----------+-------+
OUTPUT:
+-----------+-------+
| date | index |
+-----------+-------+
| 31MAR2019 | 10 |
| 30APR2019 | 12 |
| 31MAY2019 | 15 |
| 30JUN2019 | 14 |
| 31JUL2019 | 14 |
| 31AUG2019 | 14 |
| 30SEP2019 | 14 |
| 31OCT2019 | 14 |
| 30NOV2019 | 14 |
| 31DEC2019 | 14 |
+-----------+-------+

adding rows given a certain condition

I have a database with 3 columns. ID, Date and amount. It is ordered by ID and Date. All I want to do is to add a row after the latest occurrence of every ID with the same ID, Date = Date + 1 Month and Amount = 0.
As an Illustration I want to go from this:
id | Date |amount |
A | 01JAN| 1 |
A | 01FEB| 1 |
B | 01FEB| 0 |
B | 01MAR| 1 |
to this:
id | Date |amount |
A | 01JAN| 1 |
A | 01FEB| 1 |
A | 01MAR| 0 | <- ADD THIS ROW
B | 01FEB| 0 |
B | 01MAR| 1 |
B | 01APR| 0 |<- ADD THIS ROW
I know I should use intxn but beyond that I don't really know what to do. I appreciate any input.

Assuming that the DATE variable has actual date values in it you just need to output twice on the last observation in each group.
data want;
set have;
by id;
output;
if last.id then do;
date=intnx('month',date,1,'b');
amount=0;
output;
end;
run;

How to split a combination of numeric and characters into multiple columns

I want to split some variable "15to16" into two columns where for that row I want the values 15 and 16 in each of the column entries. Hence, I want to get from this
+-------------+
| change |
+-------------+
| 15to16 |
| 9to8 |
| 6to5 |
| 10to16 |
+-------------+
this
+-------------+-----------+-----------+
| change | from | to |
+-------------+-----------+-----------+
| 15to16 | 15 | 16 |
| 9to8 | 9 | 8 |
| 6to5 | 6 | 5 |
| 10to16 | 10 | 16 |
+-------------+-----------+-----------+
Could someone help me out? Thanks in advance!

data have;
input change $;
cards;
15to16
9to8
6to5
10to16
;
run;
data want;
set have;
from = input(scan(change,1,'to'), 8.);
to = input(scan(change,2,'to'), 8.);
run;
N.B. in this case the scan function is using both t and o as separate delimiters, rather than looking for the word to. This approach still works because scan by default treats multiple consecutive delimiters as a single delimiter.

Regular expressions with the metacharacter () define groups whose contents can be retrieved from capture buffers with PRXPOSN. The capture buffers retrieved in this case would be one or more consecutive decimals (\d+) and converted to a numeric value with INPUT
data have;
input change $20.; datalines;
15to16
9to8
6to5
10to16
run;
data want;
set have;
rx = prxparse('/^\s*(\d+)\s*to\s*(\d+)\s*$/');
if prxmatch (rx, change) then do;
from = input(prxposn(rx,1,change), 12.);
to = input(prxposn(rx,2,change), 12.);
end;
drop rx;
run;

You can get the answer you want by declaring delimiter when you create the dataset. However you did not provide enough information regarding your other variables and how you import them
Data want;
INFILE datalines DELIMITER='to';
INPUT from to;
datalines;
15to16
9to8
6to5
10to16
;
Run;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compare Value of Current Observation with First Observation - sas

Related

SAS IFN function gets stuck

Grouping child items and displaying parent sum

How can I add observations to the existing dataset based on dates?

adding rows given a certain condition

How to split a combination of numeric and characters into multiple columns

Categories

Resources