How to convert SAS codes to SAS macros? [closed] - sas

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Would you convert these SAS codes to short MACRO as SC changes as 1 to 6?
if SC=1 and u_rep =1 and ((co1*(Daily**2))+(co_dem_1*daily)+(int_1))>99.7 then
exp_val =1;
Else if SC=1 and u_rep =1 and ((co1*(Daily**2))+(co_dem_1*daily)+(int_1))<99.7
then
exp_val=(( co_dem_1*((co1*(Daily**2))+(co_dem_1*daily)+(int_1))/100
;
if SC=2 and u_rep =1 and ((co2*(Daily**2))+(co_dem_2*daily)+(int_2))>99.7 then
exp_val =1;
Else if SC=2 and u_rep =1 and ((co2*(Daily**2))+(co_dem_2*daily)+(int_2))<99.7
then
exp_val=(( co_dem_2 *((co2*(Daily**2))+(co_dem_2*daily)+(int_1))/100
;
if SC=3 and u_rep =1 and ((co3*(Daily**2))+(co_dem_3*daily)+(int_3))>99.7 then
exp_val =1;
Else if SC=2 and u_rep =1 and ((co3*(Daily**2))+(co_dem_3*daily)+(int_3))<99.7
then
exp_val=(( co_dem_3*((co3*(Daily**2))+(co_dem_3*daily)+(int_3))/100
;
I need macro version of the codes.

You might want to use data step arrays instead of macro coding:
data want;
set have;
array co(6) co1-co6;
array co_dem(6) co_dem_1-co_dem_6;
array int(6) int_1-int_6;
do index = 1 to 6;
if SC = index and u_rep = 1 then do;
if ((co(index)*(Daily**2))+(co_dem(index)*daily)+(int(index))) > 99.7 then
exp_val = 1;
else
exp_val = (( co_dem(index)*((co(index)*(Daily**2))+(co_dem(index)*daily)+(int(index)))/100;
end;
end;
run;

Related

SAS multiple datasets logical order

I'm sorry to ask this question but my English is poor and I don't know what to type on google to get results.
I want to do :
data test;
set mytable1 to mytable999;
run;
how can I tell SAS to set all the tables from 1 to 999 without writing them (cause it's long to do so). something like mytable1-999
thank you very much, I know it's a basic function but I don't remember what is the name in English
Just use the wild-card function of ´:´ in SAS. In
data myTable1;
do i = 1 to 3;
j = 2*i;
output;
end;
run;
data myTable2;
do i = 1 to 3;
j = -i;
output;
end;
run;
data myAll;
set myTable:;
run;
myTable: is equivalent with the list of all tables of which the name starts with myTable.
The result is
i j
== ==
1 2
2 4
3 6
1 -1
2 -2
3 -3

How do I select the first 5 observations with regard to duplicates? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a large dataset containting over 80 000 000 rows sorted by "name" and "income" (with duplicates both for name and income). For the first name I would like to have the 5 lowest incomes. For the second name I would like to have the 5 lowest incomes (but incomes drawn to the first name are then disqualified to be selected). And so on, until the last name (if there are any incomes left at that time).
You first want to rank income within names. So:
proc rank data=yourdata out=temp ties=low;
by name;
var income;
ranks incomerank;
run;
Then you want to filter the 5 lowest incomes by name, so:
proc sql;
create table want as
select distinct *
from temp
where incomerank < 6;
quit;
You will need to sort and track incomes
Use an array to sort and track the lowest five income of a name.
Use a hash to track and check the observance of an income being output and thus ineligible for output by later names.
Example:
An insert sort of eligible low valued incomes is used and will be fast due to only 5 items.
data have;
call streaminit(1234);
do name = 1 to 1e6;
do seq = 1 to rand('integer', 20);
income = rand('integer', 20000, 1000000);
output;
end;
end;
run;
data
want (label='Lowest 5 incomes (first occurring over all names) of each name')
want_barren(keep=name label='Names whose all incomes were previously output for earlier names')
;
array X(5) _temporary_;
if _n_ = 1 then do;
if 0 then set have;
declare hash incomes();
incomes.defineKey('income');
incomes.defineDone();
end;
_maxmin5 = 1e15;
x(1) = 1e15;
x(2) = 1e15;
x(3) = 1e15;
x(4) = 1e15;
x(5) = 1e15;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if incomes.check() = 0 then continue;
* insert sort - lowest five not observed previously;
if income > _maxmin5 then continue;
do _i_ = 1 to 5;
if income < x(_i_) then do;
do _j_ = 5 to _i_+1 by -1;
x(_j_) = x(_j_-1);
end;
x(_i_) = income;
_maxmin5 = x(5);
incomes.add();
leave;
end;
end;
end;
_outflag = 0;
do _n_ = 1 to _n_;
set have;
if income in x then do;
_outflag = 1;
OUTPUT want;
end;
end;
if not _outflag then
OUTPUT want_barren;
drop _:;
run;
data have;
do n = 1 to 8e5;
do _N_ = 1 to 100;
income = ceil(rand('uniform') * 1e4);
address = cats('Address_', _N_);
output;
end;
end;
run;
data want(drop=c);
if _N_ = 1 then do;
dcl hash h(dataset : 'have(obs=0)', ordered : 'a', multidata : 'y');
h.definekey('income');
h.definedata(all : 'y');
h.definedone();
dcl hiter i('h');
dcl hash inc();
inc.definekey('income');
inc.definedone();
end;
do until (last.n);
set have;
by n;
h.add();
end;
do c = 0 by 0 while (i.next() = 0);
if inc.add() = 0 then do;
c + 1;
output;
end;
if c = 5 then leave;
end;
_N_ = i.first();
_N_ = i.prev();
h.clear();
run;
Here is my interpretation of your problem and a solution.
Suppose a simplified version of your data looks like this and you want the 2 lowest income for each name. For simplicity, I use a numeric variable n as name, but a character var will work as well.
data have;
input n income;
datalines;
1 100
1 200
1 300
2 400
2 100
2 500
3 600
3 200
3 500
;
From this data, my guess is that your logic goes like this:
Start with n = 1.
Output the 2 observations with the lowest income (100 and 200)
Go to the next name (n=2).
Output the 2 observations with the lowest income, that has not already been output (300 and 400). 200 Has been output in the n=1 group.
...And so on...
This gives the desired result below:
data want;
input n income;
datalines;
1 100
1 200
2 300
2 400
3 500
;
Try out the solution below and verify that you get the result as posted above.
data want(drop=c);
if _N_ = 1 then do;
dcl hash h(ordered : 'a', multidata : 'y');
h.definekey('income');
h.definedone();
dcl hiter i('h');
dcl hash inc();
inc.definekey('income');
inc.definedone();
end;
do until (last.n);
set have;
by n;
h.add();
end;
do c = 0 by 0 while (i.next() = 0);
if inc.add() = 0 then do;
c + 1;
output;
end;
if c = 2 then leave;
end;
_N_ = i.first();
_N_ = i.prev();
h.clear();
run;
Finally, let us create representable example data with 80Mio obs. I change the if c = 2 then leave; statement to if c = 5 then leave; to go back to your actual problem.
The code below runs in about 45 sec on my system and processes the data in a single pass. Let me know is it works for you :-)
data have;
do n = 1 to 8e5;
do _N_ = 1 to 100;
income = ceil(rand('uniform') * 1e4);
output;
end;
end;
run;
data want(drop=c);
if _N_ = 1 then do;
dcl hash h(ordered : 'a', multidata : 'y');
h.definekey('income');
h.definedone();
dcl hiter i('h');
dcl hash inc();
inc.definekey('income');
inc.definedone();
end;
do until (last.n);
set have;
by n;
h.add();
end;
do c = 0 by 0 while (i.next() = 0);
if inc.add() = 0 then do;
c + 1;
output;
end;
if c = 5 then leave;
end;
_N_ = i.first();
_N_ = i.prev();
h.clear();
run;

DO loops and data input in SAS

I have the university edition of SAS.
I have data from treatment groups A, B, and C. I am trying to use DO loops to process the groups separately for comparison. I can do it in one nested DO loop when the data lengths are the same. But these groups have different numbers of observations and I am running into trouble. Here is my code:
data AirPoll1 (keep = Group Ozone);
label Group = "Treatment Group";
label Ozone = 'Ozone level (in ppb)';
do i=1 to 1;
input Group $##
do j=1 to 15;
input Ozone ##;
output;
end;
end;
do i=1 to 1;
input Group $ ##;
do j=1 to 10;
input Ozone ##;
output;
end;
end;
do i=1 to 1;
input Group $ ##;
do j=1 to 11;
input Ozone ##;
output;
end;
end;
datalines;
A 4 6 3 4 7 8 2 3 4 1 8 9 5 6 3
B 5 3 6 2 1 2 4 3 2 4
C 8 9 7 8 6 7 6 7 9 8 9
;
run;
proc univariate data = AirPoll1;
Var Ozone;
by Group;
histogram Ozone;
run;
The error I am getting is:
ERROR 161-185: No matching DO/SELECT statement.
Is there a quick way to fix this?
Quick fix indeed
you have missed off the semi-colon of the first input line,
doh:)
happy programming

Calculating number of correct of multiple choice questions

I have data on questions which students answered. The format is such
Student Q1 Q2 Q3 Q4
A 1 3 2 3
B 2 3 2 2
C 1 2 1 2
D 3 3 1 2
For this example, lets say 1 is the correct answer for question 1, 2 is the correct answer for question 2,3 and 4.
How would I generate a statistic table that would tell me how many questions a student answered correctly? In the example above, it would say something like
Student Answered Correct:
A 2/4
You can create an array of the correct answers, then just loop through the student answers to compare them.
I've created the final variable as character to display in the format you've shown. Obviously this means you won't have access to the underlying value, so you may want to keep the number of correct answers in the data for other analysis purposes.
data have;
input Student $ Q1 Q2 Q3 Q4;
datalines;
A 1 3 2 3
B 2 3 2 2
C 1 2 1 2
D 3 3 1 2
;
run;
data want;
set have;
array correct{4} (1 2 3 4); /* create array of correct answers */
array answer{4} q1-q4; /* create array of student answers */
_count=0; /* reset count to 0 */
do i = 1 to dim(correct);
if answer{i} = correct{i} then _count+1; /* compare student answer to correct answer and increment count by 1 if they match */
end;
length answered_correct $8; /* set length for variable */
answered_correct = catx('/',_count,dim(correct)); /* display result in required format */
drop q: correct: i _count; /* drop unwanted variables */
run;
First you have to create variable num_questions and set it to the number of questions. Then you need to write as many if-then-else statements as questions to create binary variables (flags) to check if each answer is correct (e.g. Correct_Q1). Use sum(of Correct:) to get the total of correct answers for each student. Correct: references all variable names starting with 'Correct'.
data want;
set have;
num_questions = 4;
if Q1 = 1 then Correct_Q1 = 1; else Correct_Q1 = 0;
if Q2 = 2 then Correct_Q2 = 1; else Correct_Q2 = 0;
if Q3 = 2 then Correct_Q3 = 1; else Correct_Q3 = 0;
if Q4 = 2 then Correct_Q4 = 1; else Correct_Q4 = 0;
format Answered_Correct $3. Answered_Correct_pct percent.;
Answered_Correct = compress(put(sum(of Correct:),$8.)||'/'||put(num_questions, 8.));
Answered_Correct_pct = sum(of Correct:) / num_questions;
label Student = 'Student' Answered_Correct = 'Answered correct' Answered_Correct_pct = 'Answered correct (%)';
keep Student Answered_Correct Answered_Correct_pct;
run;
proc print data=want noobs label;
run;
If you only have just four questions the fastest solution would probably be to just use conditional statements:if Q1 = 1 then answer + 1;
For a more general solution using a lookup/answer table:
Transpose the data, merge the answer table, summarize on student.
data broad_data;
infile datalines missover;
input Student $ Q1 Q2 Q3 Q4;
datalines;
A 1 3 2 3
B 2 3 2 2
C 1 2 1 2
D 3 3 1 2
;
data answers;
infile datalines missover;
input question $ correct_answer ;
datalines;
Q1 1
Q2 2
Q3 2
Q4 2
;
data long_data;
set broad_data;
length question $10 answer 8;
array long[*] Q1--Q4;
do i = 1 to dim(long);
question = vname(long[i]);
answer = long[i];
output;
end;
keep Student question answer;
run;
proc sort data = long_data; by question student; run;
data long_data_answers;
merge long_data
answers
;
by question;
run;
proc sort data = long_data_answers; by student; run;
data result;
do i = 1 by 1 until (last.student);
set long_data_answers;
by student;
count = sum(count, answer eq correct_answer);
end;
result = count/i;
keep student result;
format result fract8.;
run;
If you like sql/want to compress your code you can combine the last two datasteps + sorts into one statement.
proc sql;
create table result as
select student, sum(answer eq correct_answer)/count(*) as result format fract8.
from long_data a
inner join answers b
on a.question eq b.question
group by student
;
quit;

Data Preparation with given columns in WPS [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a dataset with two columns having gender and birth_date, I need the output in such a form that there are two rows i.e. male and female (gender) and columns as months of a year. How do I do that in SAS?
Suppose:
Gender Birthdate
Male 01/10/1989
Female 02/12/1990
and so on..i have around 100K rows.
So I assume you want the number of people born in each month.
The first data set creates some test data.
data test;
format Gender $8. Birthdate date9.;
do i=1 to 5000;
gender="MALE";
Birthdate = today() + ceil(ranuni(123)*365);
output;
end;
do i=1 to 5000;
gender="FEMALE";
Birthdate = today() + ceil(ranuni(123)*365);
output;
end;
drop i;
run;
proc sort data=test;
by gender;
run;
data output(keep=gender month:);
set test;
by gender;
array Month_[12];
if first.gender then do;
do i=1 to 12;
month_[i] = 0;
end;
end;
month_[month(birthdate)] + 1;
if last.gender then
output;
run;
Creates
If I understood what you are asking and if the result set below looks like the output you wanted.. here is the code I used naming the initial data set as 'YYY'
data new(drop=year1 MonthName birthdate) ;
set yyy;
format new_col $14. MonthName year1 $12.;
year1 = compress(year(birthdate));
MonthName=put(birthdate,monname.);
new_col = trim(MonthName)||trim(year1);
run;
proc sort data=new;
by gender new_col;
run;
proc sql;
create table new2 as
select gender, New_col, count(new_col) as New_col2
from new
group by gender, New_col;
quit;
proc transpose data=new2
out=result
name=New_col;
by gender;
id New_col;
run;
Data final;
set result(drop=New_col);
run;
Result set: