data primes;
length status $12.;
do i=1 to 4;
status='Prime';
do j=2 to i-1;
if mod(i, j) = 0 then do;
status='Composite';
leave; *exit loop;
end;
end;
output;
end;
run;
proc print data = primes;
run;
Above is the program that I ran and below is the output. I am unable to understand how the value of i when the numbers are prime. I have mentioned i should go till n-1 but in the output the value of i =n for primes. Can someone please help me understand this?
Obs status i j
1 Prime 1 2
2 Prime 2 2
3 Prime 3 3
4 Composite 4 2
If i=1 then the second loop is j=2 to 0 which means the loop never starts. The j loop never executes. You can add an explicit output or put statement to see this.
Related
I'd like to count the length of the non zero sequence in a data as below:
ID Value
1 0
1 0
1 2.5
1 3
1 0
1 4
1 2
1 5
1 0
So here the length of the first non zero sequence is 2 and the length of the second non zero sequence is 3. The new data will look like this:
ID Value Length
1 0 0
1 0 0
1 2.5 2
1 3 2
1 0 0
1 4 3
1 2 3
1 5 3
1 0 0
How can I write SAS code to accomplish this task with a large data like this. Thanks!
Here is one possible solution. It assumes there are no missing values in the Value variable and that your ID variable does not have any significance for this problem.
*creates new length variable that starts at 1 and increments by 1 from start to end of every non-zero sequence;
data step_one (drop=prev_val);
set orig_data;
retain prev_val length 0;
indx = _n_;
if value ne 0 and prev_val ne 0 then length = length + 1;
else if value ne 0 then length = 1;
else if value = 0 then length = 0;
prev_val = value;
run;
*sorts dataset in reverse order;
proc sort data=step_one;
by descending indx;
run;
*creates modified length variable that carries maximum length value for each sequence down to all observations included in that sequence;
data step_two (drop=length prev_length rename=(length_new=length));
set step_one;
retain length_new prev_length 0;
if length = 0 then length_new = 0;
else if length ne 0 and prev_length = 0 then
length_new = length;
prev_length = length;
run;
*re-sorts dataset back to its original order and outputs final dataset with just the needed variables;
proc sort data=step_two out=final_result (keep=ID value length);
by indx;
run;
Given two simple datasets A and B as follows
DATA A; INPUT X ##;
CARDS;
1 2 3 4
RUN;
DATA B; INPUT Y ##;
CARDS;
1 2
RUN;
I am trying to create two datasets named C and D, one using repeated SET and OUTPUT statements and another using DO loop.
DATA C;
SET B;
K=1; DO; SET A; OUTPUT; END;
K=K+1; DO; SET A; OUTPUT; END;
K=K+1;
RUN;
DATA D;
SET B;
DO K = 1 TO 2;
SET A; OUTPUT;
END;
RUN;
I thought that C and D should be the same as the DO loop is supposed to be repeating those statements as shown in the DATA step for C, but it turns out that they are different.
Dataset C:
Obs Y K X
1 1 1 1
2 1 2 1
3 2 1 2
4 2 2 2
Dataset D:
Obs Y K X
1 1 1 1
2 1 2 2
3 2 1 3
4 2 2 4
Could someone please explain this?
The two SET A statements in the first data step are independent. So on each iteration of the data step they will both read the same observation. So it is as if you ran this step instead.
data c;
set b;
set a;
do k=1 to 2; output; end;
run;
The SET A statement in the second data step will execute twice on the first iteration of the data step. So it will read two observations from A for each iteration of the data step.
If you really wanted to do a cross-join you would need to use point= option so that you could re-read one of the data sets.
data want ;
set b ;
do p=1 to nobs ;
set a point=p nobs=nobs ;
output;
end;
run;
Your Table B has two obs so your code will only do two iterations:
Every time you read a new observation K resets to 1, Solution: use Retain keyword.
When your current records is OBS 1 and you do an output, you will keep outputting the first row from each table, that's why you output the first and second rows twice from table A.
Debugging:
Iteration 1 current view:
Obs Table X
1 A 1
Obs Table Y k
1 B 1 1
Output:
K=1; DO; SET A; OUTPUT; END;
Obs Y K X
1 1 1 1
K=K+1; DO; SET A; OUTPUT; END;
Obs Y K X
2 1 2 1
Iteration 2 current view:
Obs Table X
2 A 2
Obs Table Y k
2 B 2 1
Output:
K=1; DO; SET A; OUTPUT; END;
Obs Y K X
3 2 1 2
K=K+1; DO; SET A; OUTPUT; END;
Obs Y K X
4 2 2 2
I try to construct Table 2 by writing below SAS code but what I get is the Table 1. I could not figure out what I missed. Help very appreciated Thank you.
&counter = 4
data new;set set1;
total = 0;
a = 1;
do i = 1 to &counter;
call symputX('a',a);
total = total + Tem_&a.;
a = symget('a')+1;
call symputX('a',a);
end;
run;
Table 1
ID Amt Tem_1 Tem_2 Tem_3 Tem_4 total
4 500 1 4 5 900 3600
5 200 50 100 200 0 0
9 50 40 0 0 0 0
10 500 70 100 250 0 0
Table 2
ID Amt Tem_1 Tem_2 Tem_3 Tem_4 total
4 500 1 4 5 900 910
5 200 50 100 200 0 350
9 50 40 0 0 0 40
10 500 70 100 250 0 420
You cannot use SYMPUT and SYMGET that way, unfortunately. While you can use them to store/retrieve macro variable values, you cannot change the code sent to the compiler after execution.
Basically, SAS has to figure out the machine code for what it's supposed to do on every iteration of the data step loop before it looks at any data (this is called compiling). So the problem is, you can't define tem_&a. and expect to be allowed to change what _&a. is during execution, because it would change what that machine code needs to do, and SAS couldn't prepare for that sufficiently.
So, what you wrote the &a. would be resolved when the program compiled, and whatever value &a. had before your data step woudl be what tem_&a. would turn into. Presumably the first time you ran this it errored (&a. does not resolve and then an error about & being illegal in variable names), and then eventually the call symput did its job and &a got a 4 in it at the end of the loop, and forever more your tem_&a. resolved to tem_4.
The solution? Don't use macros for this. Instead, use arrays.
data new;
set set1;
total = 0;
array tem[&counter.] tem_1-tem_&counter.;
a = 1;
do i = 1 to &counter; *or do i = 1 to dim(tem);
total = total + Tem[i];
end;
run;
Or, of course, just directly sum them.
data new;
set set1;
total = sum(of tem_1-tem_4);
run;
If you REALLY like macro variables, you could of course do this in a macro do loop, though this is not recommended for this purpose as it's really better to stick with data step techniques. But this should work, anyway, if you run this inside a macro (this won't be valid in open code).
data new;
set set1;
total = 0;
%do i = 1 %to &counter;
total = total + Tem_&i.;
%end;
run;
Say that I have the following database:
Min Rank Qty
2 1 100
2 2 90
2 3 80
2 4 70
5 1 110
5 2 100
5 3 90
5 4 80
5 5 70
7 1 120
7 2 110
7 3 100
7 4 90
I need to have the database with the continuous values for minutes like this:
Min Rank Qty
2 1 100
2 2 90
2 3 80
2 4 70
3 1 100
3 2 90
3 3 80
3 4 70
4 1 100
4 2 90
4 3 80
4 4 70
5 1 110
5 2 100
5 3 90
5 4 80
5 5 70
6 1 110
6 2 100
6 3 90
6 4 80
6 5 70
7 1 120
7 2 110
7 3 100
7 4 90
How can I do this in SAS? I just need to replicate the previous minute. The number of observations per minute varies...it can be 4 or 5 or more.
It is not that hard to imagine code that would do this, the problem is that it quickly starts to look messy.
If your dataset is not too large, one approach you could consider the following approach:
/* We find all gaps. the output dataset is a mapping: the data of which minute (reference_minute) do we need to create each minute of data*/
data MINUTE_MAPPING (keep=current_minute reference_minute);
set YOUR_DATA;
by min;
retain last_minute 2; *set to the first minute you have;
if _N_ NE 1 and first.min then do;
/* Find gaps, map them to the last minute of data we have*/
if last_minute+1 < min then do;
do current_minute=last_minute+1 to min-1;
reference_minute=last_minute;
output;
end;
end;
/* For the available data, we map the minute to itself*/
reference_minute=min;
current_minute=min;
output;
*update;
last_minute=min;
end;
run;
/* Now we apply our mapping to the data */
*you must use proc sql because it is a many-to-many join, data step merge would give a different outcome;
proc sql;
create table RESULT as
select YD.current_minute as min, YD.rank, YD.qty
MINUTE_MAPPING as MM
join YOUR_DATA as YD
on (MM.reference_minute=YD.min)
;
quit;
The more performant approach would involve trickery with arrays.
But i find this approach a bit more appealing (disclaimer: at first thought), it is quicker to grasp (disclaimer again: imho) for someone else afterwards.
For good measure, the array approach:
data RESULT (keep=min rank qty);
set YOUR_DATA;
by min;
retain last_minute; *assume that first record really is first minute;
array last_data{5} _TEMPORARY_;
if _N_ NE 1 and first.min and last_minute+1 < min then do; *gap found;
do current_min=last_minute+1 to min-1;
*store data of current record;
curr_min=min;
curr_rank=rank;
curr_qty=qty;
*produce records from array with last available data;
do iter=1 to 5;
min = current_minute;
rank = iter;
qty = last_data{iter};
if qty NE . then output; *to prevent output of 5th element where there are only 4;
end;
*put back values of actual current record before proceeding;
min=curr_min;
rank=curr_rank;
qty=curr_qty;
end;
*update;
last_minute=min;
end;
*insert data for use on later missing minutes;
last_data{rank}=qty;
if last.min and rank<5 then last_data{5}=.;
output; *output actual current data point;
run;
Hope it helps.
Note, currently no access to a SAS client where i am. So untested code, might contain a couple of typo's.
Unless you have an absurd number of observations, I think transposing would make this easy.
I don't have access to sas at the moment so bear with me (I can test it out tomorrow if you can't get it working).
proc transpose data=data out=data_wide prefix=obs_;
by minute;
id rank;
var qty;
run;
*sort backwards so you can use lag() to fill in the next minute;
proc sort data=data_wide;
by descending minute;
run;
data data_wide; set data_wide;
nextminute = lag(minute);
run;
proc sort data=data_wide;
by minute;
run;
*output until you get to the next minute;
data data_wide; set data_wide;
*ensure that the last observation is output;
if nextminute = . then output;
do until (minute ge nextminute);
output;
minute+1;
end;
run;
*then you probably want to reverse the transpose;
proc transpose data=data_wide(drop=nextminute)
out=data_narrow(rename=(col1=qty));
by minute;
var _numeric_;
run;
*clean up the observation number;
data data_narrow(drop=_NAME_); set data_narrow;
rank = substr(_NAME_,5)*1;
run;
Again, I can't test this now, but it should work.
Someone else may have a clever solution that makes it so you don't have to reverse-sort/lag/forward-sort. I feel like I have dealt with this before but the obvious solution for me right now is to have it sorted backwards at whatever prior sort you do (you can do the transpose with a descending sort no problem) to save you an extra sort.
How can we do iteration in a sas dataset.
For example I have chosen the first. of a variable.
And want to find the occurence of a particular condition and set a value when it satisfy
SAS data step has a built-in loop over observations. You don't have to do any thing, unless you want to, for some reason. For instance, the following generates a random number for each observation:
data one;
set sashelp.class;
rannum = ranuni(0);
run;
If you want to loop over variables, then there are arrays. For example, the following initializes variables, var1 to var10, with random numbers:
data one;
array vars[1:10] var1-var10;
do i = 1 to 10;
vars[i] = ranuni(0);
end;
run;
The first. and last. flags are automatically generated when you set a (sorted) data with a by statement. An example:
proc sort data=sashelp.class out=class;
by age;
run;
data one;
set class;
by age;
first = first.age;
last = last.age;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs Name Age first last
1 Joyce 11 1 0
2 Thomas 11 0 1
3 James 12 1 0
4 Jane 12 0 0
5 John 12 0 0
6 Louise 12 0 0
7 Robert 12 0 1
8 Alice 13 1 0
...
18 William 15 0 1
19 Philip 16 1 1
*/