Players from 1 to 50 are placed in a row in order. The coach said: "Odds number athletes out!" The remaining athletes re-queue and re-number. The coach ordered again: "Odds number athletes out!" In this way, there is only one person left at last. What number of athletes is he? What if the coach's keep ordering "Even number athletes out!" Who is left at the end?
I know it requires me to use loop in SAS to answer the question. But can only write code below:
data a;
do i=1 to 50;
output;
end;
run;
proc sql;
select i
from a
where mod(i,2**5)=0;
quit;
But it won't work for keeping the last odd number athelete. Could you guys figure out a way to simulate this process by using loop? Thanks so much
#Doris welcome :-)
Try this. The Final_Player data set contains the number of the final player in the simulation.
Simply change the mod(N, 2) = 0 to = 1 for the even problem. Feel free to ask.
data _null_;
dcl hash h(ordered : 'y');
h.definekey('p');
h.definedone();
dcl hiter ih('h');
dcl hash i(ordered : 'Y');
i.definekey('id');
i.definedone();
dcl hiter ii('i');
do p = 1 to 50;
h.add();
end;
id = .;
do while (h.num_items > 1);
do _N_ = 1 by 1 while (ih.next() = 0);
if mod(_N_, 2) = 1 then do;
i.add(key : p, data : p);
end;
end;
do while (ii.next() = 0);
rc = h.remove(key : id);
end;
i.clear();
end;
h.output(dataset : 'Final_Player');
run;
Just use algebra.
want = 2 ** floor( log2(n) );
So if you are starting with an arbitrary dataset you can find the one observation you need directly.
data want;
point = 2**floor(log2(nobs));
set a point=point nobs=nobs;
output;
stop;
put i= ;
run;
Here is example using array showing how it works.
373 data test;
374 array x [15];
375 do index=1 to dim(x); x[index]=index; end;
376 do iteration=1 by 1 while(n(of x[*])>1);
377 do index= 2**(iteration-1) to dim(x) by 2**iteration ;
378 x[index]=.;
379 end;
380 put iteration= (x[*]) (3.);
381 end;
382 do index=1 to dim(x) until(x[index] ne .);
383 end;
384 put index= x[index]= ;
385
386 run;
iteration=1 . 2 . 4 . 6 . 8 . 10 . 12 . 14 .
iteration=2 . . . 4 . . . 8 . . . 12 . . .
iteration=3 . . . . . . . 8 . . . . . . .
index=8 x8=8
Related
I am trying to run the following code, that calculates the slope of each variables' timeseries.
I need to drop the variables created with an array, because I use the same logic for other functions.
Nevertheless the output data keeps the variables ys_new&i._: and I get the warning: The variable 'ys_new3_:'n in the DROP, KEEP, or RENAME list has never been referenced.
I think the iterator is evaluated to 3 in the %do %while block.
If someone can help me, I will really apreciated it.
DATA HAVE;
INPUT ID N_TRX_M0-N_TRX_M12 TRANSACTION_AMT_M0-TRANSACTION_AMT_M12;
DATALINES;
1 3 6 3 3 7 8 6 10 5 5 8 7 7 379866 856839 307909 239980 767545 511806 603781 948936 566114 402214 844657 2197164 817390
2 51 56 55 73 48 57 54 53 55 52 49 72 53 6439314 7367157 4614827 9465017 3776064 3661525 7870605 3971889 4919128 10024385 4660264 7748467 7339863
3 5 . . . . . . . . . . . . 232165 . . . . . . . . . . . .
;
RUN;
%Macro slope(variables)/parmbuff;
%let i = 1;
/* Get the first Parameter */
%let parm_&i = %scan(&syspbuff,&i,%str( %(, %)));
%do %while (%str(&&parm_&i.) ne %str());
array ys&i(12) &&parm_&i.._M12 - &&parm_&i.._M1;
array ys_new&i._[12];
/* Corre los valores missing*/
k = 1;
do j = 1 to 12;
if not(missing(ys&i(j))) then do;
ys_new&i._[k] = ys&i[j];
k + 1;
end;
end;
nonmissing = n(of ys_new&i._{*});
xbar = (nonmissing + 1)/2;
if nonmissing ge 2 then do;
ybar = mean(of ys&i(*));
cov = 0;
varx = 0;
do m=1 to nonmissing;
cov=sum(cov, (m-xbar)*(ys_new&i._(m)-ybar));
varx=sum(varx, (m-xbar)**2);
end;
slope_&&parm_&i. = cov/varx;
end;
%let i = %eval(&i+1);
/* Get next parm */
%let parm_&i = %scan(&syspbuff ,&i, %str( %(, %)));
%end;
drop ys_new&i._: k j m nonmissing ybar xbar cov varx;
%mend;
%let var_slope =
N_TRX,
TRANSACTION_AMT
;
DATA FEATURES;
SET HAVE;
%slope(&var_slope)
RUN;
The simplest solution is to generate the DROP statement before the macro has a chance to change the value of the macro variable I .
array ys&i(12) &&parm_&i.._M12 - &&parm_&i.._M1;
array ys_new&i._[12];
drop ys_new&i._: k j m nonmissing ybar xbar cov varx;
You could use a _TEMPORARY_ array instead, but then you need to remember to clear the values on each iteration of the data step.
array ys_new&i._[12] _temporary_;
call missing(of ys_new&i._[*]);
Then you can leave the DROP statement at the end if you want.
drop k j m nonmissing ybar xbar cov varx;
You are correct. &i is 3 after it exits the do loop leading to the drop statement giving a warning that ys_new3_: does not exist. Instead, consider using a temporary array to avoid the drop statement altogether:
array ys_new&i._[12] _TEMPORARY_;
I have a dataset that looks like:
Hour Flag
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
I want to have an output dataset like:
Total_Hours Count
2 2
3 1
4 1
As you can see, I want to count the number of hours included in each period with consecutive "1s". A missing value ends the consecutive sequence.
How should I go about doing this? Thanks!
You'll need to do this in two steps. First step is making sure the data is sorted properly and determining the number of hours in a consecutive period:
PROC SORT DATA = <your dataset>;
BY hour;
RUN;
DATA work.consecutive_hours;
SET <your dataset> END = lastrec;
RETAIN
total_hours 0
;
IF flag = 1 THEN total_hours = total_hours + 1;
ELSE
DO;
IF total_hours > 0 THEN output;
total_hours = 0;
END;
/* Need to output last record */
IF lastrec AND total_hours > 0 THEN output;
KEEP
total_hours
;
RUN;
Now a simple SQL statement:
PROC SQL;
CREATE TABLE work.hour_summary AS
SELECT
total_hours
,COUNT(*) AS count
FROM
work.consecutive_hours
GROUP BY
total_hours
;
QUIT;
You will have to do two things:
compute the run lengths
compute the frequency of the run lengths
For the case of using the implict loop
Each run length occurnece can be computed and maintained in a retained tracking variable, testing for a missing value or end of data for output and a non missing value for run length reset or increment.
Proc FREQ
An alternative is to use an explicit loop and a hash for frequency counts.
Example:
data have; input
Hour Flag; datalines;
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
;
data _null_;
declare hash counts(ordered:'a');
counts.defineKey('length');
counts.defineData('length', 'count');
counts.defineDone();
do until (end);
set have end=end;
if not missing(flag) then
length + 1;
if missing(flag) or end then do;
if length > 0 then do;
if counts.find() eq 0
then count+1;
else count=1;
counts.replace();
length = 0;
end;
end;
end;
counts.output(dataset:'want');
run;
An alternative
data _null_;
if _N_ = 1 then do;
dcl hash h(ordered : "a");
h.definekey("Total_Hours");
h.definedata("Total_Hours", "Count");
h.definedone();
end;
do Total_Hours = 1 by 1 until (last.Flag);
set have end=lr;
by Flag notsorted;
end;
Count = 1;
if Flag then do;
if h.find() = 0 then Count+1;
h.replace();
end;
if lr then h.output(dataset : "want");
run;
Several weeks ago, #Richard taught me how to use DOW-loop and direct addressing array. Today, I give it to you.
data want(keep=Total_Hours Count);
array bin[99]_temporary_;
do until(eof1);
set have end=eof1;
if Flag then count + 1;
if ^Flag or eof1 then do;
bin[count] + 1;
count = .;
end;
end;
do i = 1 to dim(bin);
Total_Hours = i;
Count = bin[i];
if Count then output;
end;
run;
And Thanks Richard again, he also suggested me this article.
I have a dataset of money earned as a % every week in 2017 to 2018. Some don't have data at the start of 2017 as they didn't start earning until later on. The weeks are numbered as 201701, 201702 - 201752 and 201801 - 201852.
What I'd like to do is have 104 new variables called WEEK0 - WEEK103, where WEEK0 will have the first non empty column value of the money earned columns. Here is an example of the data:
MON_EARN_201701 MON_EARN_201702 MON_EARN_201703 MON_EARN_201704
30 21 50 65
. . 30 100
. 102 95 85
Then I want my data to have the following columns (example)
WEEK0 WEEK1 WEEK2 WEEK3
30 21 50 65
30 100 . .
102 95 85 .
These are just small examples of a very large dataset.
I was thinking I'd need to try and do some sort of do loops so what I've tried so far is:
DATA want;
SET have;
ARRAY mon_earn{104} mon_earn_201701 - mon_earn_201752 mon_earn_201801 -mon_earn_201852;
ARRAY WEEK {104} WEEK0 - WEEK103;
DO i = 1 to 104;
IF mon_earn{i} NE . THEN;
WEEK{i} = mon_earn{i};
END;
END;
RUN;
This doesn't work as it doesn't fill the WEEK0 when the first value is empty.
If anymore information is needed please comment and I will add it in.
Sounds like you just need to find the starting point for copying.
First look thru the list of earnings by calendar month until you find the first non missing value. Then copy the values starting from there into you new array of earnings by relative month.
data want;
set have;
array mon_earn mon_earn_201701 -- mon_earn_201852;
array week (104);
do i = 1 to dim(mon_earn) until(found);
if mon_earn{i} ne . then found=1;
end;
do j=1 to dim(week) while (i+j<dim(mon_earn));
week(j) = mon_earn(i+j-1);
end;
run;
NOTE: I simplified the ARRAY definitions. For the input array I assumed that the variables are defined in order so that you could use positional array list. For the WEEK array SAS and I both like to start counting from one, not zero.
You could do this if it was a long format. There's a chance you don't need it while in a long format.
proc sort data=have;
by ID week;
run;
data want;
set have;
by id; *for each group/id counter;
retain counter;
if first.id then counter=0;
if counter=0 and not missing(value) then do;
counter=1; new_week=0; end;
if counter = 1 then new_week+1;
run;
If you really need it wide:
Find first not missing value and store index in i
Loop from i to end of week dimension
Assign week to mon_earned from i to end of week.
data want;
set have;
array mon_earned(*) .... ;
array week(*) ... ;
found=0; i=0;
do while(found=0);
if not missing(mon_earned(i)) then found=1;
i+1;
end;
z=0;
do j=i to dim(week);
week(z) = mon_earned(j);
z+1;
end;
run;
You need a second index variable, call it j, to target the proper week assignment. j is only incremented when a months earning is not missing.
This example code will 'squeeze out` all missing earnings; even those missing earnings that occurring after some earning has occurred. For example
earning: . . . 10 . 120 . 25 … will squeeze to
week: 10 120 25 …
data have;
array earn earn_201701-earn_201752 earn_201801-earn_201852;
do _n_ = 1 to 1000;
call missing (of earn(*));
do _i_ = 1 + 25 * ranuni(123) to dim(earn);
if ranuni(123) < 0.95 then
earn(_i_) = round(10 + 125 * ranuni(123));
end;
output;
end;
run;
data want;
set have;
array earn earn_201701-earn_201752 earn_201801-earn_201852;
array week(0:103);
j = -1;
do i = 1 to dim(earn);
if not missing(earn(i)) then do;
j+1;
week(j) = earn(i);
end;
end;
drop i j;
run;
If you want to maintain interior missing earnings the logic would be
if not missing(earn(i)) or j >=0 then do;
j+1;
week(j) = earn(i);
end;
I have observations with column ID, a, b, c, and d. I want to count the number of unique values in columns a, b, c, and d. So:
I want:
I can't figure out how to count distinct within each row, I can do it among multiple rows but within the row by the columns, I don't know.
Any help would be appreciated. Thank you
********************************************UPDATE*******************************************************
Thank you to everyone that has replied!!
I used a different method (that is less efficient) that I felt I understood more. I am still going to look into the ways listed below however to learn the correct method. Here is what I did in case anyone was wondering:
I created four tables where in each table I created a variable named for example ‘abcd’ and placed a variable under that name.
So it was something like this:
PROC SQL;
CREATE TABLE table1_a AS
SELECT
*
a as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table2_b AS
SELECT
*
b as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table3_c AS
SELECT
*
c as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table4_d AS
SELECT
*
d as abcd
FROM table_I_have_with_all_columns
;
QUIT;
Then I stacked them (this means I have duplicate rows but that ok because I just want all of the variables in 1 column and I can do distinct count.
data ALL_STACK;
set
table1_a
table1_b
table1_c
table1_d
;
run;
Then I counted all unique values in ‘abcd’ grouped by ID
PROC SQL ;
CREATE TABLE count_unique AS
SELECT
My_id,
COUNT(DISTINCT abcd) as Count_customers
FROM ALL_STACK
GROUP BY my_id
;
RUN;
Obviously, it’s not efficient to replicate a table 4 times just to put a variables under the same name and then stack them. But my tables were somewhat small enough that I could do it and then immediately delete them after the stack. If you have a very large dataset this method would most certainly be troublesome. I used this method over the others because I was trying to use Procs more than loops, etc.
A linear search for duplicates in an array is O(n2) and perfectly fine for small n. The n for a b c d is four.
The search evaluates every pair in the array and has a flow very similar to a bubble sort.
data have;
input id a b c d; datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
55 . . . .
66 1 2 3 4
run;
The linear search for duplicates will occur on every row, and the count_distinct will be initialized automatically in each row to a missing (.) value. The sum function is used to increment the count when a non-missing value is not found in any prior array indices.
* linear search O(N**2);
data want;
set have;
array x a b c d;
do i = 1 to dim(x) while (missing(x(i)));
end;
if i <= dim(x) then count_distinct = 1;
do j = i+1 to dim(x);
if missing(x(j)) then continue;
do k = i to j-1 ;
if x(k) = x(j) then leave;
end;
if k = j then count_distinct = sum(count_distinct,1);
end;
drop i j k;
run;
Try to transpose dataset, each ID becomes one column, frequency each ID column by option nlevels, which count frequency of value, then merge back with original dataset.
Proc transpose data=have prefix=ID out=temp;
id ID;
run;
Proc freq data=temp nlevels;
table ID:;
ods output nlevels=count(keep=TableVar NNonMisslevels);
run;
data count;
set count;
ID=compress(TableVar,,'kd');
drop TableVar;
run;
data want;
merge have count;
by id;
run;
one more way using sortn and using conditions.
data have;
input id a b c d; datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
55 . . . .
66 1 2 3 4
77 . 3 . 4
88 . 9 5 .
99 . . 2 2
76 . . . 2
58 1 1 . .
50 2 . 2 .
66 2 . 7 .
89 1 1 1 .
75 1 2 3 .
76 . 5 6 7
88 . 1 1 1
43 1 . . 1
31 1 . . 2
;
data want;
set have;
_a=a; _b=b; _c=c; _d=d;
array hello(*) _a _b _c _d;
call sortn(of hello(*));
if a=. and b = . and c= . and d =. then count=0;
else count=1;
do i = 1 to dim(hello)-1;
if hello(i) = . then count+ 0;
else if hello(i)-hello(i+1) = . then count+0;
else if hello(i)-hello(i+1) = 0 then count+ 0;
else if hello(i)-hello(i+1) ne 0 then count+ 1;
end;
drop i _:;
run;
You could just put the unique values into a temporary array. Let's convert your photograph into data.
data have;
input id a b c d;
datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
;
So make an array of the input variables and another temporary array to hold the unique values. Then loop over the input variables and save the unique values. Finally count how many unique values there are.
data want ;
set have ;
array unique (4) _temporary_;
array values a b c d ;
call missing(of unique(*));
do _n_=1 to dim(values);
if not missing(values(_n_)) then
if not whichn(values(_n_),of unique(*)) then
unique(_n_)=values(_n_)
;
end;
count=n(of unique(*));
run;
Output:
Obs id a b c d count
1 11 2 3 4 4 3
2 22 1 8 1 1 2
3 33 6 . 1 2 3
4 44 . 1 1 . 1
I have the following dataset
data have;
input SUBJID VISIT$ PARAMN ABLF$ AVAL;
cards;
1 screen 1 . 151
1 random 1 YES .
1 visit1 1 . .
1 screen 2 . 65.5
1 random 2 YES 65
1 visit1 2 . .
1 screen 3 . .
1 random 3 YES 400
1 visit1 3 . 420
;
run;
I want to create another variable called BASE that captures the value of AVAL (when there is an actual value in place) when ABLF=YES and and then drag it down until a new PARAMN is encountered.
Basically I want the output to look like this
SUBJID VISIT$ PARAMN ABLF$ AVAL BASE;
1 screen 1 . 151 .
1 random 1 YES . .
1 visit1 1 . . .
1 screen 2 . 65.5 65
1 random 2 YES 65 65
1 visit1 2 . . 65
1 screen 3 . . 400
1 random 3 YES 400 400
1 visit1 3 . 420 400
I used the the following code
data want;
set have;
by SUBJID PARAMN;
if first.PARAMN and ABLF=' ' then BASE=.;
if ABLF='YES' then BASE=AVAL;
retain BASE;
run;
however when I run this I don't the data to look exactly as I want above
RETAIN does not look like the right tool for this. RETAIN can only move data forward in the file. It cannot move it backwards.
Looks like there is just one observation with the "BASE" value. So just merge it back onto the data.
data want;
merge have
have(keep=subjid paramn aval ablf rename=(aval=BASE ablf=xx)
where=(xx='YES'))
;
by SUBJID PARAMN;
drop xx;
run;
Pro SQL:
proc sql;
select a.*,b.aval as BASE from have a left join have(drop=visit where=(ablf='YES')) b
on a.subjid=b.subjid and a.paramn=b.paramn;
quit;
Double do loop:
data want;
do until(last.visit);
set have;
retain temp;
by subjid paramn notsorted;
if ablf='YES' then temp=aval;
end;
do until(last.visit);
set have;
by subjid paramn notsorted;
base=temp;
end;
drop temp;
run;