Randomize n unique number - sas

No idea why, but I'm really struggling with this one.
I'm trying to get n unique numbers.
On this example, I want it to be 15 number;
%let maximum_draws = 15;
Whatever I tried (and I'm on this for couple of hours, I get duplicates).
Could someone please explain why?
data test;
array game(&maximum_draws);
game(1) = int(ranuni(0)*15+1);
do i = 2 to &maximum_draws;
rand = int(ranuni(0)*15+1);
do j = 1 to i-1;
if rand eq game(j) then do while (rand eq game(j));
rand = int(ranuni(0)*15+1);
end;
end;
game(i) = rand;
end;
run;

You can do a more efficient test to check whether the number has already been picked, using the not in operator:
data test;
array game(&maximum_draws);
do i = 1 to &maximum_draws;
do while (game(i) = .);
rand = int(ranuni(0)*15+1);
if rand not in game then game(i) = rand;
end;
end;
run;

Another option if you're sure you have a relatively small (ie, not billions or something) is to explicitly create the values and then pick from them.
%let maximum_draws=15;
%let draws=10;
data population;
do game = 1 to &maximum_Draws.;
output;
end;
run;
proc surveyselect data=population out=games n=&draws;
run;
SAS does the work for you this way.

Related

Mixing macro-DO-loops with data step DO-loops

Some context:
I have a string of digits (not ordered, but with known range 1 - 78) and I want to extract the digits to create specific variables with it, so I have
"64,2,3" => var_64 = 1; var_02 = 2; var_03 = 1; (the rest, like var_01 are all set to missing)
I basically came up with two solutions, one is using a macro DO loop and the other one a data step DO loop. The non-macro solution was to fist initialize all variables var_01 - var_78 (via a macro), then to put them into an array and then to gradually set the values of this array while looping through the string, word-by-word.
I then realized that it would be way easier to use the loop iterator as a macro variable and I came up with this MWE:
%macro fast(w,l);
do p = 1 to &l.;
%do j = 1 %to 9;
if &j. = scan(&w.,p,",") then var_0&j. = 1 ;
%end;
%do j = 10 %to 78;
if &j. = scan(&w.,p,",") then var_&j. = 1 ;
%end;
end;
%mend;
data want;
string = "2,4,64,54,1,4,7";
l = countw(string,",");
%fast(string,l);
run;
It works (no errors, no warnings, expected result) but I am unsure about mixing macro-DO-loops and non-macro-DO-loops. Could this lead to any inconsistencies or should I just stay with the non-macro solution?
Your current code is comparing numbers like 1 to strings like "1".
&j. = scan(&w.,p,",")
It will work as long as the strings can be converted into numbers, but it is not a good practice. It would be better to explicitly convert the strings into numbers.
input(scan(&w.,p,","),32.)
You can do what you want with an array. Use the number generated from the next item in the list as the index into the array.
data want;
string = "2,4,64,54,1,4,7";
array var_ var_01-var_78 ;
do index=1 to countw(string,",");
var_[input(scan(string,index,","),32.)]=1;
end;
drop index;
run;

Generate multiple lags through loops in SAS?

I'm trying to generate 20 lags for a variable.
To generate the first lag, I use the following statement:
data temp.data2;
set temp.data1;
by gvkey fyear;
lag1 = ifn(gvkey=lag(gvkey) and fyear=lag(fyear)+1,lag(mv),.);
lag2 = ifn(gvkey=lag(gvkey) and fyear=lag(fyear)+1,lag(lag1),.);
etc.
run;
Don't want to repeat 20 times. Is there a way to do this through a loop?
Thanks a lot!
You would have to maintain your own array of mv values and assign the lag values from that. The array would be bubbled for each row processed and reset at the start of an fyear group.
Example:
data have;
do gvkey = 1 to 5;
do fyear = 1 to 5;
do day = 1 to ifn(fyear=3, 10, 30);
mv = 366-day;
output;
end;
end;
end;
run;
data want;
set have;
by gvkey fyear;
array mvs(20) _temporary_;
array lags(20) lag1-lag20;
if first.fyear then call missing(of mvs(*));
* assign lags;
do _n_ = 1 to dim(lags);
lags(_n_) = mvs(_n_);
end;
* bubble mvs;
do _n_ = dim(lags) to 2 by -1;
mvs(_n_) = mvs(_n_-1);
end;
mvs(1) = mv;
run;

Automate check for number of distinct values SAS

Looking to automate some checks and print some warnings to a log file. I think I've gotten the general idea but I'm having problems generalising the checks.
For example, I have two datasets my_data1 and my_data2. I wish to print a warning if nobs_my_data2 < nobs_my_data1. Additionally, I wish to print a warning if the number of distinct values of the variable n in my_data2 is less than 11.
Some dummy data and an attempt of the first check:
%LET N = 1000;
DATA my_data1(keep = i u x n);
a = -1;
b = 1;
max = 10;
do i = 1 to &N - 100;
u = rand("Uniform"); /* decimal values in (0,1) */
x = a + (b-a) * u; /* decimal values in (a,b) */
n = floor((1 + max) * u); /* integer values in 0..max */
OUTPUT;
END;
RUN;
DATA my_data2(keep = i u x n);
a = -1;
b = 1;
max = 10;
do i = 1 to &N;
u = rand("Uniform"); /* decimal values in (0,1) */
x = a + (b-a) * u; /* decimal values in (a,b) */
n = floor((1 + max) * u); /* integer values in 0..max */
OUTPUT;
END;
RUN;
DATA _NULL_;
FILE "\\filepath\log.txt" MOD;
SET my_data1 NOBS = NOBS1 my_data2 NOBS = NOBS2 END = END;
IF END = 1 THEN DO;
PUT "HERE'S A HEADER LINE";
END;
IF NOBS1 > NOBS2 AND END = 1 THEN DO;
PUT "WARNING!";
END;
IF END = 1 THEN DO;
PUT "HERE'S A FOOTER LINE";
END;
RUN;
How can I set up the check for the number of distinct values of n in my_data2?
A proc sql way to do it -
%macro nobsprint(tab1,tab2);
options nonotes; *suppresses all notes;
proc sql;
select count(*) into:nobs&tab1. from &tab1.;
select count(*) into:nobs&tab2. from &tab2.;
select count(distinct n) into:distn&tab2. from &tab2.;
quit;
%if &&nobs&tab2. < &&nobs&tab1. %then %put |WARNING! &tab2. has less recs than &tab1.|;
%if &&distn&tab2. < 11 %then %put |WARNING! distinct VAR n count in &tab2. less than 11|;
options notes; *overrides the previous option;
%mend nobsprint;
%nobsprint(my_data1,my_data2);
This would break if you have to specify libnames with the datasets due to the .. And, you can use proc printto log to print it to a file.
For your other part as to just print the %put use the above as a call -
filename mylog temp;
proc printto log=mylog; run;
options nomprint nomlogic;
%nobsprint(my_data1,my_data2);
proc printto; run;
This won't print any erroneous text to SAS log other than your custom warnings.
#samkart provided perhaps the most direct, easily understood way to compare the obs counts. Another consideration is performance. You can get them without reading the entire data set if your data set has millions of obs.
One method is to use nobs= option in the set statement like you did in your code, but you unnecessarily read the data sets. The following will get the counts and compare them without reading all of the observations.
62 data _null_;
63 if nobs1 ne nobs2 then putlog 'WARNING: Obs counts do not match.';
64 stop;
65 set sashelp.cars nobs=nobs1;
66 set sashelp.class nobs=nobs2;
67 run;
WARNING: Obs counts do not match.
Another option is to get the counts from sashelp.vtable or dictionary.tables. Note that you can only query dictionary.tables with proc sql.

SAS: generate abstractly long and large dataset

Trying to do some performance testing
I can't figure out a macro
%generate(n_rows,n_cols);
that would generate a table with n_rows and n_cols, filled with random numbers/strings
I tried using this link:
http://bi-notes.com/2012/08/benchmark-io-performance/
But I quickly encounter a memory issue
Thanks!
Try this. I added a 2 input parameters. So now you have a number of numerics and a number of characters. Also the ability to define the output dataset name.
%macro generate(n_rows,n_num_cols,n_char_cols,outdata=test,seed=0);
data &outdata;
array nums[&n_num_cols];
array chars[&n_char_cols] $;
temp = "abcdefghijklmnopqrstuvwxyz";
do i=1 to &n_rows;
do j=1 to &n_num_cols;
nums[j] = ranuni(&seed);
end;
do j=1 to &n_char_cols;
chars[j] = substr(temp,ceil(ranuni(&seed)*18),8);
end;
output;
end;
drop i j temp;
run;
%mend;
%generate(10,10,10,outdata=test);

Add a dummy variable to a SAS data set which doesn't depend on existing data

I am really having trouble finding the answer to the following simple question. Suppose I have an existing data set with one column of 100 observations, and I want to add a variable which has the value 0 in rows 1-50, and 1 in rows 51-100. How can I do it? I tried:
data new_data;
set existing_data;
do i = 1 to 100;
if i <= 50 then new_variable = 0;
if i >= 51 then new_variable = 1;
end;
run;
but it doesn't work.
Yep there is a way, use the SAS internal variable _n_ for row number. Like this...
data new;
set existing;
if _n_<=50 then new_var=0;
if _n_>50 then new_var=1;
run;
Well, I figured out one way to do it.
data dummy;
do i = 1 to 50;
new_variable = 0;
output;
end;
do i = 1 to 50;
new_variable = 1;
output;
end;
run;
data new_data;
merge existing_data dummy;
run;
Is there a way to do it all in one go?