how to generate unique random vector on each iteration? - sas

I'm new to SAS, I would like to produce plot for each random numerical vector.
therefore I have wrapped my proc iml with a macro, and have tried to invoke it before calling the macro generate_scatter_plot. but I get the same set of points each iteration.
Can somebody please explain what is the proper way to do it SAS.
%MACRO generate_random_points();
proc iml;
N = 6;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
submit rands;
data my_data;
input x y ##;
datalines;
&rands
;
run;
endsubmit;
%MEND;
%MACRO generate_scatter_plot();
/* call execute('%generate_random_points();'); */
proc sgplot data=my_data;
scatter x=x y=y;
run;
%MEND;
data _null_;
do i = 1 to 20;
call execute('%generate_scatter_plot();');
end;
run;
I find SAS different from the rest of languages out there.
Thank you in advance to all who are willing to help!

IML is not needed, a data step loop can generate the random values
Assuming you're looking at learning macro programming
CALL EXECUTE is required in a data step but not outside the data step
CALL EXECUTE can also generate code similar to macro
MPRINT/MLOGIC options help when debugging macro code otherwise code is not displayed to log
The following expands a bit on your logic to demonstrate the functionality of macro's.
options mprint mlogic;
%macro generate_random_points(Num=);
*Macro to generate random numbers;
*number of points generated are equal to the NUM=parameter;
data my_data;
do i=1 to &num.;
x=rand('uniform');
y=rand('uniform');
output;
end;
run;
%mend;
%macro generate_scatter_plot(Num_Points=);
*create random data with specified points;
%generate_random_points(Num=&Num_Points);
*graph data;
proc sgplot data=my_data;
scatter x=x y=y;
run;
%MEND;
*Run macro with different parameters in loop;
data _null_;
do i = 3 to 5;
call execute(catt('%generate_scatter_plot(Num_Points=', i, ');'));
end;
run;
option nomprint nomlogic;
And a slight variation on your process:
data _null_;
do i = 3 to 5;
call execute(catt("Title 'Num Points = ", i, " '; ", ' %generate_scatter_plot(Num_Points=', i, ');'));
end;
run;

If you are working in IML you should not have any need to use the SAS macro language to generate code.
You already showed how you can generate the random numbers into a IML matrix.
And you can use the SUBMIT/ENDSUBMIT block to call your PROC SGPLOT code.
What you seem to be missing is the IML syntax for converting a matrix into a dataset. https://blogs.sas.com/content/iml/2011/04/18/writing-data-from-a-matrix-to-a-sas-data-set.html
proc iml;
N = 6;
x = t(1:N);
y = j(N,1);
call randgen(y, 'Uniform');
create my_data var {x y};
append;
close my_data;
submit;
proc sgplot data=my_data;
scatter x=x y=y;
run;
endsubmit;
quit;

Although you are using IML to pass data as text into a datalines statement, you really do not need to do this. There are simpler ways of achieving your goal.
SAS does everything through datasets. They're analogous to Data Frames in Pandas. If you want to create a random vector of data, you'll create it within a dataset and use that within other procedures. datalines should be avoided in production whenever possible. There are some very special cases where it is useful, but it's mainly used for sample data or prototyping.
SAS will randomly generate data based on the system clock unless you set a seed through call streaminit(). You should always get new points. A much simpler way to achieve your results is shown below. The below macro will generate a new random dataset and plot it each time you call it.
%macro generate_scatter_plot(n=100);
data random;
do i = 1 to &n;
x = rand('uniform');
y = rand('uniform');
output;
end;
drop i;
run;
proc sgplot data=random;
scatter x=x y=y;
run;
%mend;
%generate_scatter_plot(n=100);
%generate_scatter_plot(n=1000);

Related

Drop variables with all-zero values from a SAS data set

I often work with a large number of variables that have zero or empty values only, but I could not find a SAS command to drop these unwanted variables. I know we can use SAS/IML, but I encountered such cases many times and would like to have a macro that may help me without having to type the variable names to avoid errors. Here is my code for removing variables with zero values only. It works to produce a cleaned output data set y from a raw data set x without using the names of the variables. I hope others could have a better solution or help me to make mine better.
%Macro dropZeroV(x, y) ;
proc means data = &x. ;
var _numeric_;
output out = sumTab ; run;
proc transpose data = sumTab(drop = _TYPE_) out= sumt; var _Numeric_; id _STAT_; run;
%let Vlst =;
proc sql noprint;
select _NAME_ into : dropLst separated by ' '
from sumT
where Max=0 and Min =0;
data &y.;
set &x.; drop &dropLst.;
run;
proc print data = &y.; run;
%Mend dropZeroV;
Use STACKODS and ODS SUMMARY to get the table in the format needed in one step rather than multiple steps. This limits it to the sum, since if the sum = 0, all values are 0. You may also want to look at rounding to avoid any issues with numeric precision.
PROC MEANS + PROC TRANSPOSE go to :
ods select none;
proc means data= &x. stackods sum;
var _numeric_;
ods output summary = sumT;
run;

proc report print null dataset

I have a null dataset such as
data a;
if 0;
run;
Now I wish to use proc report to print this dataset. Of course, there will be nothing in the report, but I want one sentence in the report said "It is a null dataset". Any ideas?
Thanks.
You can test to see if there are any observations in the dataset first. If there are observations, then use the dataset, otherwise use a dummy dataset that looks like this and print it:
data use_this_if_no_obs;
msg = 'It is a null dataset';
run;
There are plenty of ways to test datasets to see if they contain any observations or not. My personal favorite is the %nobs macro found here: https://stackoverflow.com/a/5665758/214994 (other than my answer, there are several alternate approaches to pick from, or do a google search).
Using this %nobs macro we can then determine the dataset to use in a single line of code:
%let ds = %sysfunc(ifc(%nobs(iDs=sashelp.class) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
Here's some code showing the alternate outcome:
data for_testing_only;
if 0;
run;
%let ds = %sysfunc(ifc(%nobs(iDs=for_testing_only) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
I've used proc print to simplify the example, but you can adapt it to use proc report as necessary.
For the no data report you don't need to know how many observations are in the data just that there are none. This example shows how I would approach the problem.
Create example data with zero obs.
data class;
stop;
set sashelp.class;
run;
Check for no obs and add one obs with missing on all vars. Note that no observation are every read from class in this step.
data class;
if eof then output;
stop;
modify class end=eof;
run;
make the report
proc report data=class missing;
column _all_;
define _all_ / display;
define name / order;
compute before name;
retain_name=name;
endcomp;
compute after;
if not missing(retain_name) then l=0;
else l=40;
msg = 'No data for this report';
line msg $varying. l;
endcomp;
run;

Create 100 copies of a dataset in SAS

I need to create 100 copies of a data set (which has 3 variables) but one of the variables need to be assign randomly (1 through 1000)
I know I can use 100 data statement but I don't want to go down that road!
Let say I have data set A and want to create data set A1 to A100, I used the following code;
data A1--A100;
set A;
do i=1 to 1000;
var3=int(ranuni(0) * 1000 + 1);
output A1--A1000;
end;
run;
but SAS does not generate anything at all
You can't do it via any shortcut like that. You could use the macro language to create the 1000 dataset names and 1000 output statements.
However, more than likely you shouldn't do this. Instead, have one dataset with a BY variable, and then in whatever you're going to do (MCMC or whatever) use that BY variable with the BY statement.
data want;
set have;
do byvar=1 to 1000;
var3 = int(ranuni(7)*1000+1);
output;
end;
run;
Also, don't use ranuni(0). Always use a positive seed (and save it) so you can replicate your results.
Here is the answer, hope it could help;
data want;
set have;
do dset=1 to 101;
rand=ranuni(4011120);
if dset=1 then real=1; else real=0;
output;
end;
run;
proc sort data=want;
by dset rand;
run;
data want2;
set permut;
if real=0 then rank= mod(_N_,366);
if real then realrank=rank;
run;
proc sort data=want2;
by dset dayofyear;
run;

Using a vector generated in SAS/IML as a macro variable

I am writing a macro that will run PROC MIXED with the level-1 residual variance fixed to a near-zero value using the PARMS statement. I am trying to generate the bulk of the starting values for the PARMS statement using SAS/IML, something like:
%macro test (dataset= , classroom= , preds= , outcome=);
proc iml;
/*count number of variables*/
%let nvars = 0;
%do %while(%qscan(&preds,&nvars+1,%str( )) ne %str());
%let nvars = %eval(&nvars+1);
%end;
/*determine location of level-1 residual in the start value vector*/
%let error_location = %eval(((&nvars*(&nvars-1))/2)+&nvars+1);
/*create vector of start values from lower triangle of identity matrix*/
start_vector = symsqr(I(&nvars));
%let starts = %str(start_vector[label=""]);
/*analyze data*/
proc mixed data=&dataset noprofile method=ml;
class &classroom;
model &outcome = &preds /noint;
random &preds /type=un sub=&classroom g;
parms
&starts
.00000001 /hold= &error_location;
run;
quit;
%mend;
The code works fine without the PARMS statement in the PROC MIXED code. When I run the code as is, however, SAS apparently puts the literal string 'start_vector[label=""]' after PARMS rather than listing the values generated by IML.
How can I avoid this error and have SAS specify the values contained in START_VECTOR as starting values for the PARMS statement?
You should use the SYMPUT or SYMPUTX routines in SAS/IML to convert a vector to a macro variable.
This is one way to get a vector into a single string in a macro variable.
proc iml;
start = {"Hi","Bye"};
call symput("start",rowcat(start`));
%put &start;
quit;
With a numeric vector, you need to use char to convert it:
proc iml;
start_vector = j(5);
call symputx("start_vector",rowcat(char(j)));
%put &start_vector;
quit;
With a numeric matrix, you need to use SHAPE to flatten it:
proc iml;
start_vector = j(5,5);
call symputx("start_vector",rowcat(shape(char(start_vector),1)));
%put &start_vector;
quit;
Your problem and two solutions are discussed in the article "Passing values from PROC IML into SAS Procedures."
Do you have to wrap this in a macro? If so, the SUBMIT and ENDSUBMIT statements won't work, since they can't be called form a macro. However, since SAS/IML enables you to define and call modules with arguments, I usually avoid the macro language and define a module that takes arguments, then call the module directly.

SAS - How to get last 'n' observations from a dataset?

How can you create a SAS data set from another dataset using only the last n observations from original dataset. This is easy when you know the value of n. If I don't know 'n' how can this be done?
This assumes you have a macro variable that says how many observations you want. NOBS tells you the number of observations in the dataset currently without reading the whole thing.
%let obswant=5;
data want;
set sashelp.class nobs=obscount;
if _n_ gt (obscount-&obswant.);
run;
Using Joe's example of a macro variable to specify the number of observations you want, here is another answer:
%let obswant = 10;
data want;
do _i_=nobs-(&obswant-1) to nobs;
set have point=_i_ nobs=nobs;
output;
end;
stop; /* Needed to stop data step */
run;
This should perform better since it only reads the specific observations you want.
If the dataset is large, you might not want to read the whole dataset. Instead you could try a construction that reads the total number of Observations in the dataset first. So if you want to have the last of observations:
data t;
input x;
datalines;
1
2
3
4
;
%let dsid=%sysfunc(open(t));
%let num=%sysfunc(attrn(&dsid,nlobs));
%let rc=%sysfunc(close(&dsid));
%let number = 2;
data tt;
set t (firstobs = %eval(&num.-&number.+1));
run;
For the sake of variety, here's another approach (not necessarily a better one)
%let obswant=5;
proc sql noprint;
select nlobs-&obswant.+1 into :obscalc
from dictionary.tables
where libname='SASHELP' and upcase(memname)='CLASS';
quit;
data want;
set sashelp.class (firstobs=&obscalc.);
run;
You can achive this using the
_nobs_ and _n_ variables. First, create a temporary variable to store the total no of obs. Then compare the automatic variable N to nobs.
data a;
set sashelp.class nobs=_nobs_;
if _N_ gt _nobs_ -5;
run;