Create 100 copies of a dataset in SAS - sas

I need to create 100 copies of a data set (which has 3 variables) but one of the variables need to be assign randomly (1 through 1000)
I know I can use 100 data statement but I don't want to go down that road!
Let say I have data set A and want to create data set A1 to A100, I used the following code;
data A1--A100;
set A;
do i=1 to 1000;
var3=int(ranuni(0) * 1000 + 1);
output A1--A1000;
end;
run;
but SAS does not generate anything at all

You can't do it via any shortcut like that. You could use the macro language to create the 1000 dataset names and 1000 output statements.
However, more than likely you shouldn't do this. Instead, have one dataset with a BY variable, and then in whatever you're going to do (MCMC or whatever) use that BY variable with the BY statement.
data want;
set have;
do byvar=1 to 1000;
var3 = int(ranuni(7)*1000+1);
output;
end;
run;
Also, don't use ranuni(0). Always use a positive seed (and save it) so you can replicate your results.

Here is the answer, hope it could help;
data want;
set have;
do dset=1 to 101;
rand=ranuni(4011120);
if dset=1 then real=1; else real=0;
output;
end;
run;
proc sort data=want;
by dset rand;
run;
data want2;
set permut;
if real=0 then rank= mod(_N_,366);
if real then realrank=rank;
run;
proc sort data=want2;
by dset dayofyear;
run;

Related

how to generate unique random vector on each iteration?

I'm new to SAS, I would like to produce plot for each random numerical vector.
therefore I have wrapped my proc iml with a macro, and have tried to invoke it before calling the macro generate_scatter_plot. but I get the same set of points each iteration.
Can somebody please explain what is the proper way to do it SAS.
%MACRO generate_random_points();
proc iml;
N = 6;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
submit rands;
data my_data;
input x y ##;
datalines;
&rands
;
run;
endsubmit;
%MEND;
%MACRO generate_scatter_plot();
/* call execute('%generate_random_points();'); */
proc sgplot data=my_data;
scatter x=x y=y;
run;
%MEND;
data _null_;
do i = 1 to 20;
call execute('%generate_scatter_plot();');
end;
run;
I find SAS different from the rest of languages out there.
Thank you in advance to all who are willing to help!
IML is not needed, a data step loop can generate the random values
Assuming you're looking at learning macro programming
CALL EXECUTE is required in a data step but not outside the data step
CALL EXECUTE can also generate code similar to macro
MPRINT/MLOGIC options help when debugging macro code otherwise code is not displayed to log
The following expands a bit on your logic to demonstrate the functionality of macro's.
options mprint mlogic;
%macro generate_random_points(Num=);
*Macro to generate random numbers;
*number of points generated are equal to the NUM=parameter;
data my_data;
do i=1 to &num.;
x=rand('uniform');
y=rand('uniform');
output;
end;
run;
%mend;
%macro generate_scatter_plot(Num_Points=);
*create random data with specified points;
%generate_random_points(Num=&Num_Points);
*graph data;
proc sgplot data=my_data;
scatter x=x y=y;
run;
%MEND;
*Run macro with different parameters in loop;
data _null_;
do i = 3 to 5;
call execute(catt('%generate_scatter_plot(Num_Points=', i, ');'));
end;
run;
option nomprint nomlogic;
And a slight variation on your process:
data _null_;
do i = 3 to 5;
call execute(catt("Title 'Num Points = ", i, " '; ", ' %generate_scatter_plot(Num_Points=', i, ');'));
end;
run;
If you are working in IML you should not have any need to use the SAS macro language to generate code.
You already showed how you can generate the random numbers into a IML matrix.
And you can use the SUBMIT/ENDSUBMIT block to call your PROC SGPLOT code.
What you seem to be missing is the IML syntax for converting a matrix into a dataset. https://blogs.sas.com/content/iml/2011/04/18/writing-data-from-a-matrix-to-a-sas-data-set.html
proc iml;
N = 6;
x = t(1:N);
y = j(N,1);
call randgen(y, 'Uniform');
create my_data var {x y};
append;
close my_data;
submit;
proc sgplot data=my_data;
scatter x=x y=y;
run;
endsubmit;
quit;
Although you are using IML to pass data as text into a datalines statement, you really do not need to do this. There are simpler ways of achieving your goal.
SAS does everything through datasets. They're analogous to Data Frames in Pandas. If you want to create a random vector of data, you'll create it within a dataset and use that within other procedures. datalines should be avoided in production whenever possible. There are some very special cases where it is useful, but it's mainly used for sample data or prototyping.
SAS will randomly generate data based on the system clock unless you set a seed through call streaminit(). You should always get new points. A much simpler way to achieve your results is shown below. The below macro will generate a new random dataset and plot it each time you call it.
%macro generate_scatter_plot(n=100);
data random;
do i = 1 to &n;
x = rand('uniform');
y = rand('uniform');
output;
end;
drop i;
run;
proc sgplot data=random;
scatter x=x y=y;
run;
%mend;
%generate_scatter_plot(n=100);
%generate_scatter_plot(n=1000);

Drop variables with all-zero values from a SAS data set

I often work with a large number of variables that have zero or empty values only, but I could not find a SAS command to drop these unwanted variables. I know we can use SAS/IML, but I encountered such cases many times and would like to have a macro that may help me without having to type the variable names to avoid errors. Here is my code for removing variables with zero values only. It works to produce a cleaned output data set y from a raw data set x without using the names of the variables. I hope others could have a better solution or help me to make mine better.
%Macro dropZeroV(x, y) ;
proc means data = &x. ;
var _numeric_;
output out = sumTab ; run;
proc transpose data = sumTab(drop = _TYPE_) out= sumt; var _Numeric_; id _STAT_; run;
%let Vlst =;
proc sql noprint;
select _NAME_ into : dropLst separated by ' '
from sumT
where Max=0 and Min =0;
data &y.;
set &x.; drop &dropLst.;
run;
proc print data = &y.; run;
%Mend dropZeroV;
Use STACKODS and ODS SUMMARY to get the table in the format needed in one step rather than multiple steps. This limits it to the sum, since if the sum = 0, all values are 0. You may also want to look at rounding to avoid any issues with numeric precision.
PROC MEANS + PROC TRANSPOSE go to :
ods select none;
proc means data= &x. stackods sum;
var _numeric_;
ods output summary = sumT;
run;

Create new variables from format values

What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;

Delete N highest from a dataset in sas

I have a bunch of sas datasets of various lengths and I need to trim the nth highest and lowest values by a variable value.
To do this for when I needed to trim the highest and lowest I did this
DATA VDBP273_first_night_Systolic;
SET VDBP273_first_night end=eof;
IF _N_ =1 then delete;
if eof then delete;
run;
And it worked fine.
Now I need to do something more like this
PROC SORT DATA=foo OUT=foo_sorted;
BY bar;
run;
DATA foo_out;
SET foo_sorted end=eof;
IF _N_ <= 5 then delete;
if eof *OR THE 4 right before it* then delete;
run;
I'm sure this is easy but it's stumping me. How can I say the last 5 of this sorted data set delete those?
Since you are presorting your data and then trying to eliminate top n and bottom n record, You can easily solve your problem using OBS= and FIRSTOBS= dataset option.
proc sql noprint;
select count(*) -4 into:counter from sashelp.class ;
quit;
proc sort data=sashelp.class out=have;by height;run;
proc print data=have;run;
data want;
set have(firstobs=6 obs=&counter);
run;
proc print data=want;run;
You can use the nobs= dataset option to store the total number of observations, which then means you can do something similar to your code to exclude the top/bottom n records.
I'd recommend putting the number of records to be excluded in a macro variable, it makes it easier to read and change than hard coding it.
%let excl = 6;
data want;
set sashelp.class nobs=numobs;
if &excl.< _n_ <=(numobs-&excl.);
run;
or simply do the same step done before, adding descending to the proc sort variable
proc sort data=have out=want; by var1 descending; run;

SAS - How to get last 'n' observations from a dataset?

How can you create a SAS data set from another dataset using only the last n observations from original dataset. This is easy when you know the value of n. If I don't know 'n' how can this be done?
This assumes you have a macro variable that says how many observations you want. NOBS tells you the number of observations in the dataset currently without reading the whole thing.
%let obswant=5;
data want;
set sashelp.class nobs=obscount;
if _n_ gt (obscount-&obswant.);
run;
Using Joe's example of a macro variable to specify the number of observations you want, here is another answer:
%let obswant = 10;
data want;
do _i_=nobs-(&obswant-1) to nobs;
set have point=_i_ nobs=nobs;
output;
end;
stop; /* Needed to stop data step */
run;
This should perform better since it only reads the specific observations you want.
If the dataset is large, you might not want to read the whole dataset. Instead you could try a construction that reads the total number of Observations in the dataset first. So if you want to have the last of observations:
data t;
input x;
datalines;
1
2
3
4
;
%let dsid=%sysfunc(open(t));
%let num=%sysfunc(attrn(&dsid,nlobs));
%let rc=%sysfunc(close(&dsid));
%let number = 2;
data tt;
set t (firstobs = %eval(&num.-&number.+1));
run;
For the sake of variety, here's another approach (not necessarily a better one)
%let obswant=5;
proc sql noprint;
select nlobs-&obswant.+1 into :obscalc
from dictionary.tables
where libname='SASHELP' and upcase(memname)='CLASS';
quit;
data want;
set sashelp.class (firstobs=&obscalc.);
run;
You can achive this using the
_nobs_ and _n_ variables. First, create a temporary variable to store the total no of obs. Then compare the automatic variable N to nobs.
data a;
set sashelp.class nobs=_nobs_;
if _N_ gt _nobs_ -5;
run;