Trying to use the LAG function in SAS to replicate a piece of code in a migration into SAS DI, however there doesn't seem to be the same function in SAS DI at all.
Using SAS DI 4.21 currently, with a view to move up to 4.9 soon.
So my question is, is there an alternative way of replicating the following code in SAS DI:
DATA work.dm_chg_bal;
SET tmp_bal_chg;
FORMAT dt2 date9.;
acct_id2 = LAG1(acct_id);
app_suf2 = LAG1(app_suf);
dt2 = LAG1(start_dt);
RUN;
Cheers,
For this I would use a User Written transform. There is really no issue with doing this so long as you are diligent enough to perform the variable mapping - hence retaining the data lineage in metadata.
An explanation of this is available here.
I don't know the DI Studio transformations well (I typically only use the User Written transform).
I wonder if there is a transformation you could trick into generating:
data work.dm_chg_bal;
set tmp_bal_chg;
output;
set tmp_bal_chg(rename=(acct_id=acct_id2 app_suf=app_suf2 start_dt=dt2));
run;
or
data work.dm_chg_bal;
if _n_ > 1 then set tmp_bal_chg(rename=(acct_id=acct_id2 app_suf=app_suf2 start_dt=dt2));
set tmp_bal_chg;
run;
If not, I'm sure there are data transformations that would allow you to make two copies of the dataset, one with ID=_n_ and the other with ID=_n_+1, and then merge by ID. That is, generate:
data main;
set tmp_bal_chg;
ID = _n_ ;
run;
data lag;
set tmp_bal_chg (rename=(acct_id=acct_id2 app_suf=app_suf2 start_dt=dt2));
ID = _n_ + 1;
run;
data work.dm_chg_bal;
merge main (in=a)
lag (keep=id acct_id2 app_suf2 dt2 in=b)
;
by id;
if a;
run;
Related
Is it possible to score a data set with a model created by PROC ARIMA in SAS?
This is the code I have that is not working:
proc arima data=work.data;
identify var=x crosscorr=(y(7) y(30));
estimate outest=work.arima;
run;
proc score data=work.data score=work.arima type=parms predict out=pred;
var x;
run;
When I run this code I get an error from the PROC SCORE portion that says "ERROR: Variable x not found." The x column is in the data set work.data.
proc score does not support autocorrelated variables. The simplest way to get an out-of-sample score is to combine both proc arima and a data step. Here's an example using sashelp.air.
Step 1: Generate historical data
We leave out the year 1960 as our score dataset.
data have;
set sashelp.air;
where year(date) < 1960;
run;
Step 2: Generate a model and forecast
The nooutall option tells proc arima to only produce the 12 future forecasts.
proc arima data=have;
identify var=air(12);
estimate p=1 q=(2) method=ml;
forecast lead=12 id=date interval=month out=forecast nooutall;
run;
Step 3: Score
Merge together your forecast and full historical dataset to see how well the model did. I personally like the update statement because it will not replace anything with missing values.
data want;
update forecast(in=fcst)
sashelp.air(in=historical);
by Date;
/* Generate fit statistics */
Error = Forecast-Air;
PctError = Error/Air;
AbsPctError = abs(PctError);
/* Helpful for bookkeeping */
if(fcst) then Type = 'Score';
else if(historical) then Type = 'Est';
format PctError AbsPctError percent8.2;
run;
You can take this code and convert it into a generalized macro for yourself. That way in the future, if you wanted to score something, you could simply call a macro program to get what you need.
What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;
I have a data set of patient information where I want to count how many patients (observations) have a given diagnostic code. I have 9 possible variables where it can be, in diag1, diag2... diag9. The code is V271. I cannot figure out how to do this with the "WHERE" clause or proc freq.
Any help would be appreciated!!
Your basic strategy to this is to create a dataset that is not patient level, but one observation is one patient-diagnostic code (so up to 9 observations per patient). Something like this:
data want;
set have;
array diag[9];
do _i = 1 to dim(diag);
if not missing(diag[_i]) then do;
diagnosis_Code = diag[_i];
output;
end;
end;
keep diagnosis_code patient_id [other variables you might want];
run;
You could then run a proc freq on the resulting dataset. You could also change the criteria from not missing to if diag[_i] = 'V271' then do; to get only V271s in the data.
An alternate way to reshape the data that can match Joe's method is to use proc transpose like so.
proc transpose data=have out=want(keep=patient_id col1
rename=(col1=diag)
where=(diag is not missing));
by patient_id;
var diag1-diag9;
run;
I need to create 100 copies of a data set (which has 3 variables) but one of the variables need to be assign randomly (1 through 1000)
I know I can use 100 data statement but I don't want to go down that road!
Let say I have data set A and want to create data set A1 to A100, I used the following code;
data A1--A100;
set A;
do i=1 to 1000;
var3=int(ranuni(0) * 1000 + 1);
output A1--A1000;
end;
run;
but SAS does not generate anything at all
You can't do it via any shortcut like that. You could use the macro language to create the 1000 dataset names and 1000 output statements.
However, more than likely you shouldn't do this. Instead, have one dataset with a BY variable, and then in whatever you're going to do (MCMC or whatever) use that BY variable with the BY statement.
data want;
set have;
do byvar=1 to 1000;
var3 = int(ranuni(7)*1000+1);
output;
end;
run;
Also, don't use ranuni(0). Always use a positive seed (and save it) so you can replicate your results.
Here is the answer, hope it could help;
data want;
set have;
do dset=1 to 101;
rand=ranuni(4011120);
if dset=1 then real=1; else real=0;
output;
end;
run;
proc sort data=want;
by dset rand;
run;
data want2;
set permut;
if real=0 then rank= mod(_N_,366);
if real then realrank=rank;
run;
proc sort data=want2;
by dset dayofyear;
run;
I currently have a dataset with 200 variables. From those variables, I created 100 new variables. Now I would like to drop the original 200 variables. How can I do that?
Slightly better would be, how I can drop variables 3-200 in the new dataset.
sorry if I was vague in my question but basically I figured out I need to use --.
If my first variable is called first and my last variable is called last, I can drop all the variables inbetween with (drop= first--last);
Thanks for all the responses.
As with most SAS tasks, there are several alternatives. The easiest and safest way to drop variables from a SAS data set is with PROC SQL. Just list the variables by name, separated by a comma:
proc sql;
alter table MYSASDATA
drop name, age, address;
quit;
Altering the table with PROC SQL removes the variables from the data set in place.
Another technique is to recreate the data set using a DROP option:
data have;
set have(drop=name age address);
run;
And yet another way is using a DROP statement:
data have;
set have;
drop name age address;
run;
Lots of options - some 'safer', some less safe but easier to code. Let's imagine you have a dataset with variables ID, PLNT, and x1-x200 to start with.
data have;
id=0;
plnt=0;
array x[200];
do _t = 1 to dim(x);
x[_t]=0;
end;
run;
data want;
set have;
*... create new 100 variables ... ;
*option 1:
drop x1-x200; *this works when x1-x200 are numerically consecutive;
*option 2:
drop x1--x200; *this works when they are physically in order on the dataset -
only the first and last matter;
run;
*Or, do it this way. This would also work with SQL ALTER TABLE. This is
the safest way to do it.;
proc sql;
select name into :droplist separated by ' ' from dictionary.columns
where libname='WORK' and memname='HAVE' and name not in ('ID','PRNT');
quit;
proc datasets lib=work;
modify want;
drop &droplist.;
quit;
If all of the variables you want to drop are named so they all start the same (like old_var_1, old_var_2, ..., old_var_n), you could do this (note the colon in drop option):
data have;
set have(drop= old_var:);
run;
data want;
set have;
drop VAR1--VARx;
run;
Would love to know if you can do this by position.
Definitely works with variable names separated by double dash (--).
I have some macros that would allow this here
You could run that whole set of macros, or just run list_vars(), is_blank(), num_words, find_word, remove_word, remove_words , nth_word().
Using these it would be:
%let keep_vars = keep_this and_this also_this;
%let drop_vars = %list_vars(old_dataset);
%let drop_vars = %remove_words(&drop_vars , &keep_vars);
data new_dataset (drop = &drop_vars );
set old_dataset;
/*stuff happens*/
run;
This will keep the three variables keep_this and_this also_this but drop everything else in the old dataset.