I have data set with probabilities to purchase a particular product per observation. Here is an example:
DATA probabilities;
INPUT id P_prod1 P_prod2 P_prod3 ;
DATALINES;
1 0.02 0.5 0.32
2 0.6 0.08 0.12
3 0.8 0.34 0.001
;
I need to calculate the median for each product. Here's how I do that:
%macro get_median (product);
proc means data=probabilities median;
var &product ;
output out=median_data (drop=_type _freq_) median=median;
run;
%mend;
At this point I can get the median for each product by calling
%get_median(P_product1);
Now, the last thing that I want to do is to assign the numeric result for the median to a macro variable. My best guess for how to do that would be something like:
%let med_P_prod1=%get_median(P_prod1);
but unfortunately that does not work.
Can someone help, please?
Cheers!
The simplest solution is to define a %global macro variable and set the let statement to the numeric result inside the macro.
%macro get_median (product);
proc means data=probabilities median;
var &product ;
output out=median_data (drop=_type _freq_) median=median;
run;
%global macroresult;
proc sql;
select median into :macroresult separated by ' ' from median_data;
quit;
%mend;
(That SQL statement is equivalent to LET in that it defines a macro variable, but it is better at getting results from data.)
I'd also recommend just using the dataset in your code rather than putting the value in a macro variable.
Related
I want to produce a macro that converts the variable names of a dataset into the variables' labels. I intend to apply this macro for large datasets where manually changing the variable names would be impractical.
I came across this code online from the SAS website, which looked promising but produced errors. I made slight edits to remove some errors. It now works for their sample dataset but not mine. Any assistance with improving this code to work with my sample dataset would be greatly appreciated!
SAS sample dataset (works with code):
data t1;
label x='this_x' y='that_y';
do x=1,2;
do y=3,4;
z=100;
output;
end;
end;
run;
My sample dataset (does not work with code):
data t1;
input number group;
label number = number_lab group = group_lab;
datalines;
1 1
1 .
2 1
2 .
3 2
3 .
4 1
4 .
5 2
5 .
6 1
6 .
;
run;
Code:
%macro chge(dsn);
%let dsid=%sysfunc(open(&dsn));
%let cnt=%sysfunc(attrn(&dsid,nvars));
%do i= 1 %to &cnt;
%let var&i=%sysfunc(varname(&dsid,&i));
%let lab&i=%sysfunc(varlabel(&dsid,&i));
%if lab&i = %then %let lab&i=&&var&i;
%end;
%let rc=%sysfunc(close(&dsid));
proc datasets;
modify &dsn;
rename
%do j = 1 %to &cnt;
%if &&var&j ne &&lab&j %then %do;
&&var&j=&&lab&j
%end;
%end;
quit;
run;
%mend chge;
%chge(t1)
proc contents;
run;
My code produces the following error messages:
ERROR 73-322: Expecting an =.
ERROR 76-322: Syntax error, statement will be ignored.
Mainly you are not closing the RENAME statement with a semi-colon. But it also looks like you have the RUN and QUIT statements in the wrong order.
But note that there is no need for that complex %sysfunc() macro code to get the list of names and labels. Since you are already generating a PROC DATASETS step your macro can generate other SAS code also. Then your macro will be clearer and easier to debug.
%macro chge(dsn);
%local rename ;
proc contents data=&dsn noprint out=__cont; run;
proc sql noprint ;
select catx('=',nliteral(name),nliteral(label))
into :rename separated by ' '
from __cont
where name ne label and not missing(label)
;
quit;
%if (&sqlobs) %then %do;
proc datasets nolist;
modify &dsn;
rename &rename ;
run;
quit;
%end;
%mend chge;
If the list of rename pairs is too long to fit into a single macro variable then you could resort to generating two series of macro variables using PROC SQL and then add back your %DO loop.
Here is SAS log from testing on your sample file.
4156 %chge(t1);
MPRINT(CHGE): proc contents data=t1 noprint out=__cont;
MPRINT(CHGE): run;
NOTE: The data set WORK.__CONT has 2 observations and 41 variables.
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.08 seconds
cpu time 0.01 seconds
MPRINT(CHGE): proc sql noprint ;
MPRINT(CHGE): select catx('=',nliteral(name),nliteral(label)) into :rename
separated by ' ' from __cont where name ne label and not missing(label) ;
MPRINT(CHGE): quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
MPRINT(CHGE): proc datasets nolist;
MPRINT(CHGE): modify t1;
MPRINT(CHGE): rename group=group_lab number=number_lab ;
NOTE: Renaming variable group to group_lab.
NOTE: Renaming variable number to number_lab.
MPRINT(CHGE): run;
NOTE: MODIFY was successful for WORK.T1.DATA.
MPRINT(CHGE): quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.12 seconds
cpu time 0.00 seconds
Note that if I now try to run it again on the modified dataset it does not rename anything.
4157 %chge(t1);
MPRINT(CHGE): proc contents data=t1 noprint out=__cont;
MPRINT(CHGE): run;
NOTE: The data set WORK.__CONT has 2 observations and 41 variables.
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MPRINT(CHGE): proc sql noprint ;
MPRINT(CHGE): select catx('=',nliteral(name),nliteral(label)) into :rename
separated by ' ' from __cont where name ne label and not missing(label) ;
NOTE: No rows were selected.
MPRINT(CHGE): quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.08 seconds
cpu time 0.00 seconds
you're missing a semi-colon here:
&&var&j=&&lab&j
if you still get an error, turn on these options and let us know where the error occurs.
options symbolgen mprint;
I have 10 different variables in 10 different tables with the VARNAME and MISSING PERCENT.
Out of these 10, lets say 5 do not have the "MISSING PERCENT" and I want to include these observation with 0% Missing. For now, it eliminates this observation in the final output.
data Final_Output_All_Missing;
length VARNAME $ 30;
merge work.Final_Output_MOLD work.Final_output_tbm_stage2
work.final_output_article7
work.final_output_tbm_stage1 work.final_output_bladder
work.final_output_batch_id;
by varname;
keep VARNAME PERCENT;
run;
VARNAME MISSING PERCENT
BLADDER 0.10
MOLD 0.06
TBM_STAGE1 0.18
TBM_STAGE2 99.9
Secondly, I have already merged the different tables containing different variables(0% still needs to be merged) as shown below:
After merging, I want to see the output in this format. is it possible for me to get in this format?
BLADDER MOLD TBM_STAGE1 TBM_STAGE2
1. 0.10% 0.06% 0.18% 99.9%
Appreciate your help!
The Code below will:
Replace missing values with 0 (I added missing values in the data)
Adds Percent format xx.xx%
Transpose the data
Code:
/*Create input data*/
data have;
informat VARNAME $10.;
input VARNAME $ MISSING_PERCENT ;
length VARNAME $10;
datalines;
BLADDER 0.10
MOLD 0.06
TBM_STAGE1 0.18
TBM_STAGE2 99.9
TBM_STAGE3 .
TBM_STAGE4 .
;;;;;;
run;
/*Replace missing values with 0, and Add % format */
proc sql;
create table work.new as
select
VARNAME,
coalesce(MISSING_PERCENT,0)/100 as MISSING_PERCENT format=percent8.2
from work.have;
quit;
/*Transpose Data*/
proc transpose data=work.new
out=work.want name=VARNAME;
id VARNAME;
run;
Input:
Formatted:
Output:
Thank you for the answer. It is perfectly giving what I want.
But, I do have a list of 160 Variables and I think to hard code it, will take more effort. What I did is I excluded the Variables which have 0% Missing Percent and extracted the ones which have missing percent.
This is how I did it to extract the Missing Percent:
> data Final_Output_&var;
set Final_Output_&var;
VARNAME = "&var";
%if &t=char %then
%do;
where put (&var., $missfmt.) in ("Missing");
%end;
%else
%do;
where put (&var., missfmt.) in ("Missing");
%end;
run;
Thanks again for your quick reply !!!
Best Regards,
Pankaj
I want to calculate the 95th percentile of a distribution. I think I cannot use proc means because I need the value, while the output of proc means is a table. I have to use the percentile to filter the dataset and create another dataset with only the observations greater than the percentile.
Clearly I don't want to use the numeric value..because I want to use it in a macro.
Don't put summary statistics into macro variables. You risk loss of precision.
This is based on your cryptic description of the problem.
proc means...
output out=pct95 pct95=
run;
data subset;
if _n_ eq 1 then set pct95;
set data;
if value < pct95;
run;
You can suppress proc means from outputting your results in a new tab using the noprint option. Try this:
proc means data = your_data noprint;
var variable_name;
output out = your_data2 p95= / autoname;
run;
Previous Posts :
Variable check and summary out
Macro that outputs table with testing results of SAS table
Question/Problem
From the previous posts, I thought I was able to run the macro and produce the desired results. However, after finally getting a report back that the output is not working I'm really confused as to why I'm getting the error that there were missing variables. It appears as if the data set is not being loaded after sub-setting. I'm able to process basic summary statistic tables, but when I load the macro the output is not working.
Why is the data set not loading? Does a macro require a certain type of data set?
Note : A limitation is that I do not have access to the data set, so I must send code to be run and won't get results for a few days. It's a very long and frustrating process, but I'm sure some can relate.
The code that is causing problems is the macro (in beginning of code) and the very last section which calls the macro with the data set.
Error Log :
Code :
# Filename : Census2007_Hawaii_BearingCoffee_BigIsland.sas
/******************************************************************
Clearance Test Macro
input_dataset - desired dataset which variables are located
output_dataset - an output table with test results
variable_to_consider - list of variables to compute test on
*******************************************************************/
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend mymacro;
/***********************************************************************/
*Define file path statics;
Libname def 'P:\Hawaii_Arita\John_Hawaii_Coffee\Datasets';
Libname abc "P:\Hawaii_Arita\John_Hawaii_Coffee\Datasets";
option obs=max;
/* Initialize database */
DATA def.Census2007_Hawaii_Coffee;
SET abc.census2007_hawaii_SubSet_Coffee;
**<create the variables used in the macro> **;
RUN;
/* Clearance Test Results */
%clearance_test(input_dataset=def.census2007_hawaii_SubSet_Coffee, output_dataset=test_data ,variable_to_consider= OIR OIRO ROA ROAO SProfit
LProfit SProfitAcre LProfitAcre Profitable MachineandRent UtilityandFuel LaborH LaborO FertilizerandChem MaintandCustom
Interest Tax Dep Others TFPE_cal operators workers operatorsandworkers)
A Complete/Verifiable Example :
This has been tested on the remote machine and works perfectly.
/* Create test data set*/
data business_data;
do firm = 1 to 3;
revenue = rand("uniform");
costs = rand("uniform");
profits = rand("uniform");
vcost = rand("uniform");
output;
end;
run;
/******************************************************************
Clearance Test Macro
input_dataset - desired dataset which variables are located
output_dataset - an output table with test results
variable_to_consider - list of variables to compute test on
*******************************************************************/
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend mymacro;
/* Print summary table, run macro, and print clearance test table */
PROC MEANS data = business_data n sum mean median std;
VAR revenue costs profits vcost;
RUN;
%clearance_test(input_dataset=business_data, output_dataset=test_data ,
variable_to_consider=revenue costs profits vcost)
proc print data = test_data; run;
This is where a minimal, complete verifiable example (MCVE) would be helpful for testing whether your problem is a problem with the code, or the data.
Here's the code above, but with a SASHELP dataset (those are built-in to SAS so everyone has them).
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend clearance_test;
%clearance_test(input_dataset=sashelp.cars, output_dataset=work.test, variable_to_consider=mpg_city mpg_highway);
That's the exact macro, just using a different input dataset. It works correctly on my machine (the flag variables are meaningless since the data isn't right for them, but the code works).
Run the same on your colleague's machine, and if it runs, then you know the data is the problem (ie, the dataset doesn't have the variables you think it does). If it doesn't run, then you have some other problem (perhaps an issue with how it's being submitted, maybe you end up with spurious characters or something).
I'm trying to do a loop over a liststo perform ranks for several variables.
And then I do the loop with:
options mprint;
%macro ranks(listado);
%let count=%sysfunc(countw(&listado));/*counw= count words in a string*/
%do i=1 %to &count;
%put 'count' &count;
%let vari=%qscan(&listado,&i,%str(,));
%put 'vari' &vari;
proc rank data=labo2.J_tabla_modelo groups=10 out=labo2.tmp;
var &vari.;
ranks rk_&vari.;
run;
%end;
%mend;
%ranks(%str(G_MERGE6_t1_monto6,A_CLI_monto_sucursal_1,A_CLI_monto_sucursal_2,
A_CLI_monto_sucursal_3, A_CLI_monto_sucursal_4,A_M_0705_monto));
I get the following error:
ERROR: Number of VAR statement variables was not equal to the number of RANKS statement variables.
Don't know how to solve it. Because if I run the code written by the macro works.
Thanks!
First off, don't do the macro loop this way. It's messy. Generate a list of macro calls.
Second, you're not really wanting to do that either, here. What you need to generate is this:
proc rank data=whatever out=whatever;
var v1 v2 v3 v4;
ranks r_v1 r_v2 r_v3 r_v4;
run;
What you are generating is a bunch of different PROC RANKs, which isn't ideal.
How I'd do it:
data my_vars;
length vari $32;
input vari $;
datalines;
G_MERGE6_t1_monto6
A_CLI_monto_sucursal_1
A_CLI_monto_sucursal_2
A_CLI_monto_sucursal_3
A_CLI_monto_sucursal_4
A_M_0705_monto
;;;;
run;
proc sql;
select cats('r_',vari)
into :ranklist separated by ' '
from my_vars;
select vari
into :varlist separated by ' '
from my_vars;
quit;
proc rank data=whatever out=whatever groups=10;
var &varlist;
rank &ranklist;
run;
If you actually do need the individual PROC RANK calls, then you need to figure out how to deal with the output, and then do a similar approach.
(Also, odds are that first datastep isn't needed- you probably have that data somewhere, like in dictionary.columns).