Do not want columns with null value in Proc Report - sas

I am trying to create report like the one:
But I am getting the report like this:
I am using the proc report to do this. I don't want the columns which has All the values zero. The AFG,ALL,CapMkt,IIS and PWMDIv are the sublevels of CIB and CRE.
The code I used is:
proc report data=CIB_PWM nowd;
column OWNER_NAME OWNER_EMP_NBR RM_Division RM_Region Owner_Team
Owner_YTD_Rev Owner_Prior_YTD_Rev Owner_Rolling_12_Rev
Attd_Rpt_LOB,(Attd_Rpt_SubLOB, (N));
define Attd_Rpt_LOB / Across ' ' ;
define Attd_Rpt_SubLOB / Across ' ';
define N / ' ';
define OWNER_NAME / group 'OWNER_NAME' format=$488. /*missing order=formatted*/;
define OWNER_EMP_NBR / group 'OWNER_EMP_NBR' format=$80. missing order=formatted;
define RM_Division / group 'RM_Division' missing;
define RM_Region / group 'RM_Region' missing;
define Owner_Team / group 'OWNER_HO_MKT_DEPT' format=$324. missing order=formatted;
define Owner_YTD_Rev / analysis SUM 'Owner_YTD_Rev' format=DOLLAR15. missing;
define Owner_Prior_YTD_Rev / analysis SUM 'Owner_Prior_YTD_Rev' format=DOLLAR15. missing;
define Owner_Rolling_12_Rev / analysis SUM 'Owner_Rolling_12_Rev' format=DOLLAR15. missing;
run;
quit;
Do you think I need to add any more option for this one?
Your help will be very much appreciated.
Thank you

Shankar:
You probably figured something out by now.
The 'want' image shows columns with all zeroes, so it doesn't align with the statement "I don't want the columns which has All the values zero."
Perhaps you can post some sample data demonstrating the problem.
Without seeing more I would suggest creating one or more pre-report steps that look for sets of rows contributing to cases of 'not to be reported' and marking them with a exclude_flag. The report step would then be able to use a statement
where not exclude_flag;
If this extra processing still does not produce the desired report, you may need to perform a data tranpose step and write a different Proc REPORT step.

Related

SAS set statement using colon and creating a filename variable

So using SAS, I have a number of SAS monthend datasets named as follows:
mydata_201501
mydata_201602
mydata_201603
mydata_201604
mydata_201605
...
mydata_201612
Each has account information at particular monthend. I want to stack the datasets all into one dataset using colon rather than writing out the full set statement as follows:
data mynewdata;
set mydata_:;
run;
However there is no datestamp variable within the datasets so when I stack them I will lose the monthend information for each account. I want to know which line refers to which monthend for each account. Is there a way I can automatically create a variable that names the table the row come from. for example the long winded way would be this:
data mynewdata;
set mydata_201501 (in=a) mydata_201502 (in=b) mydata_201503 (in=c)...;
if a then tablename = 'mydata_201501';
if b then tablename = 'mydata_201502';
if c...
run;
but is there a quicker way using colon along these lines?
data mynewdata;
set mydata_:;
tablename = _tablelabel_;
run;
thanks
I always find clicking on comment links annoying, so hopefully here's the answer in your context. Use the INDSNAME= SET statement option to assign the dataset name to a variable:
data mynewdata;
set mydata_: indsname=_tablelabel_;
tablename = _tablelabel_;
run;
N.B. you can call _tablelabel_ whatever you want, and you may wish to change it so it doesn't look like a SAS generated variable name.
INDSNAME= only became a SAS SET statement option in version 9.2
Just to be clear, with my particular code, where the datasets were named mydata_yyyymm and I wanted a monthend variable with datestamp, I was able to produce this using the solution provided by mjsqu as follows (obs and keep statement provided if required):
data mynewdata;
set mydata_: (obs=100 keep=xxx xxx) indsname=_tablelabel_;
format monthend yymmdd10.;
monthend = input(scan(_tablelabel_,-1,'_'),yymmn6.);
run;

Assigning index to two concatenated tables in SAS?

I have two table with exactly the same column headers and one row each. I have the code to concatenate them which works fine.
data concatenation;
set CURR_CURR CURR_30;
run;
However, there is no index in the output to say which row corresponds to which table.
I've tried using 'create index' and 'index create' already but they don't work syntactically. Simply I'd just want to add a column of strings and move it to the front of all the other columns in the data set.
INDSNAME option on the SET statement + variable to store the information.
If you set the length statement ahead of your SET statement it will create it as the first column.
Just a note that this isn't the same as an 'index'. An index in SAS has a different meaning which isn't what you're trying to create here.
data concatenation;
length dset source $50.;
set CURR_CURR CURR_30 indsname=source;
dset=source;
run;
Reeza's answer is very similar to something I figured out that worked as well. Here's my version as an alternative.
data concatenation;
length id $ 10;
set CURR_CURR (in=a) CURR_30 (in=b);
if a then id = 'curr_curr';
else if b then id = 'curr_30';
run;

How to collapse data while retaining other variables?

I am trying to collapse my data using proc sql. However, i noticed that when I tried to collapse my data I lost a bunch of variables that I wanted to keep. I am trying to collapse my data based on the variable MRN (which is numeric). The other variables I want to keep are CITY and SITE (these are character values) and these are constant for each unique MRN, so collapsing them should be fine.
Here is the code I am using
proc sql;
create table collapsed_data as
select distinct mrn,
sum(msk_tx_yes) as msk_tx_yes,
sum(msk_cancel_tx_yes) as msk_cancel_tx_yes,
sum(msk_ca_yes) as msk_ca_yes,
sum(msk_cancel_ca_yes) as msk_cancel_ca_yes,
sum(msk_dc_yes) as msk_dc_yes,
sum(conc_psych_tx_yes) as conc_psych_tx_yes,
sum(conc_psych_ca_yes) as conc_psych_ca_yes,
sum (conc_psych_dc_yes) as conc_psych_dc_yes,
sum (conc_yes) as conc_yes,
sum (psych_yes) as psych_yes,
sum (foot_prog) as foot_prog,
sum (hand_prog) as hand_prog,
sum (surg_prog) as surg_prog,
sum (sx_yes) as sx_yes
from temp_collapsed_data
group by mrn;
quit;
I'm not sure how to use the SELECT and DISTINCT functions together.
I thought maybe I could add the variables CITY and STATE after SELECT, while keeping DISTINCT but it doens't sem to work.
I want to be able to keep CITY and STATE in the new table along with the new summed variables I am making. How can I achieve this without turning CITY and STATE into dummy coded variables? I would like to keep them as character values if possible.
Anyone know how I can achieve this?
Yur code is already correct. Just add the variables to the select statement.
proc sql;
create table collapsed_data as
select distinct mrn, city, site,
sum(msk_tx_yes) as msk_tx_yes,
sum(msk_cancel_tx_yes) as msk_cancel_tx_yes,
sum(msk_ca_yes) as msk_ca_yes,
sum(msk_cancel_ca_yes) as msk_cancel_ca_yes,
sum(msk_dc_yes) as msk_dc_yes,
sum(conc_psych_tx_yes) as conc_psych_tx_yes,
sum(conc_psych_ca_yes) as conc_psych_ca_yes,
sum (conc_psych_dc_yes) as conc_psych_dc_yes,
sum (conc_yes) as conc_yes,
sum (psych_yes) as psych_yes,
sum (foot_prog) as foot_prog,
sum (hand_prog) as hand_prog,
sum (surg_prog) as surg_prog,
sum (sx_yes) as sx_yes
from temp_collapsed_data
group by mrn;
quit;
The distinct statement will result in not having two rows with the same information.

How can I write a DATA step that will drop all variables from the input dataset except the ones that I explicitly define within the dataset?

I want to generate a new SAS dataset using table foo as the input and one-to-one correspondence with records in the output dataset bar. I wand to drop variables from foo by default but I also require all of the fields of foo be available (to derive new variables) and also that some variables from foo to be kept (if explicitly indicated).
I'm currently managing an explicit list of variables to drop= but it results in long and unwieldy syntax in the data-set-option declaration.*
DATA bar (drop=id data_value2);
set foo;
new_id = id;
data_value1 = data_value1; /* Explicitly included for clarity */
new_derived_data_value = data_value2 * 2; /* etc. */
format new_id $fmt_id.
data_value1 $fmt_dat.
new_derived_data_value $fmt_ddat.
;
RUN;
The output table I want should have only fields data_value1, new_data and new_derived_data_value.
I'm looking for the most syntactically succinct way of reproducing the same effect as :
SELECT
id AS new_id
,data_value1
,data_value2 * 2 AS new_derived_data_value
FROM foo
How can I write a DATA step that will drop all variables from the input dataset except the ones that I explicitly define within the dataset?
* Update: I could use aaa--hhh type notatation but even this can be unwieldy if the ordering of the variables changes over time or I later decide I'd like to keep variable ddd.
I would store the variable names in a macro list, obtained from the DICTIONARY tables. You can then drop them all easily in a data step. e.g.
proc sql noprint;
select name into :vars separated by ' '
from dictionary.columns
where libname = 'SASHELP' and memname='CLASS';
quit;
data want (drop=&vars.);
set sashelp.class;
name1=name;
age1=age;
run;
Keith's solution is the best production solution, but a quick alternative assuming you know the first and last variables in the dataset:
data want;
set class;
drop name--weight;
name1=name;
age1=age;
run;

SAS - code to dynamically count columns and sum each of them

I’m wondering if someone can help with a coding problem I have.
Background – I have a project that imports some files and uses the data in those files to perform projections. The contents of the files determines some aspects of the size of the output that follows. Simply, values in data loaded in drives the size and shape of the tables that follow, and this can vary.
The following code is an example of the problem.
The data loaded will have a variable year start (note wf2009, 2009 is the first year) and variable range (this example goes from 2009 to 2030, but this will vary too).
proc summary data= labeled_proj_data_hc;
class jurisdiction specialty measure;
types jurisdiction*specialty*measure;
VAR wf2009--wf2030;
output out= sum_labeled_proj_data_hc
sum(wf2009) = y2009
sum(wf2010) = y2010
sum(wf2011) = y2011
sum(wf2012) = y2012;
run;
Where I’m not sure how to proceed is:
sum(wf2009) = y2009
sum(wf2010) = y2010
sum(wf2011) = y2011
sum(wf2012) = y2012;
In the sequence of lines calling for the sum of their respective columns, how can I make this dynamic so that the start year is populated from a variable and it increments yearly until the last year which is also variable.
Has anyone solved a similar problem.
Cheers,
Is renaming the variables necessary? If not then you can use the : wildcard operator to access all variables that begin with 'wf', then just put SUM= in the output statement, which will preserve the original names.
So your proc summary would look like this.
proc summary data= labeled_proj_data_hc;
class jurisdiction specialty measure;
types jurisdiction*specialty*measure;
VAR wf: ;
output out= sum_labeled_proj_data_hc
sum=;
run;