I am working with Claims data in which we have four different encounter types: Dental, Institutional, Professional and Pharmacy. I need to develop separate reports for different organizations in SAS Proc Tabulate which show claims by these four types. Some organizations have all four types and others have only 3. But if a plan does not have one particular type, I still have to show it with missing values and a header in the report.
Say Plan A has only Institutional, Pharmacy and Professional types and no Dental claims. I want to show all the four types listed out with missing values for Dental. Can this be achieved with a Proc tabulate step? (Or will I have to modify the data set to put null values for each type?) Any help will be much appreciated.
This is the main code:
proc tabulate data=servmnth format=COMMA.0;
class plan_alias encounter_type service_Month;
var netted_claim_count;
format encounter_type $enc.;
table encounter_type= ' ' ALL='Total', sum=' '*netted_claim_count= 'Netted
Claims by Service Month'*service_Month = ' ';
where plan_alias="&plan.";
run;
It was all good until I realized some plans were straight up missing one or two encounter types. I tried putting a format for encounter type using proc format but that did not work. I am wondering if there is anything I can attach to encounter_type variable in the table statement which would show 4 rows no matter what.
The classdata= of the tabulate should be an explicit crossing of all the dimensional values you want to appear in the output. Class data can be externally enforced from lists of allowed values, or generated from the data= prior to the tabulate.
Consider some sample claim count data, with the special property that for each plan the encounter_types are restricted to a distinct set of values.
data claim_counts;
do claim_id = 1 to 1e5;
plan_id = floor(16 * ranuni(123));
if plan_id = 0 then continue;
date = mdy(ceil(12*ranuni(123)),1,2017);
claim_count = ceil(abs(12*rannor(123)));
member_id = 1e8 + plan_id * 1e6 + floor(1e6 * ranuni(123));
do until (band(plan_id,2**(encounter_type-1)));
encounter_type = ceil(4*ranuni(123));
end;
output;
end;
format date yymmd.;
run;
The class data can be constructed from different lists of allowed values
* preordained, external class data (defined dimension data);
data plans;
do plan_id = 0 to 15;
output;
end;
run;
data encounter_types;
do encounter_type = 1 to 4;
output;
end;
run;
data months;
do month = 1 to 12;
date = mdy(month,1,2017);
end;
format date yymmd.;
run;
* cross all allowed dimensional values for defined coverage;
proc sql;
create table classdata as
select plan_id, encounter_type, date
from plans, encounter_types, months;
quit;
Or constructed from the dimensional coverage observed in the data
proc sql;
create table classdata2 as
select plan_id, encounter_type, date
from (select distinct plan_id from claim_counts)
, (select distinct encounter_type from claim_counts)
, (select distinct date from claim_counts)
;
In large data sets the observed coverage is highly likely to match the defined coverage
Finally, use the classdata= option in the tabulate statement. In the following I also added plan_id to the table in the page dimension
proc tabulate data=claim_counts classdata=classdata;
class plan_id encounter_type date;
var claim_count;
table
plan_id
,
encounter_type = 'type' ALL
,
sum = ' ' * claim_count*f=comma9. * date = ' '
;
;
run;
The page dimension will allow you to create output for more than one plan_id at a time and it will identify the plan for which the table below belongs to. Plan identification could also by done using a by statement or a title statements.
Related
I have a dataset with 20 columns all starting with the name morb_, which are all 1 or 2, coded as No and Yes. There is an additional column called Pat_TNO which is the patient reference number. Patients have more than one row.
I wish to create a new dataset which summarises whether each patient has had at least one of each type of event. So far the code I have written works perfectly, but is there a way to simplify it using an array?
proc sql;
select
Pat_TNO,
max(morb_1) as morb_1 format yn.,
max(morb_2) as morb_2 format yn. /* etc etc */
from morbidity
group by Pat_TNO;
quit;
COumn names aren't morb_1 and morb_2, rather morb_amputation, morb_mi, morb_tia, etc.
proc summary data=morbidity nway missing;
class pat_tno;
output out=max max(morb_:) = ;
run;
Probably I am simply using the wrong search terms: I have a large table have with multiple columns (e.g. x, y, z) and various rows, and I intend to replace a single value within with a value I have saved in a macro variable called new_value.
Reading out the current value old_value of the respective cell is straight forward:
%let new_value = 4711;
proc sql noprint;
select z into: old_value
from have
where x = 42 and y = 21;
quit;
%put --- f(42,21) = &old_value. ---;
How do I update the column z for those cases where x=42 and y=21 with &new_value?
I would also happy with a data step if it is resasonably fast. I would like to just modify the table, and not create a new one, because the table is really huge.
References
SAS Help Center on UPDATE tells me that I cannot use a logical operator.
SAS Help Center: Updating values of a table
Override/update af variables value in proc sql using case when statement
Let's do temporary copy of SASHELP.CLASS data for performing an exercise.
data class;
set sashelp.class;
run;
Select one string value
%let new_value=90;
proc sql noprint;
select weight into :old_value from class where name='Thomas' and Age=11;
quit;
%put --- f(Tomas,11) = &old_value. ---;
Update in SQL is pretty simple, you have got conditions:
proc sql;
update class set weight=&new_value where name='Thomas' and Age=11;
quit;
PROC SQL could be used for update more than one table via VIEW, but it's another question.
In the data step you can use transactions (but data sorting is required, which is not free). Let work with initial copy of CLASS dataset:
proc sort data=class;
by name sex age;
run;
Prepare example transaction data (could be more then one records):
data transaction;
set class;
where name='Thomas';
weight=&new_value;
run;
proc sort data=transaction;
by name sex age;
run;
And make update:
data class;
update class transaction;
by name sex age;
run;
Keys name, sex and age here are just for example.
I have to join 2 tables on a key (say XYZ). I have to update one single column in table A using a coalesce function. Coalesce(a.status_cd, b.status_cd).
TABLE A:
contains some 100 columns. KEY Columns ABC.
TABLE B:
Contains just 2 columns. KEY Column ABC and status_cd
TABLE A, which I use in this left join query is having more than 100 columns. Is there a way to use a.* followed by this coalesce function in my PROC SQL without creating a new column from the PROC SQL; CREATE TABLE AS ... step?
Thanks in advance.
You can take advantage of dataset options to make it so you can use wildcards in the select statement. Note that the order of the columns could change doing this.
proc sql ;
create table want as
select a.*
, coalesce(a.old_status,b.status_cd) as status_cd
from tableA(rename=(status_cd=old_status)) a
left join tableB b
on a.abc = b.abc
;
quit;
I eventually found a fairly simple way of doing this in proc sql after working through several more complex approaches:
proc sql noprint;
update master a
set status_cd= coalesce(status_cd,
(select status_cd
from transaction b
where a.key= b.key))
where exists (select 1
from transaction b
where a.ABC = b.ABC);
quit;
This will update just the one column you're interested in and will only update it for rows with key values that match in the transaction dataset.
Earlier attempts:
The most obvious bit of more general SQL syntax would seem to be the update...set...from...where pattern as used in the top few answers to this question. However, this syntax is not currently supported - the documentation for the SQL update statement only allows for a where clause, not a from clause.
If you are running a pass-through query to another database that does support this syntax, it might still be a viable option.
Alternatively, there is a way to do this within SAS via a data step, provided that the master dataset is indexed on your key variable:
/*Create indexed master dataset with some missing values*/
data master(index = (name));
set sashelp.class;
if _n_ <= 5 then call missing(weight);
run;
/*Create transaction dataset with some missing values*/
data transaction;
set sashelp.class(obs = 10 keep = name weight);
if _n_ > 5 then call missing(weight);
run;
data master;
set transaction;
t_weight = weight;
modify master key = name;
if _IORC_ = 0 then do;
weight = coalesce(weight, t_weight);
replace;
end;
/*Suppress log messages if there are key values in transaction but not master*/
else _ERROR_ = 0;
run;
A standard warning relating to the the modify statement: if this data step is interrupted then the master dataset may be irreparably damaged, so make sure you have a backup first.
In this case I've assumed that the key variable is unique - a slightly more complex data step is needed if it isn't.
Another way to work around the lack of a from clause in the proc sql update statement would be to set up a format merge, e.g.
data v_format_def /view = v_format_def;
set transaction(rename = (name = start weight = label));
retain fmtname 'key' type 'i';
end = start;
run;
proc format cntlin = v_format_def; run;
proc sql noprint;
update master
set weight = coalesce(weight,input(name,key.))
where master.name in (select name from transaction);
run;
In this scenario I've used type = 'i' in the format definition to create a numeric informat, which proc sql uses convert the character variable name to the numeric variable weight. Depending on whether your key and status_cd columns are character or numeric you may need to do this slightly differently.
This approach effectively loads the entire transaction dataset into memory when using the format, which might be a problem if you have a very large transaction dataset. The data step approach should hardly use any memory as it only has to load 1 row at a time.
I have a table in SAS which consists of data from stock exchange. One of its columns holds information about date. I would like to create subtables, each one hold data from only one specific year.
Assuming you want to do this (often, this is an inferior option as analyses run separately by year can be run from one dataset using by year;, but certainly this can sometimes be appropriate), the gold standard method for doing this is the hash table, as the hash table can produce unlimited tables based on the data. I will edit in a method for doing this with hash table if I have time while running things this afternoon; it's the 'hashing' method described on this page.
Hashing code, adapted from the sascommunity.org page above:
data have;
call streaminit(7);
do year=1998 to 2014;
do id= 1 to 10;
x=rand('Uniform');
output;
end;
end;
run;
data _null_ ;
dcl hash byyear () ;
byyear.definekey ('k') ; if `id` or similar is a safe unique ID you could use that here, otherwise `k` is your unique identifier - hash requires unique;
byyear.definedata ('year','id','x') ;
byyear.definedone () ;
do k = 1 by 1 until ( last.year ) ;
set have;
by year ;
byyear.add () ;
end ;
dsetname=cats('year',year);
byyear.output (dataset: dsetname) ;
run ;
There is a similar set of methods that revolve around using a macro to generate the code. This paper goes into detail about one method to do that; I won't explain it in detail as I consider it inferior to the hash method (even if it is lower CPU time, it is more complicated to write than either a pure macro method or a pure hash method) but in certain cases it could be better.
A simple example of the macro method using the conceptual have aframe defined:
proc sql;
select distinct(cats('year',year(date))) into :dsetlist
separated by ' '
from have;
select distinct(cats('%outputyear(year=',year(date),')')) into :outputlist
separated by ' '
from have;
quit;
%macro outpuyear(year=);
if year(date)=&year. then output year&dset.;
%mend outputyear;
data &dsetlist.;
set have;
&outputlist.;
run;
data year1 year2 year3 yearN;
set stockdata;
if year(date) = 2014 then
output year1;
else if year(date) = 2013 then
output year2;
else if year(date) = 2012 then
output year3;
else
output yearN;
run;
You could also use case statements I guess.
I've been trying to make my code more efficient and this is the original code, but I think it can be written in one step.
data TABLE;set ORIGINAL_DATA;
Multi=percent*total_units;
keep Multi Type;
proc sort; by Type;
proc means noprint data=TABLE1; by Type; var Multi;output out=Table2(drop= _type_ _freq_)sum=Multi;run;
proc means noprint data=Table1; var Multi;output out=Table3(drop= _type_ _freq_) sum=total ;run;
proc sql;
create table TABLE4as
select a.Type, a.Multi label="Multi", b.total label="total"
from TABLE2 a, TABLE3 b
order by Type;
quit;
data TABLE5;set TABLE4;
pct=(MULTI/total)*100;
run;
I am able to split up part of it, but I can't figure out how to get the PCT part in my code. This is what I have.
proc sql;
create table TABLE1 as
select distinct type, sum(percent*total_units) as MULTI label "MULTI",
MULTI/(percent*total_units)) as PCT
from ORIGINAL_DATA
group by type;
quit;
I had to edit some of the code but I think the general idea should make sense.
The main problem is I cannot call upon the MULTI column because it is just being created but I want to create a percentage of the total for each type.
The "SAS" way to do something like this is to use a CLASS statement with PROC MEANS. That will calculate statistics on all the interaction levels in the data (identified by the TYPE variable). The row where TYPE=0 will be the "total" value, representing the value of that statistic for the entire data set.
In your case, we can take advantage of the fact that PROC MEANS will create the output data set sorted by TYPE and by the variables listed in the CLASS statement. That means we can just read the first observation and save it's value for calculating percentages.
It's probably easier to just show some code:
data TABLE;
set ORIGINAL_DATA;
Multi = percent * total_units;
keep Multi Type;
run;
proc means noprint data=TABLE;
class Type;
var multi;
output out=next sum=;
run;
data want;
retain total;
set next;
if _n_ = 1 then do;
/* The first obs will be the _TYPE_=0 record */
total = multi;
delete;
end;
pct = (multi / total) * 100;
drop total _freq_ _type_;
run;
Notice that you do not need to sort the data before using PROC MEANS. That's because we are using a CLASS statement rather than a BY statement. The data step is using the first observation in the data set created by MEANS (the TYPE=0 record) to retain the total sum of your variable. The delete statement keeps it out of the result.
CLASS statements with PROC MEANS are very useful. Take a few minutes to read up on how the TYPE variable is calculated, especially if you try using more than one class variable.
You can skip the initial data step by using the WEIGHT option in VAR statement of PROC MEANS (this will effectively do the multiplication for you). You can also use PROC TABULATE instead of PROC MEANS, as tabulate can calculate the percentage. I believe the following code will produce your required output in one go.
ods noresults;
proc tabulate data=have out=want (drop=_: rename=(total_units_sum=total total_units_pctsum_0=pct));
class type;
var total_units / weight=percent;
table type, total_units*(sum pctsum);
run;
ods results;
If you need one step, maybe this will work, but it's not actually efficient, since it processes data twice, once for detail by TYPE, once for total.
proc sql;
create table TABLE1 as
select
d.type
, sum(d.percent*d.total_units) as MULTI label "MULTI"
, calculated MULTI/s.total as PCT
from ORIGINAL_DATA d,
( select sum(percent*total_units) as total
from ORIGINAL_DATA) s
group by type
;
quit;
For more efficiency, but in more than one steps you could simply replace tables withe views in your original code:
data TABLE; => data TABLE / view=TABLE;
create table TABLE4 => create view TABLE4