Hi I would really like to create dynamic tables based on the following sample data, create 4 new data sets based upon PAYEE_ID: 522,622,743,and 888. I want all all of the fields to be in the new 4 data sets, but only have the top 3 AMT_BILLED in the 4 tables for each type of PAYEE_ID
PAYEE_ID PAYEENAME MSG_CODE MSG_DESCRIPTION AMT_BILLED percentbilled claimscounts PercentLines TotalAmount TotNumofClaims
522 MakeBelieve Center 1 AA text field 1 10000 4% 50 16% 275000 305
522 MakeBelieve Center 1 BB text field 2 20000 7% 40 13% 275000 305
522 MakeBelieve Center 1 6N text field 3 30000 11% 30 10% 275000 305
522 MakeBelieve Center 1 5U text field 4 25000 9% 20 7% 275000 305
522 MakeBelieve Center 1 1F text field 5 90000 33% 100 33% 275000 305
522 MakeBelieve Center 1 2E text field 6 100000 36% 65 21% 275000 305
622 Invisible Center 2 A4 text field 1 600 2% 9 7% 34300 134
622 Invisible Center 2 D2 text field 2 700 2% 31 23% 34300 134
622 Invisible Center 2 D4 text field 3 8000 23% 11 8% 34300 134
622 Invisible Center 2 DS text field 4 10000 29% 62 46% 34300 134
622 Invisible Center 2 F8 text field 5 15000 44% 21 16% 34300 134
743 Pretend Center 1 1K text field 1 440 1% 2 1% 41040 246
743 Pretend Center 1 1N text field 2 3000 7% 7 3% 41040 246
743 Pretend Center 1 1V text field 3 400 1% 4 2% 41040 246
743 Pretend Center 1 2W text field 4 15000 37% 63 26% 41040 246
743 Pretend Center 1 3B text field 5 500 1% 2 1% 41040 246
743 Pretend Center 1 3H text field 6 7700 19% 41 17% 41040 246
743 Pretend Center 1 3Z text field 7 14000 34% 127 52% 41040 246
888 It's A MakeBelieve One B7 text field 1 68000 38% 257 29% 178449 886
888 It's A MakeBelieve One B8 text field 2 5000 3% 47 5% 178449 886
888 It's A MakeBelieve One B9 text field 3 200 0% 138 16% 178449 886
888 It's A MakeBelieve One BB text field 4 1562 1% 18 2% 178449 886
888 It's A MakeBelieve One BO text field 5 39999 22% 3 0% 178449 886
888 It's A MakeBelieve One BZ text field 6 40000 22% 2 0% 178449 886
888 It's A MakeBelieve One C2 text field 7 500 0% 5 1% 178449 886
888 It's A MakeBelieve One C5 text field 8 7865 4% 395 45% 178449 886
888 It's A MakeBelieve One C7 text field 9 8649 5% 14 2% 178449 886
888 It's A MakeBelieve One CR text field 10 5674 3% 1 0% 178449 886
888 It's A MakeBelieve One CX text field 11 1000 1% 6 1% 178449 886
to
I'm new to SAS, and this would really help me out. Thank you so much!
proc sort data=sampleData out=sampleData_s;
by payee_id amt_billed;
run;
You can use descending if by 'top' you mean largest e.g. by payee_id descending amt_billed;
Once the data are sorted you are able to read into a data step and use first and last e.g.
data partial_solution(drop=count);
retain count 0;
set sampleData_s;
by payee_id descending amt_billed;
if first.payee_id then count=0;
count+1;
if count le 3 then output;
run;
To output to different dataset names:
proc sort data=sampleData(keep=payee_id) out=all_payee_ids nodupkey;
by payee_id;
run;
data _null_;
length id_list $10000; * needs to be long enough to contain all ids;
* if you do not state this, sas will default;
* length to first value;
retain id_list;
set all_payee_ids end=eof;
id_list = catx('|', id_list, payee_id);
if eof then call symputx('macroVarIdList', id_list);
run;
You've now got a pipe separated list of all your id's. You can loop through these using them to create names for you datasets. You need to do this as SAS needs to know the names of the datasets you want to output to up front e.g.
data ds1 ds2 ds3 ds4;
set some_guff;
if blah then output ds1;
else if blahblah then output ds2;
else output d3;
output d4;
run;
So with the macro var loop:
%let nrVars=%sysfunc(countw(¯oVarIdList));
data
%do i = 1 %to &nrVars;
dataset_%scan(¯oVarIdList,&i,|)
%end;
;
set partial_solution;
count+1;
%do j = 1 %to &nrVars;
%let thisPayeeId=%scan(¯oVarIdList,&j,|);
if payee_id = "&thisPayeeId" then output dataset_&thisPayeeId.;
%end;
run;
Related
I am currently trying to create a side-by-side dual axis bar chart in proc sgplot for the data which is based on dates. I am currently stuck at last thing, where I am not able to shift the bars using discreteoffset option on vbar, because I am using Type=time on xaxis. If I comment this, then the bars are shifted but then the xaxis tick values look clumsy. So I wonder if there is any other option that can move the bars for Date/time Data? Following is my SAS code.
data input;
input people visits outcome date date9.;
datalines;
41 448 210 1-Jan-18
43 499 207 1-Feb-18
45 544 221 1-Mar-18
49 564 239 1-Apr-18
39 575 236 1-May-18
37 549 210 1-Jun-18
51 602 263 1-Jul-18
32 586 208 1-Aug-18
52 557 225 1-Sep-18
41 534 227 1-Oct-18
48 499 217 1-Nov-18
44 514 235 1-Dec-18
31 582 281 1-Jan-19
33 545 269 1-Feb-19
38 574 259 1-Mar-19
29 564 247 1-Apr-19
29 642 274 1-May-19
28 556 216 1-Jun-19
20 531 187 1-Jul-19
31 604 226 1-Aug-19
19 513 186 1-Sep-19
24 483 185 1-Oct-19
28 401 156 1-Nov-19
18 450 158 1-Dec-19
21 418 178 1-Jan-20
28 396 149 1-Feb-20
43 488 177 1-Mar-20
33 539 205 1-Apr-20
57 631 244 1-May-20
54 695 291 1-Jun-20
58 732 309 1-Jul-20
62 681 301 1-Aug-20
42 654 291 1-Sep-20
57 749 365 1-Oct-20
60 627 249 1-Nov-20
56 623 244 1-Dec-20
54 712 298 1-Jan-21
62 655 262 1-Feb-21
;
run;
proc sgplot data=input;
format date monyy7.;
styleattrs datacolors=(Red DarkBlue) datacontrastcolors=(black black) datalinepatterns=(solid);
vbar date / response=visits discreteoffset=-0.17 barwidth=0.3;
vbar date / response=outcome discreteoffset=0.17 barwidth=0.3;
vline date / response=people y2axis lineattrs=(color=black thickness=3);
xaxis display=(nolabel) /*fitpolicy=rotate valuesrotate=vertical*/ type=time /*interval=month*/;
yaxis grid label='Label1' values=(0 to 800 by 100);
y2axis label='Label2' values=(0 to 70 by 10);
keylegend / title="";
run;
Output I am getting:
Output I want: (With shifted bars, but it is changing dates)
Appreciate any help!
Thank you.
Reshape the data with transpose so the variables wanted side by side become categorical, i.e. name value pairs. The name can be used in vbar as the group= with groupdisplay=cluster.
Note: The xaxis type=time appears to perform special checks based on the format of the vbar variable, and will rendered a pretty two-line axis label when that format is date9. I've never seen this discussed in the documentation.
Example:
Uses name= in the plotting statements so the keylegend can look prettier.
proc transpose data=input out=plot;
by rowid date;
copy people;
var visits outcome;
run;
proc sgplot data=plot;
vbar date / response=col1 group=_name_ groupdisplay=cluster name='relatedcounts';
vline date / response=people group=_name_ y2axis lineattrs=(color=black thickness=3) name='people';
xaxis
type = time
interval = month
;
format date date9.;
yaxis grid label='Related counts' values=(0 to 800 by 100);
y2axis label='# People' values=(0 to 70 by 10);
keylegend 'relatedcounts' / title="";
run;
Will produce
Hello,
I want to write a dynamic program which helps me to flag the start and end dates of events that are nested within the consolidated dates that are present at the top of each Pt.ID in the attached example. I can easily do these if there is only one such consolidated period per Pt.ID. However, there could be more than one such consolidated periods per Pt. ID. (As shown for second Pt.ID, 1002). As shown in the example, the events that fall within the consolidated period/s are fagged as "Y" in the flag variable and if they don't fall within the consolidated period then they are flagged as "N" in this variable. How can I write a program that accounts for all of such consolidated periods per Pt.ID and then compare them with the dates for the rest of the events of a particular patient and flag events which fall within any of those consolidated periods?
Thank you.
So join the event records with the period records and calculate whether the event is within the period. Then you could take the MAX over all periods.
For example here is code for your sample that creates a binary 1/0 flag variable called INCLUDED.
data Sample;
infile datalines missover;
input Pt_ID Event_ID Category $ Start_Date : mmddyy10.
Start_Day End_date : mmddyy10. End_day Duration
;
format Start_date End_date mmddyy10.;
datalines;
1001 . Moderate 8/5/2016 256 9/3/2016 285 30
1001 1 Moderate 3/8/2016 106 3/16/2016 114 9
1001 2 Moderate 8/5/2016 256 8/14/2016 265 10
1001 3 Moderate 8/21/2016 272 8/24/2016 275 4
1001 4 Moderate 8/23/2016 274 9/3/2016 285 12
1002 . Severe 11/28/2016 13 12/19/2016 34 22
1002 . Severe 2/6/2017 83 2/28/2017 105 23
1002 1 Severe 11/28/2016 13 12/5/2016 20 8
1002 2 Severe 12/12/2016 27 12/19/2016 34 8
1002 3 Severe 1/9/2017 55 1/12/2017 58 4
1002 4 Severe 2/6/2017 83 2/13/2017 90 8
1002 5 Severe 2/20/2017 97 2/28/2017 105 9
1002 6 Severe 3/17/2017 122 3/24/2017 129 8
1002 7 Severe 5/4/2017 170 5/13/2017 179 10
1002 8 Severe 5/24/2017 190 5/30/2017 196 7
1002 9 Severe 6/9/2017 206 6/13/2017 210 5
;
proc sql ;
create table want as
select a.*
, max(b.start_date <= a.start_date and b.end_date >= a.end_date ) as Included
from sample a
left join sample b
on a.pt_id = b.pt_id and missing(b.event_id)
group by 1,2,3,4,5,6,7,8
order by a.pt_id, a.event_id, a.start_date , a.end_date
;
quit;
I am trying to find days matching to a reference number of days given or else to find the number of days close to the reference days.
I coded till here, however not sure how to go forward.
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 560 86 181
3 2016-02-12 560 82 263
3 2016-02-12 560 69 332
3 2016-02-12 560 77 409
So now I want to bring out the last value close to the reference days.
and the next total_days should start from ZERO again to find the next window. How can I do this?
Here is a code that I wrote
data want;
do until (totaldays <= ref_days);
set have;
by ID ref_days notsorted;
if first.id then totaldays=0;
else totaldays+lags;
end;
run;
Required Output:
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 300 86 181
3 2016-02-12 300 82 263
3 2016-02-12 300 69 .
3 2016-02-12 300 77 146
A while ago I did similar to this via Proc sql. It calculates all the distances and takes the closest one. It works with moderate size dataset. Hopefully it is of some use.
proc sql;
select * from
(
select *,
abs(t1.link-t2.link) as dist /*In your case these would be dateVars*/
from test1 t1
left join test2 t2
on 1=1) group by system1 having dist=min(dist);
;
quit;
There was some talk that the left join on 1=1 is a bit silly (as full outter join would suffice, or something.) However this worked for the problem in question.
I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;
I have a dataframe with variables: id, 2001a, 2001b, 2002a, 2002b, 2003a, 2003b, etc.
I am trying to figure out a way to pivot the data so the variables are: id, year, a, b
The 16.2 documentation refers to some reshaping and pivoting, but that seemed to speak more towards hierarchical columns.
Any suggestions?
I am thinking about creating a hierarchical dataframe, but am not sure how to map the year in the original variable names to a created hierarchical column
sample df:
id 2001a 2001b 2002a 2002b 2003a etc.
1 242 235 5735 23 1521
2 124 168 135 1361 1
3 436 754 1 24 5124
etc.
Here is a way to create hierarchical columns.
df = pd.DataFrame({'2001a': [242,124,236],
'2001b':[242,124,236],
'2002a': [242,124,236],
'2002b': [242,124,236],
'2003a': [242,124,236]})
df.columns = df.columns.str.split('(\d+)', expand=True)
df
2001 2002 2003
a b a b a
0 242 242 242 242 242
1 124 124 124 124 124
2 236 236 236 236 236