I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;
Related
I am currently trying to create a side-by-side dual axis bar chart in proc sgplot for the data which is based on dates. I am currently stuck at last thing, where I am not able to shift the bars using discreteoffset option on vbar, because I am using Type=time on xaxis. If I comment this, then the bars are shifted but then the xaxis tick values look clumsy. So I wonder if there is any other option that can move the bars for Date/time Data? Following is my SAS code.
data input;
input people visits outcome date date9.;
datalines;
41 448 210 1-Jan-18
43 499 207 1-Feb-18
45 544 221 1-Mar-18
49 564 239 1-Apr-18
39 575 236 1-May-18
37 549 210 1-Jun-18
51 602 263 1-Jul-18
32 586 208 1-Aug-18
52 557 225 1-Sep-18
41 534 227 1-Oct-18
48 499 217 1-Nov-18
44 514 235 1-Dec-18
31 582 281 1-Jan-19
33 545 269 1-Feb-19
38 574 259 1-Mar-19
29 564 247 1-Apr-19
29 642 274 1-May-19
28 556 216 1-Jun-19
20 531 187 1-Jul-19
31 604 226 1-Aug-19
19 513 186 1-Sep-19
24 483 185 1-Oct-19
28 401 156 1-Nov-19
18 450 158 1-Dec-19
21 418 178 1-Jan-20
28 396 149 1-Feb-20
43 488 177 1-Mar-20
33 539 205 1-Apr-20
57 631 244 1-May-20
54 695 291 1-Jun-20
58 732 309 1-Jul-20
62 681 301 1-Aug-20
42 654 291 1-Sep-20
57 749 365 1-Oct-20
60 627 249 1-Nov-20
56 623 244 1-Dec-20
54 712 298 1-Jan-21
62 655 262 1-Feb-21
;
run;
proc sgplot data=input;
format date monyy7.;
styleattrs datacolors=(Red DarkBlue) datacontrastcolors=(black black) datalinepatterns=(solid);
vbar date / response=visits discreteoffset=-0.17 barwidth=0.3;
vbar date / response=outcome discreteoffset=0.17 barwidth=0.3;
vline date / response=people y2axis lineattrs=(color=black thickness=3);
xaxis display=(nolabel) /*fitpolicy=rotate valuesrotate=vertical*/ type=time /*interval=month*/;
yaxis grid label='Label1' values=(0 to 800 by 100);
y2axis label='Label2' values=(0 to 70 by 10);
keylegend / title="";
run;
Output I am getting:
Output I want: (With shifted bars, but it is changing dates)
Appreciate any help!
Thank you.
Reshape the data with transpose so the variables wanted side by side become categorical, i.e. name value pairs. The name can be used in vbar as the group= with groupdisplay=cluster.
Note: The xaxis type=time appears to perform special checks based on the format of the vbar variable, and will rendered a pretty two-line axis label when that format is date9. I've never seen this discussed in the documentation.
Example:
Uses name= in the plotting statements so the keylegend can look prettier.
proc transpose data=input out=plot;
by rowid date;
copy people;
var visits outcome;
run;
proc sgplot data=plot;
vbar date / response=col1 group=_name_ groupdisplay=cluster name='relatedcounts';
vline date / response=people group=_name_ y2axis lineattrs=(color=black thickness=3) name='people';
xaxis
type = time
interval = month
;
format date date9.;
yaxis grid label='Related counts' values=(0 to 800 by 100);
y2axis label='# People' values=(0 to 70 by 10);
keylegend 'relatedcounts' / title="";
run;
Will produce
Piggy backing on a similar question I asked
(Summing a Column By Group In a Dataset With Macros)...
I have the following dataset:
Month Cost_Center Account Actual Annual_Budget
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
I would like to "splice" it so each month has its own respective column for Actual while summing the numeric values by Account.
So for example, I want the output to look like the following:
Account May_Actual_Sum June_Actual_Sum Annual_Budget
Postage 14562 37960 255251
Phone 4564 2660 32241
The code below provided by a fellow user works great when not needing to further dis-aggregated by month; however, I'm not sure if it's possible to do so (I tired adding a 'by month clause' - didn't work).
proc means data=Test N SUM NWAY STACKODS;
class Account_Description;
var Actual annual_budget;
by month;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
data want;
set summary_stats2;
run;
Use PROC MEANS to get summaries - same as last time. Please read up the documentation on PROC MEANS to understand how the CLASS statements works and how you can control the different levels of output.
Use PROC TRANSPOSE to flip the data wide. Since the budget amount is consistent across rows you'll be fine.
I'm guessing your next set of question will then be how to sort the columns correctly because your months won't sort and how to reference them dynamically to calculate the month to date changes. Which are some of the reasons why this data structure is not recommended.
data have;
input Month $ Cost_Center $ Account $ Actual Annual_Budget;
cards;
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
;
;
;;
run;
*summarize;
proc means data=have noprint nway;
class account month;
var actual annual_budget;
output out=temp sum=actual_total budget_total;
run;
*transpose;
proc transpose data=temp out=want prefix=Month_;
by account budget_total;
var actual_total;
id month;
run;
Output:
I cannot think of a way to generate this report using just one PROC. You will need to do some post processing of PROC MEANS or PROC SUMMARY results to get to this:
proc means data=have SUM ;
class Account month;
var Actual annual_budget;
output out = summary_stats SUM=;
run;
/* Look at summary_stats to understand it's structure here */
/* Otherwise you will not understand the following code */
proc sort data = summary_stats;
where _type_ in (2,3);
by account;
run;
data want;
set summary_stats;
by account ;
retain May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
if first.account then Annual_Budget_sum = Annual_Budget;
else do;
select(month);
when ('May') May_Actual_Sum = actual;
when ('June') June_Actual_Sum = actual;
/* List other months also here. Can use some macros here to make the code compact and expandable for future enhancements */
end;
end;
if last.account then output;
keep account May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
run;
Hello,
I want to write a dynamic program which helps me to flag the start and end dates of events that are nested within the consolidated dates that are present at the top of each Pt.ID in the attached example. I can easily do these if there is only one such consolidated period per Pt.ID. However, there could be more than one such consolidated periods per Pt. ID. (As shown for second Pt.ID, 1002). As shown in the example, the events that fall within the consolidated period/s are fagged as "Y" in the flag variable and if they don't fall within the consolidated period then they are flagged as "N" in this variable. How can I write a program that accounts for all of such consolidated periods per Pt.ID and then compare them with the dates for the rest of the events of a particular patient and flag events which fall within any of those consolidated periods?
Thank you.
So join the event records with the period records and calculate whether the event is within the period. Then you could take the MAX over all periods.
For example here is code for your sample that creates a binary 1/0 flag variable called INCLUDED.
data Sample;
infile datalines missover;
input Pt_ID Event_ID Category $ Start_Date : mmddyy10.
Start_Day End_date : mmddyy10. End_day Duration
;
format Start_date End_date mmddyy10.;
datalines;
1001 . Moderate 8/5/2016 256 9/3/2016 285 30
1001 1 Moderate 3/8/2016 106 3/16/2016 114 9
1001 2 Moderate 8/5/2016 256 8/14/2016 265 10
1001 3 Moderate 8/21/2016 272 8/24/2016 275 4
1001 4 Moderate 8/23/2016 274 9/3/2016 285 12
1002 . Severe 11/28/2016 13 12/19/2016 34 22
1002 . Severe 2/6/2017 83 2/28/2017 105 23
1002 1 Severe 11/28/2016 13 12/5/2016 20 8
1002 2 Severe 12/12/2016 27 12/19/2016 34 8
1002 3 Severe 1/9/2017 55 1/12/2017 58 4
1002 4 Severe 2/6/2017 83 2/13/2017 90 8
1002 5 Severe 2/20/2017 97 2/28/2017 105 9
1002 6 Severe 3/17/2017 122 3/24/2017 129 8
1002 7 Severe 5/4/2017 170 5/13/2017 179 10
1002 8 Severe 5/24/2017 190 5/30/2017 196 7
1002 9 Severe 6/9/2017 206 6/13/2017 210 5
;
proc sql ;
create table want as
select a.*
, max(b.start_date <= a.start_date and b.end_date >= a.end_date ) as Included
from sample a
left join sample b
on a.pt_id = b.pt_id and missing(b.event_id)
group by 1,2,3,4,5,6,7,8
order by a.pt_id, a.event_id, a.start_date , a.end_date
;
quit;
I am trying to find days matching to a reference number of days given or else to find the number of days close to the reference days.
I coded till here, however not sure how to go forward.
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 560 86 181
3 2016-02-12 560 82 263
3 2016-02-12 560 69 332
3 2016-02-12 560 77 409
So now I want to bring out the last value close to the reference days.
and the next total_days should start from ZERO again to find the next window. How can I do this?
Here is a code that I wrote
data want;
do until (totaldays <= ref_days);
set have;
by ID ref_days notsorted;
if first.id then totaldays=0;
else totaldays+lags;
end;
run;
Required Output:
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 300 86 181
3 2016-02-12 300 82 263
3 2016-02-12 300 69 .
3 2016-02-12 300 77 146
A while ago I did similar to this via Proc sql. It calculates all the distances and takes the closest one. It works with moderate size dataset. Hopefully it is of some use.
proc sql;
select * from
(
select *,
abs(t1.link-t2.link) as dist /*In your case these would be dateVars*/
from test1 t1
left join test2 t2
on 1=1) group by system1 having dist=min(dist);
;
quit;
There was some talk that the left join on 1=1 is a bit silly (as full outter join would suffice, or something.) However this worked for the problem in question.
I have a table is SAS which looks like this.
year Country Host Code Value
2010 India Pak 220 111
2010 India Aus 220 123
2010 India NZ 220 23
2010 India SA 240 43
2010 India WI 250 124
2010 India SRI 250 325
2010 India ZIM 280 235
i want to transform this table to following form
Country Code Pak_2010 Aus_2010 NZ_2010 SA_2010 WI_2010 SRI_2010 IM_2010
India 220 111 123 23 0 0 0 0
India 240 0 0 0 43 0 0 0
India 250 0 0 0 0 124 325 0
India 280 0 0 0 0 0 0 235
for one country and code, there will be one value.
Can anyone please suggest me code for doing this transformation?
This is a classical proc transpose, separating your ID variables with a delimiter:
PROC TRANSPOSE
DATA=yourInput
OUT=yourOutput(drop=_name_)
DELIMITER=_;
BY Country Code;
ID Host Year;
VAR Value;