I am trying do a ARIMA model estimation for 5 different variables. The data consists of 16 months of Point of Sales. How do I approach this complicated ARIMA modelling?
Furthermore I would like to do:
A simple moving average of each product group
A Holt-Winters
exponential smoothing model
Data is as follows with date and product groups:
Date Gloves ShoeCovers Socks Warmers HeadWear
apr-14 11015 3827 3465 1264 772
maj-14 11087 2776 4378 1099 1423
jun-14 7645 1432 4490 674 670
jul-14 10083 7975 2577 1558 8501
aug-14 13887 8577 6854 1305 15621
sep-14 9186 5213 5244 1183 6784
okt-14 7611 4279 4150 977 6191
nov-14 6410 4033 2918 507 8276
dec-14 4856 3552 3192 450 4810
jan-15 17506 7274 3137 2216 3979
feb-15 21518 5672 8848 1838 2321
mar-15 17395 5200 5712 1604 2282
apr-15 11405 4531 5185 1479 1888
maj-15 11509 5690 4370 1145 2369
jun-15 9945 2610 4884 882 1709
jul-15 8707 5658 4570 1948 6255
Any skilled forecasters out there willing to help? Much appreciated!
Related
I have data consisting of route details of the customers and also their store scores.
raw data with overall ranking for all the customers :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
5371 ABC Chicago CG 1200 5 1
2098 HGT Kansas KK 6500 4.8 2
7680 POE Arizona QW 3300 4.2 3
3476 POE Arizona CV 3300 4 4
6272 KUN Florida ANF 7800 3.9 5
3220 ABC Chicago AF 1200 3.6 6
7266 IOR Califor LU 4500 3.2 7
3789 POE Arizona TR 3300 3 8
9383 KAR Newyork IO 5600 3 9
1583 KUN Florida BOT 7800 2.8 10
8219 ABC Chicago Bb 1200 2.5 11
3734 ABC Chicago AA 1200 2 12
6900 POE Arizona HAL 3300 1.8 13
8454 KUN Florida UYO 7800 1.5 14
Filters
Distname ALL
State ALL
Routecode ALL
This is the overall ranking for all the customers without selecting any filters. So when I select some filter like (Dist name, route code, store score) I want it to show the rank according to the selected filter. Eg :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
7680 POE Arizona QW 3300 4.2 1
3476 POE Arizona CV 3300 4 2
3789 POE Arizona TR 3300 3 3
6900 POE Arizona HAL 3300 1.8 4
Filter
Distname POE
State Arizona
Routecode 3300
The store score is based on some parameter which I calculated in a model using python.
Currently it is string column in powerbi. I tried some dax but it was not successful.
I want to depict the evolution of a variable called share over time. I do so by using tsline, but the resulting graph looks off: Although my data starts in May 1989 and ends in December 1993, the trendline is drawn so that it begins in January 1989 and ends in
mid-1993.
gen double time3 = monthly(time2, "YM")
format time3 %tm
tsset time3
tsline share, ///
ttitle("years") ytitle("") ylabel(0(.2).65) ///
tlabel(1989m5(12)1994m5, format(%tmY) labsize(small))
I know that Stata stores dates as integers and tried replacing the year-month-indications after tlabel by integers. Since the time variable is defined as months since 1960m1, 1989m5 is stored internally as 352 and 1993m12 as 407. I learned this by running dis tm(1989m5). But even with tlabel(352(12)407), the trendline is not drawn correctly. Has anyone an idea about how to fix this? This is the how the graph looks like by now.
This is a subsample of the data that I used:
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 time2 double(time3 share)
"1989-05" 352 .1536926147704591
"1989-06" 353 .1665024630541872
"1989-08" 355 .12674650698602793
"1989-09" 356 .18095712861415753
"1989-10" 357 .24629080118694363
"1989-11" 358 .23008849557522124
"1989-12" 359 .17638036809815952
"1990-01" 360 .20521653543307086
"1990-02" 361 .1754473161033797
"1990-03" 362 .17401960784313725
"1990-04" 363 .14173998044965788
"1990-05" 364 .1669970267591675
"1990-06" 365 .1398838334946757
"1990-08" 367 .10461689587426326
"1990-09" 368 .14965312190287414
"1990-10" 369 .1921182266009852
"1990-11" 370 .18038617886178862
"1990-12" 371 .19577735124760076
"1991-01" 372 .10562685093780849
"1991-02" 373 .09596928982725528
"1991-03" 374 .1941747572815534
"1991-04" 375 .1889106967615309
"1991-05" 376 .1794234592445328
"1991-06" 377 .1968390804597701
"1991-08" 379 .17846309403437816
"1991-09" 380 .19425173439048563
"1991-10" 381 .14556962025316456
"1991-11" 382 .15569143932267168
"1991-12" 383 .1694015444015444
"1992-01" 384 .20812928501469147
"1992-02" 385 .257590597453477
"1992-03" 386 .2204724409448819
"1992-04" 387 .22096456692913385
"1992-05" 388 .21601941747572814
"1992-06" 389 .1675025075225677
"1992-07" 390 .22176591375770022
"1992-09" 392 .15128968253968253
"1992-10" 393 .15841584158415842
"1992-11" 394 .1849112426035503
"1992-12" 395 .19642857142857142
"1993-01" 396 .22469252601702933
"1993-02" 397 .2796528447444552
"1993-03" 398 .290811339198436
"1993-04" 399 .24108910891089108
"1993-05" 400 .2562437562437562
"1993-06" 401 .22127872127872128
"1993-07" 402 .27874743326488705
"1993-09" 404 .3391472868217054
"1993-10" 405 .3840155945419103
"1993-11" 406 .45184824902723736
"1993-12" 407 .43987975951903807
end
format %tm time3
[/CODE]
The graph you've posted doesn't seem surprising.
Using the data and code you've posted
clear
input str7 time2 double(time3 share)
"1989-05" 352 .1536926147704591
"1989-06" 353 .1665024630541872
"1989-08" 355 .12674650698602793
"1989-09" 356 .18095712861415753
"2019-10" 717 .13052208835341367
"2019-11" 718 .13559059987631417
"2019-12" 719 .13997555012224938
end
format %tm time3
tsset time3
tsline share, ///
ttitle("years") ytitle("") ylabel(0(.2).65) ///
tlabel(1989m5(12)2019m12, format(%tmY) labsize(small))
it's hard to see what might be a problem.
tsline doesn't purport to draw a trend line, just a line graph for the data specified.
Correlation Loading Plot from Pro PLS in SAS
Hi All,
I used Proc PLS to do a multivariate analysis and got a plot as attached. How can I remove the green colored points in the picture? I think they are the observations' correlation values. For example, I have 90 observations, and each of them will have a loading value on factor1 and factor2, so there will be 90 green points shown in the picture. Who can tell me which option can suppress them?
for example, data is like this:
par1 par2 par3 par4 par5 par6 par7 location
2680 0.546089996 237 1 0.172 2.25 305 5
3750 0.54836587 140 1.55 0.111 1.06 425 5
3590 0.54878718 168 1.27 0.131 0.969 516 5
2390 0.549510935 183 1.07 0.096 1.84 260 5
3780 0.549631747 140 1.12 0.118 1.98 472 5
2790 0.549934008 200 1.1 0.221 2.13 321 5
2880 0.5499945 227 1.14 0.185 1.54 439 5
2910 0.550357733 259 1.31 0.116 1.31 289 5
2420 0.550842789 177 1.32 0.044067423 1.95 260 5
3850 0.550964187 128 1.41 0.117 1.08 471 5
3530 0.552425146 165 1.23 0.11 1.57 494 5
2730 0.552913856 223 1.03 0.17 2 330 5
3130 0.553158535 252 1.02 0.174 2.13 322 5
3040 0.553709856 272 1.21 0.155 1.97 317 5
3830 0.554139421 153 1.27 0.137 1.47 455 5
3930 0.554569654 164 1.17 0.116 1.5 481 5
2430 0.554569654 136 1.3 0.198 2.11 226 8
3630 0.555247085 137 1.17 0.1 1.75 413 5
2490 0.555432126 176 1.06 0.113 1.39 236 5
3490 0.555555556 166 1.28 0.044444444 1.65 465 5
3840 0.556173526 164 1.23 0.0949 1.66 470 5
2480 0.556173526 239 1.28 0.102 2.2 238 5
3760 0.556173526 191 1.33 0.131 2.12 447 5
3850 0.556173526 174 1.35 0.241 2.42 381 3
3410 0.557413601 174 1.14 0.107 1.48 419 5
2960 0.559284116 229 1.08 0.165 1.99 304 5
3410 0.559284116 137 1.19 0.291 2.17 375 8
3300 0.560538117 121 1.13 0.153 1.82 352 8
3090 0.560538117 134 1.16 0.167 1.17 416 4
3210 0.560538117 124 1.09 0.172 0.82 390 4
3950 0.560538117 130 1.29 0.199 1.89 440 4
3300 0.561167228 131 1.06 0.242 2.45 367 8
2210 0.561167228 162 0.885 0.288 3.32 208 4
3170 0.561797753 126 1.3 0.151 1.31 388 4
2740 0.561797753 96.1 1.22 0.245 0.827 254 3
3750 0.561797753 144 1.08 0.257 2.62 366 3
3640 0.562429696 120 1.32 0.159 1.63 347 8
3210 0.563063063 148 1.29 0.206 2.18 352 8
2300 0.563697858 179 0.936 0.181 2.29 223 2
3410 0.564334086 141 0.856 0.136 2.03 370 8
3500 0.564334086 126 1.38 0.177 1.45 355 8
3470 0.564334086 101 0.989 0.222 1.84 349 3
2260 0.564334086 171 0.942 0.224 2.08 219 2
2220 0.564334086 180 0.956 0.281 1.84 219 4
2340 0.564971751 165 1.05 0.228 2.25 240 8
2380 0.564971751 161 0.976 0.287 1.6 214 4
3220 0.56561086 148 1.21 0.121 0.568 520 6
3920 0.566251416 176 1.08 0.045300113 2.26 637 6
3830 0.566251416 137 1.48 0.203 1.23 387 3
2510 0.566251416 152 1.24 0.222 1.84 223 8
2760 0.566251416 168 0.994 0.282 1.31 280 4
2640 0.566251416 154 0.979 0.345 1.52 291 4
3570 0.566893424 165 1.33 0.155 2.18 505 6
3170 0.566893424 126 1.08 0.162 1.41 341 4
3700 0.566893424 159 1.3 0.17 1.64 449 4
3250 0.566893424 104 1.32 0.2 1.37 372 8
3740 0.566893424 159 1.23 0.216 1.69 409 1
3380 0.566893424 163 1.53 0.245 2.19 367 3
3240 0.56753689 136 1.07 0.153 1.88 383 4
3400 0.56753689 109 1.36 0.161 1.16 420 4
3760 0.56753689 150 0.93 0.169 1.68 537 4
3560 0.56753689 123 1.03 0.193 2.32 374 8
2360 0.56753689 163 0.697 0.235 1.94 243 8
2430 0.56753689 166 0.762 0.247 2.31 231 8
3330 0.568181818 148 1.11 0.174 2 393 4
3080 0.568181818 139 1.13 0.188 2.08 349 8
3230 0.568181818 116 1.23 0.199 1.77 328 8
2180 0.568181818 144 1.01 0.215 2.13 207 8
2520 0.568181818 128 0.809 0.369 1.65 306 4
3320 0.568828214 152 1.15 0.14 1.65 395 4
2300 0.568828214 134 0.908 0.221 1.56 233 8
3730 0.568828214 141 1.58 0.238 1.96 405 3
3800 0.568828214 160 1.24 0.241 2.2 402 3
2440 0.568828214 153 1.03 0.258 1.89 223 4
3910 0.568828214 209 1.26 0.275 2.26 350 3
4010 0.569476082 139 1.28 0.045558087 1.7 602 6
2340 0.570125428 167 1.1 0.18 1.57 208 2
2360 0.570125428 176 0.704 0.2 1.6 219 2
3490 0.570776256 171 1.43 0.269 2.4 360 3
2620 0.571428571 132 1.09 0.202 1.8 224 8
3740 0.571428571 172 1.27 0.256 1.92 355 3
3600 0.57208238 128 1.16 0.17 1.94 434 4
3360 0.57208238 150 1.18 0.171 1.81 353 1
3620 0.57208238 131 1.28 0.177 2.24 360 3
3560 0.57208238 139 1.15 0.229 1.9 366 3
2740 0.572737686 277 0.876 0.171 1.71 290 10
2340 0.572737686 148 0.964 0.231 1.18 250 6
2760 0.572737686 168 0.905 0.303 2.1 264 4
2890 0.572737686 204 0.857 0.331 2.32 272 2
code is :
proc pls data=check method=rrr;
class location;
model par1-par7=location;
run;
In general, I don't think there's a simple way to do what you're looking for. You may want to construct your own graph.
You can get the template for the graph; I'll paste that here. Unfortunately all of the data printed on the graph is printed in a single statement, so it's not helpful to just comment out one line: you comment out the scatterplot x=CORRX y=CORRY and you remove all of the data. I also don't see that ODS Graphics Editor will be able to do this.
You would be best off probably constructing your own chart using this as a base, but calling it from PROC SGRENDER so you can control how the data comes in.
Here's the template, and you'll see the spot I'm talking about:
proc template;
define statgraph Stat.PLS.Graphics.CorrLoadPlot;
dynamic Radius1 Radius2 Radius3 Radius4 xLabel xShortLabel yLabel
yShortLabel CorrX CorrXLab TraceX CorrY CorrYLab TraceY _byline_
_bytitle_ _byfootnote_;
BeginGraph /;
entrytitle "Correlation Loading Plot";
layout overlayequated / equatetype=square commonaxisopts=(
tickvaluelist=(-1.0 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1.0) viewmin=
-1 viewmax=1) xaxisopts=(label=XLABEL shortlabel=XSHORTLABEL
offsetmin=0.05 offsetmax=0.05 gridDisplay=auto_off) yaxisopts=(
label=YLABEL shortlabel=YSHORTLABEL offsetmin=0.05 offsetmax=0.05
gridDisplay=auto_off);
ellipseparm semimajor=RADIUS1 semiminor=RADIUS1 slope=0.0 xorigin=
0.0 yorigin=0.0 / clip=true display=(outline) outlineattrs=(
pattern=dash) datatransparency=0.75;
scatterplot x=XCIRCLE1LABEL y=YCIRCLE1LABEL / markercharacter=
CIRCLE1LABEL datatransparency=0.75 primary=true;
ellipseparm semimajor=RADIUS2 semiminor=RADIUS2 slope=0.0 xorigin=
0.0 yorigin=0.0 / clip=true display=(outline) outlineattrs=(
pattern=dash) datatransparency=0.75;
scatterplot x=XCIRCLE2LABEL y=YCIRCLE2LABEL / markercharacter=
CIRCLE2LABEL datatransparency=0.75 primary=true;
ellipseparm semimajor=RADIUS3 semiminor=RADIUS3 slope=0.0 xorigin=
0.0 yorigin=0.0 / clip=true display=(outline) outlineattrs=(
pattern=dash) datatransparency=0.75;
scatterplot x=XCIRCLE3LABEL y=YCIRCLE3LABEL / markercharacter=
CIRCLE3LABEL datatransparency=0.75 primary=true;
ellipseparm semimajor=RADIUS4 semiminor=RADIUS4 slope=0.0 xorigin=
0.0 yorigin=0.0 / clip=true display=(outline) outlineattrs=(
pattern=dash) datatransparency=0.75;
scatterplot x=XCIRCLE4LABEL y=YCIRCLE4LABEL / markercharacter=
CIRCLE4LABEL datatransparency=0.75 primary=true;
scatterplot x=CORRX y=CORRY / group=CORRGROUP Name="ScatterVars"
markercharacter=CORRLABEL rolename=(_id1=_ID1 _id2=_ID2 _id3=
_ID3 _id4=_ID4 _id5=_ID5) tip=(y x group markercharacter _id1
_id2 _id3 _id4 _id5) tiplabel=(y=CORRXLAB x=CORRYLAB group=
"Corr Type" markercharacter="Corr ID");
SeriesPlot x=TRACEX y=TRACEY / tip=(y x) tiplabel=(y=CORRYLAB x=
CORRXLAB);
endlayout;
if (_BYTITLE_)
entrytitle _BYLINE_ / textattrs=GRAPHVALUETEXT;
else
if (_BYFOOTNOTE_)
entryfootnote halign=left _BYLINE_;
endif;
endif;
EndGraph;
end;
run;
I would consider posting this on communities.sas.com and seeing if one of the developers can give you more specific information; Sanjay and Dan often post there and may well be able to give you a simpler answer.
I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;
Hi I would really like to create dynamic tables based on the following sample data, create 4 new data sets based upon PAYEE_ID: 522,622,743,and 888. I want all all of the fields to be in the new 4 data sets, but only have the top 3 AMT_BILLED in the 4 tables for each type of PAYEE_ID
PAYEE_ID PAYEENAME MSG_CODE MSG_DESCRIPTION AMT_BILLED percentbilled claimscounts PercentLines TotalAmount TotNumofClaims
522 MakeBelieve Center 1 AA text field 1 10000 4% 50 16% 275000 305
522 MakeBelieve Center 1 BB text field 2 20000 7% 40 13% 275000 305
522 MakeBelieve Center 1 6N text field 3 30000 11% 30 10% 275000 305
522 MakeBelieve Center 1 5U text field 4 25000 9% 20 7% 275000 305
522 MakeBelieve Center 1 1F text field 5 90000 33% 100 33% 275000 305
522 MakeBelieve Center 1 2E text field 6 100000 36% 65 21% 275000 305
622 Invisible Center 2 A4 text field 1 600 2% 9 7% 34300 134
622 Invisible Center 2 D2 text field 2 700 2% 31 23% 34300 134
622 Invisible Center 2 D4 text field 3 8000 23% 11 8% 34300 134
622 Invisible Center 2 DS text field 4 10000 29% 62 46% 34300 134
622 Invisible Center 2 F8 text field 5 15000 44% 21 16% 34300 134
743 Pretend Center 1 1K text field 1 440 1% 2 1% 41040 246
743 Pretend Center 1 1N text field 2 3000 7% 7 3% 41040 246
743 Pretend Center 1 1V text field 3 400 1% 4 2% 41040 246
743 Pretend Center 1 2W text field 4 15000 37% 63 26% 41040 246
743 Pretend Center 1 3B text field 5 500 1% 2 1% 41040 246
743 Pretend Center 1 3H text field 6 7700 19% 41 17% 41040 246
743 Pretend Center 1 3Z text field 7 14000 34% 127 52% 41040 246
888 It's A MakeBelieve One B7 text field 1 68000 38% 257 29% 178449 886
888 It's A MakeBelieve One B8 text field 2 5000 3% 47 5% 178449 886
888 It's A MakeBelieve One B9 text field 3 200 0% 138 16% 178449 886
888 It's A MakeBelieve One BB text field 4 1562 1% 18 2% 178449 886
888 It's A MakeBelieve One BO text field 5 39999 22% 3 0% 178449 886
888 It's A MakeBelieve One BZ text field 6 40000 22% 2 0% 178449 886
888 It's A MakeBelieve One C2 text field 7 500 0% 5 1% 178449 886
888 It's A MakeBelieve One C5 text field 8 7865 4% 395 45% 178449 886
888 It's A MakeBelieve One C7 text field 9 8649 5% 14 2% 178449 886
888 It's A MakeBelieve One CR text field 10 5674 3% 1 0% 178449 886
888 It's A MakeBelieve One CX text field 11 1000 1% 6 1% 178449 886
to
I'm new to SAS, and this would really help me out. Thank you so much!
proc sort data=sampleData out=sampleData_s;
by payee_id amt_billed;
run;
You can use descending if by 'top' you mean largest e.g. by payee_id descending amt_billed;
Once the data are sorted you are able to read into a data step and use first and last e.g.
data partial_solution(drop=count);
retain count 0;
set sampleData_s;
by payee_id descending amt_billed;
if first.payee_id then count=0;
count+1;
if count le 3 then output;
run;
To output to different dataset names:
proc sort data=sampleData(keep=payee_id) out=all_payee_ids nodupkey;
by payee_id;
run;
data _null_;
length id_list $10000; * needs to be long enough to contain all ids;
* if you do not state this, sas will default;
* length to first value;
retain id_list;
set all_payee_ids end=eof;
id_list = catx('|', id_list, payee_id);
if eof then call symputx('macroVarIdList', id_list);
run;
You've now got a pipe separated list of all your id's. You can loop through these using them to create names for you datasets. You need to do this as SAS needs to know the names of the datasets you want to output to up front e.g.
data ds1 ds2 ds3 ds4;
set some_guff;
if blah then output ds1;
else if blahblah then output ds2;
else output d3;
output d4;
run;
So with the macro var loop:
%let nrVars=%sysfunc(countw(¯oVarIdList));
data
%do i = 1 %to &nrVars;
dataset_%scan(¯oVarIdList,&i,|)
%end;
;
set partial_solution;
count+1;
%do j = 1 %to &nrVars;
%let thisPayeeId=%scan(¯oVarIdList,&j,|);
if payee_id = "&thisPayeeId" then output dataset_&thisPayeeId.;
%end;
run;