I am trying to calculate 3 months moving average of the following data by Product by country( I only have two country variables here). Is there a way to do so?
Here is the sales table I have:
Date Product Country Sales
201101 Sofa US 100
201102 Sofa US 200
201103 Sofa US 250
201104 Sofa US 300
201101 Sofa CA 250
201102 Sofa CA 300
201103 Sofa CA 250
201104 Sofa CA 300
201101 Chair US 300
201102 Chair US 300
201103 Chair US 300
201104 Chair US 300
201101 Chair CA 300
201102 Chair CA 300
201103 Chair CA 300
201104 Chair CA 300
I tried something like the following, but moving average is only calculated by country. Is there a way I can have it calculated by country, by product? Any ideas will be appreciated. thanks:)
PROC SORT DATA=Sales;
BY Country Product Date;
RUN;
PROC EXPAND DATA=Sales out =ma;
By Country Product;
CONVERT Value=Value_ma/transformin=(setmiss 0) transformout=(movave 3);
run;
after my comment i tested a bit, i guess concating product and country gives the result you are looking for (i hope i still did not understood something wrong):
data have;
input Date $ Product $ Country $ Sales ;
datalines;
201101 Sofa US 100
201102 Sofa US 200
201103 Sofa US 250
201104 Sofa US 300
201101 Sofa CA 250
201102 Sofa CA 300
201103 Sofa CA 250
201104 Sofa CA 300
201101 Chair US 300
201102 Chair US 300
201103 Chair US 300
201104 Chair US 300
201101 Chair CA 300
201102 Chair CA 300
201103 Chair CA 300
201104 Chair CA 300
;
run;
data have ;
set have;
copr=catx("_",Product,country);
run;
PROC SORT DATA=have;
BY copr Date;
RUN;
PROC EXPAND DATA=have out =ma ;
By copr;
CONVERT sales=average / transformin=(setmiss 0) transformout=(movave 3);
run;
proc print data=ma;
var date product country average;
where time > 1;
run;
result:
Related
I would like to create a matrix visual like below and add data bars as conditional formating to the "Sales Percentage" Column with different user defined max and min values based on the countries.
I have the following dummy data
Salesperson
Country
Product
Sales Percentage
Total Sales
Gina
Canada
City Bike
0.02
232
Gina
Canada
Mountain Bike
0.56
2800
Gina
Italy
City Bike
0.32
213
Gina
Italy
Mountain Bike
0.21
1050
Gina
USA
City Bike
0.11
122
Gina
USA
Mountain Bike
0.43
2150
John
Canada
City Bike
0.32
333
John
Canada
Mountain Bike
0.34
442
John
Italy
City Bike
0.12
2132
John
Italy
Mountain Bike
0.67
1233
John
USA
City Bike
0.22
3300
John
USA
Mountain Bike
0.45
7300
Mary
Canada
City Bike
0.21
121
Mary
Canada
Mountain Bike
0.53
2650
Mary
Italy
City Bike
0.32
213
Mary
Italy
Mountain Bike
0.12
600
Mary
USA
City Bike
0.11
123
Mary
USA
Mountain Bike
0.12
600
The matrix looks like this after showing columns as rows and putting "Sales Percentage" and "Total Sales" as values, Country as columns and Product + Salesperson as rows:
I can add databars when I right click the Sales Percentage under values but I can only enter one user defined min and max value for the whole "Sales Percentage" column. Is it possible to have different maximum value for data bars based on the Country? For example to create a target value of 35% for Canada, 40% for USA and 50% for Italy. So in other words the data bar would be full when the Sales Percentage for Canada reaches 35% and full when Sales Percentage for USA reaches 40% and so on.
This isn't possible with you current setup. The best you could do to approximate this is as follows.
Create a measure as follows:
% Canada = CALCULATE(SUM('Table'[Total Sales]), 'Table'[Country ] = "Canada")
Do the same for USA and Italy and then add them as values to your matrix.
You can now select individual targets for each country.
I am not able to get a row with ALL using row percentages. I would like the first row to give sum and percentage for column totals. So the percent under borderline for ALL should display 1861 * 100/5049=36.8% and under Desirable to display 1399 * 100/5049=27.7%. Currently it is displaying 100% and I need to change that.
proc tabulate data=sashelp.heart;* format=8.2;
class chol_status smoking_status sex;
table (all smoking_status sex),
(all chol_status)*(n*f=8. colpctn) ;
run;
The output is
All Cholesterol Status
Borderline Desirable High
N ColPctN N ColPctN N ColPctN N ColPctN
All 5049 100.00 1861 100.00 1399 100.00 1789 100.00 <- change the cholesterol % to denominator 5049
Smoking Status
Heavy (16-25) 1029 20.38 383 20.58 285 20.37 361 20.18
Light (1-5) 563 11.15 192 10.32 174 12.44 197 11.01
Moderate (6-15) 563 11.15 217 11.66 170 12.15 176 9.84
Non-smoker 2436 48.25 886 47.61 655 46.82 895 50.03
Very Heavy (> 25) 458 9.07 183 9.83 115 8.22 160 8.94
Sex
Female 2770 54.86 959 51.53 803 57.40 1008 56.34
Male 2279 45.14 902 48.47 596 42.60 781 43.66
I think the closest you can get is this:
proc tabulate data=sashelp.heart;* format=8.2;
class chol_status smoking_status sex;
table all*rowpctn=' ' (smoking_status sex)*(n=' '*f=8. colpctn=' '),
(all) (chol_status) ;
run;
That's not what you want, though, and doesn't really look very good. It's the only option that comes out of proc tabulate, though, as Tabulate won't let you assign statistics to both the rows and the columns - you have to pick one.
PROC REPORT will do what you want, with some effort. However, you could also run this in a two step process - output the tabulate to a dataset, fix the row percentages, then re-print it, either in Report or Tabulate, not asking it to percentage things that time.
Piggy backing on a similar question I asked
(Summing a Column By Group In a Dataset With Macros)...
I have the following dataset:
Month Cost_Center Account Actual Annual_Budget
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
I would like to "splice" it so each month has its own respective column for Actual while summing the numeric values by Account.
So for example, I want the output to look like the following:
Account May_Actual_Sum June_Actual_Sum Annual_Budget
Postage 14562 37960 255251
Phone 4564 2660 32241
The code below provided by a fellow user works great when not needing to further dis-aggregated by month; however, I'm not sure if it's possible to do so (I tired adding a 'by month clause' - didn't work).
proc means data=Test N SUM NWAY STACKODS;
class Account_Description;
var Actual annual_budget;
by month;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
data want;
set summary_stats2;
run;
Use PROC MEANS to get summaries - same as last time. Please read up the documentation on PROC MEANS to understand how the CLASS statements works and how you can control the different levels of output.
Use PROC TRANSPOSE to flip the data wide. Since the budget amount is consistent across rows you'll be fine.
I'm guessing your next set of question will then be how to sort the columns correctly because your months won't sort and how to reference them dynamically to calculate the month to date changes. Which are some of the reasons why this data structure is not recommended.
data have;
input Month $ Cost_Center $ Account $ Actual Annual_Budget;
cards;
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
;
;
;;
run;
*summarize;
proc means data=have noprint nway;
class account month;
var actual annual_budget;
output out=temp sum=actual_total budget_total;
run;
*transpose;
proc transpose data=temp out=want prefix=Month_;
by account budget_total;
var actual_total;
id month;
run;
Output:
I cannot think of a way to generate this report using just one PROC. You will need to do some post processing of PROC MEANS or PROC SUMMARY results to get to this:
proc means data=have SUM ;
class Account month;
var Actual annual_budget;
output out = summary_stats SUM=;
run;
/* Look at summary_stats to understand it's structure here */
/* Otherwise you will not understand the following code */
proc sort data = summary_stats;
where _type_ in (2,3);
by account;
run;
data want;
set summary_stats;
by account ;
retain May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
if first.account then Annual_Budget_sum = Annual_Budget;
else do;
select(month);
when ('May') May_Actual_Sum = actual;
when ('June') June_Actual_Sum = actual;
/* List other months also here. Can use some macros here to make the code compact and expandable for future enhancements */
end;
end;
if last.account then output;
keep account May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
run;
I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;
Is it possible to use PCTSUM in PROC TABULATE to calculate the what percentage a sub-group (or even a sub-sub-group) takes up compared to the overall group? It's probably best to provide an example.
Here's a sample dataset:
data sample;
input make $ model $ owned rented;
datalines;
Toyota Corolla 400 224
Toyota Camry 750 700
Honda Civic 650 519
Honda Accord 225 203
;
I know the following PROC TABULATE line will give me what percentage of vehicles are rented by make
proc tabulate data=sample;
class make model;
var owned rented;
table (make='Vehicle Make' all), owned='Total Owned'*sum rented='Rented'*(sum='Total Rented' pctsum<owned>='Pct Rented');
run;
Like so:
Veh Make TotOwned TotRent PctRent
Honda 875 722 82.51%
Toyota 1150 924 80.35%
All 2025 1646 81.28%
But is it possible to break that down by model so that it tells us not what percentage of Civics are rented (519/650=79.8%) but what percentage of all Hondas are rented Civics (519/875=59.3%)?
How do I write the PROC TABULATE line so that it shows me this:
Veh Make VehModel TotOwned TotRent PctRent
Honda Accord 225 203 23.20%
Civic 650 519 59.31%
All 875 722 82.51%
Toyota Camry 750 700 60.87%
Corolla 400 224 19.48%
All 1150 924 80.35%
All 2025 1646 81.28%
Note that the 23.2% and 59.31% of the Honda models total up to the 82.51% of the Honda subtotal.
Thanks for any help you can provide.
The best I can do is to split the table into pages, and use PAGEPCTSUM. You could output this to a dataset and then re-print it (if you are using the printed output) using another PROC TABULATE or a PROC REPORT or similar.
proc tabulate data=sample;
class make model;
var owned rented;
table (make='Vehicle Make'), model='Vehicle Model',
owned='Total Owned'*sum rented='Rented'*
(sum='Total Rented' pagepctsum<owned>='Pct Rented');
run;