Is duplication okay in random forest modelling? - sas

I am using random forest modelling for a project. The tool used is SAS Eguide.
My data is 12 months history with a moving payment history. Below table summarizes my data.
acc
payment_3months
dueamt_3mons
month
2314
100
200
01
2314
300
200
02
3241
450
450
01
3241
500
500
02
3241
250
350
03
Does this data looks okay to be fed in the random forest? Or should I aggregate these data and remove duplicates and keep single row for each acc value?
Please advice

Related

How to create a DAX measure for finding values in a previous month, not using a Date field?

I am currently trying to create a report that shows how customers behave over time, but instead of doing this by date, I am doing it by customer age (number of months since they first became a customer). So using a date field isn't really an option, considering one customer may have started in Dec 2016 and another starts in Jun 2017.
What I'm trying to find is the month-over-month change in units purchased. If I was using a date field, I know that I could use
[Previous Month Total] = CALCULATE(SUM([Total Units]), PREVIOUSMONTH([FiscalDate]))
I also thought about using EARLIER() to find out but I don't think it would work in this case, as it requires row context that I'm not sure I could create. Below is a simplified version of the table that I'll be using.
ID Date Age Units
219 6/1/2017 0 10
219 7/1/2017 1 5
219 8/1/2017 2 4
219 9/1/2017 3 12
342 12/1/2016 0 500
342 1/1/2017 1 280
342 2/1/2017 2 325
342 3/1/2017 3 200
342 4/1/2017 4 250
342 5/1/2017 5 255
How about something like this?
PrevTotal =
VAR CurrAge = SELECTEDVALUE(Table3[Age])
RETURN CALCULATE(SUM(Table3[Units]), ALL(Table3[Date]), Table3[Age] = CurrAge - 1)
The CurrAge variable gives the Age evaluated in the current filter context. You then plug that into a filter in the CALCULATE line.

How to determine the number of filled drums, and the room left in each drum

Not quite a homework problem, but it may as well be:
You have a long list of positive integer values stored in column A. These are packets in unit U.
A Drum can fit up to 500 U, but you cannot break up packets.
How many drums are required for any given list of values in column A?
This does not have to be the most efficient answer, processing in row order is absolutely fine.
I Think you should be able to solve this with a formula, but the closest I got was
=CEILING(SUM(A1:A1000)/500;1)
Of course, this breaks up packets.
Additionally, this problem requires me to be able to find the room left in each drum used, but emphasis for this question should remain on just the number required.
This cannot be done with a single simple formula. Each drum and packet needs to be counted. However contrary to my comment, for this particular problem a spreadsheet works well, and there is no need for a macro.
First, set B2 to 500 for use in other formulas. If column A is not yet filled, use the formula =RANDBETWEEN(1,B$2) to add some values.
Column C is the main formula that determines how full each drum is. Set C2 to =A2. C3 is =IF(C2+A3>B$2,A3,C2+A3). Fill C3 down to fill the remaining rows.
For column D, use =IF(C2+A3>B$2,B$2-C2,""). However the last row of column D is shorter: =B$2-C21 and change 21 to whatever the last row is.
Finally in column E we find the answer, which is simply =COUNT(D2:D21).
Packets Drum Size How Full Room left in each drum used Number of filled drums
------- --------- -------- --------------------------- ----------------------
206 500 206 294 13
309 309
68 377
84 461 39
305 305 195
387 387 113
118 118
8 126 374
479 479 21
492 492 8
120 120
291 411 89
262 262
108 370 130
440 440 60
88 88
100 188
102 290 210
478 478 22
87 87 413
For OpenOffice Calc, use semicolons ; instead of commas , in formulas.

SAS ARIMA modelling for 5 different variables

I am trying do a ARIMA model estimation for 5 different variables. The data consists of 16 months of Point of Sales. How do I approach this complicated ARIMA modelling?
Furthermore I would like to do:
A simple moving average of each product group
A Holt-Winters
exponential smoothing model
Data is as follows with date and product groups:
Date Gloves ShoeCovers Socks Warmers HeadWear
apr-14 11015 3827 3465 1264 772
maj-14 11087 2776 4378 1099 1423
jun-14 7645 1432 4490 674 670
jul-14 10083 7975 2577 1558 8501
aug-14 13887 8577 6854 1305 15621
sep-14 9186 5213 5244 1183 6784
okt-14 7611 4279 4150 977 6191
nov-14 6410 4033 2918 507 8276
dec-14 4856 3552 3192 450 4810
jan-15 17506 7274 3137 2216 3979
feb-15 21518 5672 8848 1838 2321
mar-15 17395 5200 5712 1604 2282
apr-15 11405 4531 5185 1479 1888
maj-15 11509 5690 4370 1145 2369
jun-15 9945 2610 4884 882 1709
jul-15 8707 5658 4570 1948 6255
Any skilled forecasters out there willing to help? Much appreciated!

SAS reading a file in long format

I have a file in long format, like so:
name weight month cal
bob 80 01 5000
ben 70 01 4989
mary 60 01 3000
bob 81 02 4999
ben 68 02 6000
mary 57 02 2800
...
I would like to create N linear regressions of weight over cal: one for each of the months.
I know how to read the data into a dataset and how to fit a regression model.
I am not sure how I do this in a loop for the N months...
Any pointers?
Many thanks!

Is there any DAX expression for calculating SUM of few rows based on an index?

Suppose I have the following data :
MachineNumber | Duration
01 | 234
01 | 200
01 | 150
02 | 320
02 | 120
02 | 100
I want to know a DAX query which can add 234 + 200 + 150 since it belongs to machine 01 and give me the sum.
If you want to see the Machine
like this table
you have do avoid the automatic sum on your MachineNumber
You can also do a transformation in the PowerQuery editor by specifying that your MachineNumber is a string in place of a Number
To find the total duration machine wise, I chose SUM option from the field and machine wise total sum got displayed.