Dynamic Rolling Window in SAS for correlation calculation - sas

Problem: I have a data set as below -
Comp date time returns
1 12-Aug-97 10:23:38 0.919292648
1 12-Aug-97 10:59:43 0.204139521
1 13-Aug-97 11:03:12 0.31909242
1 14-Aug-97 11:10:02 0.989339371
1 14-Aug-97 11:19:27 0.08394389
1 15-Aug-97 11:56:17 0.481199854
1 16-Aug-97 13:53:45 0.140404929
1 17-Aug-97 10:09:03 0.538569786
2 14-Aug-97 11:43:49 0.427344962
2 14-Aug-97 11:48:32 0.154836294
2 15-Aug-97 14:03:47 0.445415114
2 15-Aug-97 9:38:59 0.696953041
2 15-Aug-97 13:59:23 0.577391987
2 15-Aug-97 9:10:12 0.750949097
2 15-Aug-97 10:22:38 0.077787596
2 15-Aug-97 11:07:57 0.515822161
2 16-Aug-97 11:37:26 0.862673945
2 17-Aug-97 11:42:33 0.400670247
2 19-Aug-97 11:59:34 0.109279307
These are nothing but share price returns for every company at a date and time level.
I need to calculate autocorrelation(degree 1) of returns over a period of 10 days for each Comp and date value combination. As you can see, my time series is not continuous, it has breaks for weekends and public holidays. In such cases, if i need to take a 10 day range, I can't use a intnk function as adding 10 days to the date column might include a saturday/sunday for which I don't have data for and hence, my autocorrelation value will be compromised. How do I make this range dynamic?
I found this question Calculating rolling correlations in SAS that I thought might help but then again, there is the same intnx problem.

You can use the INTERVALDS system option to define a custom interval that fits your needs. See this article for more details.
The basic concept is that you create a dataset containing all of your possible dates (or datetimes) and define an interval value for each one, then tell SAS via the system option to use that dataset when you use a particular interval name. Then use INTNX as normal.
Otherwise, you could just do a PROC FREQ of your data to get the unique days, and then use that to create a day counter; then instead of creating your fromDate with intnx, you can just use SQL to grab the row with a date 10 less than current date.

Related

DAX selecting and displaying the max value of all selected records

Problem
I'm trying to calculate and display the maximum value of all selected rows alongside their actual values in a table in Power BI. When I try to do this with the measure MaxSelectedSales = MAXX(ALLSELECTED(FactSales), FactSales[Value]), the maximum value ends up being repeated, like this:
If I add additional dimensions to the output, even more rows appear.
What I want to see is just the selected rows in the fact table, without the blank values. (i.e., only four rows would be displayed for SaleId 1 through 4).
Does anyone know how I can achieve my goal with the data model shown below?
Details
I've configured the following model.
The DimMarket and DimSubMarket tables have two rows each, you can see their names above. The FactSales table looks like this:
SaleId
MarketId
SubMarketId
Value
IsCurrent
1
1
1
100
true
2
2
1
50
true
3
1
2
60
true
4
2
2
140
true
5
1
1
30
false
6
2
2
20
false
7
1
1
90
false
8
2
2
200
false
In the table output, I've filtered FactSales to only include rows where IsCurrent = true by setting a visual level filter.
Your max value (the measure) is a scalar value (a single value only). If you put a scalar value in a table with the other records, the value just get repeated. In general mixing scalar values and records (tables) does not really bring any benefit.
Measures like yours can be better displayed in a KPI or Multi KPI visual (normally with the year, that you get the max value per year).
If you just want to display the max value of selected rows (for example a filter in your table), use this measure:
Max Value = MAX(FactSales[Value])
This way all filter which are applied are considered in the measures calculation.
Here is a sample:
I've found a solution to my problem, but I'm slightly concerned with query performance. Although, on my current dataset, things seem to perform fairly well.
MaxSelectedSales =
MAXX(
FILTER(
SELECTCOLUMNS(
ALLSELECTED(FactSales),
"id", FactSales[SaleId],
"max", MAXX(ALLSELECTED(FactSales), FactSales[Value])
),
[id] = MAX(FactSales[SaleId])
),
[max]
)
If I understand this correctly, for every row in the output, this measure will calculate the maximum value across all selected FactSales rows, set it to a column named max and then filter the table so that only the current FactSales[SaleId] is selected. The performance hit comes from the fact that MAX needs to be executed for every row in the output and a full table scan would be done when that occurs.
Posted on behalf of the question asker

Power Bi DAX: Divide Value and set it for each week

I have a target value of 20 for January but it is 20 for the month, i need to show this over each week. I have to divide the target by 4 if there are 4 weeks in a month and by 5 if there are 5 weeks in a month. It is as simple as that, i am using a line and clustered column chart to display my data, i need the target spread out into each week of the month. I also need another field to do this but im sure i can replicate your formula and make it applicable.
I have added a WeeksinMonth column that counts how many weeks are in a particular month e.g January has 5 weeks and February has 4 weeks. I need an IF statement that will divide the target and value by how many weeks in a month. e.g if month has 5 weeks then divide target and value by 5, if month has 4 weeks divide target and value by 4.
I have a calendar table with week values which i can used to put the divided target into and also the desired output i made in excel (See attached).
How will i go about this?
Desired Output
Calendar Table
enter code hereYou can create an extra column in your calendar table:
WeekMax =
var stOfMonth = Weeks[Start of Week].[Month]
var stOfYear = Weeks[Start of Week].[year]
return CALCULATE(max(Weeks[Weekinmonth]);FILTER(Weeks;Weeks[Start of Week].[Month] = stOfMonth && Weeks[Start of Week].[Year] = stOfYear))
Make sure you have a relation in your model between calendar and target date. Now you can use this number for that week/date to divide by.
You can now add an extra column in your target table:
WeeklyTarget = Targets[Target]/Related(Calendar[WeekMax])

Increment ID# by 1 if Same month ArrayFormula

I'm trying to set up an array formula in a google sheet to save filling in a simple formula for ID#s.
The sheet is populated by a google form, so it receives a timestamp. Let's say these are orders.
If the month of the order matches that of the previous I want to increase the ID# by one, essentially counting this months orders. The complete ID# is actually made up of several factors, the order count being just one of them (so that they are unique), but for the sake of this exercise, I'll keep it simple.
If the month of the order does not match the previous, then safe to say we've entered the new month and the ID should restart at 01.
I have a column that has the extracted month from the timestamp. So the data looks like this:
A B
ID# MONTH
1 1
2 1
3 1
4 1
5 1
6 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
I can't get the arrayformula to work! I've tried numerous countIfs and Ifs, something like
=ARRAYFORMULA(if(len(B2:B),if(B3:B<>B2:B,1,A2:A+1),""))
Does anyone have any suggestions for this?
I found it hard to Google for and have tried a few search terms!
try:
=ARRAYFORMULA(IF(B1:B<>"", COUNTIFS(B1:B, B1:B, ROW(B1:B), "<="&ROW(B1:B)), ))

Creating and doing Market basket analysis from raw data

I have a data set with me which have many items and their sales data in terms of amount and quantity sold rolled up per week. I want to figure out that is there some correlation between the two or not, trying to access that if sales of one item affecting the other's sale or not, in terms of any positive or negative effect.
Consider the following type of data:
Week # Product # Sale($) Quantity
Week 1 Product 1 1 1
Product 2 2 1
Product 3 3 1
Week 2 Product 1 3 2
Product 3 2 1
Product 6 2 2
Week 3 Product 4 2 1
Product 3 1 2
Product 5 4 2
So,from the above data on week basis, I want to figure out that how can I convert this data into a form of market basket data with the above set of parameters available with me. Since, there isn't any market basket data available.
The parameters I could think of is :
To use the count or occurrences of each product in a given week.
To use the total quantity sold
To use the total sales to find correlation.
So, basically I have to come up with how can an item be correlated to the other of the affinity of one product with the other product.No matter it is positively correlated or negative correlated. The only issue is I do not have any primary key to bind the items with a basket or an order number since it's rolled up sales.
Any answers or help in this topic is highly appreciable. In case you find it incomplete, you can let me know for any further clarity.
You can't do this because you have no information about the co-occurrence. You also have data muddled from daily grain to weekly grain. Aggregates won't permit this.

Modifying data in SAS: copying part of the value of a cell, adding missing data and labeling it

I have three different questions about modifying a dataset in SAS. My data contains: the day and the specific number belonging to the tag which was registred by an antenna on a specific day.
I have three separate questions:
1) The tag numbers are continuous and range from 1 to 560. Can I easily add numbers within this range which have not been registred on a specific day. So, if 160-280 is not registered for 23-May and 40-190 for 24-May to add these non-registered numbers only for that specific day? (The non registered numbers are much more scattered and for a dataset encompassing a few weeks to much to do by hand).
2) Furthermore, I want to make a new variable saying a tag has been registered (1) or not (0). Would it work to make this variable and set it to 1, then add the missing variables and (assuming the new variable is not set for the new number) set the missing values to 0.
3) the last question would be in regard to the format of the registered numbers which is along the line of 528 000000000400 and 000 000000000054. I am only interested in the last three digits of the number and want to remove the others. If I could add the missing numbers I could make a new variable after the data has been sorted by date and the original transponder code but otherwise what would you suggest?
I would love some suggestions and thank you in advance.
I am inventing some data here, I hope I got your questions right.
data chickens;
do tag=1 to 560;
output;
end;
run;
data registered;
input date mmddyy8. antenna tag;
format date date7.;
datalines;
01012014 1 1
01012014 1 2
01012014 1 6
01012014 1 8
01022014 1 1
01022014 1 2
01022014 1 7
01022014 1 9
01012014 2 2
01012014 2 3
01012014 2 4
01012014 2 7
01022014 2 4
01022014 2 5
01022014 2 8
01022014 2 9
;
run;
proc sql;
create table dates as
select distinct date, antenna
from registered;
create table DatesChickens as
select date, antenna, tag
from dates, chickens
order by date, antenna, tag;
quit;
proc sort data=registered;
by date antenna tag;
run;
data registered;
merge registered(in=INR) DatesChickens;
by date antenna tag;
Registered=INR;
run;
data registeredNumbers;
input Numbers $16.;
datalines;
528 000000000400
000 000000000054
;
run;
data registeredNumbers;
set registeredNumbers;
NewNumbers=substr(Numbers,14);
run;
I do not know SAS, but here is how I would do it in SQL - may give you an idea of how to start.
1 - Birds that have not registered through pophole that day
SELECT b.BirdId
FROM Birds b
WHERE NOT EXISTS
(SELECT 1 FROM Pophole_Visits p WHERE b.BirdId = p.BirdId AND p.date = ????)
2 - Birds registered through pophole
If you have a dataset with pophole data you can query that to find if a bird has been through. What would you flag be doing - finding a bird that has never been through any popholes? Looking for dodgy sensor tags or dead birds?
3 - Data code
You might have more joy with the SUBSTRING function
Good luck