Increment ID# by 1 if Same month ArrayFormula - if-statement

I'm trying to set up an array formula in a google sheet to save filling in a simple formula for ID#s.
The sheet is populated by a google form, so it receives a timestamp. Let's say these are orders.
If the month of the order matches that of the previous I want to increase the ID# by one, essentially counting this months orders. The complete ID# is actually made up of several factors, the order count being just one of them (so that they are unique), but for the sake of this exercise, I'll keep it simple.
If the month of the order does not match the previous, then safe to say we've entered the new month and the ID should restart at 01.
I have a column that has the extracted month from the timestamp. So the data looks like this:
A B
ID# MONTH
1 1
2 1
3 1
4 1
5 1
6 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
I can't get the arrayformula to work! I've tried numerous countIfs and Ifs, something like
=ARRAYFORMULA(if(len(B2:B),if(B3:B<>B2:B,1,A2:A+1),""))
Does anyone have any suggestions for this?
I found it hard to Google for and have tried a few search terms!

try:
=ARRAYFORMULA(IF(B1:B<>"", COUNTIFS(B1:B, B1:B, ROW(B1:B), "<="&ROW(B1:B)), ))

Related

How to apply conditional formatting to rows?

I have some requirements, I need to format rows by their efficiency. BI is now formatting by columns and has no default settings for rows. This is what it looks like now:
but i need to do like this:
I not good at DAX yet, so i don't understand how to do this. I mean how to formate values for row, for example:
1 1 0 4 5 6 10
2 0 3 0 5 2 4
3 1 3 1 4 2 1
Where "0" is min of the 1st row and "10" is max of the 1st row; In row 2 min is a "0" and "5" is a max and so on.
Can you help me? Any advice or links would be appreciated. Thank you!
I tried creating additional tables for sorting. Some manipulation of measures. But no one method sorts by rows.

Merging Tables Correctly in SAS

Hi I am trying to merge two tables the FormA scores table that I made that is now CalculatingScores with the domain number found in DomainsFormA. I need to merge them by QuestionNum. Here is my code.
proc sql;
create table combined as
select *
from CalculatingScores inner join DomainsFormA
on CalculatingScores.Scores=DomainsFormA.QuestionNum;
quit;
proc print data=combined (obs=15);
run;
This table is what I am trying to get my merged tables to look like but for 15 observations.
Form
Student
QuestionNum
Scores
DomainNum
A
1
1
0
5
A
1
2
1
4
A
1
3
0
5
But My tables look more like this
Form
Student
QuestionNum
Scores
DomainNum
A
1
2
1
5
A
1
4
1
5
A
1
5
1
5
My entire Scores column for these 15 observations have a value of 1. Also my DomainNum column only has values of 5. My Student and Form columns are correct but I need to have varied scores and varied domain numbers. Any ideas for how to solve my problem? Maybe I need a order by statement?
You appear to be joining on the incorrect columns
You coded
on CalculatingScores.Scores=DomainsFormA.QuestionNum
which is joining a score to a question number
perhaps you should be coding
on CalculatingScores.QuestionNum=DomainsFormA.QuestionNum
^^^^^^^^^^^ ^^^^^^^^^^^

Sorting between groups based on a variable other than the one grouped on

I would like to use Pandas groupby to sort groups according to a value within each group. This value is not the one used for the grouping.
I am working with public transport data which tells me the stops and arrival times of different bus trips. Here is a sample of the dataframe (called stopTimes):
trip_id stop_sequence arrival_time
1 3 15:08:00
2 2 16:01:00
1 1 09:00:40
2 3 16:45:00
2 1 07:05:30
1 2 12:03:00
I would like to sort the trips according to the arrival time at the first stop. So the result of the sorting for the above dataframe would be:
trip_id stop_sequence arrival_time
2 1 07:05:30
2 2 16:01:00
2 3 16:45:00
1 1 09:00:40
1 2 12:03:00
1 3 15:08:00
I have been able to achieve this result already by:
timeSortedTrips = stopTimes.loc[stopTimes['stop_sequence']==1].sort_values('arrival_time')['trip_id']
stopTimes['trip_id'] = pd.Categorical(stopTimes['trip_id'],timeSortedTrips)
stopTimes = stopTimes.sort_values(['trip_id','arrival_time'])
However, I am curious: can I achieve this using groupby? If so, would it be more efficient? Additionally, I am new to Python, so if you have even better ideas to do this sorting please point me in that direction.
You can groupby trip_id and within each group, sort by arrival_time
stopTimes.arrival_time = pd.to_datetime(stopTimes.arrival_time)
stopTimes = stopTimes.groupby("trip_id", as_index=False).apply(lambda x: x.sort("arrival_time"))

Creating and doing Market basket analysis from raw data

I have a data set with me which have many items and their sales data in terms of amount and quantity sold rolled up per week. I want to figure out that is there some correlation between the two or not, trying to access that if sales of one item affecting the other's sale or not, in terms of any positive or negative effect.
Consider the following type of data:
Week # Product # Sale($) Quantity
Week 1 Product 1 1 1
Product 2 2 1
Product 3 3 1
Week 2 Product 1 3 2
Product 3 2 1
Product 6 2 2
Week 3 Product 4 2 1
Product 3 1 2
Product 5 4 2
So,from the above data on week basis, I want to figure out that how can I convert this data into a form of market basket data with the above set of parameters available with me. Since, there isn't any market basket data available.
The parameters I could think of is :
To use the count or occurrences of each product in a given week.
To use the total quantity sold
To use the total sales to find correlation.
So, basically I have to come up with how can an item be correlated to the other of the affinity of one product with the other product.No matter it is positively correlated or negative correlated. The only issue is I do not have any primary key to bind the items with a basket or an order number since it's rolled up sales.
Any answers or help in this topic is highly appreciable. In case you find it incomplete, you can let me know for any further clarity.
You can't do this because you have no information about the co-occurrence. You also have data muddled from daily grain to weekly grain. Aggregates won't permit this.

Dynamic Rolling Window in SAS for correlation calculation

Problem: I have a data set as below -
Comp date time returns
1 12-Aug-97 10:23:38 0.919292648
1 12-Aug-97 10:59:43 0.204139521
1 13-Aug-97 11:03:12 0.31909242
1 14-Aug-97 11:10:02 0.989339371
1 14-Aug-97 11:19:27 0.08394389
1 15-Aug-97 11:56:17 0.481199854
1 16-Aug-97 13:53:45 0.140404929
1 17-Aug-97 10:09:03 0.538569786
2 14-Aug-97 11:43:49 0.427344962
2 14-Aug-97 11:48:32 0.154836294
2 15-Aug-97 14:03:47 0.445415114
2 15-Aug-97 9:38:59 0.696953041
2 15-Aug-97 13:59:23 0.577391987
2 15-Aug-97 9:10:12 0.750949097
2 15-Aug-97 10:22:38 0.077787596
2 15-Aug-97 11:07:57 0.515822161
2 16-Aug-97 11:37:26 0.862673945
2 17-Aug-97 11:42:33 0.400670247
2 19-Aug-97 11:59:34 0.109279307
These are nothing but share price returns for every company at a date and time level.
I need to calculate autocorrelation(degree 1) of returns over a period of 10 days for each Comp and date value combination. As you can see, my time series is not continuous, it has breaks for weekends and public holidays. In such cases, if i need to take a 10 day range, I can't use a intnk function as adding 10 days to the date column might include a saturday/sunday for which I don't have data for and hence, my autocorrelation value will be compromised. How do I make this range dynamic?
I found this question Calculating rolling correlations in SAS that I thought might help but then again, there is the same intnx problem.
You can use the INTERVALDS system option to define a custom interval that fits your needs. See this article for more details.
The basic concept is that you create a dataset containing all of your possible dates (or datetimes) and define an interval value for each one, then tell SAS via the system option to use that dataset when you use a particular interval name. Then use INTNX as normal.
Otherwise, you could just do a PROC FREQ of your data to get the unique days, and then use that to create a day counter; then instead of creating your fromDate with intnx, you can just use SQL to grab the row with a date 10 less than current date.