Optimizing / speeding up calculation time in Google Sheets - if-statement

I have asked a few questions related to this personal project of mine already on this platform, and this should be the last one since I am so close to finishing. Below is the link to a mock example spreadsheet I've created, which mimics what my actual project does but it contains less sensitive information and is also smaller in size.
Mock Spreadsheet
Basic rundown of the spreadsheet:
Pulls data from a master schedule which is controlled/edited by another party into the Master Schedule tab.
In the columns adjacent to the imported data, an array formula expands the master schedule by classroom in case some of the time slots designate multiple rooms. Additional formulas adjust the date, start time, and end time to be capped within the current day's 24-hour period. The start time of each class is also made to be an hour earlier.
In the Room Schedule tab, an hourly calendar is created based on the room number in the first column, and only corresponds to the current day.
I have tested the spreadsheet extensively with multiple scenarios, and I'm happy with how everything works except for the calculation time. I figured the two volatile functions I use would take some processing time just by themselves, and I certainly didn't expect this to be lightning-fast especially without using a script, but the project that I am actually implementing this method for is much larger and takes a very long time to update. The purpose of this spreadsheet is to allow users to find an open room and "reserve" it by clicking the checkbox next to it (which will consequently color the entire row red) allowing everyone else to know that it is now taken.
I'd like to know if there is any way to optimize / speed up my spreadsheet, or to not update it every time a checkbox is clicked and instead update it "manually", similar to what OP is asking here. I am not familiar with Apps Script nor am I well-versed in writing code overall, but I am willing to learn - I just need a push in the right direction since I am going into this blind. I know the number of formulas in the Room Schedule tab is probably working against me yet I am so close to what I wanted the final product to be, so any help or insight is greatly appreciated!
Feel free to ask any questions if I didn't explain this well enough.

to speed up things you should avoid usage of the same formulae per each row and make use of arrayformulas. for example:
=IF(AND(TEXT(K3,"m/d")<>$A$1,(M3-L3)<0),K3+1,K3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3:K, "m/d")<>$A$1)*((M3:M-L3:L)<0), K3:K+1, K3:K+0), ))
=IF(AND(TEXT(K3,"m/d")=$A$1,(M3-L3)<0),TIMEVALUE("11:59:59 PM"),M3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3,"m/d")=$A$1)*((M3-L3)<0), TIMEVALUE("11:59:59 PM"), M3:M+0), ))

Related

Django query to find value of objects from previous period (if exists)

I have a simple django project and I am trying to keep track of ranks for certain objects to see how they change over time. For example, what was the rank of US GDP (compared to other countries) over last 3 years. Below is the postgres db structure I am working with:
Below is what I am trying to achieve:
What I am finding challenging is that the previous period value may or may not exist and it's possible that even the entity may or may not be in the pervious period. Period can be year, quarter or months but for a specific record it can be either of one and stays consistently same for all the years for that record.
Can someone guide me in the right direction to write a query to achieve those tables? I am trying to avoid writing heavy forloop queries because there may be 100s of entities and many years of data.
So far I have only been able to achieve the below output:
I am just trying to figure out how to use annotate to fetch previous period values and ranks but I am pretty much stuck.

Comparing data of payments

At my work we have two systems, one that collects the customers payments automatically every month. And one that manages the memberships of those customers. Sadly our outdated technology doesn’t communicate to each other so we don’t know if a customer actually paid for their membership without manually auditing them.
I’ve been put in charge of this process and boy does it take awhile to do.
I have limited knowledge of C++ and was looking into maybe writing a program to do the comparisons for me.
I have two ideas on how to implement this, and was wondering what you guys thought. If these would be best or if it’s even possible or if there’s a better solution?
Current Setup: We have a list of all members in excel, with how much each should be paying, we then go through the actual money collected and check to make sure everyone’s payment went through and was processed and not declined.
Option 1: have a multi-dimensional array of strings. Read the excel file into this array it would have three Columns, first name, last name, amount they should be paying. This would be put in alphabetical order to help with the searching. I would then export the transactions in css file format and read each line one at a time. When it reads a line it would search the array for the same first and last name. Once found it would take the amount paid confirm it said processed and not declined and if so would subtract it from the customers amount they should be paying. In the end if every customers amount they should be paying is equal to 0 then everyone paid.
Option 2: is similar to option 1 just instead of using a multidimensional array it would use two css files. And not put the items into the array at the start.
Thoughts? Is this a smart way to combat this problem? I’m a newbie programmer so I’m just looking for suggestions/advice.
Your solutions would work, but are suited for small datasets. I don't now what your constraints are, but I think that a more elegant solution would be to setup a database on the first system first(instead of the excel file).
Are you allowed to create a database? How many customers are in the excel file?

Update in data warehouse fact table

Reading upon many Kimball design tips regarding fact tables (transaction, accumulating, periodic) etc. I'm still vague what should I do with my case of updating a fact table which I believe is not that uncommon. To the case.
We're processing complaints from clients, and we want to be able to reflect current status of complaint in the Data Warehouse. Our complaints have a workflow of statuses they go through, different assignees that deal with them on time, but for our analysis this is irrelevant as of now. We would like to review what the current situation on complaint is.
To my understanding the grain of the fact table would be single complaint, with columns (irrelevant for this question whether it should be junk dimension, degenerate etc) such as:
Complaint Number
Current Status
Current Status Date
Current Assignee
Type of complaint
As far as I understand, since we don't want to view the process history, but instead see what the current status of the process is, storing multiple rows for each complaint representing it's state is an overkill, so instead we store only one row per complaint and update it.
Now, is my reasoning correct to do that? In above case, complaint number and type of complaint store values that don't change, while "Current" columns do and we need to update the row, so we could implement Change Data Capture mechanism (just like we do for dimensions right now) to compare incoming rows from source system for this fact with currently stored fact rows to improve time cost of such operation.
It honestly looks like a Dimension table with mixed SCD Type 0&1 for me, but it stores facts of receiving complaints.
SO Post for reference: Fact table with information that is regularly updatable in source system
Edit
I'm aware that I could use accumulating fact table with time stamps which is somewhat SCD Type 2 alike but the end user doesn't really care about the history of the process. There are more facts involved in the analysis later on, so separating this need from data warehouse doesn't really work in this case.
I’ve encountered similar use cases in the past, where an accumulating snapshot would be the default solution.
However, the accumulating snapshot doesn’t allow processes with varying length. I’ve designed a different pattern, when 2 rows are added for each event: if an object goes from state A to state B you first insert a row with state A and quantity -1, then a new one with state B and quantity +1.
The end result allows:
- no updates necessary, only inserts;
- map-reduce friendly;
- arbitrary length processes;
- counting how many of each in each state at any point in time (with the help of a periodic snapshot for performance reasons);
- how many entered or left any state at any point in time.;
- calculate time in each state and age overall.
Details in 5 blog posts here (with implementation in Pentaho Data Integration):
http://ubiquis.co.uk/dwh/status-change-fact-table-part-1-the-problem/

Suggestions for statistical model/approach to “Pattern recognition for non-uniform time data”

I have a dataset from which I would like to detect recurring patterns (i.e: daily, weekly, monthly). The dataset only contains a time stamp (datetime), and the spacing is non-uniform.
The observations in the data reflect the exact time when this one person passes my window. He does this several times a day (on a single day he walks by my window approx 10-30 times), and I am trying to see, if there is any pattern (there might also be some seasonality, sudden changes in previous behavior and other interesting stuff going on).
Does anyone have a suggestion for a statistical model/approach that might be helpful in figuring out if there is any pattern in this behavior? Hopefully, I’ll be able to predict when he will pass my window again ;)
How would you approach this?
Any help would really be appreciated.

Modelling EVERY day in Django

I have a booking system for something where the price can change based on the day. The admins for the site can make these changes. If a booking crosses the boundary of a daily rate, they pay pro-rata for the rates they used.
I'm losing confidence in how this is implemented. There are at least two ways:
Having Rates that specify their validity (start, end fields) and then working out which of those apply. But which overlapping ones take priority? Etc. Nasty. This is what we're trying to do and cannot currently answer sufficiently well.
The same except that there is some form of unique quality to date so that no two rates can overlap. The problem here is we'd need to split existing Rates on insert and rejoin two on delete/edit, etc if they had the same value. We'd need to make sure there were no gaps. It requires some heavy ORM overriding.
Keeping a DayRate table with every day defined. This means keeping a load of extra data around but most bookings are for tens of days, not thousands so I'm not worried about the database bandwidth requirements here. Date would be primary-unique and I'd just do a range filter for grabbing which ones I need to factor in.
The problem is generating these dates ahead of time. I know that as soon as I implement this, somebody will make a booking for 2032. Is there a good way around this or should we limit them?
None of these answers seems great and I have to imagine that I'm not the first guy with a booking system. Is there a better way of keeping track of a rate over a contiguous (possibly infinite) amount of time?