average hourly traffic over the year - rrdtool

After hours of searching the web (including SO), I am requesting advice from the community. RRD seems to be the right tool for this, but I could not get a straight answer until now.
My question is : Is it possible to get RRD output a graph for the day, that averages data from the past year ?
In other words, I want the "view span" to be one day long, but the "data span" to extend over the last 12 months, so that for 6pm, the value will be computed as the average value of ALL previous traffic measured at 6pm last 12 months.
Any hints, or instructions welcomed!

There is no direct way to create such a graph, at least in theory it would be possible using multiple DEF lines together with the SHIFT operation to create such a chart ... you would have to use a program to create the necessary command line though

Related

Django query to find value of objects from previous period (if exists)

I have a simple django project and I am trying to keep track of ranks for certain objects to see how they change over time. For example, what was the rank of US GDP (compared to other countries) over last 3 years. Below is the postgres db structure I am working with:
Below is what I am trying to achieve:
What I am finding challenging is that the previous period value may or may not exist and it's possible that even the entity may or may not be in the pervious period. Period can be year, quarter or months but for a specific record it can be either of one and stays consistently same for all the years for that record.
Can someone guide me in the right direction to write a query to achieve those tables? I am trying to avoid writing heavy forloop queries because there may be 100s of entities and many years of data.
So far I have only been able to achieve the below output:
I am just trying to figure out how to use annotate to fetch previous period values and ranks but I am pretty much stuck.

Optimizing / speeding up calculation time in Google Sheets

I have asked a few questions related to this personal project of mine already on this platform, and this should be the last one since I am so close to finishing. Below is the link to a mock example spreadsheet I've created, which mimics what my actual project does but it contains less sensitive information and is also smaller in size.
Mock Spreadsheet
Basic rundown of the spreadsheet:
Pulls data from a master schedule which is controlled/edited by another party into the Master Schedule tab.
In the columns adjacent to the imported data, an array formula expands the master schedule by classroom in case some of the time slots designate multiple rooms. Additional formulas adjust the date, start time, and end time to be capped within the current day's 24-hour period. The start time of each class is also made to be an hour earlier.
In the Room Schedule tab, an hourly calendar is created based on the room number in the first column, and only corresponds to the current day.
I have tested the spreadsheet extensively with multiple scenarios, and I'm happy with how everything works except for the calculation time. I figured the two volatile functions I use would take some processing time just by themselves, and I certainly didn't expect this to be lightning-fast especially without using a script, but the project that I am actually implementing this method for is much larger and takes a very long time to update. The purpose of this spreadsheet is to allow users to find an open room and "reserve" it by clicking the checkbox next to it (which will consequently color the entire row red) allowing everyone else to know that it is now taken.
I'd like to know if there is any way to optimize / speed up my spreadsheet, or to not update it every time a checkbox is clicked and instead update it "manually", similar to what OP is asking here. I am not familiar with Apps Script nor am I well-versed in writing code overall, but I am willing to learn - I just need a push in the right direction since I am going into this blind. I know the number of formulas in the Room Schedule tab is probably working against me yet I am so close to what I wanted the final product to be, so any help or insight is greatly appreciated!
Feel free to ask any questions if I didn't explain this well enough.
to speed up things you should avoid usage of the same formulae per each row and make use of arrayformulas. for example:
=IF(AND(TEXT(K3,"m/d")<>$A$1,(M3-L3)<0),K3+1,K3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3:K, "m/d")<>$A$1)*((M3:M-L3:L)<0), K3:K+1, K3:K+0), ))
=IF(AND(TEXT(K3,"m/d")=$A$1,(M3-L3)<0),TIMEVALUE("11:59:59 PM"),M3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3,"m/d")=$A$1)*((M3-L3)<0), TIMEVALUE("11:59:59 PM"), M3:M+0), ))

Why Amazon Forecast cannot train the predictor?

While training my predictor I came across this error and I got stuck how to fix it.
I have two data-series, a "Target time-series data" with 9234 rows and a single "item_id" and a second one that is "Related time-series data" with the same number of rows as I only have a single id.
I'm setting de data with a window of 180 days, what is exactly the difference between the second and the first number that has appeared on the error, 9414 - 9234 = 180.
We were unable to train your predictor.
Please ensure there are no missing values for any items in the related time series, All items need data until 2020-03-15 00:00:00.0. For example, following items have missing data: item: brl only has 9234/9414 required datapoints starting 1994-06-07 00:00:00.0, please refer to documentation for additional details.
Once my data don't have missing data and it's on a daily basis why is it returning this error?
My data starts on 1994-06-07 and ends on 2019-09-17. Why should I have 9414 data points rather than 9234?
Should I take out 180 days in my "Target time-series data"?
The future values of the related time-series data must be known.
Example of a good related-time series: You know past and future days in which marketing has or will send email newsletters promoting the product you're forecasting. You can use this data as a related-time series.
Example of a bad related-time series: You notice that Google searches for your brand correlated with the sale of your product. As a result you want to use it as a related-time series. Since you don't know how many searches will occur in the future, so you can't use this as a related time series.
In you case, You have TARGET_TIME_SERIES data for 9414 days and you want to predict demand for the next 180 days. That means your RELATED_TIME_SERIES data should be 9594 days.
Edit: I have not tested this with amazon's forecasting product. I'm basing my answer on working with Facebook Prophet (which is one of the models amazon forcast uses). Please let me know if my solution worked.

Google AutoML Importing text items very slow

I'm importing text items to Google's AutoML. Each row contains around 5000 characters and I'm adding 70K of these rows. This is a multi-label data set. There is no progress bar or indication of how long this process will take. Its been running for a couple of hours. Is there any way to calculate time remaining or total estimated time. I'd like to add additional data sets, but I'm worried that this will be a very long process before the training even begins. Any sort of formula to create even a semi-wild guess would be great.
-Thanks!
I don't think that's possible today, but I filed a feature request [1] that you can follow for updates. I asked for both training and importing data, as for training it could be useful too.
I tried training with 50K records (~ 300 bytes/record) and the load took more than 20 mins after which I killed it. I retried with 1K, which ran for 20 mins and then emailed me an error message saying I had multiple labels per input (yes, so what? training data is going to have some of those) and I had >100 labels. I simplified the classification buckets and re-ran. It took another 20 mins and was successful. Then I ran 'training' which took 3 hours and billed me $11. That maps to $550 for 50K recs, assuming linear behavior. The prediction results were not bad for a first pass, but I got the feeling that it is throwing a super large neural net at the problem. Would help if they said what NN it was and its dimensions. They do say "beta" :)
don't wast your time trying to using google for text classification. I am a GCP hard user but microsoft LUIS is far better, precise and so much faster that I can't believe that both products are trying to solve same problem.
Luis has a much better documentation, support more languages, has a much better test interface, way faster.. I don't know if is cheaper yet because the pricing model is different but we are willing to pay more.

Most efficient way to process complex histogram data?

I'm currently implementing a histogram that will show a very large scale data using Qt and I have some doubts about which data structure(s) I should be using for my problem. I will be displaying the amount of queries received from users of an application and the way I should display is as follows -in a single application that will show different histograms upon clicking different "show me this data etc." buttons-
1) Display the histogram of total queries per every month -4 months of data here, I
kept four variables and incremented them as I caught queries belonging to those months
in the CSV file-
2) Display the histogram of total queries per every single day in a selected month -I was thinking of using 4 QVectors to represent the months for this one, incrementing every element of the vector (day), as I come by that specific day -e.g. the vector represents the month of August and whenever I come across a data with 2011-08-XY , I will increment the (XY + 1)th element of that vector by 1- my second alternative is to use 4 QLinkedList's for the sake of better complexity but I'm not sure if the ways I've come up with are efficient enough and I'm willing to listen to any other idea.
3) Here's where things get a bit complicated. Display the histogram of total queries per every hour in a selected day and month. The data represented is multiplied in a vast manner and I don't know which data structure -or combination of structures- I should use to implement this one. A list of lists perhaps?
Any ideas on my problems at 2) and 3) would be helpful, Thanks in advance.
Actually, it shouldn't be too unmanageable to always do queries per hour. Assuming that the number of queries per hour is never greater than the maximum int value, that's only 24 ints per day = 32 bits or 64 depending on your machine. Assuming 32 bits, then you could get up to 28 years worth of data per MB.
There's no need to transfer the month/year - your program can work that out. Just assign hour 0 to the earliest point in your data, which you keep as a constant, then work out the date based on hours passed since then.
This avoids having to have a list of lists or anything fancy - just use an array where each address contains the number of hours since hour 0, and the number of queries for that hour.
Why don't you simply use a classical database?
When you start asking these kind of question I think it is a good time to consider a more robust structure.There are multiple data structures implemented inside any DB, optimized either for different access type. You should considerate at least lookup, insertion, deletion, range queries. There is no structure which is better than the others in all costs, so there is always a trade-off.
Qt has some database classes you can use. I never used the Qt SQL library, but I think you should give it a shot. Fortunately, there is a Qt SQL programming guide at the end of the page linked.