How do I get the most recent report data? - google-admin-sdk

I'm trying to build a tool that collects a few data points from a user usage report with
https://www.googleapis.com/admin/reports/v1/usage/{user}/all/dates/{yyyy-mm-dd}
Since the data is delayed - how do I get the most recent report? If I were to query today's (2013-11-22) date I would get something like:
Data for dates later than 2013-11-19 is not yet available. Please check back later
Is there a set number of days/hours for reports to be available - or do I have to trial and error backwards until I get a successful response?

I believe there is a delay of about 48 hours for the reports as of right now. However, if Google is able to improve on that, you'll want your app to be able to take advantage of those improvements without any changes needed.
I suggest you make a first attempt using today's date. When that fails, parse the error response to grab the last date report data is available for and use that value. This way you're always making only 2 max attempts and if Google improves the delay to 24 hours or even less, your app is able to take immediate advantage of that change.

Related

Powerquery & Zabbix API - DataSource.Error (500) Internal Server Error when asking for much data

I am using Powerquery to fetch the data from Zabbix using their API. It works fine when I fetch the data for some days, but as I increase the period and the amount of data surpasses the millions of rows I just get the error below after some time waiting, and the query doesn't return anything.
I am using Web.contents to get the data as follows:
I have added that timeout as you can see above but the error just happens much before 5 minutes have passed. How should I solve this? Are there ways to fetch large amounts of data in power query in parts, without being all at once? Or does this error happen because of connection parameters inherent to zabbix configs.?
My team changed all the possible parameters regarding server memory and nothing seemed to have worked. One thing to notice is that, although power query seems to face the same error (500) internal server error if I get data for a period of 3 days or 30 days, for the first case it shows the error much faster while in the last case it takes much more time and eventually gets to the same error.
Thanks!
It's a PHP memory limit hit, you should modify the maximum memory.
For example, in an Apache standard setup you should edit /etc/httpd/conf.d/zabbix.conf and modify the php_value memory_limit to a greater value (restart apache!).
The default is 128M, the "right" setting depends on the memory available on the system and the maximum data size you want to get.

variable number of tweets using python

I was trying to run the following code and I get variable number of tweets when I keep running the code at some interval of time (more than 15min). Sometimes I get 1400 tweets and 1200,1000,1600 tweets the other time. Can't I get fixed number of tweets all the time I run the code even if i change the keyword?
for tweet in tweepy.Cursor(api.search, q="#narendramodi", rpp=100).items(200):
You search does not specify any id limit.
Because of pagination, Twitter Search API looks for latest tweets every time you call it. Since tweets are added continuously, simple call to Search API returns the most recent ones and you'll get different number of tweets based on how many tweets were posted during the time you were querying. See Working with Timelines.
Please also note that Twitter Search API focuses on relevance rather than completeness of the results. See The Search API.
If you want to iterate over tweets, starting from the moment you run your application and continuing to older tweets, I recommend using max_id in your next query parameters setting it with the id field of the last result from your query as suggested here.

Is there any possibility that deleted data can be recovered back in SAS?

I am working on production environment. Last day accidentally I made changes to Master dataset permanently while trying to get the sample out of it in work directory. Unfortunately they don't have any backup for this data.
I wanted to execute this:
Data work.facttable;
Set Master.facttable(obs=10);
run;
instead of this, accidentally I executed the following:
data Master.facttable;
set Master.facttable(obs=10);
run;
You can clearly see what sort of blunder it was!
Facttable has been building up nearly from 2 long years and it is of 250GB and has millions of rows. Now it has 10 rows and is of 128kb :(
I am very much worried how to recover the data back. It is crucial for the business teams. I have no idea how to proceed to get it back.
I know that SAS doesn't support any rollback options or recovery process. We don't use Audit trail method also.
I am just wondering if there is any way that still we can get the data back in spite of all these.
Details: Dataset is assigned on SPDE Engine. I checked the data files(.dpf) but all were disappeared except yesterday's data file which is of 128kb
You appear to have exhausted most of the simple options already:
Restore from external/OS-level backup
Restore from previous generation via the gennum= data set option (only available if the genmax option was set to 1+ when creating the dataset).
Restore from SAS audit trail
I think that leaves you with just 2 options:
Rebuild the dataset from the underlying source(s), if you still have them.
Engage the services of a professional data recovery company, who might be able to recover some or all of the deleted files, depending on the complexity of your storage environment, and how much of the original 250GB has since been overwritten.
Either way, it sounds as though this may prove to have been an expensive mistake.

Stream Analytics Output

I have a project that uses an event hub to receive data, this is sent every second, the data is received by a website using SignalR, this is all working fine, i have been storing the data in to blob storage via a Stream Analytics Job, but this is really slow to access, and with the amount of data i am receiving off just 6 devices, it will get even slower as this increases, i need to access the data to display historical data on via graphs on the website, and then this is topped up with the live data coming in.
I don't really need to store the data every second, so thought about only storing it every 30 seconds instead, but into a SQL DB, what i am trying to do, is still receive the data every second but only store it every 30, i have tried a tumbling window, but from what i can see, this just dumps everything every 30 seconds instead of the single entries.
am i miss understanding the Tumbling, Sliding and Hopping windows, i am guessing i cannot use them in this way ? if that is the case, i am guessing the only way to do it, would be to have the output db as an input, so i can cross reference the timestamp with the current time ?
unless anyone has any other ideas ? any help would be appreciated.
Thanks
am i miss understanding the Tumbling, Sliding and Hopping windows
You are correct that this will put all events within the Tumbling/Sliding/Hopping window together. However, this is only valid within a group by case, which requires a aggregate function over this group.
There is a aggregate function Collect() which will create an array of the events within a group.
I think this should be possible when you group every event within a 30 second tumbling window using Collect(), then in the next step, CROSS APPLY each record, which should output all received events within the 30 seconds.
With Grouper AS (
SELECT Collect() AS records
FROM Input TIMESTAMP BY time
GROUP BY TumblingWindow(second, 30)
)
SELECT
record.ArrayValue.FieldA AS FieldA,
record.ArrayValue.FieldB AS FieldB
INTO Output
FROM Grouper
CROSS APPLY GetArrayElements(Grouper.records) AS record
If you are trying to aggregate 30 entries into one summary row every 30 seconds then a tumbling window is a good choice. Something like the following should work:
SELECT System.TimeStamp AS OutTime, TollId, COUNT(*) as cnt, sum(TollCharge) as TollCharge
FROM Input TIMESTAMP BY EntryTime
GROUP BY TollId, TumblingWindow(second, 30)
Thanks for the response, I have been speaking to my contact at Microsoft and he suggested something similar, I had also found something like that in various examples online. what I actually want to do, is only update the database with the data every 30 seconds. so I will receive the event, store it, and I will not store it again until 30 seconds have passed. I am not sure how I can do it with and ASA job to be honest, as I need to have a record of the last time it was updated, I actually have a connection to the event hub from my web site, so in the receiver, I am going to perform a simple check, and then store the data from there.

SimpleDB Incremental Index

I understand SimpleDB doesn't have an auto increment but I am working on a script where I need to query the database by sending the id of the last record I've already pulled and pull all subsequent records. In a normal SQL fashion if there were 6200 records I already have 6100 of them when I run the script I query records with an ID greater than > 6100. Looking at the response object, I don't see anything I can use. It just seems like there should be a sequential index there. The other option I was thinking would be a real time stamp. Any ideas are much appreciated.
Using a timestamp was perfect for what I needed to do. I followed this article to help me on my way:http://aws.amazon.com/articles/1232 I would still welcome if anyone knows if there is a way to get an incremental index number.