How to clean azure iothub built-in endpoint data? Retention time set at 1 day, data staying months - endpoint

My iothub built-in endpoint retention time is set for 1 day but the data has not been automatically deleted in months. The retention time cannot be changed, nor by the slider nor by typing a number.
I followed the tutorials to read data from the endpoint using .net & java; My .net and java apps successfully read the built-in endpoint data sent by my Raspberry device to the iothub.
The problem is that when the apps are launched, they first read all the 8k+ messages stored somewhere in the endpoint.
My setup is for learning: a "pay-as-you-go" subscription with a free iothub tier.
Earlier today I used all of my 8k message allowance when I ran the java app and all the messages were sent to a blob and then to my app (I believe that's what happened).
After my allowance was reset, I deleted the blob containers from my storage, re-run the java app, still first received all the 8k+ messages before starting to receive the new Raspberry messages, but today's allowance was reduced only about 250 messages, just the messages sent during the test
Thank you for your help.

Related

How to do "live request batching" in gcloud

Here is my situation:
I have a rather slow tensorflow model that runs on GPU (2 to 3 seconds per prediction)
A prediction for a single 'entity' vs a prediction for 8 'entities' takes about the same time
This means I could be 8 times as efficient by simply combining multiple predictions in the same request
I have a service on AI platform serving requests to that model
The service works for slow request rates but has trouble scaling up (anything over 4 QPS is too much to handle)
My question then is:
Is there a standard way / best practice for batching live client requests:
When receiving a request, wait a little bit for other requests
After a while, or when the number of requests reaches a set number, forward the requests in a single "batch" to another service.
If traffic is low, the delay will expire before the batch is full, but since traffic is low, that's not an issue
If traffic is high, the batch will be full before the delay, and the client will have to wait less
I have an almost-working solution with app-engine + firebase (for hosting the shared 'queue') but implementing the delay is giving me trouble (app engine doesn't seem to like python's threading.Timer
I'd appreciate something that could work with app engine, but at this point I'm open to any suggestions (as long as it is applicable on google cloud).
Thanks!
The perfect (but not the cheapest) is to use Dataflow.
When a prediction request comes in, publish it in PubSub
Deploy a dataflow in streaming mode, with fixed windows of X minutes, and another trigger, not accumulated, after Y event in the window.
When a window trigger is performed (either on the number of messages or on the timer) do the batch processing
You can imagine other designs, simpler/cheaper.
Still publish the prediction requests in PubSub
You can schedule a Cloud Functions, or a Cloud Run every X minutes to pull the pubsub subscription and then to trigger the batch job. But, it's a fixed time.
When you publish the message in PubSub, you can also store, in firestore for example, and increase a counter and the date of the 1st message published in PubSub.
If the number of message is above your threshold, perform a request to your other process that pull the PubSub subscription and run the batch processing (as before #1). Reset the counter value and the message date value
Set up a cloud scheduler which check, every minute, the value of the 1st message date in Firestore. If it's above your time limit, perform a request to your other process that pull the PubSub subscription and run the batch processing (as before #1). Reset the counter value and the message date value
The #2 will generate a lot of Firestore read/write, but will be cheaper than dataflow.

Postmates - webhook: determining the actual pickup_complete and delivered_complete

so I am looking into the postmates API and I have been able to create a delivery. This was great, I also setup a webhook url with ngrok to test the response from postmates but I am totally stumped as to how to determine when the pickup was actually completed and the dropoff/delivery was actually completed.
I saved all of the responses in a database and each time I did the test delivery, I received exactly 70 calls from the webhook endpoint. And each time 47 of them were in regards to the 'kind': 'event.delivery_status'. Here are the stats:
THIS IS ALL IN TEST MODE WITH THE SANDBOX...
11 of those are 'status':'pickup_complete'
14 of those are 'status':'pickup'
11 of those are 'status':'dropoff'
11 of those are 'status':'delivered'
all of the webhook responses for status=delivered have a 'data.courier_imminent':false value.
I went to the webpage for the 'data.tracking_url' and when the webpage showed that the delivery was complete, I immediately updated the database to see how many records that I had saved and I was only at 32 total records. this means that the webhook was continuing to send me updates after it was supposedly complete.
Lastly, all of these statuses are not in order, they are totally random, in fact the 6th to last record that was received was a pickup_complete status..
The real question:
how will I know what is actually a picked=completed, delivered=complete etc..
You'll receive a webhook of type event.delivery_status. One of the field within the body of the payload will be {status: "delivered"}. This has been accurate so far. Postmates doesn't return adelivered_at` timestamp, but you could create your own timestamp and store it along with the delivery for reporting.
As for the number of webhooks, Postmates has a delivery robot (called robo) that moves as if it was a real postmate. You'll receive a lot of webhooks of type event.courier_update with the updated location.

Google Cloud PubSub Message Delivered More than Once before reaching deadline acknowledgement time

Background:
We configured cloud pubsub topic to interact within multiple app engine services,
There we have configured push based subscribers. We have configured its acknowledgement deadline to 600 seconds
Issue:
We have observed pubsub has pushed same message twice (more than twice from some other topics) to its subscribers, Looking at the log I can see this message push happened with the gap of just 1 Second, Ideally as we have configured ackDeadline to 600 seconds, pubsub should re-attempt message delivery only after 600 seconds.
Need following answers:
Why same message has got delivered more than once in 1 second only
Does pubsub doesn’t honors ackDeadline configuration before
reattempting message delivery?
References:
- https://cloud.google.com/pubsub/docs/subscriber
Message redelivery can happen for a couple of reasons. First of all, it is possible that a message got published twice. Sometimes the publisher will get back an error like a deadline exceeded, meaning the publish took longer than anticipated. The message may or may not have actually been published in this situation. Often, the correct action is for the publisher to retry the publish and in fact that is what the Google-provided client libraries do by default. Consequently, there may be two copies of the message that were successfully published, even though the client only got confirmation for one of them.
Secondly, Google Cloud Pub/Sub guarantees at-least-once delivery. This means that occasionally, messages can be redelivered, even if the ackDeadline has not yet passed or an ack was sent back to the service. Acknowledgements are best effort and most of the time, they are successfully processed by the service. However, due to network glitches, server restarts, and other regular occurrences of that nature, sometimes the acknowledgements sent by the subscriber will not be processed, resulting in message redelivery.
A subscriber should be designed to be resilient to these occasional redeliveries, generally by ensuring that operations are idempotent, i.e., that the results of processing the message multiple times are the same, or by tracking and catching duplicates. Alternatively, one can use Cloud Dataflow as a subscriber to remove duplicates.

For Facebook webhook updates, how long does it normally take for a page post to be relayed by a webhook to a callback URL?

I'm troubleshooting an issue with a Node application I've inherited serving as a webhook callback endpoint.
To debug, I'm posting messages to a page that has been subscribed by the Facebook app associated with the endpoint and following my Node app's log.
After several hours, I still see no update requests from Facebook for my page posts.
Comparing timestamps on posts to my app's logs for the last update requests it received (several days ago), I see that it appears that there was about an 8 hour lag between the post and the update request.
I've searched the documentation for help but could only find this:
Update notifications are aggregated and sent in a batch of up to 1000 updates.
If any update sent to your server fails, we will retry immediately, then try a few more times with decreasing frequency over the next 24 hours. Your server should handle deduplication in these cases. Updates unaccepted for 24 hours will be dropped.
This gives me the impression that updates are not instantaneous. But are several hour delays the norm?
Can anybody with more experience with Graph API webhooks provide a ballpark for normal lag?

Delayed SES Stats Updation

I am noticing AWS SES stats are not being updated in real-time. After sending email, it takes time for sent count to increase on SES Dashboard. Sometimes it takes few minutes and sometimes it takes long.
Has anyone also experienced this? Any thoughts?
On the assumption that the console is simply making a call to a standard API action (rather than using some kind a console-only backend service that is not documented or user-accessible -- such things are not unheard-of, but are pretty rare in AWS, so it's a reasonably safe assumption), it looks like this is not really designed to be real-time. The stats are reported in 15 minute windows.
From the SES API reference:
GetSendStatistics
Returns the user's sending statistics. The result is a list of data points, representing the last two weeks of sending activity.
Each data point in the list contains statistics for a 15-minute interval.
— http://docs.aws.amazon.com/ses/latest/APIReference/API_GetSendStatistics.html
AWS/SES dashboard stats are for pure hint performace but not to rely on them. In such case, if you want to have real time notifications of sent emails you will need to create SNS notifications. Keep in mind that Spam-Complaint notifications can take up to a couple of days as this is based on information provided by the ISP to Amazon. And complaints within the Gmail evil-system will NEVER get to you.