How to get the current timestamp from the cursor position for Azure EventHub - azure-eventhub

I am using EventProcessorHost for reading messages from EventHub. It maintains the checkpoints in a blob storage in the following format
{
"PartitionId": "0",
"Owner": "xxxxxxxxxxxxxxxxxx",
"Token": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx",
"Epoch": 7370,
"Offset": "12116110271960",
"SequenceNumber": 106597952
}
I want to know if there is a way to find out the timestamp of the events being read using the above information.
I am planning on using the same for creating a simple application that will show the status of read per partition and alert in case the backlog on the partition is growing.

You can compare the Message Sequence of the current message being processed, against the last known sequence number generated for a partition. The difference between these numbers is 'how far behind' the latest message your processing has fallen, and therefore how many messages need to be processed to catch up.
I wrote an article that shows how to achieve this using Azure Functions - but the concept would be the same - calculate the number of messages in the backlog using the Message Sequence\Partition Sequence technique & turn that into a metric and record it somewhere. That lets you visualise on a dashboard like Grafana and alert when it breaches a threshold - I use Azure Monitor and a dynamic metric alert to do this. I also use the metric to scale out my processing logic, so it's useful to capture this.

Related

Metric for Number of unacknowledged messages older than 20 minutes

I am trying to set up alerts on pubsub in gcp that monitor the number of old messages in a queue. Specifically the number of unacknowledged messages older than 20 minutes.
I want an alert that because number of unacknowledged messages cloud shoot high on sudden push of hugh number of messages. And using only the oldest unacknowledged message will run the alert for outlier messages that might stuck in the queue (ex bad formatted messages etc..)
I've tried to combine both metrics but could not know how to filter on one of them.
fetch pubsub_subscription |
{
t_0: metric 'pubsub.googleapis.com/subscription/num_undelivered_messages';
t_1: metric 'pubsub.googleapis.com/subscription/oldest_unacked_message_age'
}
| outer_join 0 # how to filter now on oldest_unacked_message_age > 20 minutes and select num_undelivered_messages
Also I think this won't work as my understanding of cloud pubsub metrics because each metric is a single time series number. It does not have information about individual messages (correct me if I am wrong).
Also I've tried to look for a metic that have them both but can't find one as well.
You can deploy an alert of undelivered messages in Google Cloud Monitoring. You will find the Pub/Sub subscription resource type, and then you can set a filter based on the response_code. Also, you can create a new chart based on your needs.

Is there a way to retrieve the count of messages in a PubSub subscription (in realtime)?

I want to achieve batch consuming of a PubSub subscription, retrieving all the messages that were in the subscription at the begining of my process. To do so, I use PubSub's asynchronous pulling for Java, and the consumer.ack() and consumer.nack() functions to process exactly the number of messages that I want, and make the subscription redeliver the messages that I have received but not processed yet. My problem being that I did not managed to find a way to retrieve the real time count of messages in my subscription.
I have started to request pubsub.googleapis.com/subscription/num_undelivered_messages metric from Google Cloud Monitoring, but unfortunately the metric has a ~3 minutes latency with the real count of undelivered messages in the subscription.
Is there any way to retrieve this message count on real time ?
There is no way to retrieve the message count in real time, no. Also keep in mind that such a number would not be sufficient to retrieve all of the messages that were in the subscription at the beginning of the process unless you can guarantee that no publishing is happening at the same time.
If there is publishing, then your subscriber could get those messages before messages published earlier, unless you are using ordered message delivery and even still, those delivery guarantees are per ordering key, not a total ordering guarantee. If you can guarantee that there are no publishes during this time and/or you are only bringing the subscriber up periodically, then it sounds more like a batch case, which means you may want to consider a database or a GCS file as an alternative place to store the messages for processing.

GCP Alert Filters Don't Affect Open Incidents

I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).
The way I defined the alert is:
And the secondary aggregator is delta.
The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).
ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.
Is there something that I'm missing?
Edit:
Adding another image to show what I think might be the problem:
From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work
According to the official documentation:
"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."
That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:
"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."
The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).
So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.

REST API for monitoring undelivered message in google cloud pubsub

I want to implement a service to monitor an undelivered messages and send notification when it reach threshold or process further.
I already look through the Stackdriver. It provide me the monitoring and alert that It only provide the API to get the metricDescriptor but it does not provide an API to get the undelivered message as you can see in Stackdriver Monitoring API.
Is there actually an provided API to get the metrics value?
You can get the values via the projects.timeSeries.list method. You would set the name to projects/<your project>, filter to metric.type = "pubsub.googleapis.com/subscription/num_undelivered_messages", and end time (and if a range of values is desired, the start time as well) to a string representing a time in RFC3339 UTC "Zulu" format, e.g., 2018-10-04T14:00:00Z. If you want to look at a specific subscription, set the filter to metric.type = "pubsub.googleapis.com/subscription/num_undelivered_messages" AND resource.label.subscription_id = "<subscription name>".
The result will be one or more TimeSeries types (depending on whether or not you specified a specific subscription) with the points field including the data points for the specified time range, each of which will have the value's int64Value set to the number of messages that have have not been acknowledged by subscribers.

Status of kinesis stream reader

How do I tell what percentage of the data in a Kinesis stream a reader has already processed? I know each reader has a per-shard checkpoint sequence number, and I can also get the StartingSequenceNumber of each shard from describe-stream, however, I don't know how far along in my data the reader currently is (I don't know the latest sequence number of the shard).
I was thinking of getting a LATEST iterator for each shard and getting the last record's sequence number, however that doesn't seem to work if there's no new data since I got the LATEST iterator.
Any ideas or tools for doing this out there?
Thanks!
I suggest you implement a custom metric or metrics in your applications to track this.
For example, you could append a message send time within your Kinesis message, and on processing the message, record the time difference as an AWS CloudWatch custom metric. This would indicate how close your consumer is to the front of the stream.
You could also record the number of messages pushed (at the pushing application) and messages received at the Kinesis consumer. If you compare these in a chart on CloudWatch, you could see that the curves roughly follow each other indicating that the consumer is doing a good job at keeping up with the workload.
You could also try monitoring your Kinesis consumer, to see how often it idly waits for records (i.e, no results are returned by Kinesis, suggesting it is at the front of the stream and all records are processed)
Also note there is not a way to track a "percent" processed in the stream, since Kinesis messages expire after 24 hours (so the total number of messages is constantly rolling). There is also not a direct (API) function to count the number of messages inside your stream (unless you have recorded this as above).
If you use KCL you can do that by comparing IncomingRecords from the cloudwatch built-in metrics of Kinesis with RecordsProcessed which is a custom metric published by the KCL.
Then you select a time range and interval of say 1 day.
You would then get the following type of graphs:
As you can see there were much more records added than processed. By looking at the values in each point you will know exactly if your processor is behind or not.