Is there a way to implement a health check service for Eventhub - queue in Azure for resource monitoring - azure-eventhub

Requirement -
From webMethods we are sending event driven messages to Azure Eventhub queues over http. We are looking for an option to have a health check service on the availability of the queue rather than the landing zone to handle transient errors.
What are we trying to achieve -
We are basically trying to implement a transient error handler over resource monitoring in webMethods to avoid unnecessary automated prod alerts for high volume interfaces. Also to have a automated suspend and retry mechanism of feeds rather than doing it manually.
Please do let me know if there is a way to implement this solution.

There are four transient exceptions in Azure Event Hubs messaging which can be generated by using .NET framework API.
Microsoft.ServiceBus.Messaging.MessagingException
Microsoft.ServiceBus.Messaging.ServerBusyException
Microsoft.Azure.EventHubs.ServerBusyException
Microsoft.ServiceBus.Messaging.MessagingCommunicationException
Additionally from this, there are 3 top approaches to monitor the health status of Azure Event Hubs.
Metric Based Monitoring
Azure Event Hub Logging
Activity Log
Diagnostic Log
Status Monitoring
Note: Monitoring the Azure resource status does not mean monitoring the physical status of a resource, but it also includes monitoring availability, performance, reliability, and consumption.
To know more about these approaches and to implement them, please refer this third-party tutorial.

Related

Which is the best place where to consume kafka topic in Google cloud platform?

we have a microservices architecture developed on Google cloud.
Actually the microservices are all running on cloud run and talk each other with rest (sync) or with pub/sub (async).
It is an event-driven pattern so when a service publish something happened (like "user_created") on the right pub/sub topic, many services receive that event with a push subscription on their http endpoint.
Now we are moving to kafka for message ordering and replaying features.
Unfortunately kafka consumers are pull based so we need to change the way services are receiving events.
Since cloud run is a serverless solutions that scale to zero, we cannot make it listen to kafka topic, because the service could shut down during the night because no request arrive.
We have different services which can safely be updated with a scheduled cron, so every one hour as example, we make a get request to service, which download all new kafka messages and update itself accordingly.
But many other services need a near real time update to accomplish their role.
So which product of google cloud platform is best suited to consume kafka topic in this architecture?
Thanks!

Difference between AWS CloudWatch and AWS CloudWatch Events

Was studying about Amazon web services and fundamentals when came across these 2 concepts:
Amazon CloudWatch
Amazon CloudWatch Events
Even while going through the official documents on AWS, I couldn't find a difference between the two even when Amazon mentions that they are different. Excerpt is:
CloudWatch provides you with data and actionable insights to monitor
your applications, respond to system-wide performance changes,
optimize resource utilization, and get a unified view of operational
health. CloudWatch collects monitoring and operational data in the
form of logs, metrics, and events, providing you with a unified view
of AWS resources, applications, and services that run on AWS and
on-premises servers. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications
running smoothly.
Documentation of AWS CloudWatch
Amazon CloudWatch Events delivers a near real-time stream of system
events that describe changes in Amazon Web Services (AWS) resources.
Using simple rules that you can quickly set up, you can match events
and route them to one or more target functions or streams. CloudWatch
Events becomes aware of operational changes as they occur. CloudWatch
Events responds to these operational changes and takes corrective
action as necessary, by sending messages to respond to the
environment, activating functions, making changes, and capturing
state information.
Documentation of AWS CloudWatch Events
CloudWatch
CloudWatch is a monitoring service for your AWS resources. You can log your log files. By default the resources created within AWS logs in CloudWatch(CW). You can monitor the performance of resources too for example you can monitor how is the CPU utilisation of your EC2 instances. You can set Alarms for your resources
threshold and get an SNS alert on that. For example you can create an Alarm for your DynamoDB if Write capacity is exceeding. You can set an alarm for your billing too. So basically CW is used as a Monitoring solution.
CloudWatch Events
CW Events is also the part of CloudWatch. CloudWatch Events is helpful when you want to schedule something. Say you to want run your lambda every other day, you can create a Rule for that or you want to trigger your lambda by Event Pattern. There are bunch of services supported by CloudWatch Events, you can use anyone of them as your target not just Lambda. Event Buses is used to send your events to multiple accounts also. For example if you have a CICD account and every month you bake new AMI there, to notify all accounts you can use Event Buses, after getting the event from Event Buses other accounts can trigger some important tasks.

Is there a service or framework in Native AWS for task management?

I am looking for a service or framework in Native AWS which given, a csv file, creates a task and process that task asynchronously and returns a task id or job id to the client and notifies the client when the task is completed. Some requirements for this:
Client should be able to check the progress of the task by job id at any time.
Processing of entire task can take more than 15 mins.
There should be a way for clients to see the reasons of failures.
All the business logic would be at line item level. (this is the only thing developer should care about)
Is there any in-built service or framework for that in Native AWS? I know one can build this kind of service using some SQS, Lambda, SNS, Dynamodb but I am just looking if there is a already available AWS offering for it, which can do all of these?
The closest service to this concept is AWS Step Functions.
However, it would just be one component of a solution. You would still need to create the compute component by using Amazon EC2 or AWS Lambda. You would need to build the interface for users, add authentication, notifications, etc.
Bottom line: There is no AWS service that does what you describe. However, there are the building blocks if you wish to create one yourself.

Google Cloud - Detecting Offline Devices

I am rather new to Google Cloud IoT Core and the associated services, and have come across a problem for which I can find no "best practice" solution.
Using Google Cloud IoT Core to receive telemetry data from IoT Devices, what is the best way to detect when an IoT Sensor Device goes offline or becomes silent? Other Cloud based IoT Service implementations have built-in notification timeouts for generating alerts, but I can find no similar for Google IoT
Example: A number of IoT Edge devices monitors the temperature of cold storage rooms, and pushes a measurement every minute to a Google Cloud IoT Core, via MQTT or HTTP through WiFi or mobile data connections. If the measured temperature exceeds acceptable limits, an alert message is triggered, and routed to operational service personnel.
However, if one of the IoT Edge sensors suddenly stops operating, for whatever reason, how can this be detected by Google Cloud IoT services? Obviously, the only sign of something being wrong, is that no messages have been received from a certain DeviceID for a period substantially longer than the configured messaging-interval, e.g. 2 x interval + grace_period, so that an alert can be generated to warn of a lack of telemetry data, possibly caused by a power failure, which needs to be addressed?
Is there any standard-means by which an "IoT Device Presence" status can be automatically maintained for each device, based on the (lack of) received telemetry data from the device, in such a way, that the state change (online/offline transitions) can cause alert messages to be generated?
Or will it require a separate scheduled service to iterate all (supposedly active) devices, measuring the duration since the last received telemetry (temperature) update, and updating the device presence status directly?
Assuming you just want disconnect events, there was a solution posted earlier that involves setting up StackDriver logs that exports messages to Pub/Sub. From there, you can handle the event in a Cloud Function to send an email in a similar way to what is available in your listed implementation. It takes more time to set up, but is more flexible in terms of what you can do with connect/disconnect events.
Google Core IoT Device Offline Event or Connection Status

How do I set up CloudWatch to detect when an EC2 instance goes down?

I've got an app running on AWS. How do I set up Amazon CloudWatch to notify me when the EC2 instance fails or is no longer responsive?
I went through the CloudWatch screens, and it appears that you can monitor certain statistics, like CPU or disk utilization, but I didn't see a way to monitor an event like "the instance got an http request and took more than X seconds to respond."
Amazon's Route 53 Health Check is the right tool for the job.
Route 53 can monitor the health and performance of your application as well as your web servers and other resources.
You can set up HTTP resource checks in Route 53 that will trigger an e-mail notification if the server is down or responding with an error.
http://eladnava.com/monitoring-http-health-email-alerts-aws/
To monitor an event in CloudWatch you create an Alarm, which monitors a metric against a given threshold.
When creating an alarm you can add an "action" for sending a notification. AWS handles notifications through SNS (Simple Notification Service). You can subscribe to a notification topic and then you'll receive an email for you alarm.
For EC2 metrics like CPU or disk utilization this is the guide from the AWS docs: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_AlarmAtThresholdEC2.html
As answered already, use an ELB to monitor HTTP.
This is the list of available metrics for ELB:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html#available_metrics
To answer your specific question, for monitoring X seconds for the http response, you would set up an alarm to monitor the ELB "Latency".
CloudWatch monitoring is just like you have discovered. You will be able to infer that one of your instances is frozen by taking a look at the metrics, but CloudWatch won't e.g. send you an email when your app is down or too slow, for example.
If you are looking for some sort of notification when your app or instance is down, I suggest you to use a monitoring service. Pingdom is a good option. You can also set up a new instance on AWS and install a monitoring tool, like Nagios, which would be my preferred option.
Good practices that are always worth, in the long road: using load balancing (Amazon ELB), more than one instance running your app, Autoscaling (when an instance is down, Amazon will automatically start a new one and maintain your SLA), and custom monitoring.
My team has used a custom monitoring script for a long time, and we always knew of failures as soon as they occurred. Basically, if we had two nodes running our app, node 1 sent HTTP requests to node 2 and node 2 to 1. If any request took more than expected, or returned an unexpected HTTP status or response body, the script sent an email to the system admins. Nowadays, we rely on more robust approaches, like Nagios, which can even monitor operating system stuff (threads, etc), application servers (connection pools health, etc) and so on. It's worth every cent invested in setting it up.
CloudWatch recently added "status check" metrics that will answer one of your questions on whether an instance is down or not. It will not do a request to your Web server but rather a system check. As previous answer suggest, use ELB for HTTP health checks.
You could always have another instance for tools/testing, that instance would try the http request based on a schedule and measure the response time, then you could publish that response time with CloudWatch and set an alarm when it goes over a certain threshold.
You could even do that from the instance itself.
As Kurst Ursan mentioned above, using "Status Check" metrics is the way to go. In some cases you won't be able to browse that metrics (i.e if you;re using AWS OpsWorks), so you're going to have to report that custom metric on your own. However, you can set up an alarm built on a metric that always matches (in an OK sate) and have the alarm trigger when the state changes to "INSUFFICIENT DATA" state, this technically means CloudWatch can't tell whether the state is OK or ALARM because it can't reach your instance, AKA your instance is offline.
There are a bunch of ways to get instance health info. Here are a couple.
Watch for instance status checks and EC2 events (planned downtime) in the EC2 API. You can poll those and send to Cloudwatch to create an alarm.
Create a simple daemon on the server which writes to DynamoDB every second (has better granularity than Cloudwatch). Have a second process query the heartbeats and alert when missing.
Put all instances in a load balancer with a dummy port open that that gives a TCP response. Setup TCP health checks on the ELB, and alert on unhealthy instances.
Unless you use a product like Blue Matador (automatically notifies you of production issues), it's actually quite heinous to set something like this up - let alone maintain it. That said, if you're going down the road, and want some help getting started using Cloudwatch (terminology, alerts, logs, etc), start with this blog: How to Monitor Amazon EC2 with CloudWatch
You can use CloudWatch Event Rule to Monitor whenever any EC2 instance goes down. You can create an Event rule from CloudWatch console as following :
In the CLoudWatch Console choose Events -> rule
For Event Pattern, In service Name Choose EC2
For Event Type, Choose EC2 Instance State-change Notification
For Specific States, Choose Stopped
In targets Choose any previously created SNS topic for sending a notification!
Source : Create a Rule - https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-Input-Transformer-Tutorial.html#input-transformer-create-rule
This is not exactly a CloudWatch alarm, however this serves the purpose of monitoring/notification.