Azure EventHub / Azure Monitoring - Create alert when incoming bytes exceed outgoing bytes, preferrably by a certain amount? - azure-eventhub

I want to know when my consumer group fails to handle the amount of events coming in into EventHub. Looking at the metrics, I think the symptom is incoming bytes exceed outgoing bytes.
In Azure portal, I only see alert condition when incoming bytes greater than a static number, which is not what I want. Is it even possible to set up condition like this?

I have checked Azure Monitor docs and alerts don't seem to support metric comparison.
One solution might be to create an Azure Logic App which can store and remember incoming/outgoing byte values and then compare them on a scheduled basis.
For variable support you can look at - https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-create-variables-store-values

Related

Estimate SQS processing time and load

I am going to use AWS SQS(regular queue, not FIFO) to process different client side metrics.
I’m expect to have ~400 messages per second (worst case).My SQS message will contain S3 location of the file.
I created an application, which will listen to my SQS Queue, and process messages from it.
By process I mean:
read SQS message ->
take S3 location from that SQS message ->
call S3 client ->
Read that file ->
Add a few additional fields —>
Publish data from this file to AWS Kinesis Firehose.
Similar process will be for each SQS message in the Queue. The size of S3 file is small, less than 0,5 KB.
How can calculate if I will be able to process those 400 messages per second? How can I estimate that my solution would handle x5 increase in data?
How can calculate if I will be able to process those 400 messages per second? How can I estimate that my solution would handle x5 increase in data?
Test it! Start with a small scale, and do the math to extrapolate from there. Make your test environment as close to what it will be in production as feasible.
On a single host and single thread, the math is simple:
1000 / AvgTotalTimeMillis = AvgMessagesPerSecond, or
1000 / AvgMessagesPerSecond = AvgTotalTimeMillis
How to approach testing this:
Start with a single thread and host, and generate some timing metrics for each step that you outlined, along with a total time.
Figure out your average/max/min time, and how many messages per second that translates to
400 messages per second on a single thread & host would be under 3ms per message. Hopefully this makes it obvious you need multiple threads/hosts.
Scale up!
Now that you know how much a single thread can handle, figure out how many threads a single host can effectively handle (you'll need to experiment). Consider batching messages where possible - SQS provides batch operations.
Use math to calculate how many hosts you need
If you need 5X that number, go up from there
While you're doing this math, consider any limits of the systems you're using:
Review the throttling limits of SQS / S3 / Firehose / etc. If you plan to use Lambda to do the work instead of EC2, it has limits too. Make sure you're within those limits, and consider contacting AWS support if you are close to exceeding them.
A few other suggestions based on my experience:
Based on your workflow outline & details, using EC2 you can probably handle a decent number of threads per host
M5.large should be more than enough - you can probably go smaller, as the performance bottleneck will likely be networking I/O to fetch and send messages.
Consider using autoscaling to handle message spikes for when you need to increase throughput, though keep in mind autoscaling can take several minutes to kick in.
The only way to determine this is to create a test environment that mirrors your scenario.
If your solution is designed to handle messages in parallel, it should be possible to scale-up your system to handle virtually any workload.
A good architecture would be to use AWS Lambda functions to process the messages. Lambda defaults to 1000 concurrent functions. So, if a function takes 3 seconds to run, it would support 333 messages per second consistently. You can request for the Lambda concurrency to be increased to handle higher workloads.
If you are using Amazon EC2 instead of Lambda functions, then it would just be a matter of scaling-out and adding more EC2 instances with more workers to handle whatever workload you desired.

CloudWatch Agent: batch size equal to "1" - is it a bad idea?

If I correctly understand, a CloudWatch Agent publishes events to CloudWatch by using a of kind of batching, the size of which is specified by the two params:
batch_count:
Specifies the max number of log events in a batch, up to 10000. The
default value is 1000.
batch_size
Specifies the max size of log events in a batch, in bytes, up to
1048576 bytes. The default value is 32768 bytes. This size is
calculated as the sum of all event messages in UTF-8, plus 26 bytes
for each log event.
I guess, that in order to eliminate a possibility of loosing any log data in case of a EC2 instance termination, the batch_count should be equal to 1 (because in case of the instance termination all logs will be destroyed). Am I right that this is only one way to achieve it, and how this can affect the performance? Will it have any noticeable side-effects?
Yes, it's a bad idea. You are probably more likely to lose data that way. The PutLogEvents API that the agent uses is limited to 5 requests per second per log stream (source). With a batch_count of 1, you'd only be able to publish 5 log events per second. If the application were to produce more than that consistently, the agent wouldn't be able to keep up.
If you absolutely can't afford to lose any log data, maybe you should be writing that data to a database instead. There will always be some risk of losing log data, even if with a batch_count of 1. The host could always crash before the agent polls the log file... which BTW is every 5 seconds by default (source).

Selecting message queue approach for multiple consumers in AWS

Please help selecting a MQ app/system/approach for the following use-case:
Check for incoming messages for a specific user -> read the message if available -> delete from the queue, ideally, staying within AWS.
Context:
Social networking app, users receiving messages, i.e.
I need to identify incoming messages by recipient ID.
The app is doing long-polls for new messages every 30 seconds.
Message size is <1Kb.
As per current estimates, I'll need 100M+ message checks per months in total (however, much less messages, these are just checks).
While users acknowledge messages choosing OK or Ignore, however not sure if ACK support is required from MQ system for that.
I'm in AWS. Initially thought of SQS, but the more I read the less it looks like a good match - cannot set message recipient ID in a way to filter by recipient, etc, however maybe I'm wrong.
One of the options I also thought about is to just use DynamoDB's "messages" table, partition key being userId and sort key being a messageId, thus I'll be able to easily query by a user, however concerned with costs.
If possible, I would much more prefer to stay within AWS or at least use SAAS like SQS, as being a 1-person startup I really want to avoid headaches supporting self-hosted system.
Thank you!
D
You are right on both these counts:
SQS won't work, because of the limitation you pointed.
DynamoDB would work, but cost a lot.
I can suggest the following:
Create a Redis cluster, possibly on Amazon ElastiCache.
In it, make one List per user.
Whenever a new message comes, append it to concerned User's list.
To deliver the message, just read from the User's list. Also, flush the queue if needed.
What I am suggesting is very similar to how Twitter manages each User's news-feed and home-feed.
It should also be cheap.

I need help clarifying a high level use-case of Amazon SQS

So I need a second pair of eyes to correct or confirm my understand standing of Amazon SQS. From my understanding, you can add an unlimited amount of messages to one queue. A message can be 256 KB in size, and if it needs to be larger than that, you can use amazon s3 to store 2 GB. Reading around online, it appears there are many use cases for this queuing service. For example one use case of SQS can act as a database buffer.
But here's what I'm looking to do.. I'm looking to make a real time messaging system. My current functionality acts like more of a message board, so the implementation just inserts into the database then reads the data and packages it into JSON to be inserted on SQLITE mobile phone. That works great, but I'm getting a lot of requests from people to make it real-time.
So what I'm wondering is can I utilize amazon SQS to write and read messages for a chat application? So in my theoretical use case of SQS would have a message queue to write to, and pull from the that queue every second to check for messages on mobile. But here's where I'm confused. Since you cannot "Query" a particular message from the queue, would it make sense to have a queue per user then a generic queue for the app server to read from? Or am I just talking crazy and should spend cognitive resources thinking about implementing an open connection on an Ec2 instance?
Any help would be great,
Thanks!
Have you thought about using Amazon SNS to push the chat messages to your mobile devices? Each user publishes to a topic and the readers subscribe to that topic. You just have to be ok with missing messages if the app isn't running.
If you only have a few (or maybe, less than 100) users, you could have thought of having one SQS queue per user. If that is not so, the solution won't be operationally feasible.
If you were to have one generic queue, SQS won't help because it doesn't allow querying for a given field in all available messages.
I can think of following options for your use case:
Setup one Redis cluster, possibly on Amazon ElastiCache. Have one message List per user.
One Messages table in MySQL, possibly on AWS RDS. This will provide an easy way to query messages for a given user.
You can also use DynamoDB in #2.

Logstash & Elasticsearch mass log process

I would like to know what's the best configuration for processing mass log files - I've enabled AWS new feature, ELB logs and would like to ship all to an elasticsearch using logstash,
I have almost 400 Million requests per day, which architecture should I choose?
Best Regards.
For Logstash/Elasticsearch it "all depends" on your events, how complex your Logstash configurations are, and how complex the resulting events are.
Best is to do a proof of concept starting one index with one shard on one machine and fire events at it until you find the limit.
Same procedure for processing events with Logstash.
Then you'll have a reference for the hardware necessary to process your volume of events.