AWS CloudWatchLog limit - amazon-web-services

I am trying to find centralized solution to move my application logging from database (RDS).
I was thinking to use CloudWatchLog but noticed that there is a limit for PutLogEvents requests:
The maximum rate of a PutLogEvents request is 5 requests per second
per log stream.
Even if I will break my logs into many streams (based on EC2, log type - error,info,warning,debug) the limit of 5 req. per second is still very restrictive for an active application.
The other solution is to somehow accumulate logs and send PutLogEvents with log records batch, but it means then I am forced to use database to accumulate that records.
So the questions is:
May be I'm wrong and limit of 5 req. per second is not so restrictive?
Is there any other solution that I should consider, for example DynamoDB?

PutLogEvents is designed to put several events by definition (as per it name: PutLogEvent"S") :) Cloudwatch logs agent is doing this on its own and you don't have to worry about this.
However please note: I don't recommend you to generate to much logs (e.g don't run debug mode in prodution), as cloudwatch logs can become pretty expensive as your volume of log is growing.

My advice would be to use a Logstash solution on an AWS instance.
In alternative, you can run logstash on another existing instance or container.
https://www.elastic.co/products/logstash
It is designed for this scope and it does it wonderfully.
Cloudwatch, is not designed mainly for your needs.
I hope this helps somehow.

If you are calling this API directly from your application: the short answer is that you need to batch you log events (it's 5 for PutLogEvents).
If you are writing the logs to disk and after that you are pushing them there is already an agent that knows how to push the logs (http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/QuickStartEC2Instance.html)
Meta: I would suggest that you prototype this and ensure that it works for the log volume that you have. Also, keep in mind that, because of how the cloudwatch api works, only one application/user can push to a log stream at a time (see the token you have to pass in) - so that you probably need to use multiple stream, one per user / maybe per log type to ensure that your applicaitions are not competing for the log.
Meta Meta: think about how your application behaves if the logging subsystem fails and if you can live with the possibility of losing the logs (ie is it critical for you to always/always have the guarantee that you will get the logs?). this will probably drive what you do / what solution you ultimately pick.

Related

Are there fixed conditions for how long a log stream is open?

I'm writing a concurrent tailing utility for watching multiple AWS CloudWatch log groups across many regions simultaneously, and in CloudWatch logs, there are log groups, which contain many log streams that are rotated occasionally. Thus, to tail a log group, one must find the latest log stream, read it in a loop, and occasionally check for a new log stream, and start reading that in a loop.
I can't seem to find any documentation on this, but is there a set of published conditions upon which I can conclude that a log stream has been "closed?" I'm assuming I'll need to have multiple tasks tailing multiple log streams in a group up until a certain cut-off point, but I don't know how to logically determine that a log stream has been completed and to abandon tailing it.
Does anyone know whether such published conditions exist?
I don't think you'll find that published anywhere.
If AWS had some mechanism to know that a log stream was "closed" or would no longer receive log entries, I believe their own console for a stream would make use of it somehow. As it stands, when you view even a very old stream in the console, it will show this message at the bottom:
I know it is not a direct answer to your question, but I believe that is strong indirect evidence that AWS can't tell when a log stream is "closed" either. Resuming auto retry on an old log stream generates traffic that would be needless, so if they had a way to know the stream was "closed" they would disable that option for such streams.
Documentation says
A log stream is a sequence of log events that share the same source.
Since each new "source" will create a new log stream, and since CloudWatch supports many different services and options, there won't be a single answer. It depends on too many factors. For example, with the Lambda service, each lambda container will be a new source, and AWS Lambda may create new containers based on many factors like lambda execution volume, physical work in its data center, outages, changes to lambda code, etc. And that is just for one potential stream source for log streams.
You've probably explored options, but these may give some insights into ways to achieve what you're looking to do:
The CLI has an option to tail that will include all log streams in a group: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/logs/tail.html though if you're building your own utility the CLI won't likely be an option
Some options are discussed at how to view aws log real time (like tail -f) but there are no mentions of published conditions for when a stream is "closed"
When does AWS CloudWatch create new log streams? may yield some insights

Triggering an AWS Lambda Function based on multiple (1k-10k) schedules

We have an AWS Lambda function which queries some data for a client from our DB and sends a report to the client. Some clients want daily reports, some might need weekly or monthly reports. The number of clients can go up ~1000 and each client might have ~10 such reports.
So we are looking for a way to trigger the Lambda function with different parameters based on schedules set by each client.
For Example:
Client A wants daily report of their data to be sent to abc#clienta.com and Client B wants a weekly report of their data to be sent to xyz#clientb.com. So the Lambda function will be invoked twice on Sunday 12 AM (for both clients) and once on Monday-Saturday 12 AM (for Client A).
We found the following solutions on AWS, but both have some limitations.
Approach 1: Use CloudWatch Events
We can create a CloudWatch Events Rule for each client and each report that could trigger our Lambda function on each schedule.
Pros:
Simple setup, easy to implement.
Cons:
There is a limitation of 100 Event Rules per AWS Account. It's mentioned that we can contact AWS to get it increased, but we are not sure if it can be increased to the number we are looking for (Currently it is ~10k, but we would prefer a solution in which there is no such limit). Also, a limit of 100 per account gives an indication that this is not a suitable solution for such a use case.
Approach 2: Using Step Functions
For each client and each report, we can create one AWS State Machine. We can use the Iterator pattern in Step Functions to wait for a day/week/month and then re-invoke the Lambda Function.
Pros:
No limitations on number of State Machines, so this enables us to scale easily.
Cons:
Step Functions have a limitation that they can run for a year, at maximum. This will be a problem in our case because the users will need to get the reports for a much longer period. There is a way to overcome this in Step Functions. Just before it's about to reach the 1-year limit, we can cancel the execution and start a fresh execution. So overall, this solution looks complex.
Can someone suggest a better solution for this on AWS?
Do you really need a CloudWatch for each client? Why not do something like the following architecture.
Have cloudwatch kick off a lambda that checks schedules for all clients each day (or whatever the most frequent report schedule you allow). You don't want this to take a long time so you just have this check a database (i.e. DynamoDB) of schedules and drop metadata about any reports that need to be generated onto an SQS queue (i.e. type of report, client information, destination email). Worst case, this execute and finds nothing to schedule but this should only takes seconds so the cost is very low to just run this everyday.
Then you have a lambda that actually does the report generator and email that consumes the queue. This report generator lambda will scale and spin up as many instances it needs to handle the messages on the queue. You can set the concurrency limit for the report generator lambda to ensure it doesn't spin up too many at a time if that is a concern once you are having 1000s of clients.
The definition and deployment of all these components can easily be automated via an AWS SAM.
Hope this alternate approach gives you a few more ideas.
You can combine both approach, to get the best result.
step 1: Use stepfunction to run your lambdas.
step 2: Trigger your stepfunction from cloudwatch, based on stepfunction event(SUCCESS,FAILED ETC).
In this way when step 1 fails or completes 1 year run. Cloudwatch event can trigger it back on, based on the json input you pass.

AWS Lambda best practices for Real Time Tracking

We currently run an AWS Lambda function that primarily simply redirects the user to a different URL. The function is invoked via API-Gateway.
For tracking purposes, we would like to create a widget on our dashboard that provides real-time insights into how many redirects are performed each second. The creation of the widget itself is not the problem.
My main question currently is which AWS Services is best suited for telling our other services that an invocation took place. We plan to register the invocation in our database.
Some additional things:
low latency (< 5 seconds) in order to be real-time data
nearly no increased time wait for the user. We aim to redirect the user as fast as possible
Many thanks in advance!
Best Regards
Martin
I understand that your goal is to simply persist the information that an invocation happened somewhere with minimal impact on the response time of the Lambda.
For that purpose I'd probably use an SQS standard queue and just send a message to the queue that the invocation happened.
You can then have an asynchronous process (Lambda, Docker, EC2) process the messages from the queue and update your Dashboard.
Depending on the scalability requirements looking into Kinesis Data Analytics might also be worth it.
It's a fully managed streaming data solution and the analytics part allows you to do sliding window analyses using SQL on data in the Stream.
In that case you'd write the info that something happened to the stream, which also has a low latency.

Cron Jobs vs Task Scheduler table for scheduled emails

Preamble: I have a web app, the backend is based on the serverless architecture. It's basically an amplify app hosted on AWS with a dynamoDB database. I've learnt is possible to create a task scheduling system of sorts more here. A quick summary of the article is "Its possible to create a task scheduling table taking advantage of TTL and dynamoDB streams to execute lambda function at specific times. The TTL specifies a set time for an record to be deleted, we can capture this delete event in a dynamoDB stream and run some tasks based on information from the stream"
Problem:
The goal is to send a series of emails to users who sign up for our service. Each user that signs up gets a series of "Getting Started" emails. The first of the emails is sent 24 hours after a user signs up, the second 3 days later and the third exactly 7 days after sign up.
I see how a cron job would be suitable here, but it just seems a bit inefficient to me. I would basically have to search the users table for users whose sign up time falls between a specific 24 hour period and send the email to the users whereas with a Task scheduler table I could add a task to the table ( something like send first email to user300 with a TTL of when I want it to be sent ) and listen for delete events to run the task. No need to run a cron job daily, just a function that handles each task as it comes.
I think this is more like a performance vs storage problem. Having a task scheduler table would take up space, if we add all the emails to be sent to a user as tasks on the table (each email to be sent to a specific user is it's own task) each time a user signs up then I see the task scheduler table growing 3n records for every n user signed up. But this may not really be a problem as tasks are deleted after they are run. I do not know the performance cost of using a cron job for this particular task hence I'm here. I also may be wrong and the cost of running and updating this task scheduler table may be more than that of the cron job.
I initially thought of setting up a dummy user table and running both the cron and the task scheduler and documenting cost of running both, but you can imagine how much time and effort that would take.
So I guess my question is which is a more efficient solution in terms of performance and cost?
There is no perfect solution here. Keep in mind that Dynamodb TTL takes up to 48h to invoke, so it's probably unacceptable. CRON Jobs with Lambda are cheap, and it's easy to set. You coul also use SQS and populate it with daily CRON. Yan Cui wrote great article about this problem https://theburningmonk.com/2019/03/dynamodb-ttl-as-an-ad-hoc-scheduling-mechanism/
This may not exactly be an answer. Based on the medium article you linked the guy had a plausible reason why the TTL and dynamoDB streams would be better than a cron job which you reiterated. Setting up a cron job is easier and cheaper (free) and I doubt the performance will be that much worse unless the database is huge. I don't have any experience doing something like this so I wouldn't know how large the database would have to be for it to make sense to switch over. Alternatively, you can have as many cron jobs as you want so I don't see how you couldn't just set up a user specific cron job whenever someone signs up.
You can setup a CloudWatch Event to fire a Lambda function on a regular schedule. The Lambda function can search a database for an applicable result set and perform other actions - send an email, a text message, etc.
Here is an AWS tutorial that covers a very similar use case with step by step instructions. This tutorial is implemented by using the AWS Java API (but you can implement it using other supported programming languages).
https://github.com/awsdocs/aws-doc-sdk-examples/tree/master/javav2/usecases/creating_scheduled_events
From a Cost perspective - Lambda allows 1M free requests per month. Details are here - https://aws.amazon.com/lambda/pricing/

Log delay in Amazon S3

I have recently hosted in Amazon S3, and I need the log files to calculate the statistics for the "get", "put", "list" operations in the objects.
And I've observed that the log files are organized weirdly. I don't know when the log will appear(not immediatly, at least 20 minutes after the operation) and how many lines of logs will be contained in one log file.
After that, I need to download these log files and analyse them. But I can't figure out how often I will do this.
Can somebody help? Thanks.
What you describe (log files being made available with delays and being in unpredictable order) is exactly what is declared by AWS as behaviour to expect. This is by nature of distributed system, AWS S3 is using to provide S3 service, the same request may be served each time from different server - I have seen 5 different IP addresses being provided for publishing.
So the only solution is: accept the delay, see the delay you experience and add some extra time and learn living with this total delay (I would expect something like 30 to 60 minutes, but statistics could tell more).
If you need log records ordered, you have either sort them yourself, or search for some log processing solutions - I have seen some applications being offered exactly for this purpose.
In case, you really need to get your log file with very short delay, you have to make the logs yourself and this means, you have to write and run some frontend, which gives access to your files on S3 and at the same time keeps logging as needed.
I run such a solution, users get user name and password and url of my frontend. As they send the request, I evaluate, if they provide proper credentials and if they are allowed to see given resource, and if so, I create few minutes valid temporary url for that resource and redirect the request to that.
But such a fronted costs money (you have to run your frontend somewhere) and is less robust, then accessing directly the AWS S3.
Good luck, Lulu.
A lot has changed since the time that the question was originally posted. The delay is still there, but one of OP concerns was when to download the logs to analyze them.
One option right now would be to leverage Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/setup-event-notification-destination.html
This way, whenever an object is created in the access logs bucket, you can trigger a notification either to SNS, SQS or Lamba, and based on that download and analyze the log files.