Storing CloudWatch Logs into S3 (some structured format) - amazon-web-services

I have a CloudWatch Log Group, this log group continuously receives logging information from my AWS services.
I want to extract some of the logging information from this log-group and want to store that data into S3 in some format (CSV, PARQUET).
I will then use Athena to query this logging data.
I want some sort of automatic mechanism to send these logs continuously to S3.
Can anyone suggest solution for this?

It looks like Athena is able to communicate directly with cloudwatch as shown here. Not sure how performant this is and how costly this turns out.
The other option is to configure Cloudwatch to send data to Firehose via Subscriptions which then dumps it to S3.

Related

How do I export aws lambda logs(Prints) to Kinesis Data Streams?

I have been Stocked on how to send Lambda logs(Prints) directly to Amazon Kinesis Data Stream. I have Found the way to send Logs from Cloud watch but I would like to send every single prints to kinesis data streams. I have a doubt if I send data from cloud watch does it stream real time prints records to kinesis or not? On this case I would like to use lambda as producer and through the kinesis data S3 as a consumer .
below I have attached a flow work of my conditions.
You can also check the lambda extensions, which helps into direct ingestion of the logs to custom destinations. Its helpful incase you want to avoid cloudwatch costs
https://aws.amazon.com/blogs/compute/using-aws-lambda-extensions-to-send-logs-to-custom-destinations/
You have to create CouldWatch Subscription filter for the Lambda's log stream you want to save to S3. So you would do:
CW Logs subscription ---> Firehose ---> S3

Went over cloudwatch event size for structured application logs, now what?

We have a service that outputs application logs to cloudwatch. We structure the logs into json format, and output them through stdout, which is forwarded by fluentbit to cloudwatch. We then have a stream set up to forward the logs from cloudwatch to s3, followed by glue crawlers, Athena, and quick sight for dashboards.
We go all this working, and I just saw today that there is a 256kb limit in cloudwatch which we went over for some of our application logs. How else can we get our logs out of our service to s3 (or maybe a different data store?) for analysis? Is cloudwatch not the right approach for this? Other option I thought of us to break up the application logs into multiple events, but then we need to plumb through a joinable ID, as well as write etl logic that does more complex joins. Was hoping to avoid it unless it’s considered a better practice than what we are doing.
Thanks!

how to check number of times a dynamoDB table has been accessed

I have a dynamoDB table lets say sampleTable. I want to find out how many times this table has been accessed from cli. How do i check this?
PS. I have checked the metrics but couldnt find any particular metric which gives this information.
There is no CloudWatch metric to monitor API calls to DynamoDB.
However, there is CloudTrail (CT). Thus you can to go CT's event history and look for API calls to DynamoDB from the last 90 days. You can export the history to a CSV file, and investigate off line as well.
For ongoing monitoring of the API calls you can enable CT trail which will store event log details in S3 for as long as you require:
Logging DynamoDB Operations by Using AWS CloudTrail
If you have the trial created, you can use Amazon Athena to query the log data for the statistics of interests, such as number of specific API calls to DynamoDb:
Querying AWS CloudTrail Logs
Also, you could create custom metric based on the trial's log data (once you configure CloudWatch logs for the trial):
Analyzing AWS CloudTrail in Amazon CloudWatch
However, I don't think you can differentiate between API calls done using CLI, or SDK or by other means.

save cloudwatch logs generated by glue crawler in another cloudwatch log group

is there a way of saving logs generated by a crawler in a specific, newly created cloudwatch log group?
I want to use the finish crawling log as a trigger to a lambda function.
Many thanks in advance!
You can use the AWS CloudWatch Logs API to upload logs. Use CreateLogGroup and CreateLogStream to create your log stream, and then use PutLogEvents to upload your log.
There are other options that might be more suitable for triggering a Lambda function though, depending on your exact use case, such as uploading the collected log to S3 and having the upload trigger the function, or even starting the Lambda function directly.

Best way to save "event" data on AWS

I have an app running on AWS and I need to save every "event" in a file.
An "event" happens, for instance, when a user logs in to the app. When that happens I need to save this information on a file (presumably I would need to save a time stamp and the session id)
I expect to have a lot of events (of the order of a million per month) and I was wondering what would be the best way to do this.
I thought of writing on S3, but I think I can't append to existing files.
Another option would be to redirect the "event" to the standard output, but would not be the smartest solution.
Any ideas? Also, this needs to be done in python.
There are bunch of options, and your choice would depend on following factors:
Scale of these events.
Cost you are willing to bear.
How you intend on consuming them.
Options:
Log every event into a Kinesis Firehose stream, which in turn can dump your data into S3 or Redshift, as per your configuration.
Setup an Elasticsearch cluster. Log all events in a file on disk, and use Logstash to asynchronously push these into the cluster.
Create a DynamoDB table and log one event per row.
Using CloudWatch Logs you can export the logs to S3. You can use the CloudWatch Logs agent to send your application log file to CloudWatch.