RDS export log files - amazon-web-services

I want to export error log , general log and slow-query log from RDS Mysql.
I am done with all the necessary settings on my DB Instance.
I have exported the general log and slow-query log to file. (log_output : FILE)
What is the best approach to do this.
I am thinking to use lambda for this. But I am not able to find a suitable way to trigger my lambda function, eg : When ever a new log file is created my lambda function must be triggered.
Is it possible to push this log files events to CloudWatch directly ?
I have gone through the documentation , but I am not able to find such mechanism.
How should I proceed ?

There is no Supported Event Source for RDS in Lambda. I would suggest using a table to store logs for each kind in the parameters. And call the rotation to remove the old data to save disk space.

Related

is it possible write watchtower logs in single file in aws

I have developed one small application. For logs , i'm using watchtower in aws. logs are working fine.Logs are inserted in cloudwatch by file wise logs in aws but i wants all logs to be registered in a single file only (for example api.views ) .is this possible? if yes, how?
solved this problem... i have written logs function in one file.. calling that function wherever i want to write logs.... so all logs are saving with same file name

Pointing multiple projects' log sinks to one bucket

I have a few GCP projects with log sinks to different storage buckets. I'd like to combine them into a single bucket. But the stackdriver export doesn't add any distinguishing information to the object names it creates; they all look like cloudaudit.googleapis.com/activity/2017/11/14/00:00:00_00:59:59_S0.json
What will happen if I start pushing them all to a single bucket? Will the different project sinks overwrite each other's objects? Is there any way to distinguish which project created the logs just from the object?
If not, I guess I should switch to pubsub sinks, and then write some code that produces objects with more desirable names. Are there any established patterns or examples for doing this?
Update: I filed https://issuetracker.google.com/issues/69371200 for this issue.
To enable this, just select custom destination on the sink and point to the bucket with this format: storage.googleapis.com/[BUCKET_ID].
I've just enabled this in a couple of my projects, as I'm curious to see the results when exporting to a bucket. However, I have been using a single BQ sink for all my projects, and the tables created have all the logs mixed, so no logs lost when using a single BQ sink.
I'm assuming for a GCS sink will work in the same way, but I'll tell you in a couple of days.
If a single bucket sink does not work, you can always use a single BQ sink (that will help in analyzing the logs), and when you no longer want to have them in BQ, export them and store the files wherever you want.
Also, since you'll be writing to your sink constantly, you can't use nearline or coldline, so the storage pricing is better in BQ than a regional bucket (0.02 USD/GB in BQ vs somewhere between 0.02 and 0.35 USD/GB for regional storage, depending on the region; BQ has 10GB free monthly, GCS 5GB).
I would generally recommend using a BQ sink, but I'll tell you what happens with my bucket logs.
Update:
A few hours later, and I've verified that shared bucket sinks work pretty much as you would expect. It concatenates logs chronologically regardless of the project origin, and only creates a single file for each time window. Hope this helps! (I still prefer BQ as a log sink...)
Update 2:
For the behavior you seek in the feature request, I would use BQ, but you could just as easily grep the project ID and separate the logs:
grep '"logName":"projects/<your-project-id>/' mixed-log.json > single-project-log.json
Or just get a cloud function triggered by bucket updates (so, every time you receive a log file in the sink) to run this for you.
Or namespace you buckets and have a cloud function moving them to wherever you need as soon as they are written.
The possibilities are endless!
If you have an organization or folder which includes all the projects that you want to collect logs from, then you can create a sink that collects from all projects in that org/folder.
Unfortunatlely, you cannot do this from the Cloud Console. Instead you must use gcloud with the --organization or --folder option or the API.

Is there a way to group my DynamoDB export tasks on one EMR cluster?

When I set up a re-occuring backup via the export function in the DynamoDB console, the task it creates automatically creates a new EMR cluster when it runs. Some of my tables need to be backed up but are fairly small. What I end up with is a huge number of large servers running to back up some relatively small tables. Is there any easy way to chain these tasks to run on one server group in series or parallel?
Yes, it is possible. There is not a direct way but needs some additional tweaking in the Data-Pipeline end. You are required to understand how Data-Pipeline actually runs your export job by default.
When you click on export button on DDB console, it takes you to Data-Pipelines console to create a Pipeline for the export.
After filling out the template, instead of running, you can use Edit in Architect feature to alter the current template which only works with one table.
On the architect page, if you observe the Activities section ,you will find EmrAcvity running a EMR STEP using the following param's . This EMR STEP will run the export job using parameters that you initially passed on the template. Note that it will also RunsOn EMRclusterforBackup resource which you can find in resource section.
s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}
To run export on other DDB tables using same EMR resource, you simply need to create another EMRActivity object by clicking Add and then add EMRActivity on architect. On this activity , you can use the same RunsOn as previous activity is using and in the STEP param's you can manually edit to to include other table name and its export path
like
s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,s3://myexport-bucket/table2/,table2,0.9
You can extend it for multiple tables.
Note: This can easily be done for multiple tables using a JSON file as Data-Pipeline definition , editing it to add more activities and parameters and then exporting it to Run later.

Filter AWS Cloudwatch Lambda's Log

I have a Lambda function and its logs in Cloudwatch (Log group and Log Stream). Is it possible to filter (in Cloudwatch Management Console) all logs that contain "error"? For example logs containing "Process exited before completing request".
In Log Groups there is a button "Search Events". You must click on it first.
Then it "changes" to "Filter Streams":
Now you should just type your filter and select the beginning date-time.
So this is kind of a side issue, but it was relevant for us. (I posted this to another answer on StackOverflow but thought it would be relevant to this conversation too)
We've noticed that tailing and searching logs gets really slow after a log group has a lot of Log Streams in it, like when an AWS Lambda Function has had a lot of invocations. This is because "tail" type utilities and searching need to connect to each log stream to run. Log Events get expired and deleted due to the policy you set on the Log Group itself, but the Log Streams never get cleaned up. I made a few little utility scripts to help with that:
https://github.com/four43/aws-cloudwatch-log-clean
Hopefully that save you some agony over waiting for those logs to get searched.
You can also use CloudWatch Insights (https://aws.amazon.com/about-aws/whats-new/2018/11/announcing-amazon-cloudwatch-logs-insights-fast-interactive-log-analytics/) which is an AWS extension to CloudWatch logs that gives a pretty powerful query and analytics tool. However it can be slow. Some of my queries take up to a minute. Okay, if you really need that data.
You could also use a tool I created called SenseLogs. It downloads CloudWatch data to your browser where you can do queries like you ask about. You can use either full text and search for "error" or if your log data is structured (JSON), you can use a Javascript like expression language to filter by field, eg:
error == 'critical'
Posting an update as CloudWatch has changed since 2016:
In the Log Groups there is a Search all button for a full-text search
Then just type your search:

AWS - want to upload multiple files to S3 and only when all are uploaded trigger a lambda function

I am seeking advice on what's the best way to design this -
Use Case
I want to put multiple files into S3. Once all files are successfully saved, I want to trigger a lambda function to do some other work.
Naive Approach
The way I am approaching this is by saving a record in Dynamo that contains a unique identifier and the total number of records I will be uploading along with the keys that should exist in S3.
A basic implementation would be to take my existing lambda function which is invoked anytime my S3 bucket is written into, and have it check manually whether all the other files been saved.
The Lambda function would know (look in Dynamo to determine what we're looking for) and query S3 to see if the other files are in. If so, use SNS to trigger my other lambda that will do the other work.
Edit: Another approach is have my client program that puts the files in S3 be responsible for directly invoking the other lambda function, since technically it knows when all the files have been uploaded. The issue with this approach is that I do not want this to be the responsibility of the client program... I want the client program to not care. As soon as it has uploaded the files, it should be able to just exit out.
Thoughts
I don't think this is a good idea. Mainly because Lambda functions should be lightweight, and polling the database from within the Lambda function to get the S3 keys of all the uploaded files and then checking in S3 if they are there - doing this each time seems ghetto and very repetitive.
What's the better approach? I was thinking something like using SWF but am not sure if that's overkill for my solution or if it will even let me do what I want. The documentation doesn't show real "examples" either. It's just a discussion without much of a step by step guide (perhaps I'm looking in the wrong spot).
Edit In response to mbaird's suggestions below-
Option 1 (SNS) This is what I will go with. It's simple and doesn't really violate the Single Responsibility Principal. That is, the client uploads the files and sends a notification (via SNS) that its work is done.
Option 2 (Dynamo streams) So this is essentially another "implementation" of Option 1. The client makes a service call, which in this case, results in a table update vs. a SNS notification (Option 1). This update would trigger the Lambda function, as opposed to notification. Not a bad solution, but I prefer using SNS for communication rather than relying on a database's capability (in this case Dynamo streams) to call a Lambda function.
In any case, I'm using AWS technologies and have coupling with their offering (Lambda functions, SNS, etc.) but I feel relying on something like Dynamo streams is making it an even tighter coupling. Not really a huge concern for my use case but still feels dirty ;D
Option 3 with S3 triggers My concern here is the possibility of race conditions. For example, if multiple files are being uploaded by the client simultaneously (think of several async uploads fired off at once with varying file sizes), what if two files happen to finish uploading at around the same time, and two or more Lambda functions (or whatever implementations we use) query Dynamo and gets back N as the completed uploads (instead of N and N+1)? Now even though the final result should be N+2, each one would add 1 to N. Nooooooooooo!
So Option 1 wins.
If you don't want the client program responsible for invoking the Lambda function directly, then would it be OK if it did something a bit more generic?
Option 1: (SNS) What if it simply notified an SNS topic that it had completed a batch of S3 uploads? You could subscribe your Lambda function to that SNS topic.
Option 2: (DynamoDB Streams) What if it simply updated the DynamoDB record with something like an attribute record.allFilesUploaded = true. You could have your Lambda function trigger off the DynamoDB stream. Since you are already creating a DynamoDB record via the client, this seems like a very simple way to mark the batch of uploads as complete without having to code in knowledge about what needs to happen next. The Lambda function could then check the "allFilesUploaded" attribute instead of having to go to S3 for a file listing every time it is called.
Alternatively, don't insert the DynamoDB record until all files have finished uploading, then your Lambda function could just trigger off new records being created.
Option 3: (continuing to use S3 triggers) If the client program can't be changed from how it works today, then instead of listing all the S3 files and comparing them to the list in DynamoDB each time a new file appears, simply update the DynamoDB record via an atomic counter. Then compare the result value against the size of the file list. Once the values are the same you know all the files have been uploaded. The down side to this is that you need to provision enough capacity on your DynamoDB table to handle all the updates, which is going to increase your costs.
Also, I agree with you that SWF is overkill for this task.