I would like to debug an issue with DynamoDB.
The provided expression refers to an attribute that does not exist in the item
For that I'd like to log all requests made to a DynamoDB Table from AWS (not from the lambda code).
I have the RequestId in the error and I wish to be able to search for it to find the exact requests with its parameters.
I have looked into AWS Cloudtrail but it seems to only log Management Operations not all gets and all puts done to DynamoDB.
Thanks
You will need to add this level of data plane logging to your application as currently CloudTrail only supports logging of control plane operations for DynamoDB.
Related
I want to query AWS load balancer log to automatically and on schedule send report for me.
I am using Amazon Athena and AWS Lambda to trigger Athena. I created data table based on guide here: https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html
However, I encounter following issues:
Logs bucket increases in size day by day. And I notice if Athena query need more than 5 minutes to return result, sometimes, it produce "unknown error"
Because the maximum timeout for AWS Lambda function is 15 minutes only. Therefore, I can not continue to increase Lambda function timeout to wait for Athena to return result (if in the case that Athena needs >15 minutes to return result, for example)
Can you guys suggest for me some better solution to solve my problem? I am thinking of using ELK stack but I have no experience in working with ELK, can you show me the advantages and disadvantages of ELK compared to the combo: AWS Lambda + AWS Athena? Thank you!
First off, you don't need to keep your Lambda running while the Athena query executes. StartQueryExecution returns a query identifier that you can then poll with GetQueryExecution to determine when the query finishes.
Of course, that doesn't work so well if you're invoking the query as part of a web request, but I recommend not doing that. And, unfortunately, I don't see that Athena is tied into CloudWatch Events, so you'll have to poll for query completion.
With that out of the way, the problem with reading access logs from Athena is that it isn't easy to partition them. The example that AWS provides defines the table inside Athena, and the default partitioning scheme uses S3 paths that have segments /column=value/. However, ALB access logs use a simpler yyyy/mm/dd partitioning Scheme.
If you use AWS Glue, you can define a table format that uses this simpler scheme. I haven't done that so can't give you information other than what's in the docs.
Another alternative is to limit the amount of data in your bucket. This can save on storage costs as well as reduce query times. I would do something like the following:
Bucket_A is the destination for access logs, and the source for your Athena queries. It has a life-cycle policy that deletes logs after 30 (or 45, or whatever) days.
Bucket_B is set up to replicate logs from Bucket_A (so that you retain everything, forever). It immediately transitions all replicated files to "infrequent access" storage, which cuts the cost in half.
Elasticsearch is certainly a popular option. You'll need to convert the files in order to upload it. I haven't looked, but I'm sure there's a Logstash plugin that will do so. Depending on what you're looking to do for reporting, Elasticsearch may be better or worse than Athena.
I'm searching for a method to track the identities which are doing modifications on my table besides the application service itself. In the beginning I though there could be two options, but:
CloudTrail - the documentation (Logging DynamoDB Operations by Using AWS CloudTrail) says, as far as I understood, I'd be only able to track changes made to the infrastructure itself, but not to the actual use of a table.
DynamoDB Streams - I'd guessed that the modifying identity is also passed in a stream event, but actually it's not. I'm using NEW_AND_OLD_IMAGES as the stream type.
Am I overlooking something or is there probably another possibility anywhere else? The streams event does pass me an EventID. Is this of use somewhere?
Grateful for any tips on how to solve this, even if it's a complete different approach.
AWS CloudTrail now supports logging for DynamoDB actions!
AWS CloudTrail Adds Logging of Data Events for Amazon DynamoDB
I have a AWS dynamodb. How to know who has updated the records in my table (not the table)? Need to know the details like logged-in user id or ARN of the AWS services which has updated the records in the table.
Updated: 8.16.2021
Cloudtrail now suppoerst tracking data events for DynamoDB
https://aws.amazon.com/about-aws/whats-new/2021/03/aws-cloudtrail-adds-logging-of-data-events-for-amazon-dynamoDB/
DynamoDB does not let you inquire which user last modified a certain item. Nor does log these data modification events anywhere. The DynamoDB Detective Security Best Practices explains your options:
If all you want to log are administrative operations, such as table creation and deletion, then AWS CloudTrail is good enough for you. This feature gives you a log of all these administrative operations, and which user did which.
However, you said that you want to know about data-plane operations (PutItem, UpdateItem, etc.), not just control-plane operations. So CloudTrail is not good enough for you. The remaining option is to use DynamoDB Streams. This creates a "stream" of modification events to your database, where each event also records the user who did this modification. A dedicated application can listen to this stream, and either record the information of who-modified-what, or react to suspicious activity, or whatever you want to do with it.
Using Streams as suggested above is neither easy nor free to do. But without doing this, the information of which user modifies which item is simply not recorded anywhere by DynamoDB.
This is where CloudTrail would come in handy. CloudTrail can be attached to services, including DynamoDB, so you can see any operations on your tables.
Here is a tutorial for it:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/logging-using-cloudtrail.html
I have a dynamoDB table lets say sampleTable. I want to find out how many times this table has been accessed from cli. How do i check this?
PS. I have checked the metrics but couldnt find any particular metric which gives this information.
There is no CloudWatch metric to monitor API calls to DynamoDB.
However, there is CloudTrail (CT). Thus you can to go CT's event history and look for API calls to DynamoDB from the last 90 days. You can export the history to a CSV file, and investigate off line as well.
For ongoing monitoring of the API calls you can enable CT trail which will store event log details in S3 for as long as you require:
Logging DynamoDB Operations by Using AWS CloudTrail
If you have the trial created, you can use Amazon Athena to query the log data for the statistics of interests, such as number of specific API calls to DynamoDb:
Querying AWS CloudTrail Logs
Also, you could create custom metric based on the trial's log data (once you configure CloudWatch logs for the trial):
Analyzing AWS CloudTrail in Amazon CloudWatch
However, I don't think you can differentiate between API calls done using CLI, or SDK or by other means.
As per AWS docs, there's no Redshift-Lambda integration yet.
What we would like to do is monitoring redshift activity in order to do something when a redshift table is created, a copy from S3 is made or a bulk insert is performed.
Is there a way to register this kind of activity, and then do something similar to run a lambda function ir order run a small script or so?
Redshift provides an event notification mechanism. You can find a full list of the event categories and messages here. If that covers the kind of information you are interested in you can simply have your Lambda function add the SNS topic used by Redshift for event notification as an event source and your Lambda function will get called every time an event is sent by Redshift.
You can enable audit logs that end up in s3.
All the info you want is also available in various admin tables with prefixes like stl_, stv_ and pg_. For example, COPY commands from S3 are recorded in stl_load_commits, and stl_utilitytext has info on non-select queries like CREATE.
As for triggering events, you could have S3 trigger a lambda when one of the log files lands or run occasional jobs that query the system tables and take action with something like cron jobs or airflow.