Today I encountered an issue and I'm not if there's a solution or is it a bug.
In our project we use S3 and SQS and API Gateway as an interface for S3. Whenever a new file is uploaded via gateway a new event is being published to SQS and there are no problems.
Earlier today I deployed a new version of our service and consumes SQS messages. To test that everything works as expected I created a new S3 bucket and corresponding SQS queue. Than I started to copy objects from production bucket to the newly created one using boto3 Python library.
After a while I noticed that for some files there was no SQS event published. And after research it turned out that all such files are greater than 8Mb.
I also tried to upload a file using AWS CLI just in case, but result was the same.
However, when I upload a file from AWS web console, then I can see SQS event published.
So everything works when uploading to S3 via API Gateway or AWS Web Console but not AWS CLI or boto3 and presumably other libraries.
Seems like a bug or some limitation but I couldn't find any documentation on it.
Has anyone experienced similar behaviour?
Thanks in advance for any tips.
I believe 8MB is the size at which the CLI (and SDK) will start performing multi-part upload operations. You probably need to enable notifications for the s3:ObjectCreated:CompleteMultipartUpload event on your S3 bucket.
Related
I use a lot StackDriver sinks to BigQuery. It helps on keeping the Data and have them in a convenient queryable form.
I am searching for the equivalent on AWS using CloudWatch but it seems only S3 is integrated. Any workarounds or should I code it?
This can be done with a workaround. It needs only one code component reading from S3 and posting to Stackdriver, then GCP sinks can pickup.
Create a CloudWatch Sink to S3 bucket.
The S3 bucket will be integrated to send SNS messages on object Creation.
A process should listen to those SNS notifications and copy the contents of the bucket into StackDriver.
Since the guarantees of SNS and Cloudwatch to S3 are there, a Lambda function on the SNS notification messages can make the solution more seamless.
I have a node.js backend that sends out images to a secondary api for transformations and then those images appear in s3 bucket. The problem is that the secondary api doesn't inform my api when the file is created in the bucket.
Is there some sort of long polling in s3 available because spamming get requests doesn't feel right (also will get expensive).
I'm considering adding a trigger on new files in s3, that will invoke a lambda that will put a message into some sort of pub/sub message broker and then I could just subscribe to it but this seems a bit too complicated?
From the S3 notification docs you can be notified via:
Amazon Simple Notification Service (Amazon SNS) topic
Amazon Simple Queue Service (Amazon SQS) queue
AWS Lambda
The relative benefits or each one are up to you but don't poll S3 for changes. Use one of these to be notified of the changes. You can decide to get notices for just new objects or deleted object.
I created a custom app that automatically uploads logs to s3.
Is there a way to push those logs to cloudwatch from s3 for analysis and alerting?
I'm aware that I can use a cloudwatch agent to push directly to cloudwatch from the app but there are complications involved in that option.
Thank you!
You could probably use Cloudwatch Events to listen to S3 changes. Not sure about if you can get the data from the S3 file, or just a trigger saying that a new log has been added.
You could also use S3 event notifications (https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html) connected either to a lambda or SQS, and from there create the logs to Cloudwatch. (similar to what was suggested by #marcin)
A better solution, but one that is a bit beyond the scope of the question, would be to send your logs through Kinesis Firehose and from there add the Cloudwatch and S3 logs.
I'm not aware of any out-of-the-box mechanism for that provided by AWS. But I think it could be relatively easy to develop.
Namely, you can create S3 notification for a PUT of a new log file from your app to S3. The event would trigger a lambda function. The function would get the file and using AWS SDK, e.g. boto3's put_log_events, it would send the log events to CloudWatch logs.
I have to read and process a file in an AWS Lambda function from an SFTP server that is not on AWS.
Some external source is putting the file in the SFTP server which is not in AWS, and whenever the file is uploaded completely, we have to check it via AWS CloudWatch and then trigger an AWS Lambda for processing this file.
Is this approach right? Is this even possible?
If this is possible, please suggest some steps. I checked in AWS CloudWatch but I was not able to found any trigger which checks the file outside the AWS.
You need to create some sort of job that will monitor your SFTP directory (e.g., using inotify) and then invoke your AWS Lambda function by using AWS access keys of an IAM user created with programmatic access enabled and sufficient permissions to invoke that AWS Lambda function.
You can also create an AWS CloudWatch event that will be triggered on scheduled basis like every 5 mins that will trigger the AWS Lambda function to check for any news file by maintaining a history somewhere for example on AWS DynamoDB but I would rather trigger the AWS Lambda from SFTP server as using AWS based file-upload on SFTP detection sounds better if AWS Transfer for SFTP is used instead of on-premises SFTP server because it uses AWS S3 as an SFTP store and AWS S3 as the feature of creating an event on files/objects upload and trigger an AWS Lambda function.
Can you modify the external source script ?
If yes, you can send a SNS notification to specific topic using the aws cli or a specific language sdk.
Then you can have a lambda to process your file, triggered by the SNS topic.
I have a DynamoDB application and it seems to be running well and using normal throughput generally. However once in a while it seems to spike pretty high (latest kicked up over 300, normal is around 10-20 max). I've looked through the code and I'm having a bit of trouble figuring out what it is that is causing these spikes. Is there any type of history of the calls in DynamoDB that could tell me what exactly were the calls that caused the spiking?
You can enable the cloudtrail logs for the dynamoDB.It will deliver these log files to S3 bucket. Taken directly from the AWS Docs :-
DynamoDB is integrated with CloudTrail, a service that captures
low-level API requests made by or on behalf of DynamoDB in your AWS
account and delivers the log files to an Amazon S3 bucket that you
specify. CloudTrail captures calls made from the DynamoDB console or
from the DynamoDB low-level API. Using the information collected by
CloudTrail, you can determine what request was made to DynamoDB, the
source IP address from which the request was made, who made the
request, when it was made, and so on. To learn more about CloudTrail,
including how to configure and enable it, see the AWS CloudTrail User
Guide.
Please follow the aws dynamoDB cloudtrail logging to enable it.