Amazon S3 Event Notification not triggering sometimes - amazon-web-services

We have an Amazon S3 Bucket with Event notification setup for POST and Multipart upload completed and initially we had it set to trigger a Lambda directly but due to error handling concerns we change it to SQS to get the "backout" feature of SQS to easier capture any message in error.
The files are put to S3 from a SFTP server (EC2 instance) and the events are put to SQS in like 99.9% of the cases but ever so often a file is missed...
We can easily spot this as the SQS will in turn trigger a Lambda and the first thing the Lambda does is to rename the file to ".processing" and as soon as the processing s completed the file is moved to another Bucket.
Now and again we find files with the original file name which has not gotten the ".processing" extension and there are no SQS messages or logs that shows that the Lambda has picked them up. This happens like once in a thousand files or something like that...
Files are always transferred the same way to the Bucket but sometimes there are large batches and it seems to happen more frequently in large batches...
What could be the reason some files are not triggering a notification?
Or what can I check to find what could possibly cause this?

Related

How to wait to trigger Lambda function until specific files are uploaded

I'm new to AWS and working on Lambda function triggered by S3.
I have two json files and need to use the both data to send queue to SQS.
For example, two files, "file_a.json" and "file_b.json" are to be uploaded to S3 at the same time.
file_a.json
{"data": "123"}
file_b.json
{"data": "456"}
Then, Lambda will send queue message like ["123", "456"] which is created with the data from both files.
However, S3 invokes Lambda function every time the single file is uploaded and this makes it hard to send queue to SQS as I wrote above. I'm thinking if S3 can wait until the two files are uploaded to trigger Lambda, it can solve the issue.
If anyone had faced the similar situation, I would like to know how to resolve this problem. Thanks!
S3 notifications are simple events in response to changes in S3, so you cant "make s3 wait" for both files, its upto you to handle that in your lambda.
You could store the message (or whatever data you need) - s3, rds, paramstore etc, or if you can deduce the other filename from the message you can query S3 and check if the other file exists yet and act accordingly.

Lambda invocation on two SNS events at the sametime

I have a usecase where I need to read the two files which are in a different account, I will be receiving an SNS event with the filename and I need to create an EMR cluster from the Lambda only if two files are available in the other s3 bucket.
Currently I am writing dummy files to s3 bucket every time I receive a SNS event and then creating the EMR cluster only after ensuring that on the second SNS event that I have received, the first file is available in my accounts s3 bucket- This approach is working fine.
But I am unable to solve the issue of what really happens if we receive two files at the same time in the other s3 bucket and if we get two sns events around the same time, as each event thinks the other file hasn’t been arrived yet.
How would I solve this problem .

Limit number of S3 uploads / SNS notifications

We have an external service that uploads files to our S3 bucket in account A. We raise SNS notifications on each upload. A lambda function in account B subscribes to these notifications.
This works well for us, except that if the external service misses a configuration, it uploads >500 files together (in a single directory). And our lambda is triggered 500 times when that happens.
1. Is there a way to limit the number of files uploaded to a bucket within X minutes?
2. Is there a way to stop the lambda from getting invoked if it sees >500 SNS notifications together?
I am aware that placing an SQS between the Lambda and SNS will probably solve our problem. I want to know if there is another easier, more convenient way to solve this.
I explored the possibility of limiting the lambda concurrency so that is fails on throttling, however SNS notifications will be retried thrice (which is also a good thing and we don't want to lose this feature in case of other errors), so we do not want to do that.
Note that instant processing is not a hard requirement for us. We can wait for around 5 minutes to process the SNS notification.
No, it is not possible to limit uploads to Amazon S3 within a given time period.
Nor is it possible to stop Lambda being invoked if it sees more than a given quantity of Amazon SNS notifications.

Missing s3 events in AWS SQS

I have an AWS Lambda function that is supposed to be triggered by messages from Simple Queue Service SQS. This SQS is supposed to get a notification when new json file is written into my s3 bucket, or when existing json file in s3 bucket is overwritten. Event type for both cases is s3:ObjectCreated, and I see notification for both cases is my SQS.
Now, the problem is that pretty frequently there is a new file in s3 (or updated existing file in s3), but there is no corresponding message in sqs! So many files are missing and Lambda is not aware that those should be processed. In Lambda I print the whole content of received SQS payload into the log file, and then try to find those missed files with something like aws --profile aaa logs filter-log-events --log-group-name /aws/lambda/name --start-time 1554357600000 --end-time 1554396561982 --filter-pattern "missing_file_name_pattern" but can't find anything, which means that s3:objectCreated event was not generated for this missing file. Are there some conditions that prevents s3:objectCreated events for new/updated s3 files? Is there a way to fix it? Or workaround of some kind, may be?
According to AWS Documentation:
If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent. If you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket. With versioning, every successful write will create a new version of your object and will also send an event notification.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
Also, why not directly trigger lambda from S3?
Two possibilities:
Some events may be delayed or not sent at all: "Amazon S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer. On very rare occasions, events might be lost.", although it is very rare.
You have some mistake and the lambda is either not printing what you expect when processing this message / you don't search correctly for the log.
You should also make sure on SQS that all the records were ingested and processed successfully.
Make sure that you have all of the create object events checked off as a trigger.
I had an issue where files > 8MB were being uploaded as multi-part uploads which are listed as another trigger separately to the PUT trigger.

Poll periodically for new files in AWS S3 buckets having a lot of file?

I have situation where I need to poll AWS S3 bucket for new files.
Also, its not just one bucket. There are ~1000+ buckets and these buckets could have a lot of files.
What are the usual strategies / design for such use case. I need to consumer new files on each poll. I cannot delete files from the bucket.
Instead of polling, you should subscribe to S3 event notifications: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
These can be delivered to an SNS topic, an SQS queue, or trigger a Lambda function.
Well, in order to best answer that question, we would need to know what kind of application / architecture is doing the polling and consuming, however the 'AWS' way to do that is to have S3 send out S3 notifications upon creation of each file. The S3 notification contains a reference to the S3 file and can go out to SNS or SQS or even better Lambda which will then trigger the application to spin up, consume the files and then shut down.
Now, if you're going to have a LOT of files, all of those SNS/SQS notifications could get costly and some might then start looking at continuously polling S3 with the S3 SDK/CLI, however you need to keep in mind there are costs associated with the polling as well and you should look at ways to decrease the number of files. For example, if you're using Kinesis Firehose to dump into S3, look at batching. Or you can batch the SQS. Try your best to stick with the event notifications, it's much more resilient.