Rate Limit on S3 bucket - amazon-web-services

I have a 3rd party client to which I have exposed my S3 bucket which will upload files in it. I want to rate limit on the bucket so that in case of anomaly at their end I dont receive a lot of file upload requests on my bucket which is connected to my SQS queue and DynamoDB so it will lead to throttling at the DB and in queue as well. Also I would be charged heftily.How do I prevent this?

It is not possible to configure a rate-limit for Amazon S3. However, in some situations, Amazon S3 might impose a rate limit when data.
A way to handle this would be to process all uploads through API Gateway and your back-end service. However, this might lead to more overhead and costs than you are trying to save.
You could configure an AWS Lambda function to be triggered when a new object is created, then store information in a database to track the upload rate, but this again would involve more complexity and (a little) expense.

Related

aws sqs to s3 using lambda

Our upstream system is sending JSON messages to our SQS we will have 5 million messages per day.
I need to persist these messages to a S3 bucket for archiving and analytics purpose. I need to dequeue the messages and write 100K messages to a S3 file using lambda function. we will have multiple small files created in S3 buckets to Facilitate quick processing. The lambda would be triggered few times in a day . Any sample code for the lambda function that i can use or any pointers would be appreciated.
Processing millions of objects in Amazon S3 is not advisable.
Software or services that attempt to use these objects will be very slow. For example, simply listing the contents of an Amazon S3 bucket can only return 1000 objects per API call. Even services such as Amazon Athena that process multiple files in parallel will be very slow in listing and reading that many objects.
An alternative approach would be to send the messages to an Amazon Kinesis Data Firehose, which can combine multiple messages together based on size or elapsed time. It can then store files that combine multiple messages in one, thereby reducing the number of objects created in the S3 bucket.
If you are dealing with 100K+ objects in Amazon S3, also consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.

AWS S3 Bucket Notifications when object changes storage class?

I'm looking for a way to be notified when an object in s3 changes storage class. I thought there would be a bucket event notification for this but I don't see it as an option. How can I know when an object moves from STANDARD to GLACIER? We have systems that depend on objects not being in GLACIER. If they change to GLACIER, we need to be made aware and handle them accordingly.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types
You can use S3 access logs to capture changes in life cycle, but i think thats about it:
Amazon S3 server access logs can be enabled in an S3 bucket to capture
S3 Lifecycle-related actions such as object transition to another
storage class
Taken from AWS docs - life-cycle and other bucket config
You could certainly roll your own notifications for storage class transitions - might be a bit more involved than you are hoping for though.... You need a separate bucket to write your access logs. Setup an S3 notification for object creation in your new logs bucket to trigger a lambda function to process each new log file. In your lambda function use Athena to query the logs and fire off an SNS alert or perform some corrective action in code.
There are some limitations to be aware of though - see best effort logging means you might not get logs for a few hours
Updated 28/5/21
If the logs are on you should see the various lifecycle operations logged as they happen +/- a few hours. If not are you definitely meeting the minimum criteria for transitioning objects to glacier? (eg it takes 30 days to transition from standard to glacier).
As for:
The log record for a particular request might be delivered long after
the request was actually processed, or it might not be delivered at
all.
Consider S3's eventual consistency model and the SLA on data durability - there is possibility of data loss for any object in S3. I think the risk is relatively low of loosing log records, but it could happen.
You could also go for a more active approach - use s3 api from a lambda function triggered by cloudwatch events (cron like scheduling) to scan all the objects in the bucket and do something accordingly (send an email, take corrective action etc). Bare in mind this might get expensive depending on how often you run the lambda and how many object are in your bucket but low volumes might even be in the free tier depending on your usage.
As of Nov 2021 you can now do this via AWS EventBridge.
Simply create a new Rule on the s3 bucket that handles the Object Storage Class Changed event.
See https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/

How can I avoid AWS S3 throttling issues?

All my work to access AWS S3 is in the region: us-east2 and the AZ is us-east-2a.
But I saw some throttling complaints from S3, so I am wondering if I move some of my work to another AZ like us-east-2b, could it mitigate the problem? ( Or it will not help since us-east-2a and us-east-2b are actually pointing to same endpoint?)
Thank you.
The throttling is not per AZ, its for a bucket. The below quote is from the AWS documentation.
You can send 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per partitioned prefix in an S3 bucket. When you have an increased request rate to your bucket, Amazon S3 might return 503 Slow Down errors while it scales to support the request rate. This scaling process is called partitioning.
To avoid or minimize 503 Slow Down responses, verify that the number of unique prefixes in your bucket supports your required transactions per second (TPS). This helps your bucket leverage the scaling and partitioning capabilities of Amazon S3. Additionally, be sure that the objects and the requests for those objects are distributed evenly across the unique prefixes. For more information, see Best Practices Design Patterns: Optimizing Amazon S3 Performance.
If possible enable exponential backoff to retry in the S3 bucket. If the application uploading to S3 is performance sensitive the suggestion would be to handoff to a background application that can upload to S3 at a later time.

How to save data from a Lambda function into a S3 when we have too much incoming per millisecond?

I have a process that publishes data into a IoT-Core and that triggers a Lambda function that inserts the payload into an Amazon S3 bucket.
I have a process that send around 1.2 million records in some seconds, and when I check in the bucket I see I have lost around 10% of the data. If I set a sleep in the Lambda function it goes beyond 15 minutes.
What is the solution for this scenario?
It appears that your requirement is to capture the events coming into IoT-Core and save them to Amazon S3.
It also sounds like your Lambda functions are being throttled due to hitting concurrency limits and data is being lost. By default, there is a limit of 10,000 concurrent AWS Lambda functions. This could potentially be fixed by requesting an increase in the maximum number of concurrent functions.
Here is a diagram from How AWS IoT works:
As shown in the digram, the Rules engine can actually be used to send data to Amazon S3 without requiring Lambda. However, this creates a separate object in Amazon S3 for every message.
If you wish to combine messages together, you can Write to Kinesis Data Firehose Using AWS IoT. Firehose will buffer the data by time or size, and then output multiple messages to an Amazon S3 object. This could be a good way to handle large volumes of data, and it also makes it easier to work with the resulting objects in S3 because there are less objects created. This makes them faster to query and process later (eg with Amazon Athena).
Going from IoT-Core rule direct to a Lambda can be fragile.
You can use Kinesis to buffer the data or Firehose to stream it directly to S3. These are standard patterns that AWS recommend for IoT in the AWS Well-Architected framework (https://d1.awsstatic.com/whitepapers/architecture/AWS-IoT-Lens.pdf).

S3 and large files backup

I have large files and I am looking for a place where I could store files (databases dumps). AWS S3 is good for backups? I have already exceeded all limits.
I have a few questions:
I am using API and CLI. Which solution is cheaper to send files via API? "aws s3api put-object" or "aws s3 cp"?
"2,000 Put, Copy, Post or List Requests of Amazon S3". How is consumption calculated? In HTTP requests or bytes? Ac Currently, Currently, I have level of consumption for 20 files per day: 2,000.00/2,000 Requests.
Are there any paid plans?
Everything you need to know is at the Request Pricing section of the S3 Pricing page.
Amazon S3 request costs are based on the request type, and are charged
on the quantity of requests or the volume of data retrieved as listed
in the table below. When you use the Amazon S3 console to browse your
storage, you incur charges for GET, LIST, and other requests that are
made to facilitate browsing. Charges are accrued at the same rate as
requests that are made using the API/SDK.
Specific pricing is available at that page (not included here because it will change over time).