I have a development where I can send files to my folder in s3 one by one, but i would like to know if this can be done in batch since i would like to send many files in a single transaction so as not to exceed the control limits of the sales force.
Related
I have project with the following workflow:
Pull files from server and upload to S3
When files hit S3, a message is sent to a topic using SNS
The lambda function subscribed to said topic will then process files by doing calculations
So far, I have not experienced any issues, but I wonder if this is a use case for SQS?
How many messages can my Lambda function handle all at once? If I am pulling hundreds of files, is SQS a necessity?
By default, parallel invocation limit is set for 1000.
You can change that limitation, but i never hit that number so far.
As soon as a lambda is done with consuming current request, it will be used for another, so if you upload 1000 files, you probably will only need about 100 lambdas, unless you need Minutes for one lambda to run.
The AWS handles the queued triggers, so even if you upload 100.000 files, they will be consumed asap, depending on diverse criteria.
You can test it with creating many little files and upload them all at once :)
For higher speed, upload them to different bucket, and simply move from bucket to bucket (speed is higher this way)
gl !
am facing a problem while uploading one or more files i.e images/videos to AWS s3 bucket by using aws_s3_client plugin.
It's taking much time to upload a 10MB file
Not able to track the upload progress percentage
Not having option to upload multiple file at once (if same bucket)
Every time while uploading we have to verify the IM-User access. (since why cant we use single instance at once to verify and keep connection persistent/keep alive until application getting closed)
Hence, am not familiar with AWS services. So, suggest to me a best way to upload a file or multiple files to AWS s3 bucket with faster, with upload progress percentage, multiple file upload at once and persistent connection /Keep Alive verification.
For 1 and 2, use managed uploads, it provides an event to track upload progress and makes uploads faster by using multipart upload. Beware that multipart uploads only work for files having sizes from 5 MB to 5 TB.
For 3, AWS S3 does not allow uploading files having same names or keys in the same bucket. Depending on your requirement, you can turn on versioning in your bucket and that will save different versions of the same file.
For 4, you can generate and use pre-signed URLs. Pre-signed URLs have configurable timeouts that you can adjust depending on how long you want the link to be available for an upload.
Use multi part upload.multi part upload will upload files quickly to S3.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html
How can I transfer S3 files to an external server via POST request without having to download the files locally? The REST API only allows transfers within AWS.
You need some computing for that. Easiest thing would be to do that with Lambda.
There are several ways of doing this, which are use-case specific. Due to lack of details in the question (are the files small, do you want to POST them automatically when new files are added to S3, how many files at once) its difficult to provide precise answer. Nevertheless, all the possibilities would involve you writing some code to handle the transfer to a third party.
Some possibilities are:
for small files (few mega I would say), you could write lambda function for that. For per-existing files you could create S3 inventory which would trigger a lambda function when done. The lambda would take the list of files from inventory, and identify the files for upload to a third part and perform it. For new files only, you could setup S3 notification for new files. The notification would trigger a lambda function whenever a new files is upload to S3, and it would post it to a third party.
for large files you would have to use containers or dedicated Ec2 instances. They could periodically scan your bucket for new files, download them and upload to a third party.
I have a task where on a scheduled basis need to check number of files in a bucket (files are uploaded via a NAS) and then e-mail the total number using SES.
The e-mail part on its own is working fine. However, since I have over 40 000 files in the bucket it takes over 5 mins or more to return the count of total number of files.
From an design perspective, is it better to put this part of the logic in an EC2 machine and then schedule the action on the ec2? Or are there better ways to do this?
Note, I don't have to list all the files. I simply want to get a total count of all the files in the bucket.
How about having a lambda triggered every time a file is put/delete/etc
and according to the event received, lambda updates one DynamoDb table which is storing the numbers.
e.g.
In case, file is added to S3, lambda will increase the count in DynamoDb table by 1
and in case of file delete lambda will decrease the count
So this way, I guess, you will always have the latest count without even counting the files.
You did not mention how often you need to do this file count.
If it is daily or less often, you can activate Amazon S3 Inventory. It can provide a daily dump of all files in a bucket, from which you could perform a count.
Trying to sync a large (millions of files) S3 bucket from cloud to local storage seems to be troublesome process for most S3 tools, as virtually everything I've seen so far uses GET Bucket operation, patiently getting the whole list of files in bucket, then diffing it against a list local of files, then performing the actual file transfer.
This looks extremely unoptimal. For example, if one could list files in a bucket that were created / changed since the given date, this could be done quickly, as list of files to be transferred would include just a handful, not millions.
However, given that answer to this question is still true, it's not possible to do so in S3 API.
Are there any other approaches to do periodic incremental backups of a given large S3 bucket?
On AWS S3 you can configure event notifications (Ex: s3:ObjectCreated:*). To request notification when an object is created. It supports SNS, SQS and Lambda services. So you can have an application that listens on the event and updates the statistics. You may also want to ad timestamp as part of the statistic. Then just "query" the result for a certain period of time and you will get your delta.