In this multipart upload example, one needs to save the upload ID and a set of etags corresponding to each uploaded part until the upload is "closed." If I lose my upload ID, I guess I can recover it by looking through open multipart uploads with ListMultipartUploads, but what if I lose an etag? Can those be recovered somehow, or must I abort the whole transfer and start over?
Once you have retrieved the upload ID from ListMultipartUploads, you can then use ListParts to get the list of parts (and their etags) that have been completed for this upload. You can use this information to then restart your upload from the last completed part.
Multipart Upload API and Permissions
Example of resuming multipart uploads using AWS SDK for iOS
Related
I have a 121MB MP3 file I am trying to upload to my AWS S3 so I can process it via Amazon Transcribe.
The MP3 file comes from an MP4 file I stripped the audio from using FFmpeg.
When I try to upload the MP3, using the S3 object upload UI in the AWS console, I receive the below error:
InvalidPart
One or more of the specified parts could not be found. the part may not have been uploaded, or the specified entity tag may not match the part's entity tag
The error makes reference to the MP3 being a multipart file and how the "next" part is missing but it's not a multipart file.
I have re-run the MP4 file through FFmpeg 3 times in case the 1st file was corrupt, but that has not fixed anything.
I have searched a lot on Stackoverflow and have not found a similar case where anyone has uploaded a single 5MB+ file that has received the error I am.
I've also crossed out FFmpeg being the issue by saving the audio using VLC as an MP3 file but receive the exact same error.
What is the issue?
Here's the console in case it helps:
121MB is below the 160 GB S3 console single object upload limit, the 5GB single object upload limit using the REST API / AWS SDKs as well as the 5TB limit on multipart file upload so I really can't see the issue.
Considering the file exists & you have a stable internet-connected (no corrupted uploads), you may have incomplete multipart upload parts in your bucket somehow which may be conflicting with the upload for whatever reason so either follow this guide to remove them and try again or try creating a new folder/bucket and re-uploading again.
You may also have a browser caching issue/extension conflict so try incognito (with extensions disabled) or another browser if re-uploading to another bucket/folder doesn't work.
Alternatively, try the AWS CLI s3 cp command or a quick "S3 file upload" application in a supported SDK language to make sure that it's not a console UI issue.
Lets assume we generate a pre-signed URL to upload a file with an expiration time of 15sec. And we start uploading a large file. Should the file upload be completed within 15sec of the URL generation or it can go beyond that if the file upload start within the 15sec time?
Upload action should start before the expiry time and there is no known restriction on time taken for completing the uploading after it starts. Since the S3 service evaluates the permissions for uploading the file while starting the upload action, it should not be affected by the time taken for actual uploading of the file.
In your case, considering the file size, if the upload fails for any reason then users wont be able to retry after 15 sec.
Below are more details on this point from "Uploading using Pre-signed urls" doc
That is, you must start the action before the expiration date and time. If the action consists of multiple steps, such as a multipart upload, all steps must be started before the expiration, otherwise you will receive an error when Amazon S3 attempts to start a step with an expired URL. ```
I have so far allowed users to upload images to my server and then used CF's FileGetMimeType() function to determine if the MIME type is valid (.e.g jpg)
The problem is that FileGetMimeType() wants a full path to the file on the server to work. Amazon S3 is just a URL of where the image is stored. In order to get FileGetMimeType() to work, I have to first upload the image to Amazon S3 then download it again using CFHTTP and then determine the file type. This seems way less efficient than the old way.
So why not just upload to my own server first, determine the MIME type, and then upload to S3 right? I can't do that because some of these files are going to be huge with thousands of users uploading at the same time. We're talking videos as well as images.
Is there an efficient way to upload files to an external server i.e. Amazon S3 and then get the MIME type somehow without having to download the file all over again? Can it be done on S3's end?
We are using S3 for our image upload process. We approve all the images that are uploaded on our website. The process is like:
Clients upload images on S3 from javascript at a given path. (using token)
Once, we get back the url from S3, we save the S3 path in our database with 'isApproved flag false' in photos table.
Once the image is approved through our executive, the images start displaying on our website.
The problem is that the user may change the image (to some obscene image) after the approval process through the token generated. Can we somehow stop users from modifying the images like this?
One temporary fix is to shorten the token lifetime interval i.e. 5 minutes and approve the images after that interval only.
I saw this but didn't help as versioning is also replacing the already uploaded image and moving previously uploaded image to new versioned path.
Any better solutions?
You should create a workflow around the uploaded images. The process would be:
The client uploads the image
This triggers an Amazon S3 event notification to you/your system
If you approve the image, move it to the public bucket that is serving your content
If you do not approve the image, delete it
This could be an automated process using an AWS Lambda function to update your database and flag photos for approval, or it could be done manually after receiving an email notification via Amazon SNS. The choice is up to you.
The benefit of this method is that nothing can be substituted once approved.
I have a system where in we are performing upload of videos to aws by multi part upload. I have put this process as a work flow manager task. When the process completes I will update my database with the status of the payload as complete.
If the payload is not in completed status even after 24 hours I should delete the associated parts of the multipart upload from s3.
Now what all I have.
1. I have the video details(name)
2. I have the bucket in which I will be uploading the video.
When I perform the command bucket.get_all_multipart_uploads() I am not getting the asset which I have uploaded to the system, ie I dont find the name of the video which I had put on S3. I am pretty new to this. Can any one help me with proper documents and how to identify the uploads which hang on s3.