Why does my single part MP3 file fail to upload to AWS S3 due to an InvalidPart multipart file upload error? - amazon-web-services

I have a 121MB MP3 file I am trying to upload to my AWS S3 so I can process it via Amazon Transcribe.
The MP3 file comes from an MP4 file I stripped the audio from using FFmpeg.
When I try to upload the MP3, using the S3 object upload UI in the AWS console, I receive the below error:
InvalidPart
One or more of the specified parts could not be found. the part may not have been uploaded, or the specified entity tag may not match the part's entity tag
The error makes reference to the MP3 being a multipart file and how the "next" part is missing but it's not a multipart file.
I have re-run the MP4 file through FFmpeg 3 times in case the 1st file was corrupt, but that has not fixed anything.
I have searched a lot on Stackoverflow and have not found a similar case where anyone has uploaded a single 5MB+ file that has received the error I am.
I've also crossed out FFmpeg being the issue by saving the audio using VLC as an MP3 file but receive the exact same error.
What is the issue?
Here's the console in case it helps:

121MB is below the 160 GB S3 console single object upload limit, the 5GB single object upload limit using the REST API / AWS SDKs as well as the 5TB limit on multipart file upload so I really can't see the issue.
Considering the file exists & you have a stable internet-connected (no corrupted uploads), you may have incomplete multipart upload parts in your bucket somehow which may be conflicting with the upload for whatever reason so either follow this guide to remove them and try again or try creating a new folder/bucket and re-uploading again.
You may also have a browser caching issue/extension conflict so try incognito (with extensions disabled) or another browser if re-uploading to another bucket/folder doesn't work.
Alternatively, try the AWS CLI s3 cp command or a quick "S3 file upload" application in a supported SDK language to make sure that it's not a console UI issue.

Related

S3 trigger to perform a file conversion for a multi-part file type

I am working on converting shapefiles to geojson. Shapefiles are composed of at least 3 required files and as many as 8 separate files all residing in a folder. To convert to geojson you need all the constituent parts. Right now I have a batch conversion process that goes through all the shapefiles stored in an s3 bucket, downloads all the separate file parts and performs the conversion. What I'm trying to figure out now is how to run the file conversion process based on the upload of a single shapefile folder, hopefully using an s3 bucket trigger.
I have reviewed this answer (AWS - want to upload multiple files to S3 and only when all are uploaded trigger a lambda function) but in my case there is no frontend client (the answer presented in that question appears to be to signal a final event, but that is done from the client interface). Maybe I need to build one, but I was trying to handle this only in the backend (there is no frontend and no plans to have one). The 'user' would be dropping the files right into s3 directly without a file upload interface.
Of course when someone uploads a folder with all the shapefile parts in it, it triggers the s3 trigger for each part but each part cannot produce a shapefile alone.
A few solutions I thought of but with their own problems:
I am converting the shapefiles to geojson and storing the geojson in a separate s3 bucket using a naming convention for the geojson based on the s3 file name. In theory you could always check if the geojson exists in the destination s3 bucket already and if not, run the conversion. But this still doesn't take care of the timing aspect of all the multiple parts of the file being uploaded. I could check the name but it would be triggered multiple times, fail on some and then ultimately (probably) succeed after all the parts are in place.
1a. Maybe some type of try/except error checking on the conversion mentioned above? meaning, for each file part uploaded, go ahead and try to download and convert. This seems fragile and potentially error prone. Also, I believe that a certain subset of all the files will likely produce a geojson without error but without all the metadata or complete set of data so a successful conversion may not actually be a success.
Using a database to track which files have been converted, which would basically be the same solution as 1 above.
Partly a question as a solution: on the s3 web console there is 'file' upload and 'folder' upload. To upload the shapefile folder containing all the component parts, you'd have to use the 'folder' option. The question then is, is there any way to know, from the event trigger perspective, that the operation was a folder upload, not just a file upload and to therefore wait until all the parts of the folder are uploaded OR if there is any event data in AWS that, when a FOLDER is uploaded it counts the underlying file parts (1 of 6, 2 of 6 etc) and could send an event after all the parts of the folder have been uploaded(?)
I also am aware of the 'multipart' upload which would, I think, do what I proposed in #3 above but that multipart 'tag' is only if you upload via sdk or cli. Unless the s3 console folder upload is underneath a multi-part upload?

AWS CloudFront still caching when set to "CachingDisabled"

We are using an S3 bucket to hold customer zip files they created and made ready for them to download. We are using CloudFront only to handle the SSL. We have caching disabled.
The customer receives an email to download their zip file, and that works great. The S3 lifecycle removes the file after 2 weeks. Now, if they add more photos to their account and re-request their zip file, it overwrites the current zip file with the new version. So the link is exactly the same. But when they download, it's the previous zip file, not the new one.
Additionally after the two weeks, the file is removed and they try to download they get an error that basically says they need to login and re-request their photos. So they generate a zip file but their link still gives them the error message.
I could have the lambda that creates the zip file invalidate the file when it creates it, but I didn't think I needed to invalidate since we aren't caching?
Below is the screenshot of the caching policy I have selected in CloudFront

Access denied when trying to upload audio files to aws s3 bucket

I'm trying to upload audio files to a folder in my s3 bucket. I'm doing this by dragging and dropping from my laptop and hitting the upload button once I have dropped the last file. Some of the files failed to upload and instead gave me an error message saying
Access Denied. You don't have permissions to upload files and folders.
How do I fix that?
Adding to Frank Din's answer, I was able to upload a folder's 80+ images at once by selecting "Add Folder" instead of drag-and-dropping all the images at once.
I was eventually able to upload all the audio files.
I think the problem in my case was that I was trying to upload all the files practically at the same time by just dragging and dropping them all in the same breath.
I fixed that by uploading each file one at a time and only after the previous file was done uploading.

Using FileGetMimeType() with uploads to Amazon S3

I have so far allowed users to upload images to my server and then used CF's FileGetMimeType() function to determine if the MIME type is valid (.e.g jpg)
The problem is that FileGetMimeType() wants a full path to the file on the server to work. Amazon S3 is just a URL of where the image is stored. In order to get FileGetMimeType() to work, I have to first upload the image to Amazon S3 then download it again using CFHTTP and then determine the file type. This seems way less efficient than the old way.
So why not just upload to my own server first, determine the MIME type, and then upload to S3 right? I can't do that because some of these files are going to be huge with thousands of users uploading at the same time. We're talking videos as well as images.
Is there an efficient way to upload files to an external server i.e. Amazon S3 and then get the MIME type somehow without having to download the file all over again? Can it be done on S3's end?

Merging files on AWS S3 (Using Apache Camel)

I have some files that are being uploaded to S3 and processed for some Redshift task. After that task is complete these files need to be merged. Currently I am deleting these files and uploading merged files again.
These eats up a lot of bandwidth. Is there any way the files can be merged directly on S3?
I am using Apache Camel for routing.
S3 allows you to use an S3 file URI as the source for a copy operation. Combined with S3's Multi-Part Upload API, you can supply several S3 object URI's as the sources keys for a multi-part upload.
However, the devil is in the details. S3's multi-part upload API has a minimum file part size of 5MB. Thus, if any file in the series of files under concatenation is < 5MB, it will fail.
However, you can work around this by exploiting the loop hole which allows the final upload piece to be < 5MB (allowed because this happens in the real world when uploading remainder pieces).
My production code does this by:
Interrogating the manifest of files to be uploaded
If first part is
under 5MB, download pieces* and buffer to disk until 5MB is buffered.
Append parts sequentially until file concatenation complete
If a non-terminus file is < 5MB, append it, then finish the upload and create a new upload and continue.
Finally, there is a bug in the S3 API. The ETag (which is really any MD5 file checksum on S3, is not properly recalculated at the completion of a multi-part upload. To fix this, copy the fine on completion. If you use a temp location during concatenation, this will be resolved on the final copy operation.
* Note that you can download a byte range of a file. This way, if part 1 is 10K, and part 2 is 5GB, you only need to read in 5110K to get meet the 5MB size needed to continue.
** You could also have a 5MB block of zeros on S3 and use it as your default starting piece. Then, when the upload is complete, do a file copy using byte range of 5MB+1 to EOF-1
P.S. When I have time to make a Gist of this code I'll post the link here.
You can use Multipart Upload with Copy to merge objects on S3 without downloading and uploading them again.
You can find some examples in Java, .NET or with the REST API here.