Difference between upload() and putObject() for uploading a file to S3?

Difference between upload() and putObject() for uploading a file to S3? - amazon-web-services

In the aws-sdk's S3 class, what is the difference between upload() and putObject()? They seem to do the same thing. Why might I prefer one over the other?

The advantage to using AWS SDK upload() over putObject() is as below:
If the reported MD5 upon upload completion does not match, it
retries.
If the file size is large enough, it uses multipart upload to upload
parts in parallel.
Retry based on the client's retry settings.
You can use for Progress reporting.
Sets the ContentType based on file extension if you do not provide
it.

upload() allows you to control how your object is uploaded. For example you can define concurrency and part size.
From their docs:
Uploads an arbitrarily sized buffer, blob, or stream, using intelligent concurrent handling of parts if the payload is large enough.
One specific benefit I've discovered is that upload() will accept a stream without a content length defined whereas putObject() does not.
This was useful as I had an API endpoint that allowed users to upload a file. The framework delivered the file to my controller in the form of a readable stream without a content length. Instead of having to measure the file size, all I had to do was pass it straight through to the upload() call.

When looking for the same information, I came across: https://aws.amazon.com/blogs/developer/uploading-files-to-amazon-s3/
This source is a little dated (referencing instead upload_file() and put() -- or maybe it is the Ruby SDK?), but it looks like the putObject() is intended for smaller objects than the upload().
It recommends upload() and specifies why:
This is the recommended method of using the SDK to upload files to a
bucket. Using this approach has the following benefits:
Manages multipart uploads for objects larger than 15MB.
Correctly opens files in binary mode to avoid encoding issues.
Uses multiple threads for uploading parts of large objects in parallel.
Then covers the putObject() operation:
For smaller objects, you may choose to use #put instead.
EDIT: I was having problems with the .abort() operation on my .upload() and found this helpful: abort/stop amazon aws s3 upload, aws sdk javascript
Now my various other events from https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Request.html are firing as well! With .upload() I only had 'httpUploadProgress'.

This question was asked almost six years ago and I stumbled across it while searching for information on the latest AWS Node.js SDK (V3). While V2 of the SDK supports the "upload" and "putObject" functions, the V3 SDK only supports "Put Object" functionality as "PutObjectCommand". The ability to upload in parts is supported as "UploadPartCommand" and "UploadPartCopyCommand" but the standalone "upload" function available in V2 is not and there is no "UploadCommand" function.
So if you migrate to the V3 SDK, you will need to migrate to Put Object. Get Object is also different in V3. A Buffer is no longer returned and instead a readable stream or a Blob. So if you got the data through "Body.toString()" you now have to implement a stream reader or handle Blob's.
EDIT:
the upload command can be found in the AWS Node.js SDK (V3) under #aws-sdk/lib-storage. here is a direct link: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_lib_storage.html

Related

AWS Amplify Storage | Upload large file

Using AWS Amplify Storage, uploading a file to AWS S3 should be simple:
Storage.put(key, blob, options)
The above works without problem for smaller files, (no larger than around 4MB).
Uploading anything larger, ex. a 25MB video, does not work: Storage just freezes (app does not freeze, only Storage). No error is returned.
Question: How can I upload larger files using AWS Amplify Storage?
Side note: Described behaviour appears both on Android and iOS.

Amplify now automatically segments large files into 5Mb chunks and uploads them using the Amazon S3 Multipart upload process
https://aws.amazon.com/about-aws/whats-new/2021/10/aws-amplify-javascript-file-uploads-storage/
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpu-process

After updating to
"aws-amplify": "ˆ4.3.11",
"aws-amplify-react-native": "^6.0.2"
uploads over 100MB are not freezing UI anymore + we also migrated to resumable uploads. When we used older version of aws-amplify": "^3.1.1", the problems like you mentioned were present.
Here is the pull request from Dec, 2021 for mentioned fixes:
https://github.com/aws-amplify/amplify-js/pull/8336
So the solution is really to upgrade AWS Amplify library.
However, this approach works only on iOS.
Uploading big media files on Android results in network error when calling fetch (as a required step before calling Storage.put method).
Although the same method can perfectly work on the web, in React Native uploading big files was/is not implemented optimally (taking in mind, that we should load all file in memory using fetch()).

AWS S3 C++: Should I use UploadFile() or PutObject() for uploading a file? Where are the differences? [duplicate]

In the aws-sdk's S3 class, what is the difference between upload() and putObject()? They seem to do the same thing. Why might I prefer one over the other?

The advantage to using AWS SDK upload() over putObject() is as below:
If the reported MD5 upon upload completion does not match, it
retries.
If the file size is large enough, it uses multipart upload to upload
parts in parallel.
Retry based on the client's retry settings.
You can use for Progress reporting.
Sets the ContentType based on file extension if you do not provide
it.

upload() allows you to control how your object is uploaded. For example you can define concurrency and part size.
From their docs:
Uploads an arbitrarily sized buffer, blob, or stream, using intelligent concurrent handling of parts if the payload is large enough.
One specific benefit I've discovered is that upload() will accept a stream without a content length defined whereas putObject() does not.
This was useful as I had an API endpoint that allowed users to upload a file. The framework delivered the file to my controller in the form of a readable stream without a content length. Instead of having to measure the file size, all I had to do was pass it straight through to the upload() call.

When looking for the same information, I came across: https://aws.amazon.com/blogs/developer/uploading-files-to-amazon-s3/
This source is a little dated (referencing instead upload_file() and put() -- or maybe it is the Ruby SDK?), but it looks like the putObject() is intended for smaller objects than the upload().
It recommends upload() and specifies why:
This is the recommended method of using the SDK to upload files to a
bucket. Using this approach has the following benefits:
Manages multipart uploads for objects larger than 15MB.
Correctly opens files in binary mode to avoid encoding issues.
Uses multiple threads for uploading parts of large objects in parallel.
Then covers the putObject() operation:
For smaller objects, you may choose to use #put instead.
EDIT: I was having problems with the .abort() operation on my .upload() and found this helpful: abort/stop amazon aws s3 upload, aws sdk javascript
Now my various other events from https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Request.html are firing as well! With .upload() I only had 'httpUploadProgress'.

This question was asked almost six years ago and I stumbled across it while searching for information on the latest AWS Node.js SDK (V3). While V2 of the SDK supports the "upload" and "putObject" functions, the V3 SDK only supports "Put Object" functionality as "PutObjectCommand". The ability to upload in parts is supported as "UploadPartCommand" and "UploadPartCopyCommand" but the standalone "upload" function available in V2 is not and there is no "UploadCommand" function.
So if you migrate to the V3 SDK, you will need to migrate to Put Object. Get Object is also different in V3. A Buffer is no longer returned and instead a readable stream or a Blob. So if you got the data through "Body.toString()" you now have to implement a stream reader or handle Blob's.
EDIT:
the upload command can be found in the AWS Node.js SDK (V3) under #aws-sdk/lib-storage. here is a direct link: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_lib_storage.html

Update wowza StreamPublisher schedule via REST API (or alternative)

Just getting started with Wowza Streaming Engine.
Objective:
Set up a streaming server which live streams existing video (from S3) at a pre-defined schedule (think of a tv channel that linearly streams - you're unable to seek through).
Create a separate admin app that manages that schedule and updates the streaming app accordingly.
Accomplish this with as a little custom Java as possible.
Questions:
Is it possible to fetch / update streamingschedule.smil with the Wowza Streaming Engine REST API?
There are methods to retrieve and update specific SMIL files via the REST API, but they only seem to be applicable to those created through the manager. After all, streamingschedule.smil needs to be created manually by hand
Alternatively, is it possible to reference a streamingschedule.smil that exists on an S3 bucket? (In a similar way footage can be linked from S3 buckets with the use of the MediaCache module)
A comment here (search for '3a') seems to indicate it's possible, but there's a lot of noise in that thread.
What I've done:
Set up Wowza Streaming Engine 4.4.1 on EC2
Enabled REST API documentation
Created a separate S3 bucket and filled it with pre-recorded footage
Enabled MediaCache on the server which points to the above S3 bucket
Created a customised VOD edge application, with AppType set to Live and StreamType set to live in order to be able to point to the above (as suggested here)
Created a StreamPublisher module with a streamingschedule.smil file
The above all works and I have a working schedule with linearly streaming content pulled from an S3 bucket. Just need to be able to easily manipulate that schedule without having to manually edit the file via SSH.
So close! TIA

To answer your questions:
No. However, you can update it by creating an http provider and having it handle the modifications to that schedule. Should you want more flexibility here you can even extend the scheduler module to not require that file at all.
Yes. You would have to modify the ServerListenerStreamPublisher solution to accomplish it. Currently it solely looks a the local filesystem to read teh streamingschedule.smil file.
Thanks,
Matt

Using java code to count the number of lines in a file on S3

Using java code, is it possible to count the number of lines in a file on AWS s3 without downloading it to local machine.

Depends what you mean by download.
There is no remote processing in S3 - you can't upload code that will execute in the S3 service. Possible alternatives:
If the issue is that the file is too big to store in memory or on your local disk, you can still download the file in chunks and process each chunk separately. You just use the Java InputStream (or whatever other API you are using) and download a chunk, say 4KB, process it (scan for line endings), and continue without storing to disk. Downside here is that you are still doing all this I/O from S3 to download the file to your machine.
Use AWS lambda - create a lambda function that does the processing for you. This code runs in the amazon cloud, so no I/O to your machine, only inside the cloud. The function would be the same as the previous option, just runs remotely.
Use EC2 - If you need more control of your code, custom operating systems, etc, you can have a dedicated VM on ec2 that handles this.
Given the information in your question, I would say that the lambda function is probably the best option.

Use AWS Elastic Transcoder and S3 to stream HLSv4 without making everything public?

I am trying to stream a video with HLSv4. I am using AWS Elastic Transcoder and S3 to convert the original file (eg. *.avi or *.mp4) to HLSv4.
Transcoding is successful, with several *.ts and *.aac (with accompanying *.m3u8 playlist files for each media file) and a master *.m3u8 playlist file linking to the media-file specific playlist files. I feel fairly comfortable everything is in order here.
Now the trouble: This is a membership site and I would like to avoid making every video file public. The way to do this typically with S3 is to generate temporary keys server-side which you can append to the URL. Trouble is, that changes the URLs to the media files and their playlists, so the existing *.m3u8 playlists (which provide references to the other playlists and media) do not contain these keys.
One option which occurred to me would be to generate these playlists on the fly as they are just text files. The obvious trouble is overhead, it seems hacky, and these posts were discouraging: https://forums.aws.amazon.com/message.jspa?messageID=529189, https://forums.aws.amazon.com/message.jspa?messageID=508365
After spending some time on this, I feel like I'm going around in circles and there doesn't seem to be a super clear explanation anywhere for how to do this.
So as of September 2015, what is the best way to use AWS Elastic Transcoder and S3 to stream HLSv4 without making your content public? Any help is greatly appreciated!
EDIT: Reposting my comment below with formatting...
Thank you for your reply, it's very helpful
The plan that's forming in my head is to keep the converted ts and aac files on S3 but generate the 6-8 m3u8 files + master playlist and serve them directly from app server So user hits "Play" page and jwplayer gets master playlist from app server (eg "/play/12/"). Server side, this loads the m3u8 files from s3 into memory and searches and replaces the media specific m3u8 links to point to S3 with a freshly generated URL token
So user-->jwplayer-->local master m3u8 (verify auth server side)-->local media m3u8s (verify auth server side)-->s3 media files (accessed with signed URLs and temporary tokens)
Do you see any issues with this approach? Such as "you can't reference external media from a playlist" or something similarly catch 22-ish?

Dynamically generated playlists is one way to go. I actually implemented something like this as a Nginx module and it works very fast, though it's written in C and compiled and not PHP.
The person in your first link is more likely to have issues because of his/hers 1s chunk duration. This adds a lot of requests and overhead, the value recommended by Apple is 10s.
There are solutions like HLS encrypted with AES-128 (supported on the Elastic Transcoder), which also adds overhead if you do it on the-fly, and HLS with DRM like PHLS/Primetime which will most likely get you into a lot of trouble on the client-side.
There seems to be a way to do it with Amazon CloudFront. Please note that I haven't tried it personally and you need to check if it works on Android/iOS.
The idea is to use Signed Cookies instead of Signed URLs. They were apparently introduced in March 2015. The linked blog entry even uses HLS as an example.
Instead of dynamic URLs you send a Set-Cookie header after you authenticate the user. The cookie (hopefully) gets passed around with every request (playlist and segments) and CloudFront decides whether to allow the access to your S3 bucket or not:
You can find the documentation here:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js