AWS S3 file uploads: PHP SDK vs REST API - amazon-web-services

I need to upload a file to AWS Simple Storage Service from a PHP script. The script gets called from an external program and for some unknown reason the script bombs out as soon as I load the AWS PHP SDK. I've tried everything to get it to work without any success. I'm therefore thinking of rather using the AWS S3 REST API to upload the file.
My question is, what is the major drawback of using the REST API compared to the PHP SDK? I know it will be a bit harder to use the REST APIs, but if I only need to upload files to S3, would it take significantly more time? Or would it be worth spending another half a day (hopefully) trying to get the script to run while using the SDK?

Related

AWS Amplify Storage | Upload large file

Using AWS Amplify Storage, uploading a file to AWS S3 should be simple:
Storage.put(key, blob, options)
The above works without problem for smaller files, (no larger than around 4MB).
Uploading anything larger, ex. a 25MB video, does not work: Storage just freezes (app does not freeze, only Storage). No error is returned.
Question: How can I upload larger files using AWS Amplify Storage?
Side note: Described behaviour appears both on Android and iOS.
Amplify now automatically segments large files into 5Mb chunks and uploads them using the Amazon S3 Multipart upload process
https://aws.amazon.com/about-aws/whats-new/2021/10/aws-amplify-javascript-file-uploads-storage/
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpu-process
After updating to
"aws-amplify": "ˆ4.3.11",
"aws-amplify-react-native": "^6.0.2"
uploads over 100MB are not freezing UI anymore + we also migrated to resumable uploads. When we used older version of aws-amplify": "^3.1.1", the problems like you mentioned were present.
Here is the pull request from Dec, 2021 for mentioned fixes:
https://github.com/aws-amplify/amplify-js/pull/8336
So the solution is really to upgrade AWS Amplify library.
However, this approach works only on iOS.
Uploading big media files on Android results in network error when calling fetch (as a required step before calling Storage.put method).
Although the same method can perfectly work on the web, in React Native uploading big files was/is not implemented optimally (taking in mind, that we should load all file in memory using fetch()).

AWS S3 C++: Should I use UploadFile() or PutObject() for uploading a file? Where are the differences? [duplicate]

In the aws-sdk's S3 class, what is the difference between upload() and putObject()? They seem to do the same thing. Why might I prefer one over the other?
The advantage to using AWS SDK upload() over putObject() is as below:
If the reported MD5 upon upload completion does not match, it
retries.
If the file size is large enough, it uses multipart upload to upload
parts in parallel.
Retry based on the client's retry settings.
You can use for Progress reporting.
Sets the ContentType based on file extension if you do not provide
it.
upload() allows you to control how your object is uploaded. For example you can define concurrency and part size.
From their docs:
Uploads an arbitrarily sized buffer, blob, or stream, using intelligent concurrent handling of parts if the payload is large enough.
One specific benefit I've discovered is that upload() will accept a stream without a content length defined whereas putObject() does not.
This was useful as I had an API endpoint that allowed users to upload a file. The framework delivered the file to my controller in the form of a readable stream without a content length. Instead of having to measure the file size, all I had to do was pass it straight through to the upload() call.
When looking for the same information, I came across: https://aws.amazon.com/blogs/developer/uploading-files-to-amazon-s3/
This source is a little dated (referencing instead upload_file() and put() -- or maybe it is the Ruby SDK?), but it looks like the putObject() is intended for smaller objects than the upload().
It recommends upload() and specifies why:
This is the recommended method of using the SDK to upload files to a
bucket. Using this approach has the following benefits:
Manages multipart uploads for objects larger than 15MB.
Correctly opens files in binary mode to avoid encoding issues.
Uses multiple threads for uploading parts of large objects in parallel.
Then covers the putObject() operation:
For smaller objects, you may choose to use #put instead.
EDIT: I was having problems with the .abort() operation on my .upload() and found this helpful: abort/stop amazon aws s3 upload, aws sdk javascript
Now my various other events from https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Request.html are firing as well! With .upload() I only had 'httpUploadProgress'.
This question was asked almost six years ago and I stumbled across it while searching for information on the latest AWS Node.js SDK (V3). While V2 of the SDK supports the "upload" and "putObject" functions, the V3 SDK only supports "Put Object" functionality as "PutObjectCommand". The ability to upload in parts is supported as "UploadPartCommand" and "UploadPartCopyCommand" but the standalone "upload" function available in V2 is not and there is no "UploadCommand" function.
So if you migrate to the V3 SDK, you will need to migrate to Put Object. Get Object is also different in V3. A Buffer is no longer returned and instead a readable stream or a Blob. So if you got the data through "Body.toString()" you now have to implement a stream reader or handle Blob's.
EDIT:
the upload command can be found in the AWS Node.js SDK (V3) under #aws-sdk/lib-storage. here is a direct link: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_lib_storage.html

Difference between upload() and putObject() for uploading a file to S3?

In the aws-sdk's S3 class, what is the difference between upload() and putObject()? They seem to do the same thing. Why might I prefer one over the other?
The advantage to using AWS SDK upload() over putObject() is as below:
If the reported MD5 upon upload completion does not match, it
retries.
If the file size is large enough, it uses multipart upload to upload
parts in parallel.
Retry based on the client's retry settings.
You can use for Progress reporting.
Sets the ContentType based on file extension if you do not provide
it.
upload() allows you to control how your object is uploaded. For example you can define concurrency and part size.
From their docs:
Uploads an arbitrarily sized buffer, blob, or stream, using intelligent concurrent handling of parts if the payload is large enough.
One specific benefit I've discovered is that upload() will accept a stream without a content length defined whereas putObject() does not.
This was useful as I had an API endpoint that allowed users to upload a file. The framework delivered the file to my controller in the form of a readable stream without a content length. Instead of having to measure the file size, all I had to do was pass it straight through to the upload() call.
When looking for the same information, I came across: https://aws.amazon.com/blogs/developer/uploading-files-to-amazon-s3/
This source is a little dated (referencing instead upload_file() and put() -- or maybe it is the Ruby SDK?), but it looks like the putObject() is intended for smaller objects than the upload().
It recommends upload() and specifies why:
This is the recommended method of using the SDK to upload files to a
bucket. Using this approach has the following benefits:
Manages multipart uploads for objects larger than 15MB.
Correctly opens files in binary mode to avoid encoding issues.
Uses multiple threads for uploading parts of large objects in parallel.
Then covers the putObject() operation:
For smaller objects, you may choose to use #put instead.
EDIT: I was having problems with the .abort() operation on my .upload() and found this helpful: abort/stop amazon aws s3 upload, aws sdk javascript
Now my various other events from https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Request.html are firing as well! With .upload() I only had 'httpUploadProgress'.
This question was asked almost six years ago and I stumbled across it while searching for information on the latest AWS Node.js SDK (V3). While V2 of the SDK supports the "upload" and "putObject" functions, the V3 SDK only supports "Put Object" functionality as "PutObjectCommand". The ability to upload in parts is supported as "UploadPartCommand" and "UploadPartCopyCommand" but the standalone "upload" function available in V2 is not and there is no "UploadCommand" function.
So if you migrate to the V3 SDK, you will need to migrate to Put Object. Get Object is also different in V3. A Buffer is no longer returned and instead a readable stream or a Blob. So if you got the data through "Body.toString()" you now have to implement a stream reader or handle Blob's.
EDIT:
the upload command can be found in the AWS Node.js SDK (V3) under #aws-sdk/lib-storage. here is a direct link: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_lib_storage.html

Using java code to count the number of lines in a file on S3

Using java code, is it possible to count the number of lines in a file on AWS s3 without downloading it to local machine.
Depends what you mean by download.
There is no remote processing in S3 - you can't upload code that will execute in the S3 service. Possible alternatives:
If the issue is that the file is too big to store in memory or on your local disk, you can still download the file in chunks and process each chunk separately. You just use the Java InputStream (or whatever other API you are using) and download a chunk, say 4KB, process it (scan for line endings), and continue without storing to disk. Downside here is that you are still doing all this I/O from S3 to download the file to your machine.
Use AWS lambda - create a lambda function that does the processing for you. This code runs in the amazon cloud, so no I/O to your machine, only inside the cloud. The function would be the same as the previous option, just runs remotely.
Use EC2 - If you need more control of your code, custom operating systems, etc, you can have a dedicated VM on ec2 that handles this.
Given the information in your question, I would say that the lambda function is probably the best option.

Amazon S3 server multiple upload

I'm writing a program that saves images on Amazon S3 servers. My test suite is taking close to a minute to run due to having to run multiple uploads straight to S3 in order to test various features of the photos.
What is the issue here and how can I fix this?
You could:
1- Use multipart upload to speed it up, see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
2- Use a mock "no-op" class to fake the upload instantly, see https://code.google.com/p/mockito/
3- Use a local emulation for testing, see https://github.com/jubos/fake-s3