How to upload an object to AWS S3 with "Transfer-Encoding: chunked"

How to upload an object to AWS S3 with "Transfer-Encoding: chunked" - amazon-web-services

I am using the awssdk for c++ to upload data to an S3 bucket. I am using the method PutObject() of the class Aws::S3::S3Client to upload the data.
However, before uploading the data, I need to know the content-length of the data I am uploading. I need to either set the content-length in the request (the request is of type Aws::S3::Model::PutObjectRequest). Or the awssdk library will try to determine the size of the body by itself (the awssdk does that by seeking to the end of payload data to read its size).
Either way, in my case I don't know yet the content-length of the data I am uploading. I am working on a server that compresses data. I have to wait for the compression to end to know the final size of the compressed data. I don't want to wait for that. Instead, I want to start uploading to S3 as the compressed data is being produced. The way that can be done is to use "Transfer-Encoding: chunked" when sending the HTTP post request to the S3 server.
Is there a way to use Aws::S3::S3Client to upload the data with "Transfer-Encoding: chunked" ? *
I've downloaded the source of the awssdk for c++, I've tweaked it a bit to force send "Transfer-Encoding: chunked" in the http request. But the S3 server returned HTTP error code 501 NotImplemented:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NotImplemented</Code>
<Message>A header you provided implies functionality that is not implemented</Message>
<Header>Transfer-Encoding</Header>
<RequestId>8F55B09D484DD66C</RequestId>
<HostId>gZk6zaPcObsTfclz0zXvKGtPT5udzDKigIrm7laD3csG30vhx3pa0eFFS8nh6t9k7XkDeJRm9Z4=</HostId>
</Error>

Related

Wrap JPEG image in a multipart header using AWS Lambda#Edge

I have been trying to read the AWS Lambda#Edge documentation, but I still cannot figure out if the following option is possible.
Assume I have an object (image.jpg, with size 32922 bytes) and I have setup AWS as static website. So I can retrieve:
$ GET http://example.com/image.jpg
I would like to be able to also expose:
$ GET http://example.com/image
Where the response body would be a multipart/related file (for example). Something like this :
--myboundary
Content-Type: image/jpeg;
Content-Length: 32922
MIME-Version: 1.0
<actual binary jpeg data from 'image.jpg'>
--myboundary
Is this something supported out of the box in the AWS Lambda#Edge API ? or should I use another solution to create such response ? In particular it seems that the response only deal with text or base64 (I would need binary in my case).

I finally was able to find complete documentation. I eventually stumble upon:
API Gateway - PORT multipart/form-data
which refers to:
Enabling binary support using the API Gateway console
The above documentation specify the steps to handle binary data. Pay attention that you need to base64 encode the response from lambda to pass it to API Gateway.

Best practice when returning a 413 response from REST API

My REST API occasionally needs to return a 413 'Payload too large' response.
As context: I use AWS with API Gateway and Lambda. Lambda has a maximum payload of 6Mb. Sometimes - less than 0.1% of requests - the payload is greater that 6Mb and my API returns a 413 status.
The way I deal with this is to provide an alternative way to request the data from the API - as a URL with the URL linked to the data stored as a json file on S3. The S3 is in a bucket with a lifecycle rule that automatically deletes the file after a short period.
This works OK, but has the unsatisfying characteristic that a large payload request results in the client making 3 separate calls:
Make a standard request to the API and receive the 413 response
Make a second request to the API for the data stored at an S3 URL. I use an asURL=true parameter in the GET request for this.
Make a third request to retrieve the data from the S3 bucket
An alternative I'm considering is embedding the S3 URL in the 413 response. For example, embedding it in a custom header. This would avoid the need for the second call.
I could also change the approach so that every request is returned as an S3 URL but then 99.9% of the requests would unnecessarily make 2 calls rather than just 1.
Is there a best practice here, or equally, bad practices to avoid?

I would do the way you said - embed S3 URL in the 413 response. Then the responsibility of recovering from 413 will be on the client to check for 413 in the response and call s3. If the consumer is internal then it would be ok. It could be an inconvenience if the consumer is external.

Streaming upload to Google Storage API when the final stream size is not known

So Google Storage has this great API for resumable uploads: https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload which I'd like to utilize to upload a large object in multiple chunks. However this is done a in stream processing pipeline where the total amount of bytes in the stream is not know in advance.
According to the documentation of the API, you're supposed to use Content-Range header to tell the Google Storage API that you're done uploading the file, e.g.:
PUT https://www.googleapis.com/upload/storage/v1/b/myBucket/o?uploadType=resumable&upload_id=xa298sd_sdlkj2 HTTP/1.1
Content-Length: 1024
Content-Range: bytes 1023-2048/2048
[BYTES 1023-2048]
If I'm understanding how this works correctly, that bytes 1023-2048/2048 value of the Content-Range header is how Google Storage determines that you're uploading the last chunk of data and it can successfully finish the resumable upload session after it's done.
In my case however the total stream size is not known in advance, so I need to keep uploading until there's no more data to upload. Is there a way to do this using the Google Storage API? Ideally I'd like some way of manually telling the API "hey I'm done, don't expect any more data from me".

In my case however the total stream size is not known in advance,
In this case you need to send Content-Range: bytes 1023-2048/* in the PUT requests. Note however, that these requests must be in multiples of 256KiB:
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload#example_uploading_the_file
so I need to keep uploading until there's no more data to upload. Is there a way to do this using the Google Storage API?
Yes. You send the chunks with bytes NNNNN-MMMMM/*.
Ideally I'd like some way of manually telling the API "hey I'm done, don't expect any more data from me".
You do that by either (a) sending a chunk that is not a multiple of 256KiB, or (b) sending a chunk with bytes NNN-MMM/(MMM+1). That is, the last chunk contains the total size for the upload and indicates that it contains the last byte.

The documentation you linked states that:
Content-Length. Required unless you are using chunked transfer encoding. Set to the number of bytes in the body of this initial request.
So if you click that link to chunked transfer encoding, the HTTP spec will explain how to send chunks of data until the transfer is complete:
Chunked enables content streams of unknown size to be transferred as a
sequence of length-delimited buffers, which enables the sender to
retain connection persistence and the recipient to know when it has
received the entire message.
It likely not going to be easy to implement this on your own, so I suggest finding an HTTP client library that knows how to do this for you.

AWS s3 upload api call returning 411 status

I have been trying to perform AWS s3 rest api call to upload document to s3 bucket. The document is in the form of a byte array.
PUT /Test.pdf HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: **********
Content-Type: application/pdf
Content-Length: 5039151
x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
x-amz-date: 20180301T055442Z
When we perform the api call, it gives the response status 411 i.e Length Required. We have already added the Content-Length header with the byte array length as value. But still the issue is repeating. Please help to resolve the issue.

x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD is only used with the non-standards-based chunk upload API. This is a custom encoding that allows you to write chunks of data to the wire. This is not the same thing as the Multipart Upload API, and is not the same thing as Transfer-Encoding: chunked (which S3 doesn't support for uploads).
It's not clear why this would result in 411 Length Required but the error suggests that S3 is not happy with the format of the upload.
For a standard PUT upload, x-amz-content-sha256 must be set to the hex-encoded SHA-256 hash of the request body, or the string UNSIGNED-PAYLOAD. The former is recommended, because it provides an integrity check. If for any reason your data were to become corrupted on the wire in a way that TCP failed to detect, S3 would automatically reject the corrupt upload and not create the object.
See also https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html

gzip request from Play Framework's WSClient

I'm trying to call a webservice using the WSClient API from Play Framework.
The main issue is that I want to transfer huge JSON payloads (more than 2MB) without exceeding the maximal payload size.
To do so, I would like to compress the request using gzip (with the HTTP header Content-Encoding: gzip). In the documentation, the parameter play.ws.compressionEnabled is mentioned, but it only seems to enable WSResponse compression.
I have tried to manually compress the payload (using a GZipOutputStream) and to put the header Content-Encoding:gzip, but the server throws a io.netty.handler.codec.compression.DecompressionException : Unsupported compression method 191 in the GZIP header.
How could I correctly compress my request ?
Thanks in advance

Unfortunately I don't think you can compress the request (it is not supported by Netty, the underlying library). You can find more info in https://github.com/AsyncHttpClient/async-http-client/issues/93 and https://github.com/netty/netty/issues/2132

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to upload an object to AWS S3 with "Transfer-Encoding: chunked" - amazon-web-services

Related

Wrap JPEG image in a multipart header using AWS Lambda#Edge

Best practice when returning a 413 response from REST API

Streaming upload to Google Storage API when the final stream size is not known

AWS s3 upload api call returning 411 status

gzip request from Play Framework's WSClient

Categories

Resources