gzip request from Play Framework's WSClient - web-services

I'm trying to call a webservice using the WSClient API from Play Framework.
The main issue is that I want to transfer huge JSON payloads (more than 2MB) without exceeding the maximal payload size.
To do so, I would like to compress the request using gzip (with the HTTP header Content-Encoding: gzip). In the documentation, the parameter play.ws.compressionEnabled is mentioned, but it only seems to enable WSResponse compression.
I have tried to manually compress the payload (using a GZipOutputStream) and to put the header Content-Encoding:gzip, but the server throws a io.netty.handler.codec.compression.DecompressionException : Unsupported compression method 191 in the GZIP header.
How could I correctly compress my request ?
Thanks in advance

Unfortunately I don't think you can compress the request (it is not supported by Netty, the underlying library). You can find more info in https://github.com/AsyncHttpClient/async-http-client/issues/93 and https://github.com/netty/netty/issues/2132

Related

Do Amazon CloudFront or Azure CDN support dynamic compression for HTTP range requests?

AWS CloudFront and Azure CDN can dynamically compress files under certain circumstances. But do they also support dynamic compression for HTTP range requests?
I couldn't find any hints in the documentations only on the Google Cloud Storage docs.
Azure:
Range requests may be compressed into different sizes. Azure Front Door requires the content-length values to be the same for any GET HTTP request. If clients send byte range requests with the accept-encoding header that leads to the Origin responding with different content lengths, then Azure Front Door will return a 503 error. You can either disable compression on Origin/Azure Front Door or create a Rules Set rule to remove accept-encoding from the request for byte range requests.
See: https://learn.microsoft.com/en-us/azure/frontdoor/standard-premium/how-to-compression
AWS:
HTTP status code of the response
CloudFront compresses objects only when the HTTP status code of the response is 200, 403, or 404.
--> Range-Request has status code 206
See:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/206
• Yes, Azure CDN also supports dynamic compression for HTTP range requests wherein it is known as ‘object chunking’. You can describe object chunking as dividing the file to be retrieved from the origin server/resource into smaller chunks of 8 MB. When a large file is requested, the CDN retrieves smaller pieces of the file from the origin. After the CDN POP server receives a full or byte-range file request, the CDN edge server requests the file from the origin in chunks of 8 MB.
• After the chunk arrives at the CDN edge, it's cached and immediately served to the user. The CDN then prefetches the next chunk in parallel. This prefetch ensures that the content stays one chunk ahead of the user, which reduces latency. This process continues until the entire file is downloaded (if requested), all byte ranges are available (if requested), or the client terminates the connection.
Also, this capability of object chunking relies on the ability of the origin server to support byte-range requests; if the origin server doesn't support byte-range requests, requests to download data greater than 8mb size will fail.
Please find the below link for more details regarding the above: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-large-file-optimization#object-chunking
Also, find the below link for more clarification on the types of compression and the nature of compression for Azure CDN profiles that are supported: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-improve-performance#azure-cdn-standard-from-microsoft-profiles
Some tests have shown when dynamic compression is enabled in AWS CloudFront the range support is disabled. So Range and If-Range headers are removed from all request.

Wrap JPEG image in a multipart header using AWS Lambda#Edge

I have been trying to read the AWS Lambda#Edge documentation, but I still cannot figure out if the following option is possible.
Assume I have an object (image.jpg, with size 32922 bytes) and I have setup AWS as static website. So I can retrieve:
$ GET http://example.com/image.jpg
I would like to be able to also expose:
$ GET http://example.com/image
Where the response body would be a multipart/related file (for example). Something like this :
--myboundary
Content-Type: image/jpeg;
Content-Length: 32922
MIME-Version: 1.0
<actual binary jpeg data from 'image.jpg'>
--myboundary
Is this something supported out of the box in the AWS Lambda#Edge API ? or should I use another solution to create such response ? In particular it seems that the response only deal with text or base64 (I would need binary in my case).
I finally was able to find complete documentation. I eventually stumble upon:
API Gateway - PORT multipart/form-data
which refers to:
Enabling binary support using the API Gateway console
The above documentation specify the steps to handle binary data. Pay attention that you need to base64 encode the response from lambda to pass it to API Gateway.

How to upload an object to AWS S3 with "Transfer-Encoding: chunked"

I am using the awssdk for c++ to upload data to an S3 bucket. I am using the method PutObject() of the class Aws::S3::S3Client to upload the data.
However, before uploading the data, I need to know the content-length of the data I am uploading. I need to either set the content-length in the request (the request is of type Aws::S3::Model::PutObjectRequest). Or the awssdk library will try to determine the size of the body by itself (the awssdk does that by seeking to the end of payload data to read its size).
Either way, in my case I don't know yet the content-length of the data I am uploading. I am working on a server that compresses data. I have to wait for the compression to end to know the final size of the compressed data. I don't want to wait for that. Instead, I want to start uploading to S3 as the compressed data is being produced. The way that can be done is to use "Transfer-Encoding: chunked" when sending the HTTP post request to the S3 server.
Is there a way to use Aws::S3::S3Client to upload the data with "Transfer-Encoding: chunked" ? *
I've downloaded the source of the awssdk for c++, I've tweaked it a bit to force send "Transfer-Encoding: chunked" in the http request. But the S3 server returned HTTP error code 501 NotImplemented:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NotImplemented</Code>
<Message>A header you provided implies functionality that is not implemented</Message>
<Header>Transfer-Encoding</Header>
<RequestId>8F55B09D484DD66C</RequestId>
<HostId>gZk6zaPcObsTfclz0zXvKGtPT5udzDKigIrm7laD3csG30vhx3pa0eFFS8nh6t9k7XkDeJRm9Z4=</HostId>
</Error>

Streaming upload to Google Storage API when the final stream size is not known

So Google Storage has this great API for resumable uploads: https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload which I'd like to utilize to upload a large object in multiple chunks. However this is done a in stream processing pipeline where the total amount of bytes in the stream is not know in advance.
According to the documentation of the API, you're supposed to use Content-Range header to tell the Google Storage API that you're done uploading the file, e.g.:
PUT https://www.googleapis.com/upload/storage/v1/b/myBucket/o?uploadType=resumable&upload_id=xa298sd_sdlkj2 HTTP/1.1
Content-Length: 1024
Content-Range: bytes 1023-2048/2048
[BYTES 1023-2048]
If I'm understanding how this works correctly, that bytes 1023-2048/2048 value of the Content-Range header is how Google Storage determines that you're uploading the last chunk of data and it can successfully finish the resumable upload session after it's done.
In my case however the total stream size is not known in advance, so I need to keep uploading until there's no more data to upload. Is there a way to do this using the Google Storage API? Ideally I'd like some way of manually telling the API "hey I'm done, don't expect any more data from me".
In my case however the total stream size is not known in advance,
In this case you need to send Content-Range: bytes 1023-2048/* in the PUT requests. Note however, that these requests must be in multiples of 256KiB:
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload#example_uploading_the_file
so I need to keep uploading until there's no more data to upload. Is there a way to do this using the Google Storage API?
Yes. You send the chunks with bytes NNNNN-MMMMM/*.
Ideally I'd like some way of manually telling the API "hey I'm done, don't expect any more data from me".
You do that by either (a) sending a chunk that is not a multiple of 256KiB, or (b) sending a chunk with bytes NNN-MMM/(MMM+1). That is, the last chunk contains the total size for the upload and indicates that it contains the last byte.
The documentation you linked states that:
Content-Length. Required unless you are using chunked transfer encoding. Set to the number of bytes in the body of this initial request.
So if you click that link to chunked transfer encoding, the HTTP spec will explain how to send chunks of data until the transfer is complete:
Chunked enables content streams of unknown size to be transferred as a
sequence of length-delimited buffers, which enables the sender to
retain connection persistence and the recipient to know when it has
received the entire message.
It likely not going to be easy to implement this on your own, so I suggest finding an HTTP client library that knows how to do this for you.

AWS s3 upload api call returning 411 status

I have been trying to perform AWS s3 rest api call to upload document to s3 bucket. The document is in the form of a byte array.
PUT /Test.pdf HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: **********
Content-Type: application/pdf
Content-Length: 5039151
x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
x-amz-date: 20180301T055442Z
When we perform the api call, it gives the response status 411 i.e Length Required. We have already added the Content-Length header with the byte array length as value. But still the issue is repeating. Please help to resolve the issue.
x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD is only used with the non-standards-based chunk upload API. This is a custom encoding that allows you to write chunks of data to the wire. This is not the same thing as the Multipart Upload API, and is not the same thing as Transfer-Encoding: chunked (which S3 doesn't support for uploads).
It's not clear why this would result in 411 Length Required but the error suggests that S3 is not happy with the format of the upload.
For a standard PUT upload, x-amz-content-sha256 must be set to the hex-encoded SHA-256 hash of the request body, or the string UNSIGNED-PAYLOAD. The former is recommended, because it provides an integrity check. If for any reason your data were to become corrupted on the wire in a way that TCP failed to detect, S3 would automatically reject the corrupt upload and not create the object.
See also https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html