Error: EntityTooLarge Status 400 AmazonAWS - amazon-web-services

I'm trying to upload a video from a Cordova app to an Amazon AWS S3 bucket from an Android/iPhone. But it's failing sometimes, giving sporadic reports of this error from AWS bucket:
http_status:400,
<Code>EntityTooLarge</Code>
Some of the files are tiny, some around 300mb or so.
What can I do to resolve this at the AWS end?

The 400 Bad Request error is sometimes used by S3 to indicate conditions that make the request in some sense invalid -- not just syntactically invalid, which is the traditional sense of 400 errors.
EntityTooLarge
Your proposed upload exceeds the maximum allowed object size.
400 Bad Request
http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
Note the word "proposed." This appears to be a reaction to the Content-Length request header you are sending. You may want to examine that. Perhaps the header is inconsistent with the actual size of the file, or the file is being detected as larger than it actually is.
Note that while the maximum object size in S3 is 5 TiB, the maximum upload size is 5 GiB. (Objects larger than 5 GiB have to be uploaded in multiple parts.)

413 errors occur when the request body is larger than the server is configured to allow. I believe its not the error which AWS S3 is throwing because they support 5 TB size of object.
If you are first accepting this video in your app and from there you are making request to amazon S3, then your server is not configure to accept the large entities in request.
Refer -set-entity-size for different servers. if your server is not listed here, then you need to figure out how to increase entity size for your server.

Related

LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413

I have node/express + serverless backend api which I deploy to Lambda function.
When I call an api, request goes to API gateway to lambda, lambda connects to S3, reads a large bin file, parses it and generates an output in JSON object.
The response JSON object size is around 8.55 MB (I verified using postman, running node/express code locally). Size can vary as per bin file size.
When I make an api request, it fails with the following msg in cloudwatch,
LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413
I can't/don't want to change this pipeline : HTTP API Gateway + Lambda + S3.
What should I do to resolve the issue ?
the AWS lambda functions have hard limits for the sizes of the request and of the response payloads. These limits cannot be increased.
The limits are:
6MB for Synchronous requests
256KB for Asynchronous requests
You can find additional information in the official documentation here:
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
You might have different solutions:
use EC2, ECS/Fargate
use the lambda to parse and transform the bin file into the desired JSON. Then save this JSON directly in an S3 public bucket. In the lambda response, you might return the client the public URL/URI/FileName of the created JSON.
For the last solution, if you don't want to make the JSON file visible to whole the world, you might consider using AWS Amplify in your client or/and AWS Cognito in order to give only an authorised user access to the file that he has just created.
As noted in other questions, API Gateway/Lambda has limits on on response sizes. From the discussion I read that latency is a concern additionally.
With these two requirements Lambda are mostly out of the question, as they need some time to start up (which can be lowered with provisioned concurrency) and do only have normal network connections (whereas EC2,EKS can have enhanced networking).
With this requirements it would be better (from AWS Point Of View) to move away from Lambda.
Looking further we could also question the application itself:
Large JSON objects need to be generated on demand. Why can't these be pre-generated asynchronously and then downloaded from S3 directly? Which would give you the best latency and speed and can be coupled with CloudFront
Why need the JSON be so large? Large JSONs also need to be parsed on the client side requiring more CPU. Maybe it can be split and/or compressed?

Do Amazon CloudFront or Azure CDN support dynamic compression for HTTP range requests?

AWS CloudFront and Azure CDN can dynamically compress files under certain circumstances. But do they also support dynamic compression for HTTP range requests?
I couldn't find any hints in the documentations only on the Google Cloud Storage docs.
Azure:
Range requests may be compressed into different sizes. Azure Front Door requires the content-length values to be the same for any GET HTTP request. If clients send byte range requests with the accept-encoding header that leads to the Origin responding with different content lengths, then Azure Front Door will return a 503 error. You can either disable compression on Origin/Azure Front Door or create a Rules Set rule to remove accept-encoding from the request for byte range requests.
See: https://learn.microsoft.com/en-us/azure/frontdoor/standard-premium/how-to-compression
AWS:
HTTP status code of the response
CloudFront compresses objects only when the HTTP status code of the response is 200, 403, or 404.
--> Range-Request has status code 206
See:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/206
• Yes, Azure CDN also supports dynamic compression for HTTP range requests wherein it is known as ‘object chunking’. You can describe object chunking as dividing the file to be retrieved from the origin server/resource into smaller chunks of 8 MB. When a large file is requested, the CDN retrieves smaller pieces of the file from the origin. After the CDN POP server receives a full or byte-range file request, the CDN edge server requests the file from the origin in chunks of 8 MB.
• After the chunk arrives at the CDN edge, it's cached and immediately served to the user. The CDN then prefetches the next chunk in parallel. This prefetch ensures that the content stays one chunk ahead of the user, which reduces latency. This process continues until the entire file is downloaded (if requested), all byte ranges are available (if requested), or the client terminates the connection.
Also, this capability of object chunking relies on the ability of the origin server to support byte-range requests; if the origin server doesn't support byte-range requests, requests to download data greater than 8mb size will fail.
Please find the below link for more details regarding the above: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-large-file-optimization#object-chunking
Also, find the below link for more clarification on the types of compression and the nature of compression for Azure CDN profiles that are supported: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-improve-performance#azure-cdn-standard-from-microsoft-profiles
Some tests have shown when dynamic compression is enabled in AWS CloudFront the range support is disabled. So Range and If-Range headers are removed from all request.

Elastic search 403 Request throttled due to too many requests /_bulk

I am trying to sync 1 million record to ES, and I am doing it using bulk API in batch of 2k.
But after inserting around 25k-32k, elastic search is giving following exception.
Unable to parse response body: org.elasticsearch.ElasticsearchStatusException
ElasticsearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [**********], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Request throttled due to too many requests]
403 Request throttled due to too many requests /_bulk]; nested: ResponseException[method [POST], host [************], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Request throttled due to too many requests]
403 Request throttled due to too many requests /_bulk];
I am using aws elastic search.
I think, I need to implement wait strategy to handle it, something like keep checking es status and call bulk insert if status all of ES okay.
But not sure how to implement it? Does ES offers anything pre-build for it?
Or Anything better way to handle this?
Thanks in advance.
Update:
I am using AWS elastic search version 6.8
Thanks #dravit for including my previous SO answer in the comment, after following the comments it seems OP wants to improve the performance of bulk indexing and want exponential backoff, which i don't think Elasticsearch provides out of the box.
I see that you are putting a pause of 1 second after every second which will not work in all the cases, and if you have large number of batches and documents to be indexed, for sure it will take a lot of time. There are few more suggestions from my side to improve the performance.
Follow my tips to improve the reindex speed in Elasticsearch and see what all things listed here is applicable and doing them improves speed by what factor.
Find a batching strategy which best suits to your environment, I am not sure but this article from #spinscale who is the developer of java high level rest client might help or you can ask a question on https://discuss.elastic.co/, I remembered he shared a very good batching strategy in one of his webinar but couldn't find the link of it.
Notice various ES metrics apart from bulk threadpool and queue size, and see if your ES still has capacity can you increase the queue size and increase the rate by which you can send requests to ES.
Check the error handling guide here
If you receive persistent 403 Request throttled due to too many requests or 429 Too Many Requests errors, consider scaling vertically. Amazon Elasticsearch Service throttles requests if the payload would cause memory usage to exceed the maximum size of the Java heap.
Scale your application vertically or increase delay between requests.

Possible to set Accept-Ranges header on Amazon S3

I have an application with very short-lived(5s) access tokens, paranoid client, and some of their users are accessing the S3 stored files using mobile connections so the lag can be quite high.
I've noticed that Amazon forcefully sends out the Accept-Ranges header on all requests, and I'd like to disable that for the files in question. So it would always download the entire file the first time around instead of downloading it chunks.
The main offender I've noticed for this is Chromes built-in PDF viewer. It'll start viewing the PDF, get a 200 response. Then it'll reconnect with a 206 header and start downloading the file in two chunks. If Chrome is too slow to start the download of all chunks before the access token expires it'll keep spamming requests towards S3 (600+ requests when I closed the window).
I've tried setting the header by changing it in the S3 console but while it says it saved it successfully it gets cleared instantly. I also tried to set the header with the signed request, as you can do for Content-Disposition for example, but S3 ignored the passed in header.
Or is there any other way to force a client to download the entire file at once?
Seems like it's not possible. Made the token expire later in hope it would take care of most cases.
But in case it doesn't make the client happy I will try and proxy it locally and remove all headers I don't like. Following this guide, https://coderwall.com/p/rlguog.

How to remove RequestTimeTooSkewed check from Amazon?

I have a Java 7 "agent" program running on several client machines (mostly Windows XP). My "agent" uploads client files to Amazon S3 and often I get this error:
RequestTimeTooSkewed
I know this is because the client's computer system time difference is too large compared to Amazon's. Here's my problem: I can't control the client's computer (system) time! So, I don't want Amazon to care about time differences.
I heard about jets3t, but I'm hoping not having to resort to yet another tool (agent footprint must remain small).
Any ideas how to remove this check and get rid of this pesky error?
Error detail:
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 59C9614D15006F23, AWS Error Code: RequestTimeTooSkewed, AWS Error Message: The difference between the request time and the current time is too large., S3 Extended Request ID: v1pGBm3ed2J9dZ3sG/3aDrG3DUGSlt3Ac+9nduK2slih2wyaAnc1n5Jrt5TkRzlV
The error is coming from the S3 service, not from the client, so there really isn't anything you can do other than correct the clock on the client. That check is being done on the service to help detect and prevent replay attacks so it's an important part of the overall security of the service.
Trying a different client-side SDK won't help.