Does boto2 use http or https to upload files to s3? - amazon-web-services

I noticed that uploading small files to S3 bucket is very slow. For a file with size of 100KB, it takes 200ms to upload. Both the bucket and our app are in Oregon. App is hosted on EC2.
I googled it and found some blogs; e.g. http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/
It's mentioned that http can bring much speed gain than https.
We're using boto 2.45; I'm wondering whether both uses https or http by default? Or is there any param to configure this behavior in boto?
Thanks in advance!

The boto3 client includes a use_ssl parameter:
use_ssl (boolean) -- Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.
Looks like it's time for you to move to boto3!

I tried boto3, which has a nice parameter "use_ssl" in connection constructor. However, it turned out that boto3 is significantly slower than boto2.... there're actually already many posts online about this issue.
Finally, I found that, in boto2, there's also a similar param "is_secure"
self.s3Conn = S3Connection(config.AWS_ACCESS_KEY_ID, config.AWS_SECRET_KEY, host=config.S3_ENDPOINT, is_secure=False)
Setting is_secure to False saves us about 20ms. Not bad..........

Related

LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413

I have node/express + serverless backend api which I deploy to Lambda function.
When I call an api, request goes to API gateway to lambda, lambda connects to S3, reads a large bin file, parses it and generates an output in JSON object.
The response JSON object size is around 8.55 MB (I verified using postman, running node/express code locally). Size can vary as per bin file size.
When I make an api request, it fails with the following msg in cloudwatch,
LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413
I can't/don't want to change this pipeline : HTTP API Gateway + Lambda + S3.
What should I do to resolve the issue ?
the AWS lambda functions have hard limits for the sizes of the request and of the response payloads. These limits cannot be increased.
The limits are:
6MB for Synchronous requests
256KB for Asynchronous requests
You can find additional information in the official documentation here:
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
You might have different solutions:
use EC2, ECS/Fargate
use the lambda to parse and transform the bin file into the desired JSON. Then save this JSON directly in an S3 public bucket. In the lambda response, you might return the client the public URL/URI/FileName of the created JSON.
For the last solution, if you don't want to make the JSON file visible to whole the world, you might consider using AWS Amplify in your client or/and AWS Cognito in order to give only an authorised user access to the file that he has just created.
As noted in other questions, API Gateway/Lambda has limits on on response sizes. From the discussion I read that latency is a concern additionally.
With these two requirements Lambda are mostly out of the question, as they need some time to start up (which can be lowered with provisioned concurrency) and do only have normal network connections (whereas EC2,EKS can have enhanced networking).
With this requirements it would be better (from AWS Point Of View) to move away from Lambda.
Looking further we could also question the application itself:
Large JSON objects need to be generated on demand. Why can't these be pre-generated asynchronously and then downloaded from S3 directly? Which would give you the best latency and speed and can be coupled with CloudFront
Why need the JSON be so large? Large JSONs also need to be parsed on the client side requiring more CPU. Maybe it can be split and/or compressed?

Bypassing Cloud Run 32mb error via HTTP2 end to end solution

I have an api query that runs during a post request on one of my views to populate my dashboard page. I know the response size is ~35mb (greater than the 32mb limits set by cloud run). I was wondering how I could by pass this.
My configuration is set via a hypercorn server and serving my django web app as an asgi app. I have 2 minimum instances, 1gb ram, 2 cpus per instance. I have run this docker container locally and can't bypass the amount of data required and also do not want to store the data due to costs. This seems to be the cheapest route. Any pointers or ideas would be helpful. I understand that I can bypass this via http2 end to end solution but I am unable to do so currently. I haven't created any additional hypecorn configurations. Any help appreciated!
The Cloud Run HTTP response limit is 32 MB and cannot be increased.
One suggestion is to compress the response data. Django has compression libraries for Python or just use zlib.
import gzip
data = b"Lots of content to compress"
cdata = gzip.compress(s_in)
# return compressed data in response
Cloud Run supports HTTP/1.1 server side streaming, which has unlimited response size. All you need to do is use chunked transfer encoding.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

Does AWS CPP S3 SDK support "Transfer acceleration"

I enabled "Transfer acceleration" on my bucket. But I dont see any improvement in speed of Upload in my C++ application. I have waited for more than 20 minutes that is mentioned in AWS Documentation.
Does the SDK support "Transfer acceleration" by default or is there a run time flag or compiler flag? I did not spot anything in the SDK code.
thanks
Currently, there isn't a configuration option that simply turns on transfer acceleration. You can however, use endpoint override in the client configuration to set the accelerated endpoint.
What I did to enable a (working) transfer acceleration:
set in the bucket configuration on the AWS panel "Transfer Acceleration" to enabled.
add to the IAM user that I use inside my C++ application the permission s3::PutAccelerateConfiguration
Add the following code to the s3 transfer configuration (bucket_ is your bucket name, the final URL must match the one shown in the AWS panel "Transfer Acceleration"):
Aws::Client::ClientConfiguration config;
/* other configuration options */
config.endpointOverride = bucket_ + ".s3-accelerate.amazonaws.com";
Ask for acceleration to the bucket before transfer... (docs in here )
auto s3Client = Aws::MakeShared<Aws::S3::S3Client>("Uploader",
Aws::Auth::AWSCredentials(id_, key_), config);
Aws::S3::Model::PutBucketAccelerateConfigurationRequest bucket_accel;
bucket_accel.SetAccelerateConfiguration(
Aws::S3::Model::AccelerateConfiguration().WithStatus(
Aws::S3::Model::BucketAccelerateStatus::Enabled));
bucket_accel.SetBucket(bucket_);
s3Client->PutBucketAccelerateConfiguration(bucket_accel);
You can check in the detailed logs of the AWS sdk that your code is using the accelerated entrypoint and you can also check that before the transfer start there is a call to /?accelerate (info)
What worked for me:
Enabling S3 Transfer Acceleration within AWS console
When configuring the client, only utilize the accelerated endpoint service:
clientConfig->endpointOverride = "s3-accelerate.amazonaws.com";
#gabry - your solution was extremely close, I think the reason it wasn't working for me was perhaps due to SDK changes since originally posted as the change is relatively small. Or maybe because I am constructing put object templates for requests used with the transfer manager.
Looking through the logs (Debug level) the SDK automatically concatenates the bucket used in transferManager::UploadFile() with the overridden endpoint. I was getting unresolved host errors as the requested host looked like:
[DEBUG] host: myBucket.myBucket.s3-accelerate.amazonaws.com
This way I could still keep the same S3_BUCKET macro name while only selectively calling this when instantiating a new configuration for upload.
e.g.
<<
...
auto putTemplate = new Aws::S3::Model::PutObjectRequest();
putTemplate->SetStorageClass(STORAGE_CLASS);
transferConfig->putObjectTemplate = *putTemplate;
auto multiTemplate = new Aws::S3::Model::CreateMultipartUploadRequest();
multiTemplate->SetStorageClass(STORAGE_CLASS);
transferConfig->createMultipartUploadTemplate = *multiTemplate;
transferMgr = Aws::Transfer::TransferManager::Create(*transferConfig);
auto transferHandle = transferMgr->UploadFile(localFile, S3_BUCKET, s3File);
transferMgr = Aws::Transfer::TransferManager::Create(*transferConfig);
...
>>

Possible to set Accept-Ranges header on Amazon S3

I have an application with very short-lived(5s) access tokens, paranoid client, and some of their users are accessing the S3 stored files using mobile connections so the lag can be quite high.
I've noticed that Amazon forcefully sends out the Accept-Ranges header on all requests, and I'd like to disable that for the files in question. So it would always download the entire file the first time around instead of downloading it chunks.
The main offender I've noticed for this is Chromes built-in PDF viewer. It'll start viewing the PDF, get a 200 response. Then it'll reconnect with a 206 header and start downloading the file in two chunks. If Chrome is too slow to start the download of all chunks before the access token expires it'll keep spamming requests towards S3 (600+ requests when I closed the window).
I've tried setting the header by changing it in the S3 console but while it says it saved it successfully it gets cleared instantly. I also tried to set the header with the signed request, as you can do for Content-Disposition for example, but S3 ignored the passed in header.
Or is there any other way to force a client to download the entire file at once?
Seems like it's not possible. Made the token expire later in hope it would take care of most cases.
But in case it doesn't make the client happy I will try and proxy it locally and remove all headers I don't like. Following this guide, https://coderwall.com/p/rlguog.

How to remove RequestTimeTooSkewed check from Amazon?

I have a Java 7 "agent" program running on several client machines (mostly Windows XP). My "agent" uploads client files to Amazon S3 and often I get this error:
RequestTimeTooSkewed
I know this is because the client's computer system time difference is too large compared to Amazon's. Here's my problem: I can't control the client's computer (system) time! So, I don't want Amazon to care about time differences.
I heard about jets3t, but I'm hoping not having to resort to yet another tool (agent footprint must remain small).
Any ideas how to remove this check and get rid of this pesky error?
Error detail:
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 59C9614D15006F23, AWS Error Code: RequestTimeTooSkewed, AWS Error Message: The difference between the request time and the current time is too large., S3 Extended Request ID: v1pGBm3ed2J9dZ3sG/3aDrG3DUGSlt3Ac+9nduK2slih2wyaAnc1n5Jrt5TkRzlV
The error is coming from the S3 service, not from the client, so there really isn't anything you can do other than correct the clock on the client. That check is being done on the service to help detect and prevent replay attacks so it's an important part of the overall security of the service.
Trying a different client-side SDK won't help.