Bypassing Cloud Run 32mb error via HTTP2 end to end solution - django

I have an api query that runs during a post request on one of my views to populate my dashboard page. I know the response size is ~35mb (greater than the 32mb limits set by cloud run). I was wondering how I could by pass this.
My configuration is set via a hypercorn server and serving my django web app as an asgi app. I have 2 minimum instances, 1gb ram, 2 cpus per instance. I have run this docker container locally and can't bypass the amount of data required and also do not want to store the data due to costs. This seems to be the cheapest route. Any pointers or ideas would be helpful. I understand that I can bypass this via http2 end to end solution but I am unable to do so currently. I haven't created any additional hypecorn configurations. Any help appreciated!

The Cloud Run HTTP response limit is 32 MB and cannot be increased.
One suggestion is to compress the response data. Django has compression libraries for Python or just use zlib.
import gzip
data = b"Lots of content to compress"
cdata = gzip.compress(s_in)
# return compressed data in response

Cloud Run supports HTTP/1.1 server side streaming, which has unlimited response size. All you need to do is use chunked transfer encoding.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

Related

Do Amazon CloudFront or Azure CDN support dynamic compression for HTTP range requests?

AWS CloudFront and Azure CDN can dynamically compress files under certain circumstances. But do they also support dynamic compression for HTTP range requests?
I couldn't find any hints in the documentations only on the Google Cloud Storage docs.
Azure:
Range requests may be compressed into different sizes. Azure Front Door requires the content-length values to be the same for any GET HTTP request. If clients send byte range requests with the accept-encoding header that leads to the Origin responding with different content lengths, then Azure Front Door will return a 503 error. You can either disable compression on Origin/Azure Front Door or create a Rules Set rule to remove accept-encoding from the request for byte range requests.
See: https://learn.microsoft.com/en-us/azure/frontdoor/standard-premium/how-to-compression
AWS:
HTTP status code of the response
CloudFront compresses objects only when the HTTP status code of the response is 200, 403, or 404.
--> Range-Request has status code 206
See:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/206
• Yes, Azure CDN also supports dynamic compression for HTTP range requests wherein it is known as ‘object chunking’. You can describe object chunking as dividing the file to be retrieved from the origin server/resource into smaller chunks of 8 MB. When a large file is requested, the CDN retrieves smaller pieces of the file from the origin. After the CDN POP server receives a full or byte-range file request, the CDN edge server requests the file from the origin in chunks of 8 MB.
• After the chunk arrives at the CDN edge, it's cached and immediately served to the user. The CDN then prefetches the next chunk in parallel. This prefetch ensures that the content stays one chunk ahead of the user, which reduces latency. This process continues until the entire file is downloaded (if requested), all byte ranges are available (if requested), or the client terminates the connection.
Also, this capability of object chunking relies on the ability of the origin server to support byte-range requests; if the origin server doesn't support byte-range requests, requests to download data greater than 8mb size will fail.
Please find the below link for more details regarding the above: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-large-file-optimization#object-chunking
Also, find the below link for more clarification on the types of compression and the nature of compression for Azure CDN profiles that are supported: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-improve-performance#azure-cdn-standard-from-microsoft-profiles
Some tests have shown when dynamic compression is enabled in AWS CloudFront the range support is disabled. So Range and If-Range headers are removed from all request.

Kastrel tuning for large (>1MB) response json

I have .net core MVC API implementation. In my controller I try to query for 800 records from DB. In result my response body size is abound 6MB. In that case response time is over 6s. My service is in AWS cloud.
I made several tests to make diagnostic of service. In all these scenarios I still ready 800 records from DB. Here is list of my experiments:
Return only 10 records - my response time were under 800ms always and size of response body 20kB.
Return only 100 records - my response time were over 800ms but not timeouts and size of response body 145kB
Try to use my custom json serialization in controller await JsonSerializer.SerializeAsync(HttpContext.Response.Body, limitedResult); - a bit better result but like 10% only
Return only 850 records - my response time were over 6s but and size of response body 6MB
In service I don't have problem with memory or with restart service.
Looks like for Kastrel the problem is to serve large response data.
My objections are connected with buffers I/O which for large response will use disk what can affect performance of AWS docker image.
Question is how to optimize Kastrel to serve large response size?
UPDATE:
I enabled zip compression on server side. My files are compressed quite good because of json format. But result is exact the SAME. Network bandwidth is not a problem. So looks like between my controller and compression is bottleneck. Any suggestion how to configure .net core service to handle large response (>1 MB)?
Since you are already sure that network is not an issue - this mostly points to time being spent in serialization. You can try running the application on local machine and using a profiler such as PerfView to see where the time is spent most for big json.

How can my Heroku Flask web application support N users concurrently downloading an image file?

I am working on a Flask web application using Heroku. As part of the application, users can request to download an image from the server. That calls a function which has to then retrieve multiple images from my cloud storage (about 500 KB in total), apply some formatting, and return a single image (about 60 KB). It looks something like this:
#app.route('/download_image', methods=['POST'])
def download_image():
# Retrieve about 500 KB of images from cloud storage
base_images = retrieve_base_images(request.form)
# Apply image formatting into a single image
formatted_image = format_images(base_images)
# Return image of about 60 KB for download
formatted_image_file = io.BytesIO()
formatted_image.save(formatted_image_file, format='JPEG')
formatted_image_data = formatted_image_file.getvalue()
return Response(formatted_image_data,
mimetype='image/jpeg',
headers={'Content-Disposition': 'attachment;filename=download.jpg'})
My Procfile is
web: gunicorn my_app:app
How can I design/configure this to support N concurrent users? Let's say, for example, I want to make sure my application can support 100 different users all requesting to download an image at the same time. With several moving parts, I am unsure how to even go about doing this.
Also, if someone requests a download but then loses internet connection before their download is complete, would this cause some sort of lock that could endlessly stall, or would that thread/process automatically timeout after a short period and therefore be handled smoothly?
I currently have 1 dyno (on the Heroku free plan). I am willing to add more dynos if needed.
Run multiple Gunicorn workers:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
…
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
Note that Heroku sets a default WEB_CONCURRENCY for you based on your dyno size. You can probably handle a small number of concurrent requests right now.
However, you're not going to get anywhere close to 100 on a free dyno. This section appears between the previous two in the documentation:
Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–4 Gunicorn worker processes on a free, hobby or standard-1x dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.
Even if your application is very lightweight you probably won't be able to go above 6 workers on a single small dyno. Adding more dynos and / or increasing the number of dynos you run will be required.
Do you really need to support 100 concurrent requests? If you have four workers going, four users' requests can be served at the same time. If a fifth makes a request, that request just won't get responded to until one of the workers frees up. That's usually reasonable.
If your request takes an unreasonable amount of time to complete you have a few options besides adding more workers:
Can you cache the generated images?
Can you return a response immediately, create the images in a background job, and then notify the user that the images are ready? With some fancy front-end work this can be fairly transparent to the end user.
The right solution will depend on your specific use case. Good luck!

How do i check if cache is working on aws s3

How do I check if cacheControl is working on aws s3 file upload?
'CacheControl' => 'max-age=2592000',
In Google Chrome, go to inspect element and keep the network tab open. Then request the file in browser using the http url. You will be able to see whether you get it from the server or cache looking at the size column. It will show as (from memory cache), (from disk cache) or the file size if its received from server. Also you can observe 'Not modified' when its received from a cache.

Does boto2 use http or https to upload files to s3?

I noticed that uploading small files to S3 bucket is very slow. For a file with size of 100KB, it takes 200ms to upload. Both the bucket and our app are in Oregon. App is hosted on EC2.
I googled it and found some blogs; e.g. http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/
It's mentioned that http can bring much speed gain than https.
We're using boto 2.45; I'm wondering whether both uses https or http by default? Or is there any param to configure this behavior in boto?
Thanks in advance!
The boto3 client includes a use_ssl parameter:
use_ssl (boolean) -- Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.
Looks like it's time for you to move to boto3!
I tried boto3, which has a nice parameter "use_ssl" in connection constructor. However, it turned out that boto3 is significantly slower than boto2.... there're actually already many posts online about this issue.
Finally, I found that, in boto2, there's also a similar param "is_secure"
self.s3Conn = S3Connection(config.AWS_ACCESS_KEY_ID, config.AWS_SECRET_KEY, host=config.S3_ENDPOINT, is_secure=False)
Setting is_secure to False saves us about 20ms. Not bad..........