How to improve web-service api throughput? - amazon-web-services

I'm new to creating web service. So I'd like to know what i'm missing out on performance (assuming i'm missing something).
I've build a simple flask app. Nothing fancy, it just reads from the DB and responds with the result.
uWSGI is used for WSGI layer. I've run multiple tests and set process=2 and threads=5 based on performance monitoring.
processes = 2
threads = 5
enable-threads = True
AWS ALB is used for load balancer. uWSGI and Flask app is dockerized and launched in ECS (3 container [1vCPU]).
The for each DB hit, the flask app takes 1 - 1.5 sec to get the data. There is no other lag on the app side. I know it can be optimised. But assuming that the request processing time takes 1 - 1.5 sec, can the throughput be increased?
The throughput I'm seeing is ~60 request per second. I feel it's too low. Is there any way to increase the throughput with the same infra ?
Am i missing something here or is the throughput reasonable given that the DB hit takes 1.5 sec?
Note : It's synchronous.

Related

Performance of a cluster Redis with Elastic Cache

I am making performance tests with a Redis Cluster mode enable ( AWS Elastic Cache - default.redis6.x.cluster.on ). cluster. And monitoring with datadog .
For the test we made a application who make juste a set and get on Redis. We make some threads that make calls to this app. And by the threads, we make around 100 calls simultaneously. So redis will receive also around 100 calls ( a keys 'abc' operation)
what we saw is that when we have only one to 10 thread (10 calls simultaneously) . every redis call spent 1-5 mileseconds. but when we make 100 calls simultaneously or more, this time is 300-500 ms for every calls.
I would know if there are some way to discover a average performance on this scenario, if should be normal have around 300-500 ms (mileseconds) ? What could be the piste for analyse the performance of Redis?
Thank you

How can I optimise requests/second under peak load for Django, UWSGI and Kubernetes

We have an application that experiences some pretty short, sharp spikes - generally about 15-20mins long with a peak of 150-250 requests/second, but roughly an average of 50-100 requests/second over that time. p50 response times around 70ms (whereas p90 is around 450ms)
The application is generally just serving models from a database/memcached cluster, but also sometimes makes requests to 3rd party APIs etc (tracking/Stripe etc).
This is a Django application running with uwsgi, running on Kubernetes.
I'll spare you the full uwsgi/kube settings, but the TLDR:
# uwsgi
master = true
listen = 128 # Limited by Kubernetes
workers = 2 # Limited by CPU cores (2)
threads = 1
# Of course much more detail here that I can load test...but will leave it there to keep the question simple
# kube
Pods: 5-7 (horizontal autoscaling)
If we assume 150ms average response time, I'd roughly calculate a total capacity of 93requests/second - somewhat short of our peak. In our logs we often get uWSGI listen queue of socket ... full logs, which makes sense.
My question is...what are our options here to handle this spike? Limitations:
It seems the 128 listen queue is determined by the kernal, and the kube docs suggest it's unsafe to increase this.
Our Kube nodes have 2 cores. The general advice seems to be to set your number of workers to 2 * cores (possibly + 1), so we're pretty much at our limit here. Increasing to 3 doesn't seem to have much impact.
Multiple threads in Django can apparently cause weird bugs
Is our only option to keep scaling this horizontally at the kubernetes level? Aside from making our queries/caching as efficient as possible of course.

How can my Heroku Flask web application support N users concurrently downloading an image file?

I am working on a Flask web application using Heroku. As part of the application, users can request to download an image from the server. That calls a function which has to then retrieve multiple images from my cloud storage (about 500 KB in total), apply some formatting, and return a single image (about 60 KB). It looks something like this:
#app.route('/download_image', methods=['POST'])
def download_image():
# Retrieve about 500 KB of images from cloud storage
base_images = retrieve_base_images(request.form)
# Apply image formatting into a single image
formatted_image = format_images(base_images)
# Return image of about 60 KB for download
formatted_image_file = io.BytesIO()
formatted_image.save(formatted_image_file, format='JPEG')
formatted_image_data = formatted_image_file.getvalue()
return Response(formatted_image_data,
mimetype='image/jpeg',
headers={'Content-Disposition': 'attachment;filename=download.jpg'})
My Procfile is
web: gunicorn my_app:app
How can I design/configure this to support N concurrent users? Let's say, for example, I want to make sure my application can support 100 different users all requesting to download an image at the same time. With several moving parts, I am unsure how to even go about doing this.
Also, if someone requests a download but then loses internet connection before their download is complete, would this cause some sort of lock that could endlessly stall, or would that thread/process automatically timeout after a short period and therefore be handled smoothly?
I currently have 1 dyno (on the Heroku free plan). I am willing to add more dynos if needed.
Run multiple Gunicorn workers:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
…
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
Note that Heroku sets a default WEB_CONCURRENCY for you based on your dyno size. You can probably handle a small number of concurrent requests right now.
However, you're not going to get anywhere close to 100 on a free dyno. This section appears between the previous two in the documentation:
Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–4 Gunicorn worker processes on a free, hobby or standard-1x dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.
Even if your application is very lightweight you probably won't be able to go above 6 workers on a single small dyno. Adding more dynos and / or increasing the number of dynos you run will be required.
Do you really need to support 100 concurrent requests? If you have four workers going, four users' requests can be served at the same time. If a fifth makes a request, that request just won't get responded to until one of the workers frees up. That's usually reasonable.
If your request takes an unreasonable amount of time to complete you have a few options besides adding more workers:
Can you cache the generated images?
Can you return a response immediately, create the images in a background job, and then notify the user that the images are ready? With some fancy front-end work this can be fairly transparent to the end user.
The right solution will depend on your specific use case. Good luck!

Django 1.11 PostgreSQL - "SET TIME ZONE" command on every session

We are working out a couple of performance issues on one of our web sites, and we have noticed that the command "SET TIME ZONE 'America/Chicago'" is being executed so often, that in a 24 hour period, just under 1 hour (or around 4% of total DB CPU resources) is spent running that command.
Note that the "USE_TZ" setting is False, so based on my understanding, everything should be stored as UTC in the database, and only converted in the UI to the local time zone when necessary.
Do you have any ideas on how we can remove this strain on the database server?
For postgres Django always sets timezone: either server's local (when USE_TZ = False) or UTC (When USE_TZ = True). That way django supports "live switching" of settings.USE_TZ for postgreSQL DB backend.
How have you actually determined that this is a bottle-neck?
Usually SET TIME ZONE is only called during creation of connection to DB. Maybe you should use persistent connections by using settings.DATABASES[...]['CONN_MAX_AGE'] = GREATER_THAN_ZERO (docs). That way connections will be reused and you'll have less calls to SET TIME ZONE. But if you use that approach you should also take closer look at your PostgreSQL configuration:
max_connections should be greater than 1+maximum concurrency of wsgi server + max number of simultaneous cron jobs that use django (if you have them) + maximum concurrency of celery workers (if you have them) + any other potential sources of connections to postgres
if you are running cron job to call pg_terminate_backend then make sure that CONN_MAX_AGE is greater than "idle timeout"
if you are running postgres on VPS, then in some cases there might be
limits on number of open sockets)
if you are using something like pgbouncer then it may already be reusing connections
if you are killing server that serves your django project with sigkill (kill -9) then it may leave some unclosed connections to DB (but I'm not sure)
I think this may also happen if you use django.utils.timezone.activate. But I'm not sure of it... This may happen if you manually call it in your code or when you are using middleware to do this
Other possible explaining: the way youre are "profiling" your requests actually shows you the time of whole transaction

Scrapy + Splash returns a lot of 504 Time Out errors

I have followed Splash's FAQ for production setups and my system currently looks like this:
1 Scrapy Container with 6 concurrency requests.
1 HAProxy Container that load balance to splash containers
2 Splash Containers with 3 slots each.
I use docker stats to monitor my setup and I never get more than 7% CPU usage or more than 55% Memory usage.
I still get a lot of
DEBUG: Retrying <GET https://the/url/ via http://haproxy:8050/execute> (failed 1 times): 504 Gateway Time-out
For every successful request I get 6-7 of these timeouts.
I have experimented with changing the slots of the splash containers and the amount of concurrency requests. I've also tried running with a single splash container behind the HAProxy. I keep getting these errors.
I'm running on a AWS EC2 t2.micro instance which have 1gb memory.
I suspect that the issue is still related to the splash instance getting flooded. Is there any advice you can give me to reduce the load of the Splash instances? Is there a good ratio between slots and concurrency requests? Should I throttle requests?