Django REST framework Query Throughput on AWS - django

I've trying to run a simple django server on amazon lightsail.
The server is supposed to be a backend that serves game state to twitch viewers via an extension. I'm currently trying to have each viewer poll my backend every second.
Trying a very bear bones read to start I'm essentially storing the game state in memory using django's cache (there's only one tracked 'player' for now) and serving it back to users via the django REST framework.
Given that my requests and responses are quite small I'd expect to easily serve at least a few hundred users but I seem to top out of the sustainable CPU usage at around 50 users. Is this expected for this setup? Are there optimizations I can make or obvious bottlenecks?
Here's my backend that gets polled every second (I update settings post-git pull to debug=False and update the databse settings)
https://github.com/boardengineer/extension
Normal backend reads come in at about 800 bytes.
I tried returning empty responses to test if the response size should be optimized but it only made a small difference.
I thought of removing header content to further reduce response size but I don't think i found a way to do so correctly.
I also tried removing some of my middleware hoping to make the server more lean but I failed to find any middleware to remove

Related

What exactly is caching and how do I add it to an app I have on heroku?

I have a data science type application where I am getting public information from FPDS and SAM gov't website. The site is currently on Heroku.
I would like cache views so if a person is researching more than one company they can quickly go back to earlier pages without having to fetch the results from the database every time.
Based on my limited knowledge that is what cashing does?
Second, I am looking at flash-caching and it doesn't appear to be that difficult to implement to the route's I would like to cache.
Now the question is on Heroku, you wouldn't use simplecashe would you? Would you use a different cache strategy? From the docs, the CASHE_TYPE can be simple, redis, memcached and several more. On Heroku would I need to store the cache on something like Redis or can I store it in memory? Ideally, to get everything up and running I would like the cache to be in memory.
Late answer to your question. Caching can be a number of techniques on client and server side to achieve a goal of reduced traffic, network transport, or speed.
I'll focus on one aspect from what you are asking: a redis integration with flask to achieve faster response from a flask app environment. Redis is 'blindingly' fast, imo, as an in-memory database. When I have many users asking for the same view (typically a report-style display), I can interrupt the view route to get the response from a named redis database, so that my flask server is not bound up in eternally regenerating the same contents, which in turn saves a good few cycles of the main back-end database. Of course, if the contents of that view/report change, I have to separately take care of that. Most importantly, Redis includes an expiry value for each entry, so one way of handling stale contents is to delete the redis contents ahead of the expiry time.
Let me know if you want sample code to demonstrate this.

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.

Using memcached with a dynamic django backend

My Django backend is always dynamic. It serves an iOS app similar to that of Instagram and Vine where users upload photos/videos and their followers can comment and like the content. Just for the sake of this question, imagine my backend serves an iOS app that is exactly like Instagram.
Many sources claim that using memcached can improve performance because it decreases the amount of hits that are made to the database.
My question is, for a backend that is already in dynamic in nature (always changing since users are uploading new pictures, commenting, liking, following new users etc..) what can I possibly cache?
It's a problem I've been thinking about for quite some time. I could cache the user profile data, but other than that, I don't know where else memcached would be useful.
Other sources mentioned using it everywhere in the backend where a 'GET' call is made but then I would need to set a suitable time limit to expire the cache since the app is always dynamic. What are your solutions and suggestions for getting around this problem?
You would cache whatever is being most frequently accessed from your Database. Make a list of the most frequent requests to get data from the database and cache the data in that priority.
Cache the most frequent requests based on category of the pictures
Cache based on users - power users go into cache (those which do a lot of data access)
Cache the most recent inserts (in case you have a page which shows the recently added posts/pictures)
I am sure you can come up with more scenarios. I am positive memcached (or any other caching) will help, even though your app is very 'dynamic'.

implementing memcached: native django's modules or on (front server, reverse proxy) nginx

I have a setup like, nginx in the front for serving static files and reverse proxying to apache for django via mod_wsgi and I want to implement memcached in my setup. I don't have a huge traffic that my server will not handle today but it will get larger soon, it's best to be ready before.
I see two options for me: The first one is using django's native memcached module which handles many things automatically (afaik, confirm on comments pls), such as when a database entry is updated, it removes the related key, and maybe user authenticated pages (confirm please).
The other one is implementing memcached on nginx. The responsible structure for caching should be the front server seems more semantic to me; I am not quite sure of that but it is like division of responsibility. However, if I choose this was, I have to write more code for releasing cache keys on updates and user auth's. This will take some time of course, but I am in no rush.
The first one is the easy way, second one is harder but seems more logical. What would be the best option in terms of manageability and response times and the work required to implement? Would it worth it?
Also, there is only one site I am hosting that would require caching right now, but it will be more sites in the future and they may not be based on python. You might want to consider this.
There may be an advantage to going the nginx route... but I'm not seeing it.
The advantages to using Django's module:
You can set data to cache, such as expensive queries and API call results, rather than be locked into caching the whole view.
It's easy, and then you can get back to making your application cool.

Speeding up page loads while using external API's

I'm building a site with django that lets users move content around between a bunch of photo services. As you can imagine the application does a lot of api hits.
for example: user connects picasa, flickr, photobucket, and facebook to their account. Now we need to pull content from 4 different apis to keep this users data up to date.
right now I have a a function that updates each api and I run them all simultaneously via threading. (all the api's that are not enabled return false on the second line, no it's not much overhead to run them all).
Here is my question:
What is the best strategy for keeping content up to date using these APIs?
I have two ideas that might work:
Update the apis periodically (like a cron job) and whatever we have at the time is what the user gets.
benefits:
It's easy and simple to implement.
We'll always have pretty good data when a user loads their first page.
pitfalls:
we have to do api hits all the time for users that are not active, which wastes a lot of bandwidth
It will probably make the api providers unhappy
Trigger the updates when the user logs in (on a pageload)
benefits:
we save a bunch of bandwidth and run less risk of pissing off the api providers
doesn't require NEARLY the amount of resources on our servers
pitfalls:
we either have to do the update asynchronously (and won't have
anything on first login) or...
the first page will take a very long time to load because we're
getting all the api data (I've measured 26 seconds this way)
edit: the design is very light, the design has only two images, an external css file, and two external javascript files.
Also, the 26 seconds number comes from the firebug network monitor running on a machine which was on the same LAN as the server
Personally, I would opt for the second method you mention. The first time you log in, you can query each of the services asynchronously, showing the user some kind of activity/status bar while the processes are running. You can then populate the page as you get the results back from each of the services.
You can then cache the results of those calls per user so that you don't have to call the apis each time.
That lightens the load on your servers, loads your page fast, and provides the user with some indication of activity (along with incrimental updates to the page as their content loads). I think those add up to the best User Experience you can provide.