Does running Django on GoogleAppEngine, as opposed to the default WebApp2 framework, consume additional resources? Any metrics?
From the mailing list, I've seen multiple comments on Django instances taking significantly longer to start up if cold. This would be noticeable if your app is rarely used.
From my app's log it looks like the first request took about 4 seconds to start up.
My Django instances use 41-43MB. Not sure about webapp2.
If you're using Django-nonrel as an ORM, I'm sure there's a few cycles spent in the ORM translation, but I doubt it's significant. My Tastypie REST layer can take anywhere between 40-120ms for the same request, it looks like that performance is dictated more by the datastore than anything else.
Related
I have a data science type application where I am getting public information from FPDS and SAM gov't website. The site is currently on Heroku.
I would like cache views so if a person is researching more than one company they can quickly go back to earlier pages without having to fetch the results from the database every time.
Based on my limited knowledge that is what cashing does?
Second, I am looking at flash-caching and it doesn't appear to be that difficult to implement to the route's I would like to cache.
Now the question is on Heroku, you wouldn't use simplecashe would you? Would you use a different cache strategy? From the docs, the CASHE_TYPE can be simple, redis, memcached and several more. On Heroku would I need to store the cache on something like Redis or can I store it in memory? Ideally, to get everything up and running I would like the cache to be in memory.
Late answer to your question. Caching can be a number of techniques on client and server side to achieve a goal of reduced traffic, network transport, or speed.
I'll focus on one aspect from what you are asking: a redis integration with flask to achieve faster response from a flask app environment. Redis is 'blindingly' fast, imo, as an in-memory database. When I have many users asking for the same view (typically a report-style display), I can interrupt the view route to get the response from a named redis database, so that my flask server is not bound up in eternally regenerating the same contents, which in turn saves a good few cycles of the main back-end database. Of course, if the contents of that view/report change, I have to separately take care of that. Most importantly, Redis includes an expiry value for each entry, so one way of handling stale contents is to delete the redis contents ahead of the expiry time.
Let me know if you want sample code to demonstrate this.
I am using django and tastypie for REST API.
For profiling, I am using django-silk and below is a summary of requests:
How do I profile the complete flow? Time taken except for database queries is (382 - 147) ms on average. How do I figure out the bottleneck and optimize/scale? I did use #silk_profile() for the get_object_list method for this resource, but even this method doesn't seem to be bottleneck.
I used caching for decreasing response time, but that didn't help much, what are the other options?
When testing using loader.io, the peak the server can handle is 1000 requests per 30 secs (which seems very low). Other than caching (which I already tried) what might help?
Here's a bunch of suggestions:
bring the query per request at least below 5 per request (34 per request is really bad)
install django toolbar and have a look where the time is spent
use gunicorn or uwsgi behind a reverse proxy (NGINX)
You have too much queries, even if they are relatively fast you spend
some time to reach database etc. Also if you have external cache
storage (for example, redis) it could take some time to connect
there.
To investigate slow parts of the code you have two options:
Use a profiler - profiling at local PC could make no sense if you have distributed system deployed to several machines
Add tracing points to your code that will record some message and current time (something like https://gist.github.com/dbf256/0f1d5d7d2c9aa70bce89). Deploy this patched code and test it with your load-testing tool and check logs.
I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.
My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).
The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.
So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).
It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.
But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?
The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.
Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.
Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.
As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.
If you are trying to diagnose slow queries in your mysql backend and are using a Django frontend, how do you tie together the slow queries reported by the backend with specific querysets in the Django frontend code?
I think you has no alternative besides logging every django query for the suspicious querysets.
See this answer on how to access the actual query for a given queryset.
If you install django-devserver, it will show you the queries that are being run and the time they take in your shell when using runserver.
Another alternative is django-debug-toolbar, which will do the same in a side panel-overlay on your site.
Either way, you'll need to test it out in your development environment. However, neither really solves the issue of pinpointing you directly to the offending queries; they work on a per-request basis. As a result, you'll have to do a little thinking about which of your views are using the database most heavily and/or deal with exceptionally large amounts of data, but by cherry-picking likely-candidate views and inspecting the times for the queries to run on those pages, you should be able to get a handle on which particular queries are the worst.
I have a setup like, nginx in the front for serving static files and reverse proxying to apache for django via mod_wsgi and I want to implement memcached in my setup. I don't have a huge traffic that my server will not handle today but it will get larger soon, it's best to be ready before.
I see two options for me: The first one is using django's native memcached module which handles many things automatically (afaik, confirm on comments pls), such as when a database entry is updated, it removes the related key, and maybe user authenticated pages (confirm please).
The other one is implementing memcached on nginx. The responsible structure for caching should be the front server seems more semantic to me; I am not quite sure of that but it is like division of responsibility. However, if I choose this was, I have to write more code for releasing cache keys on updates and user auth's. This will take some time of course, but I am in no rush.
The first one is the easy way, second one is harder but seems more logical. What would be the best option in terms of manageability and response times and the work required to implement? Would it worth it?
Also, there is only one site I am hosting that would require caching right now, but it will be more sites in the future and they may not be based on python. You might want to consider this.
There may be an advantage to going the nginx route... but I'm not seeing it.
The advantages to using Django's module:
You can set data to cache, such as expensive queries and API call results, rather than be locked into caching the whole view.
It's easy, and then you can get back to making your application cool.