I am using a django with apache mod_wsgi, my site has dynamic data on every page and all of the media (css, images, js) is stored on amazon S3 buckets liked via "http://bucket.domain.com/images/*.jpg" inside the markup . . . . my question is, can varnish still help me speed up my web server?
I am trying to knock down all the stumbling blocks here. Is there something else I should be looking at? I have made a query profiler for my code at each page renders at about 0.120 CPU seconds which seems quick enough, but when I use ab -c 5 -n 100 http://mysite.com/ the results are only Requests per second: 12.70 [#/sec] (mean) . . .
I realize that there are lots of variables at play, but I am looking for some guidance on things I can do and thought Varnish might be the answer.
UPDATE
here is a screenshot of my profiler
The only way you can improve your performance is if you measure what is slowing you down. Though it's not the best profiler in the world, Django has good integration with the hotshot profiler (described here) and you can figure out what is taking those 0.120 cpu seconds.
Are you using 2 cpus? If that's the case than perhaps the limitation is in the db when you use ab? I only say that because 0.120 * 12.70 is 1.5 which means that there's .5 seconds waiting for something. This could also be IO or something.
Adding another layer for no reason such as varnish is generally not a good idea. The only case where something like varnish would help is if you have slow clients with poor connections hold onto threads, but the ab test is not hitting this condition and frankly it's not a large enough issue to warrant the extra layer.
Now, the next topic is caching, which varnish can help with. Are your pages customized for each user, or can it be static for long periods of time? Often times pages are static except for a simple login status screen -- in this case consider off loading that login status to javascript with cookies. If you are able to cache entire pages then they would be extremely fast in ab. However, the next problem is that ab is not really a good benchmark of your site, since users aren't going to just sit at one page and hit f5 repeatedly.
A few things to think about before you go installing varnish:
First off, have you enabled the page caching middleware within Django?
Are etags set up and working properly?
Is your database server performing optimally?
Have you considered setting up caching using memcached within your code for common results? (particularly front pages and static pages displayed to non-logged-in users)
Except for query heavy dynamic pages that absolutely must display substantially different data for each user, .12 seconds seems like a long time to serve a page. Look at how you can work caching into your app to improve that performance. If you have a page that is largely static other than a username or something similar, cache the computed part of the page.
When Django's caching is working properly, ab on public pages should be lightning fast. If you aren't using any of the other features of Apache, consider using something lighter and faster like lighttpd or nginx.
Eric Florenzano has put together a pretty useful project called django-newcache that implements some advanced caching behavior. If you're running into the limitations of the built-in caching, consider it. http://github.com/ericflo/django-newcache
Related
I have to ask a more or less non-typical SO question and hope you don't mind. Right now I am developing my very first web application. I did set up an AJAX function that requests some data from a third party API and populates my html containers with the returned data.
Right now I query one single object and populate 3 html containers with around 15 lines of Javascript code. When i activate the process/function by clicking a button on my frontend, it needs around 6-7 seconds until the html content is updated.
Is this a reasonable time? The user experience honestly will be more than bad considering that I will have to query and manipulate far more data (I build a one-site dashboard related to soccer data).
There might be very controversal answers to that question, but what would be a fair enough time for the process to run using standard infrastructure? 1-2 seconds? (I will deploy the app on heroku or digitalocean and will implement a proper caching environment to handle "regular visitors").
Right now
I use a virtualenv and django server for the development
and a demo server from the third party API which might be slowed down for whatever reason (how to check this?)
which might effect the current time needed (there will be many more variables obv.).
Looking forward to your answers.
I personally think (probably a lot people might too) 6-7 secs is a significant delay for rendering a small page. The cause of this issue might not came from django directly. Check for the following:
I use a virtualenv and django server for the development
you may be running django devserver, production server might make things bit faster (use django-debug-toolbar to find what causing the delay)
Do db index in your model.
a demo server from the third party API which might be slowed down for whatever reason
use chrome developer tools 'network' tab to watch how long that third party call takes. it might not visible there if you call api in your view.py. in that case, add some timing code there to calculate how long it takes to return.
I am using django and tastypie for REST API.
For profiling, I am using django-silk and below is a summary of requests:
How do I profile the complete flow? Time taken except for database queries is (382 - 147) ms on average. How do I figure out the bottleneck and optimize/scale? I did use #silk_profile() for the get_object_list method for this resource, but even this method doesn't seem to be bottleneck.
I used caching for decreasing response time, but that didn't help much, what are the other options?
When testing using loader.io, the peak the server can handle is 1000 requests per 30 secs (which seems very low). Other than caching (which I already tried) what might help?
Here's a bunch of suggestions:
bring the query per request at least below 5 per request (34 per request is really bad)
install django toolbar and have a look where the time is spent
use gunicorn or uwsgi behind a reverse proxy (NGINX)
You have too much queries, even if they are relatively fast you spend
some time to reach database etc. Also if you have external cache
storage (for example, redis) it could take some time to connect
there.
To investigate slow parts of the code you have two options:
Use a profiler - profiling at local PC could make no sense if you have distributed system deployed to several machines
Add tracing points to your code that will record some message and current time (something like https://gist.github.com/dbf256/0f1d5d7d2c9aa70bce89). Deploy this patched code and test it with your load-testing tool and check logs.
I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.
My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).
The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.
So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).
It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.
But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?
The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.
Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.
Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.
As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.
I have been visiting some sites hosted on GAE and I found them to be very slow.
Pretty much all of them take longer than usual to load.
Time: (in seconds) [ YSlow ]
9.9 giftag.com
3.1 hotskills.net
1.9 jeeyo.net
1.5 appspot.com
Is it that App Engine Cloud is too slow, Bigtable is too slow ... or what?
You're using the YSlow plugin to measure this, and YSlow tells you why the site is slow (the cunning name is the clue). For example, in the case of gifttag.com, YSlow reports that:
This page has 9 external Javascript
scripts. Try combining them into one.
This page has 3 external stylesheets.
Try combining them into one. This page
has 13 external background images. Try
combining them with CSS sprites.
So it's get an 'E' grade for that. That's going to kill the perceived load performance of the site.
None of this has anything to do with appengine.
YSlow has nothing to do with the speed of the web app on the server side since it's a completely client side speed measurement (css, javascript, browser rendering, image loading, etc). But on the other side, I have heard that your application may be slow on App Engine if doesn't have much hits and traffic. This makes the App Engine not to cache the python runtime environment (have cold start), so this can make significant difference in performance of applications with low traffic.
Analysis: Google App Engine alluring, will be hard to escape
GAE's data access is in the order of seconds compared to a database which is measured in milliseconds. The difference is that BigTable scales to the millions of concurrent access due to the inherent isolation level of Read Uncommitted and the relaxed consistency.
No RDBMS can compute with that and still give consistency guarantees. To be honest, you don't really want to because for some applications you want strong guarantees over scalability.
We're hosting a django service for some clients using really really poor and intermittent connectivity. Satellite and GPRS connectivity in parts of Africa that haven't benefited from the recent fiber cables making landfall.
I've consolidated the javascripts and used minificatied versions, tried to clean up the stylesheets, and what not...
Like a good django implementer, I'm letting apache serve up all the static information like css and JS and other static media. I've enabled apache modules deflate (for gzip) and expired to try to minimize retransmission of the javascript packages (mainly jQuery's huge cost). I've also enabled django's gzip middleware (but that doesn't seem to do much in combination with apache's deflate).
Main question - what else is there to do to optimize bandwidth utilization?
Are there django optimizations in headers or what not to make sure that "already seen data" will not travel over the network?
The django caching framework seems to be tailored towards server optimization (minimize hitting the database) - how does that translate to actual bandwidth utilization?
what other tweaks on apache are there to make sure that the browser won't try to get data it already has?
Some of your optimizations are important for wringing better performance out of your server, but don't confuse them with optimizing bandwidth utilization. In other words gzip/deflate are relevant but Apache serving static content is not (even though it is important).
Now, for your problem you need to look at three things: how much data is being sent, how many connections are required to get the data, and how good are the connections.
You mostly have the first area covered by using deflate/gzip, expires, minimization of javascript etc. so I can only add one or two things you might not know about. First, you should upgrade to Django 1.1, if you haven't already, because it has better support for ETags/Expires headers for your Django views. You probably already have those headers working properly for static data from Apache but if you're using older Django they (probably) aren't being set properly on your dynamic views.
For the next area, number of connections, you need to consolidate your javascript and css files into as few files as possible to reduce the number of connections. Also very helpful can be consolidating your image files into a single "sprite" image. There are a few Django projects to handle this aspect: django-compress, django-media-bundler (which is the only one that will create image sprites), and you can also see this SO answer.
For the last area of how good are the connections you should look at global CDN as suggested by Alex, or at the very least host your site at an ISP closer to your users. This could be tough for Africa, which in my experience can't even get decent connectivity into European ISP's (at least southern Africa... northern Africa might be better).
You could delegate jQuery to a CDN which may have better connectivity with Africa, e.g., google (and, it's a free service!-). Beyond that I recommend anything every written (or spoken on video, there's a lot of that!-) by Steve Souders -- while his talks and books and essays are invaluable to EVERY web developer I think they're particularly precious to ones serving a low-bandwidth audience (e.g., one of his tips in his latest books and talks is about a substantial fraction of the world's browsers NOT getting compression benefits from deflate or gzip -- it's not so much about the browsers themselves, but about proxies and firewalls doing things wrong, so "manual compression" is STILL important then!).
This is definitely not an area I've had a lot of experience in, but looking into Django's ConditionalGetMiddleware might prove useful. I think it might help you solve the first of your bullet points.
EDIT: This might be a good place to start: http://docs.djangoproject.com/en/dev/topics/conditional-view-processing/