ZF2 Doctrine2 App is very slow because of doctrine method calls - doctrine-orm

The request time for the homepage of my app is about 5 seconds although there are only 6 database queries. So I decided to install xdebug with webgrind on my local server to profile my app. There I can see, that I have a huge amount of doctrine method calls, but I don't know really how to interpret this to minify the number of that calls. Maybe someone could give me a hint.
RestaurantRepository
public function findByCity(City $city) {
$queryBuilder = $this->createQueryBuilder('restaurant');
$queryBuilder->addSelect('cuisines')
->addSelect('openingHours')
->addSelect('address')
->addSelect('zipCode')
->addSelect('city')
->leftJoin('restaurant.cuisines', 'cuisines')
->leftJoin('restaurant.openingHours', 'openingHours')
->leftJoin('restaurant.meals', 'meals')
->innerJoin('restaurant.address', 'address')
->innerJoin('address.zipCode', 'zipCode')
->innerJoin('zipCode.city', 'city')
->where('zipCode.city = :city')
->andWhere('restaurant.state <= :state')
->setParameter('city', $city)
->setParameter('state', Restaurant::STATE_ENABLED)
->orderBy('restaurant.state', 'ASC')
->addOrderBy('restaurant.name', 'ASC');
return $queryBuilder->getQuery()->getResult();
}

You probably load all associations of some of your entities. It is hard to say where the problem is exactly without any more information about your entity definitions and the queries you are executing.
In the doctrine documentation are some suggestions for improving performance (one of them is about lazy loading associations) that might help you to get on your way.

Install and enable the ZendDeveloperToolbar module. There you will have a possibility to check how many DB calls you are making with each action.
As you can see on the image, there's a lot of hydration going on under the hood. There's a lot of tutorials on the net how NOT to use Doctrine. I can't tell anything without looking at what you are doing with your entities.
Also make sure you have enabled cache when in production mode so Doctrine don't have to parse mapping information with each request, which is very heavy. You probably are using the Annotation Driver, which is the slowest one.
I can also see you are using the Zend autoloader which is inefficient comparing to Composer. Simply add your modules/src's to the autoload section of composer.json file and let the Composer do the autoloading.

Related

Slow page generation in Django with 50+ sql queries per page

In my Django app I noticed that pages with big number of sql queries load considerably slower than other pages. I'm not a first day in web dev and mainly I have a deal with such a resource hog as Drupal, but even Drupal with its 150 - 200 sql queries per page generates page in 0.5 - 0.7 sec.
Django from the other side, performs really bad with more or less average number of queries per page. For example, one of my pages generates 60 queries like this:
SELECT`gamenode_gamenode`.`id`, `gamenode_gamenode`.`title`, `gamenode_gamenode`.`short_desc`, `gamenode_gamenode`.`full_desc`, `gamenode_gamenode`.`slug`, `gamenode_gamenode`.`type`, `gamenode_gamenode`.`source_gameid`, `gamenode_gamenode`.`created`, `gamenode_gamenode`.`updated`, `gamenode_gamenode`.`status`, `gamenode_gamenode`.`promote`, `gamenode_gamenode`.`sticky`, `gamenode_gamenode`.`hit_count`, `gamenode_gamenode`.`game_rank`, `gamenode_gamenode`.`share_count`, `gamenode_gamenode`.`like_count`, `gamenode_gamenode`.`comment_count` FROM `gamenode_gamenode` WHERE `gamenode_gamenode`.`id` = 1058
and outputs the data as a simple string and it takes 1200ms to generate a page! I did this just for a test to generate many fairly simple queries. If I lower the number of queries to 10 - 15, page generation time will come back to more or less acceptable number.
So I have a question, why Django is so slow when there are many sql queries on the page? I did similar comparisons by using Rails, Symfony and Drupal and all these "resource hogs" performed way better than Django. Am I doing something wrong or there's some "secret" setting to make things faster in Django or, maybe, Djangonauts consider such times as normal and just strive to write code which produces as few queries as possible? Please help me to figure this out.
Yes, Django's ORM is pretty slow. You have three choices for dealing with this:
Complain about it.
Switch to another web application framework.
Make some effort to understand why your application is generating so many database queries, and learn how to use Django's ORM effectively so as to reduce the number of queries.
(1) might be psychologically satisfying but won't solve your problem; (2) is off-topic here at Stack Overflow (but you might look at Wikipedia's Comparison of web application frameworks).
We can help you with (3), but only if you show us some more of your code. The query you quoted looks like a typical query that Django would generate for a call to get():
GameNode.objects.get(id = 1058)
You shouldn't be running more than a couple of queries like this on a page: if you want to get many GameNodes you need to get them in a single query:
GameNode.objects.filter(<criteria>)
Or if the GameNode objects are related to some other object by a foreign key on another model that you are querying, then you could fetch all the related GameNode objects by using Django's select_related() method.
There's almost always a way to speed things up (see this testimonial) but we need to know the details before we can say how to do it.

Running multiple sites on the same python process

In our company we make news portals for a pretty big number of local newspapers (currently 13, going to 30 next month and more in the future), each with 2k to 100k page views/day. Since we are evolving from a situation where each site was heavily customized to one where each difference is a matter of configuration or custom template, our software is already pretty much the same for all sites. Right now our deployment strategy is one gunicorn instance for each site (with 1-17 workers each, depending on the site traffic), on a 16-core server and 12GB RAM. The problem with this setup is that each worker (regular pre-forked gunicorn) takes 110MB, whether its being used or not. Now with the new sites we would need to add more RAM to serve not that much many requests, so basically it doesn't scale. Also, since we are moving from this model where each site is independent, each site has its own database and I quite like it that way, especially since we are using relational databases (mysql, but migrating to pgsql), so its much easier to shard this way.
I'm doing some research and experimenting with running all sites on one gunicorn instance, so I could use the servers fully and add more servers behind a load balancer when it came to it. The problem is that django assumes in a lot of places that only one site is running per process, so for what I've thought of so far I'd have to implement:
A middleware that takes the HTTP_HOST from the request and places an identifier on a threadlocal variable.
A template loader that uses that variable to load custom templates accordingly.
Monkey patch django.db.model.Model, probably adding a metaclass (not even sure that's possible, but I think I would need it because of the custom managers we sometimes need to use) that would overwrite the managers for one that would first call db_manager(identifier) on the original manager and then call the intended method. I would also need to overwrite the save and delete methods to always include the using=identifier parameter.
I guess I would need to stop using inclusion_tag decorators, not a big problem, but I need to think of other cases like this.
Heavy and ugly patching of urlresolvers if I need custom or extra urls for each site. I don't need them now, but probably will at some point.
And this is just is what I came up with without even implementing it and seeing where it breaks, I'm sure I'd need many more changes for it to work. So I really don't want to do it, especially with the extra maintenance effort I'll need, but I don't see any alternatives and would love to learn that someone already solved this in a better way. Of course I could also stop using django altogether (I already have many reasons to do so) but that would mean a major rewrite and having two maintain two incompatible branches of the software until the new one reached feature parity with the django version, so to me it seems even worse than all the ugly hacks.
I've recently developed an e-commerce system with similar requirements -- many instances running from the same project sharing almost everything. The previous version of the system was a bunch of independent installations (~30) so it was pretty unmaintainable. I'm sure the requirements still differ from yours (for example, all instances shared the same models in my case), but it still might be useful to share my experience.
You are right that Django doesn't help with scenarios like this out of the box, but it's actually surprisingly easy to work it around. Here is a brief description of what I did.
I could see a synergy between what I wanted to achieve and django.contrib.sites. Also because many third-party Django apps out there know how to work with it and use it, for example, to generate absolute URLs to the current site. The major problem with sites is that it wants you to specify the current site id in settings.SITE_ID, which a very naive approach to the multi host problem. What one naturally wants, and what you also mention, is to determine the current site from the Host request header. To fix this problem, I borrowed the hook idea from django-multisite: https://github.com/shestera/django-multisite/blob/master/multisite/threadlocals.py#L19
Next I created an app encapsulating all the functionality related to the multi host aspect of my project. In my case the app was called stores and among other things it featured two important classes: stores.middleware.StoreMiddleware and stores.models.Store.
The model class is a subclass of django.contrib.sites.models.Site. The good thing about subclassing Site is that you can pass a Store to any function where a Site is expected. So you are effectively still just using the old, well documented and tested sites framework. To the Store class I added all the fields needed to configure all the different stores. So it's got fields like urlconf, theme, robots_txt and whatnot.
The middleware class' function was to match the Host header with the corresponding Store instance in the database. Once the matching Store was retrieved, It would patch the SITE_ID in a way similar to https://github.com/shestera/django-multisite/blob/master/multisite/middleware.py. Also, it looked at the store's urlconf and if it was not None, it would set request.urlconf to apply its special URL requirements. After that, the current Store instance was stored in request.store. This has proven to be incredibly useful, because I was able to do things like this in my views:
def homepage(request):
featured = Product.objects.filter(featured=True, store=request.store)
...
request.store became a natural additional dimension of the request object throughout the project for me.
Another thing that was defined on the Store class was a function get_absolute_url whose implementation looked roughly like this:
def get_absolute_url(self, to='/'):
"""
Return an absolute url to this `Store` or to `to` on this store.
The URL includes http:// and the domain name of the store.
`to` can be an object with `get_absolute_url()` or an absolute path as string.
"""
if isinstance(to, basestring):
path = to
elif hasattr(to, 'get_absolute_url'):
path = to.get_absolute_url()
else:
raise ValueError(
'Invalid argument (need a string or an object with get_absolute_url): %s' % to
)
url = 'http://%s%s%s' % (
self.domain,
# This setting allowed for a sane development environment
# where I just set it to ".dev:8000" and configured `dnsmasq`.
# The same value was also removed from the `Host` value in the middleware
# before looking up the `Store` in database.
settings.DOMAIN_SUFFIX,
path
)
return url
So I could easily generate URLs to objects on other than the current store, e.g.:
# Redirect to `product` on `store`.
redirect(store.get_absolute_url(product))
This was basically all I needed to be able to implement a system allowing users to create a new e-shop living on its own domain via the Django admin.

DJANGO persistant site wide memory

I am new to Django, and probably using it in a way thats not normal.
That said, I would like to find a way to have site wide memory.
To Explain.
I have a very simple setup where one compter will make posts to the site every few seconds.
I want this data to be saved off somewhere.
I want everyone who is viewing the webpage to see updates based on this data in near real time via some javascript.
So using the sample code below.
Computer A would do a post to set_data and set data to "data set"
Computer B,C,D,etc.... would then do a get to get_data and see "data set"
Unfortunatly B,C,D just see ""
I have a feeling what i need is memcached, but I am on a hostgator shared server and cannot install that. In the meantime I am just writing them to files. This works but is really inneficient, and I am hopeing to serve a large user base.
Thanks for any help.
#view.py
data=""
def set_data(request):
data = request.POST['data']
return HttpResponse("");
def get_data(request):
return HttpResponse(data);
memcached is lossy, hence doesn't fulfil "persistent".
Files are fine, but switch to accessing them via mmap.
Persistent storage is also called database (although for some cases Django's cache backend might work as well). Don't ever try to use global variables in web development.
Whether you should use a Django model or the cache backend really depends on your use case, but you just described a contrived example (or does your web app consist of a getter and a setter?).

Best way to move API from CodeIgniter to Django

In the beginning we made a project using CodeIgniter and we had some controllers that were used to connect an external NAS to the database via it's web interface, to cut a long story short we had a bunch of URL that required an API key to have access to avoid general hackery from outside sources calling the API.
The API existed for various tasks the NAS had to do (manage orders, upload data/images etc.), so we had a few different controllers (ie. one for Orders, Images, etc.) So the API folder looked something like this:
controllers/apiv1/
orders.php
images.php
...
Something along the lines of this:
class Orders extends ApiController {
function Orders()
{
parent::ApiController();
}
function get_paid()
{
$shop = self::get_shop();
$this->load->model('order');
echo json_encode($this->order->by_status($shop->shop_id, Order::STATUS_PAID));
}
}
Where the ApiController just checked the APIKey against the Shop that it was trying to access.
Now we are moving the project to Django, and I was just wondering the be way to setup this api again. I was thinking of making an API app for the project and importing the models in to the views.py and make some functions for everything, my problem here is there a way to break everything up nicely (into separate files for each of the various things)? Or should I just have the views.py full of everything and worry about it in the urls.
Or is there a better way? If possible I would like to separate the api into versions like (api/v1, api/v2, etc.) so that we can just route the urls to the new api without affecting the old. This may come in handy if we have various NAS's using different versions of the API (Hard to explain why...)
You could try using something like Django Piston or Django-tastypie to quickly get something working. The big advantage over using normal Django views is that you get most of the CRUD and serialization to JSON/Yaml/XML done for you.
Tastypie comes with a built-in shared-secret key authentication mechanism, and it's not difficult to find the equivalent code for Piston.
EDIT: BTW, I've been working with both Piston and Tastypie recently. I find Tastypie is easier to setup and the code base looks cleaner. That said, it lacks some features (coming on 1.0 though) that makes it impossible for me to use it at the moment. Piston is very easy to shoehorn into whatever you need, but the code seems to be growing stagnant, the author doesn't seem to be very responsive about open issues and you'll probably end up having your own fork with the bugfixes you need for your application to work properly. Not an ideal situation.

What is a sane way to perform a radical Django Model migration in a production environment?

I have an existing django web app that is in use. I have to radically migrate one key model in my design to a completely new design, but I want to cache all of the existing data for that model and migrate them to the new records in production when ready to deploy.
I can afford to bring my website down for a few hours one night and do whatever I need to do to migrate. What are some sane ways I can do this migration?
It seems any migration would need to:
1) Dump all of the existing data into some format, such as SQL, JSON, XML
2) Migrate the model to the new format
3) Reload the data into the new model using a conversion script
I also thought of trying to store all of the existing data in some other model called "OldModel" (if Model is the name of the existing model) and then migrating the data live.
There is a project to help with migrations that I've heard of: South.
Having said that, I admit we've not used it. We still plan our migrations using a file of SQL statements. Madness, I know, but it has the advantage of testability. You can run it as many times as necessary during development and staging testing before the "big deploy". It can be source controlled, diffed, etc. It can also, therefore, be called from a larger deployment script. Of course, we back up production before running it :-)
If your database does journaling, using the old-fashioned method has the added advantage that there is a transaction history that can be rolled back.
Experiments we've run with JSON, XML and "OldModel" -> "NewModel" style dumps have scaled pretty poorly. Mind you, YMMV... we have quite a large database. By using a script, you can run on your production database without having to offload or reload vast amounts of data. This way even a complicated migration can take seconds, rather than hours.
There are around 5 or 6 tools to help automate some portion of migrations. Several of them are listed in this question and I'll add the others just for completeness.
Next, see S. Lott's answer to this question about migration workflows for a great idea on using version numbers in the model name to make migrations easier, including structuring a standalone script to properly convert the tables. To my mind this is vastly superior to serializing the data for export and then trying to build your new tables by importing.
Finally, I haven't been able to think of a way to do a hot migration properly and haven't seen any hints from anywhere else either, so maintenance downtime is inevitable.
Make all migrations in steps!
If you need to add a field, go ahead and add it, with a default value or being optional. This is safe.
If you need to make an existing optional field required, give it a default first.
If you need to make an existing field with a default not have a default, drop the default after fixing all the code that creates instances.
If you need to change the type of a field, add a new field that inherits the value from the current one, first. Then, run a script to update the existing instances to populate the new field. Thirdly, Remove all the code that uses the old field to use the new one. Finally, which no code is left using the original, you can drop it.
For every situation there is a small step you can make. For every bigger change, you can break it down into little ones. This is one place iterative development pays off. Keep good backups in place and don't be afraid to push often! Make the small changes quickly to see if they work.
If you are more comfortable with the Django ORM than with raw SQL, you might consider using Model -> BackupModel -> TestModel -> Model, where all but the last step can be performed without dropping data.
def backup(InModel,OutModel):
in_objs = InModel.objects.all()
for obj in in_objs:
out_obj = OutModel.convert_from(InModel,obj)
out_obj.save()
Here, you would just make sure that all your models have convert_from methods implemented. These should all be trivial conversions except for BackupModel -> TestModel. In the other cases, nothing but the class would change, all data being identically preserved.
The advantage to this is that before you go rewriting all your interfaces, you can play around with TestModel and make sure that your conversions were what you thought they'd be. If everything goes wrong, you convert from BackupModel->Model, and everything is okay. In a worst-case scenario, you give up on Django's ORM, run back to SQL, and simply rename all your tables that begin with backupmodel__* to model__* in your database.
Disclaimer: I've never done this.