I developed a ZF2 application with Doctrine 2 as ORM. The app is extremly slow, so I decided to install webgrind on my server to locate the bottleneck. There I saw, that I have an invocation count of round about 8.000 calls for the Doctrine\ORM\UnitOfWork->createEntity method. In another application with the same base system I come to a count of 56. At the page of the screenshot I have three database queries with a time of 13.94 ms and one of these queries has one join operation to another table.
How can I find out, where that huge amount of calls come from?
If you need further information, please let me know.
Related
I've got a web page loading pretty slowly, so I installed the Django Debug Toolbar. I'm pretty new at this, so I'm trying to figure out what I can do with it.
I can see the database did 264 queries in 205 ms. Looks kind of high. I'm pretty sure I can cut down on that by adding some indexes and just writing better queries. But my question is: What is a "good" number that should be trying to hit here? What is generally accepted as "fast enough" and further optimization isn't really worth it. 50ms? 20ms?
Also on this same page it's showing 2500ms in user CPU. That sounds terrible to me, and I'm surprised it's so much higher than the database, which I assumed was the bottleneck. Is this maybe an indication that I am trying to do too much in python code instead of at the database layer? Would reducing the number of SQL queries help with CPU? (Waiting between queries?). Again is there some well known target response time I should be aiming for.
I'm looking for a snappy response from my clients. Right now when I click around I can feel a "pregnant pause" before the pages load.
By default accessing related model fields results in one extra query per model per row. Look into select_related() and prefetch_related(), this usually cuts down number of queries and speeds things up by a lot. I think debug toolbar shows you the actual queries, if not, need to enable sql logs before doing any query optimizations. Once you cut down number of queries to a minimum (no extra queries per pow), look for the slowest query and use EXPLAIN sql syntax to see if indexes are being used, this is another area where it can get slow especially on big data.
Usually database is the bottleneck, unless you are doing some major looping in your code. If you believe python code is slow, then need to profile it, otherwise it's just guessing.
I have Neo4j v2.1.6 (default configuration) and Neo4j.rb v4.1.0. All queries are slow around 50ms. I have only 5 nodes in db.
For example:
User.find_by(person_id: 826268332)
CYPHER 47ms MATCH (n:`User`) WHERE (n.person_id = {n_person_id}) RETURN n LIMIT {limit_1} | {:n_person_id=>826268332, "limit_1"=>1}
Where can be a problem?
I'm one of the core maintainers of Neo4j.rb, along with Brian Underwood, who replied above. This is not exactly a full answer since we need to know more about your system to answer that, but I'm posting this here because it's too much for one comment.
My money is on something wrong with your DB or your system. We had a similar issue reported -- slow queries when working locally, no cause able to be determined -- for a user running Windows. See Neo4j.rb version 3.0 slow performance RoR, over 1024ms for all queries. We weren't able to pin it down. Locally, running that exact same query, I see 13ms the first time I run it and ~3ms every time after that. Indexing won't make a difference in a DB that small.
Ways to limit the chance of a problem and generally improve performance:
Use Ruby MRI 2.2.0
Use Neo4j 2.1.6 or 2.2.0
Use Mac or Linux, not Windows
Require the oj and oj_mimic_json gems in your app
You will see longer responses for a query like that if your db and app server are in two different networks.
Regarding the comment that this simple query is much faster in MongoDB and PostgreSQL: yes, it's going to be. Both of those return simple queries faster than Neo4j.rb for no fewer than two reasons:
The Ruby gems for connecting to those DBs do not use a REST interface, they use custom binary protocols.
Both of those are optimized for returning single records quickly, Neo is optimized for returning large groups of records quickly.
Before releasing Neo4j.rb 4.0, I did a ton of benchmarks against Postgres and MongoDB and found the same results: they crush us when returning single objects. (PostgreSQL is amazing technology general.) As soon as you start looking for related objects, though, things balance out, and as you add complexity, the difference becomes even more significant. I don't have any numbers to share, unfortunately, but I'll make a blog post about it sometime soon if I have some time.
That is strange. In the neo4j gem I often see simple queries run in around 1-5 ms.
For debugging, what if you did this?
User.where(yeti_person_id: 826268332).first
Also, what does this give you?
puts User.where(yeti_person_id: 826268332).to_cypher
So I implemented Haystack with ElasticSearch a week ago within our BETA application. One thing I can notice is that getting some data (large amount) back to our users (for example listing all the users within the application) is much faster by going through Haystack then Django's ORM. Now, I will be releasing a REST service (with TastyPie) to serve the possible tablets within the next weeks, as I want to be able to access the information from iPads, Nexus tablets and so on.
One thing I was wondering, is when should I be querying the ORM vs Haystack/ElasticSearch? For example, if the user on the tablet is requesting a specific set of users, should we let TastyPie query the ORM, or go to ElasticSearch?
If we look at this answer Django: Haystack or ORM, we can all agree that a DB is made to retrieve and write data. However, could we say that retrieving faster can be faster with Haystack/ElasticSearch once the search engine was updated?
I am a bit confused as to when, should we not be querying Haystack if it is much faster?!
To make things clear I guess you're talking about querying Elasticsearch via Haystack without later instantiating any objects for your search results with data from you database.
Some points to consider besides the points mentioned in the other post:
A search engine like Elasticsearch is highly optimized when dealing with full-text searches (When doing something with SQL it highly depends on the database/engine you are using)
Queries that are involving a lot of relations/joins will most like be easier to handle with the ORM, but on the other hand you can eg save data from foreign-key relations in a denormalized fashion when using ES which could give you a performance boost. Of course you can denormalize your database tables as well but this is quite often considered as a bad practice as long as you know what you are doing, eg when solving a performance bottleneck.
ES is somehow quite easy to scale while scaling your SQL DB might be more complicated.
Most likely this is a decision that depends very much on your use case, the amount of data to process and the queries you are intending to run. So the best thing of course is - as always - to do some benchmarking yourself and compare this two solutions. But don't do any premature optimisations as one big advantage of the ORM is to keep things simple - you don't have to care much about the integrity of your data and maintain an additional system.
Launching my second-ever Django site.
I've had problems in the past with Django's ORM (basically, the SQL it was generating just wasn't what I wanted and even using things like select_related() I couldn't wrangle it into what it should've been) -- I ended up just writing all my DB queries by hand in my views and using this function, taken from the Django docs, to turn the cursor's responses into usable dictionaries:
def dictfetchall(cursor, returnMultiDictAnyway=False):
"Returns all rows from a cursor as a dict"
desc = cursor.description
rows = [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
if len(rows) == 1 and not returnMultiDictAnyway:
return rows[0]
return rows
I'm almost ready to launch my site but I'm finding pretty huge performance problems on the two different webservers I've tried hosting the app with.
Locally, it doesn't run blazingly fast, but I generally put this down to my machine in general being a little slow. I don't have the numbers to hand (will add later on) but the SQL times aren't crazily high and I've made the effort to optimise MySQL (adding missing indexes etc).
Here's the app, running on two different webhosts (using bit.ly to avoid Google spidering these URLs, sorry!):
http://bit.ly/10iEWYt (hosted on Dreamhost, using Passenger WSGI)
http://bit.ly/UZ9adS (hosted on WebFaction, also using WSGI)
At the moment I have Debug=False on both of those hosts (so there shouldn't be a loading penalty) and a file-based cache of 15 minutes for each one. On the Dreamhost one I have an experimental cronjob hitting the homepage every 15 minutes in an effort to see if this keeps the Python server alive -- this doesn't seem to have done much.
If you try those links you should see how long it takes for the server to respond as you click around, even including the cache (try going from the homepage to another page then back home).
I've tried this profiling middleware but not really sure how to interpret results (can add them to this post later on when I'm home) -- in any case, the functions/lines it pointed to were all inside Django's own code so I struggled to relate that to my own views etc.
Is it likely that the dictfetchall() method above could be an issue here? I use that to work with the results of every DB query on the site (~5-10 per page, most on the homepage). I do have a few included templates but nothing too crazy. I have a context processor for common things like showing album reviews, which I use all over the place. I'm stumped about what else could be causing this slowness.
Thanks, hope this is enough info to be helpful.
EDIT: okay, here's a profiling trace of the site homepage: http://pastebin.com/raw.php?i=c7kHNXAZ -- struggling to interpret it, to be honest.
Also, I looked at the Debug Toolbar stats: 8 SQL queries in 246ms (looking currently at further optimising these), but total time for render of 3235ms (locally). This is what's confusing me.
Note: I'm using Postgres 9.x and Django ORM
I have some functions in my application which open a transaction, run a few queries, then do a couple full seconds of other things (3rd party API access, etc.), and then run a few more queries. The queries aren't very expensive,, but I've been concerned that, by having many transactions open for so long, I'll somehow bog down my database eventually or run out of connections or something. How big of a deal is this, performance-wise?
Keeping a transaction open has pros and cons.
On the plus side, every transaction has an overhead cost. If you can do a couple of related things in one transaction, you normally win performance.
However, you acquire locks on rows or whole tables along the way (especially with any kind of write operation). These are automatically released at the end of the transaction. If other processes might wait for the same resources, it is a very bad idea to call external processes while the transaction stays open.
Maybe you can do the call to the 3rd party API before you acquire any locks, and do all queries in swift succession afterwards?
Read about checking locks in the Postgres Wiki.
While not exact answer, I can't recommend this presentation highly enough.
“PostgreSQL When It’s Not Your Job” at DjangoCon US
It is from this year's DjangoCon, so there should be a video also, hopefully soon.
Plus check out authors blog, it's a golden mine of useful information on Postgres as a whole and django in particular. You'll find interesting info about transaction handling there.