I have a website that is experiencing some slow initial response time. The site is built with Django, and runs on an Apache2 server on Ubuntu. I have been using the Django Debug Toolbar for debugging and optimization.
When making a request to a user profile page, the browser is 'waiting' for ~800ms and receiving for ~60ms for the initial request. However, the Django Debug Toolbar shows that time spent on the CPU and time spent on SQL queries only adds up to ~425ms.
Chrome devtools:
Django debug toolbar:
Even a request to the index page (which has no SQL queries and almost no processing - it just responds with the template) shows ~250ms wait time.
I tried temporarily upgrading the VM to a much more powerful CPU, but that did not (noticeably) change this metric.
This leads me to believe that the wait is not due to inefficient code or database latency, but instead due to some Apache or Ubuntu settings.
After the initial response, the other requests to load page resources (js files, images etc) have a much more reasonable wait time of ~20ms.
What could account for the relatively large initial 'waiting' time?
What tools can I use to get a better picture of where that time is going?
Related
I have a working Django application that is running locally using an sqlite3 database without problem. However, when I change the Django database settings to use my external AWS RDS database all my pages start taking upwards of 40 seconds to load. I have checked my AWS metrics and my instance is not even close to being fully utilized. When I make a request to a view with no database read/write operations I also get the same problem. My activity monitor shows my local CPU spiking with each request. It shows a process named 'WindowsServer' using most of the CPU during each request.
I am aware more latency is expected when using a remote database but I don't think this should result in 40 second page lags. What other problems that could be causing this behaviour?
AWS database monitoring
Local machine
So your computer has connection to the server in Amazon, that's the problem with latency. Production servers should be in the same place as DB servers(or should have very very good connection, so the latency is lowered as much as possible.)
--edit--
So we need more details. What is your ISP? What is your connection properties? Uplink, downlink? What are pings to servers in AWS?
I am currently using ajax , on set interval page would refresh giving a real time feel but i fear on long run if if it scale will clients have the problem, as page refresh every second I think it may overload server, which is the best option for a real time chat app using django, except redis(not free and it has limit)
The app should be scalable upto 1000 concurrent connections
Server is vps with 4 core 4gb ram and 4tb bandwidth
UPDATE
It appears this issue is caused by a bug related specifically to using Axis2 with ColdFusion and we have been able
to replicate the issue in our production environment on two different servers by
switching between Axis1 and Axis2. My original tests to compare the
two were apparently thwarted by an override in an Application.cfc
which forced Axis2.
We ran into a memory leak today which forced us to speed up the resolution to this issue. It resembled the leak
discussed here though we aren't sure if it is the exact same
problem
(https://www.hass.de/content/coldfusion-10-webservice-leaking-memory-trusted-cache-leaks-memory).
Our primary webservices are in Axis1 and we only switched to Axis2 for
this new set of webservices because we needed document literal style
for SalesForce and with Axis1 an invalid wsdl was being created (did
not properly describe all object types in arrays). So now we have it as
Axis1 and using a manually manipulated wsdl. Not entirely sure if it
will work out with SalesForce but as far as a general fix this works.
I am investigating an issue with our coldfusion based soap webservices in our production environment. It appears that the time between the return statement in the webservices method code and actually receiving a response can be significant and appears to directly correspond to the size of the response and/or number of objects.
In development a particular request that returns 1000 records takes about 6 seconds to return. However in production that same hit takes 50+ seconds to return. I added some timing code and found that the actual function code takes less than 1 second to run at the start of the request, meaning that generating the response is taking coldfusion about 50 seconds in production. Hitting the webservice with simple http request does not have the same slowness so seems to be soap/axis specific. The resulting xml is about 1MB which I have compared and found no differences. I also copied out settings from cfadmin in both environments to compare and could find no performance related setting differences.
Both environments are at the same CF 10 update level. The server monitor shows no significant memory usage. I also ran the request from in the server to make sure there wasn't some slow connection issues or https slowing things down but the results are the same.
Any suggestions or solution would be appreciated.
Additional notes...
CPU sits at about 17% for most of the time of the request which is a lot of work to be doing. Something is happening very inefficiently
I tried switching instance to Axis1 and back again followed by an instance restart and additional tests with no change in results
One possibility is that you have them throttled - check the "request tuning" in your CF administrator. By default the setting for "number of simultaneous web service requests" is 10. Are you looping and hitting the server? In production is there more traffic?
In server monitor enable profiling and monitoring, then click on "statistics". On the far right there is a little chart icon. click on it and you will see a chart and a counter legend in the top right. Then run your code. Does the "web services running" reach a threshold and cross into "web services queued" - if so you need to increase that threshold.
One more clue - in the server monitor do NOT run the "memory profiling for more than a few seconds - say 30. If you don't you will have performance problems for sure.
I was wondering that i always got the wrong message when i corrected my code, but not for long, it would return the right result of my program. Is that normal?
Sounds like you are not restarting the built-in django development server manually after making the changes:
The development server automatically reloads Python code for each
request, as needed. You don’t need to restart the server for code
changes to take effect. However, some actions like adding files don’t
trigger a restart, so you’ll have to restart the server in these
cases.
As documentation says, sometimes in order to see the changes, you need to restart the server manually.
Also, sometimes Django dev server reloader doesn't see the changes right away and it takes some time for the server to notice changes and trigger restart. If you see this often, restart the server manually.
Also note that in Django 1.7 kernel signals are used to autoreload the server on linux - this should make it pick up the changes and restart faster:
Changed in Django 1.7:
If you are using Linux and install pyinotify, kernel signals will be
used to autoreload the server (rather than polling file modification
timestamps each second). This offers better scaling to large projects,
reduction in response time to code modification, more robust change
detection, and battery usage reduction.
We are using Django 1.3.1 and Postgres 9.1
I have a view which just fires multiple selects to get data from the database.
In Django documents it is mentioned, that when a request is completed then ROLLBACK is issued if only select statements were fired during a call to a view. But, I am seeing lot of "idle in transaction" in the log, especially when I have more than 200 requests. I don't see any commit or rollback statements in the postgres log.
What could be the problem? How should I handle this issue?
First, I would check out the related post What does it mean when a PostgreSQL process is “idle in transaction”? which covers some related ground.
One cause of "Idle in transaction" can be developers or sysadmins who
have entered "BEGIN;" in psql and forgot to "commit" or "rollback".
I've been there. :)
However, you mentioned your problem is related to have a lot of
concurrent connections. It sounds like investigating the "locks" tip
from the post above may be helpful to you.
A couple more suggestions: this problem may be secondary. The primary
problem might be that 200 connections is more than your hardware and
tuning can comfortably handle, so everything gets slow, and when things
get slow, more things are waiting for other things to finish.
If you don't have a reverse proxy like Nginx in front of your web app,
considering adding one. It can run on the same host without additional
hardware. The reverse proxy will serve to regulate the number of
connections to the backend Django web server, and thus the number of
database connections-- I've been here before with having too many
database connections and this is how I solved it!
With Apache's prefork model, there is 1=1 correspondence between the
number of Apache workers and the number of database connections,
assuming something like Apache::DBI is in use. Imagine someone connects
to the web server over a slow connection. The web and database server
take care of the request relatively quickly, but then the request is
held open on the web server unnecessarily long as the content is
dribbled back to the client. Meanwhile, the database connection slot is
tied up.
By adding a reverse proxy, the backend server can quickly delivery a
repliy back to the reverse proxy and then free the backend worker and
database slot.. The reverse proxy is then responsible for getting the
content back to the client, possibly holding open it's own connection
for longer. You may have 200 connections to the reverse proxy up front,
but you'll need far fewer workers and db slots on the backend.
If you graph the db slots with MRTG or similar, you'll see how many
slots you are actually using, and can tune down max_connections in
PostgreSQL, freeing those resources for other things.
You might also look at pg_top to
help monitor what your database is up to.
I understand this is an older question, but this article may describe the problem of idle transactions in django.
Essentially, Django's TransactionMiddleware will not explicitly COMMIT a transaction if it is not marked dirty (usually triggered by writing data). Yet, it still BEGINs a transaction for all queries even if they're read only. So, pg is left waiting to see if any more commands are coming and you get idle transactions.
The linked article shows a small modification to the transaction middleware to always commit (basically remove the condition that checks if the transaction is_dirty). I'll be trying this fix in a production environment shortly.