I have an Apache2 and Django (mod_wsgi) setup that provides a RESTful API. I have a set of automated tests for this, that executes ~1000 API requests (pure http GET/POST/PUT/DELETE) in sequential order.
The problem is, for every 80 requests or so, I get a strange lag/timeout for exactly 5s or 10s. See timestamp examples here:
Request 1: 2013-08-30T03:49:20.915
Response 1: 2013-08-30T03:49:30.940
Request 2: 2013-08-30T03:50:32.559
Response 2: 2013-08-30T03:50:37.597
I can't figure out why this happens. I have an apache config with KeepAlive Off (recommended setup setting for Django) but otherwise standard install for Ubuntu 12.04 LTS.
I'm running the tests from the same server where the webserver is, I first thought this was some kind of DNS cache thing, but I've added the hostname I'm requesting to /etc/hosts but the problem persists.
The system is idle and have lots of cpu and mem when this lag/timeouts happens.
The lag is not specific to a certain request (URL), it seems kinda random.
Considering that it's always exactly to the millisecond 5s or 10s, it feels like this is some specific setting somewhere causing this.
In case it provides some insight, watch my talk from PyCon US.
http://lanyrd.com/2013/pycon/scdyzk/
The talk deals with things like process churn and startup costs. One thing you shouldn't do is set maximum requests if you don't really need it.
Also consider trying New Relic to help diagnose where the issue is. That will save a lot of guessing if it is a web application of backend service infrastructure issue.
As far as seeing how such monitoring can help, watch another one of my PyCon talks.
http://lanyrd.com/2012/pycon/spcdg/
This was a DNS issue, adding the domainname I used locally to /etc/hosts actually solved the problem. I just hadn't reboot the server for the changes to take effect, thought restarting networking would take care of that, but apparently not.
Related
UPDATE
It appears this issue is caused by a bug related specifically to using Axis2 with ColdFusion and we have been able
to replicate the issue in our production environment on two different servers by
switching between Axis1 and Axis2. My original tests to compare the
two were apparently thwarted by an override in an Application.cfc
which forced Axis2.
We ran into a memory leak today which forced us to speed up the resolution to this issue. It resembled the leak
discussed here though we aren't sure if it is the exact same
problem
(https://www.hass.de/content/coldfusion-10-webservice-leaking-memory-trusted-cache-leaks-memory).
Our primary webservices are in Axis1 and we only switched to Axis2 for
this new set of webservices because we needed document literal style
for SalesForce and with Axis1 an invalid wsdl was being created (did
not properly describe all object types in arrays). So now we have it as
Axis1 and using a manually manipulated wsdl. Not entirely sure if it
will work out with SalesForce but as far as a general fix this works.
I am investigating an issue with our coldfusion based soap webservices in our production environment. It appears that the time between the return statement in the webservices method code and actually receiving a response can be significant and appears to directly correspond to the size of the response and/or number of objects.
In development a particular request that returns 1000 records takes about 6 seconds to return. However in production that same hit takes 50+ seconds to return. I added some timing code and found that the actual function code takes less than 1 second to run at the start of the request, meaning that generating the response is taking coldfusion about 50 seconds in production. Hitting the webservice with simple http request does not have the same slowness so seems to be soap/axis specific. The resulting xml is about 1MB which I have compared and found no differences. I also copied out settings from cfadmin in both environments to compare and could find no performance related setting differences.
Both environments are at the same CF 10 update level. The server monitor shows no significant memory usage. I also ran the request from in the server to make sure there wasn't some slow connection issues or https slowing things down but the results are the same.
Any suggestions or solution would be appreciated.
Additional notes...
CPU sits at about 17% for most of the time of the request which is a lot of work to be doing. Something is happening very inefficiently
I tried switching instance to Axis1 and back again followed by an instance restart and additional tests with no change in results
One possibility is that you have them throttled - check the "request tuning" in your CF administrator. By default the setting for "number of simultaneous web service requests" is 10. Are you looping and hitting the server? In production is there more traffic?
In server monitor enable profiling and monitoring, then click on "statistics". On the far right there is a little chart icon. click on it and you will see a chart and a counter legend in the top right. Then run your code. Does the "web services running" reach a threshold and cross into "web services queued" - if so you need to increase that threshold.
One more clue - in the server monitor do NOT run the "memory profiling for more than a few seconds - say 30. If you don't you will have performance problems for sure.
I am using Coldfusion MX8 server and one of the scheduled task was running from 2 years but now suddenly from 01/12/2014 scheduled tasks are not running. When i browsed the file in browser then the file is running successfully without error.
I am not sure is there any updatation or license expiration problem. I am aware that mid of this year Adobe closed the support for coldfusion 8.
The first most common problem of this problem is external to the server. When you say you browsed to the file and it worked in a browser, it is very important to know if that test was performed on the server desktop. Knowing that you can browse to the file from your desktop or laptop is of small value.
The most common source of issues like this is a change in the DNS or network stack that is interfereing with resolution. For example, if the internal DNS serving your DMZ suddenly starts serving the "external" address - suddenly your server can't browse to your domain. Or if the IP served by the server for the domain in question goes from being 127.0.0.1 to some other IP that the server can't acces correctly due to reverse proxy or LB or some other rule. Finally, sometimes the Apache or IIS is altered so that an IP that previously was serviced (127.0.0.1 being the most common example) now does not respond.
If it is something intrinsic to the scheduler service then Frank's advice is pretty good - especially look for "proxy schduler" entries in the log - they can give you good clues. I would also log results of a scheduled task to a file. Then check the file. If it exists then your scheduled tasks ARE running - they are just not succeeding. Good luck!
I've seen the cf scheduling service crash in CF8. The rest of CF is unaffected.
Have you tried restarting the server?
Here are your concerns:
Your File (works since you tested it manually).
Your Scheduled Task (failed).
Your Coldfusion Application (Service) (any changes here)?
Your Server (what about here).
To test your problem create a duplicate task and schedule it. Leave the other one in place (maybe set your new one to run earlier). Use the same file too. See if it completes.
If it doesn't then you have a larger problem. Since the Coldfusion Server sits atop of the JVM there could be something happening there. Things just don't stop working unless something got corrupted or you got compromised. If you hardened your server by rearranging/renaming the file structure to make it more secure...It would break your task.
So going back: if your test schedule works then determine what is different between the two. Note you have logging capabilities. Logging abilities for CF8
If you are not directly incharge of maintaining this server, then I would recommend asking around and see if there was recent maintenance, if so, what was done to the server?
First of all please let me be clear that I am a windows user and very new to the web world. For the past months I have been learning both python and django, and it has been a great experience for me. Now I have somehow created a small project that I would like to deploy in the production server. Since django has its built-in development server there was no problem for me. But now that I have to deploy it to a production server I googled around and found Nginx + uWSGI or Nginx + Gunicorn as the best option for it. And as uWSGI and Gunicord are incompatible with Windows, I think I should adapt Ubuntu or other Unix system.
So my questions are:
Just to be clear, as I will have to work with one of the above, please explain to me why do I need two servers?
If I have to adapt the Ubuntu environment, do I have to learn Ubuntu shell scripting, SSH and other stuff? Or the hosting provider will help me do that?
Please let me be aware of what else do I need for the above concerned.
Thank you so much for your time and please pardon if my question was a lame question. Hoping for positive response answers.
A typical configuration involves two server processes (which can be run together on the same actual hardware or virtual server) so that the proxy server in front can buffer slow clients. For instance: a slow client will connect to nginx with a request. Nginx will pass the request on to Gunicorn and Gunicorn will respond. Nginx will then consume the Gunicorn response immediately, freeing up the Gunicorn resources right away. At that point, the slow client can take as much time as it wants to consume the response from Nginx without tying up much in the way of server resources. Alternatives to the two-server-process model are to use async workers with Gunicorn and put Gunicorn itself in front, or to use an async-sync combo like Waitress. Nginx in front has the added benefit of doubling as a ready-to-use statics server, though.
Note that "slow clients" can describe: mobile phones that lose their connection and leave the TCP socket hanging until timeout mid-request; mobile phones that are just slow; unreliable connections of all types; hostile denial-of-service clients who are deliberately trying to use server resources; sometimes any old connection that has a hiccup or malfunction for any reason. So this is a problem that will affect nearly any site.
You won't need shell scripting per se but getting used to Ubuntu will take some time. There is a lot to learn even outside of scripting, like how to use the package manager, how to configure packages once they're installed in ways that won't confound future updates, etc. And you will definitely have to learn to use SSH; it is one of the most fundamental server administration tools in the *nix world.
An alternative to learning to use Ubuntu or another server platform is to use a Platform-as-a-Service option like Heroku, as PaaS hosting providers really will take care of all of that stuff for you. I recommend this approach. That having been said, even though I think PaaS is a good option for people who want to focus on development and not server admin regardless of their level of skill, it's also true that a little bit of experience with Linux server platforms goes a long way in helping you to understand the environment that your code runs in. So even if you go with PaaS, you would still benefit from tinkering with Ubuntu a little (or a lot).
Another benefit from a PaaS is that normally their infrastructure handles the Nginx part of the deal (buffering of slow requests via proxy). This is the case with Heroku, for instance. So you won't have to worry about that part of the infrastructure at all.
This part of the question is too broad to answer, but let me know in the comments if you need clarification.
I'm doing it almoast like in this tutorial: http://michal.karzynski.pl/blog/2013/06/09/django-nginx-gunicorn-virtualenv-supervisor/
Nginx is my proxy to django app running on gunicorn and its serving statics, virtualenv for my python enviroment, supervisor to watch my app's running.
It's possible you will run in some error's if not using Postgresql, ask then I will help (used MySQL in the past now it's Postgresql)
Firstly, there's no need to use Ubuntu if you're happier with Windows. I don't know if nginx works on Windows, but I'd be very surprised if it doesn't (in fact, here are the nginx docs for installing on Windows). Apache, meanwhile, definitely does work on Windows. The Django documentation has a full explanation of how to set up Apache/mod_wsgi to serve Django.
You don't need two servers. I'm not sure why you think you do: the usual reason for that is to have the static assets on a separate server, but you don't mention that as a reason. Since you're only talking about a small site, though, you don't even need to do that. One server configured to serve both Django and the static assets will do fine. Again, the docs explain exactly how to do that.
I know that it's bad to use the Django web server in production. There's been at least one Stackoveflow question on this already.
But I'm wondering about where to draw the line between development and production? If I'm only allowing HTTP access to one (or a few) IP addresses, then I know I'm in development. What if I open it to all IP addresses, but only e-mail a couple friends to see what they think of what I've built?
As far as I can tell, the problems with using the Django server are:
It's single-threaded
Security
I don't think (1) is likely to be an issue if I'm only sharing it with a few people. For (2)--what's the worst-case scenario? Does it make a difference that I'm running on an Amazon EC2 server that I could very easily restart from a backup if something bad happened?
Well, the answer is very simple actually, you've left development when you have something you must protect: real user personal information, real data in your database that you'd be afraid to lose, etc.
Security isn't a concern until these things are present. The rule about not using the dev server in "production" is guidance, not mandatory. You can fire up the dev server in your production environment any time you want. However, you'd be silly to do so and then open up universal access to it, once your site is truly live and in use by the world.
Setting up mod_wsgi (or some other WSGI container) on a development machine takes all of 5 minutes, and can help you sort out deployment issues before you actually reach deployment. So really, why ever use the development server if you don't have to?
While building this web service and the app that calls it, we have noticed that the first call to the web service each day is extremely slow. It even will time out on some days. However, every call after that work great. Can anybody shed light on why this might be and how we can get rid of this pain?
Thanks in advance!
If it's an ASP.NET web service, it may be the CLR initializing and loading and verifying the assemblies for the first time. You may want to consider pre-compilation
Agree with the other answers on caching, initialization, etc. As far as a workaround, one possibility may be to set up some sort of daily task (SQL Server job, Windows service, something else?) to simulate a hit to the service each day, so that your users don't experience this first slow request.
If it is an ASP.NET web service, then you might want to check the settings of the application pool the web service is running in, especially the idle timeout which defaults to 20 minutes in IIS7.
Configuring IIS7 idle-timeout
Even if it is not an ASP.NET web service, other web servers will have equivalent configuration settings you have to tweak to keep your web service alive overnight.
Can you duplicate the same behavior on your database? It could just be the db needing to optimise the query for the first run (Maybe the parameter is today's date?).
Are there a lot of static constructors or set up code in the Global.asax class? Because IIS recycles worker processes periodically, the start up code may be running again.
The rule for optimization is: don't guess. Put in profiling to find out exactly what is slow, and then work to make that faster. Everything already posted provides excellent tips on where to start looking for slowness.