ColdFusion server crashing on hourly basis

ColdFusion server crashing on hourly basis - coldfusion

I am facing serious ColdFusion Server crashing issue. I have many live sites on that server so that is serious and urgent.
Following are the system specs:
Windows Server 2003 R2, Enterprise X64 Edition, Service Pack 2
ColdFusion (8,0,1,195765) Enterprise Edition
Following are the hardware specs:
Intel(R) Xeon(R) CPU E7320 #2.13 GHZ, 2.13 GHZ
31.9 GB of RAM
It is crashing on the hourly bases. Can somebody help me to find out the exact issue? I tried to find it through ColdFusion log files but i do not find anything over there. Every times when it crashes, i have to reset the ColdFusion services to get it back.
Edit1
When i saw the runtime log files "ColdFusion-out165.log" so i found following errors
error ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
04/18 16:19:44 error ROOT CAUSE:
java.lang.OutOfMemoryError: GC overhead limit exceeded
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Here are my current JVM settings:
As you can see my JVM setting are
Minimum JVM Heap Size (MB): 512
Maximum JVM Heap Size (MB): 1024
JVM Arguments
-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib
Note:- when i tried to increase Maximum JVM Heap size to 1536 and try to reset coldfusion services, it does not allow me to start them and give the following error.
"Windows could not start the ColdFusion MX Application Server on Local Computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to service-specific error code 2."
Should i not able to set my maximum heap size to 1.8 GB, because i am using 64 bit operating system. Isn't it?

How much memory you can give to your JVM is predicated on the bitness off your JVM, not your OS. Are you running a 64-bit CF install? It was an uncommon thing to do back in the CF8 days, so worth asking.
Basically the error is stating you're using too much RAM for how much you have available (which you know). I'd be having a look at how much stuff you're putting into session and application scope, and culling back stuff that's not necessary.
Objects in session scope are particularly bad: they have a far bigger footprint than one might think, and cause more trouble than they're worth.
I'd also look at how many inactive but not timed-out sessions you have, with a view to being far more agressive with your session time-outs.
Have a look at your queries, and get rid of any SELECT * you have, and cut them back to just the columns you need. Push dataprocessing back into the DB rather than doing it in CF.
Farm scheduled tasks off onto a different CF instance.
Are you doing anything with large files? Either reading and processing them, or serving them via <cfcontent>? That can chew memory very quickly.
Are all your function-local variables in CFCs properly VARed? Especially ones in CFCs which end up in shared scopes.
Do you accidentally have debugging switched on?
Are you making heavy use of custom tags or files called in with <cfmodule>? I have heard apocryphyal stories of custom tags causing memory leaks.
Get hold of Mike Brunt or Charlie Arehart to have a look at your server config / app (they will obviously charge consultancy fees).
I will update this as I think of more things to look out for.

Turn on ColdFusion monitor in the administrator. Use it to observe behavior. Find long running processes and errors.
Also, make sure that memory monitoring is turned off in the ColdFusion Server Monitor. That will bring down a production server easily.

#Adil,
I have same kind of issue but it wasn't crashing it but CPU usage going high upto 100%, not sure it relevant to your issue but atleast worth to look.
See question at below URL:
Strange JRUN issue. JRUN eating up 50% of memory for every two hours
My blog entry for this
http://www.thecfguy.com/post.cfm/strange-coldfusion-issue-jrun-eating-up-to-50-of-cpu
For me it was high traffic site and storing client variables in registry which was making thing going wrong.
hope this help.

Related

How to fix a ColdFusion app that runs much slower with sandbox security on?

Recently we upgraded to ColdFusion 11 Enterprise and noticed that the full-fledged sandbox security tends to have a way bigger overhead than the Standard edition (CF10).
What can one do to make an existing CF app perform well with sandbox security?

Here are my findings so far:
install VisualVM by adding -Dcom.sun.management.jmxremote.port=8701 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false to CF Admin's JVM Arguments. Learn how to use it and pay special attention to the CPU Snapshot & Hotspot tab. http://boncode.blogspot.ca/2010/04/cf-java-using-free-visualvm-tool-to.html. FYI CF Server Monitor in the Enterprise edition is utterly useless because its memory/performance profiling overhead is way too big to be viable for a live production server, and it doesn't perform well under load to give you any useful data of what could be going wrong.
Disable IPv6, and add [serverip] [serverip] to the OS's hostfile to speed up the default DNS reverse proxy lookup on creating new physical DB connection by Security Manager. See: On Linux, Java issues reverse DNS lookups when a socket is opened. Why, and how can I stop it? (FYI, Windows is affected to)
remove as much <cfmodule> and <cfinclude> as possible as they will end up with many java.io.File.canRead() and java.io.File.exists() which will stress the disk IO under load. Even SSD suffers under load. I have tried Trusted Cache and it does not help. Instead, try using cached CFC's in application scope and make sure the code are thread safe and local-var'ed.
eliminate the use of <cfinterface>, inheritance with extends, and getMetaData() as much as possible as they will eventually calls java.io.File.lastModified() which will stress the disk IO under load. Bug?
eliminate the use of access="package" as it will end up with many java.security.AccessController.checkPermission calls.
less objects per request the better, as each object instantiation has a higher cost with the extra java.security.AccessController.checkPermission call.

Why are my coldfusion soap webservices 10 times slower in production vs development?

UPDATE
It appears this issue is caused by a bug related specifically to using Axis2 with ColdFusion and we have been able
to replicate the issue in our production environment on two different servers by
switching between Axis1 and Axis2. My original tests to compare the
two were apparently thwarted by an override in an Application.cfc
which forced Axis2.
We ran into a memory leak today which forced us to speed up the resolution to this issue. It resembled the leak
discussed here though we aren't sure if it is the exact same
problem
(https://www.hass.de/content/coldfusion-10-webservice-leaking-memory-trusted-cache-leaks-memory).
Our primary webservices are in Axis1 and we only switched to Axis2 for
this new set of webservices because we needed document literal style
for SalesForce and with Axis1 an invalid wsdl was being created (did
not properly describe all object types in arrays). So now we have it as
Axis1 and using a manually manipulated wsdl. Not entirely sure if it
will work out with SalesForce but as far as a general fix this works.
I am investigating an issue with our coldfusion based soap webservices in our production environment. It appears that the time between the return statement in the webservices method code and actually receiving a response can be significant and appears to directly correspond to the size of the response and/or number of objects.
In development a particular request that returns 1000 records takes about 6 seconds to return. However in production that same hit takes 50+ seconds to return. I added some timing code and found that the actual function code takes less than 1 second to run at the start of the request, meaning that generating the response is taking coldfusion about 50 seconds in production. Hitting the webservice with simple http request does not have the same slowness so seems to be soap/axis specific. The resulting xml is about 1MB which I have compared and found no differences. I also copied out settings from cfadmin in both environments to compare and could find no performance related setting differences.
Both environments are at the same CF 10 update level. The server monitor shows no significant memory usage. I also ran the request from in the server to make sure there wasn't some slow connection issues or https slowing things down but the results are the same.
Any suggestions or solution would be appreciated.
Additional notes...
CPU sits at about 17% for most of the time of the request which is a lot of work to be doing. Something is happening very inefficiently
I tried switching instance to Axis1 and back again followed by an instance restart and additional tests with no change in results

One possibility is that you have them throttled - check the "request tuning" in your CF administrator. By default the setting for "number of simultaneous web service requests" is 10. Are you looping and hitting the server? In production is there more traffic?
In server monitor enable profiling and monitoring, then click on "statistics". On the far right there is a little chart icon. click on it and you will see a chart and a counter legend in the top right. Then run your code. Does the "web services running" reach a threshold and cross into "web services queued" - if so you need to increase that threshold.
One more clue - in the server monitor do NOT run the "memory profiling for more than a few seconds - say 30. If you don't you will have performance problems for sure.

What could be causing seemingly random AWS EC2 server to Crash? (Error couldn't establish database connection)

To begin, I am running a Wordpress site on an AWS EC2 Ubuntu Micro instance. I have already confirmed that this is NOT an error with Wordpress/mysql.
Seemingly at random the site will go down and I'll get the "Error establishing database connection" message. The server says that it is running just fine, and rebooting usually fixes the issue, however I'd like to figure out the cause and resolve the issue so this can stop happening (it's been the past 2 weeks now that it goes down almost every other day.)
It's not a spike in traffic, or at least Google Analytics hasn't shown the site as having any spikes in traffic (it averages about 300 visits per day.)
What's the cause, and how can this be fixed?

Sounds like you might be running into the throttling that is a limitation on t1.micro. If you use too much CPU cycles you will be throttled.
See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html#available-cpu-resources-during-spikes

The next time this happens I would check some general stats on the health of the instance. You can get a feel for the high-level health of the instance using the 'top' command (http://linuxaria.com/howto/understanding-the-top-command-on-linux?lang=en). Be sure to look for CPU and memory usage. You may find a process (pid) that is consuming a lot of resources and starving your app.
More likely, something within your application (how did you come to the conclusion that this is not a Wordpress/MySQL issue?) is going out of control. Possibly there is a database connection not being released? To see what your app is doing, find the process id (pid) for your app:
ps aux | grep "php"
and get a thread dump for that process: kill -3 to get java thread dump. This will help you see where your application's threads are stuck (if they are).
Typically it's good practice to execute two thread dumps a few seconds apart and compare trends in both. If there is an issue in the application, you should see a lot of threads stuck at the same point.
You might also want to checkout what MySQL is seeing (https://dev.mysql.com/doc/refman/5.1/en/show-processlist.html).
mysql> SHOW FULL PROCESSLIST
Hope this helps, let us know what you find!

Django mod_wsgi Hosting Server Requirements

I have normal Content Management Website developed in Django. My Client has a server with 256 MB RAM. He wants to deploy this site in wsgi mode. 256 MB RAM is sufficient or not?
I don't have any knowledge about Server RAM requirements and all. Any help will be appreciated
I have gone through this doc of wsgi
But it doesn't have any info about system Specifications.
What is the minmum RAM needed for running a Django application in wsgi mode?

How much memory you need depends on how many instances of the web application you intend to run at the same time. How many you need is going to be dictated by factors such as whether your code base is thread safe and so whether you can run it in a multithreaded configuration, or whether you will have to run a multi process configuration with single threaded processes.
So, do you even know how much memory one instance (process) uses when it is running your application?
The underlying web server has very little to do with memory used, because your application is going to totally dwarf how much memory the web server uses.
Some quick tips.
Don't use embedded mode of mod_wsgi, use daemon mode.
Go watch my PyCon US 2012 talk. http://lanyrd.com/2012/pycon/spcdg/
Do some monitoring of your web application to determine how much memory it uses.
Get an idea of what traffic volumes you need to handle.
Only once you have some real data about your applications memory requirements, how much load you need to handle and the response times of your application will you be able to work out the configuration and how much memory you need.

what is the operating system?
how many connections are needed?
what is the traffic that it needs to handle?
256MB does not seem realistic at first for CMS type of workload unless there is very little traffic and the operating system is striped down to the minimum needed.
Here is some data:
http://nichol.as/benchmark-of-python-web-servers

VisualVM and Coldfusion 8: why no memory sampling available?

We are trying to use VisualVM to track down some memory leakage in CF8, however, cannot get the tool to work 100%. Basically, everything comes up, except the Memory sampling. Says that the "JVM is not supported".
However, all the other features work (we can do CPU sampling, just not memory). Found this kind of weird that we can do everything else but the memory stuff, so am wondering if maybe we need to specify another JVM argument to allow this?
Some other info:
We are connecting locally via 127.0.0.1 or localhost.
I installed the Visual GC plugin, and it cannot connect either.
VisualVM and JRUN/CF8 are both using the same Java version (1.6.0_31), however, they are not pulled from the same location (maybe this matters). VisualVM uses the installed JDK, whereas JURN/CF8 uses just the binaries that we copied locally to the CF8 installation folder.
Installed another plugin that shows JVM properties, and it says that the JVM is not "attachable". Don't know what that means, but am just wanting to mention it.
Any help with this would be greatly appreciated. If we can just get that memory sampling, I think we can get on top of our performance issues that have plagued us here recently. Thanks in advance!
EDIT:
Also, just checked, and JRUN is being started under "administrator", whereas I am launching VisualVM under a different user. Maybe this is relevant?

Yes, it is relevant that you are running VisualVM under different user. Memory Sampling uses Attach API, which only works if you are running monitored application and VisualVM as the same user. This is also reason that the JVM properties reports that your application is not attachable. If you run VisualVM as "administrator", it will automatically detect your Coldfusion 8 application and the Memory sampler will work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js