Is it possible that a database (connected to ColdFusion 9 via a datasource connection) being unavailable could cause ColdFusion to become unresponsive? (The database is used for a singular one-off lightly-trafficked app.)
Recently, maintenance on a connected Oracle database (oracle jdbc) has caused that database to be unavailable two different times. Coincidentally, at both these times, ColdFusion pages on our site became unavailable or terribly slow to load (static HTML pages seemed to load fine, for the most part). Restarting the ColdFusion application server service would fix the problem, but only for minutes. The first time, during a time the application server was responsive, we unchecked the "Maintain connections" checkbox. I'm not sure this had any effect, then shortly after the Oracle database came back online, and we didn't seem to have the problem any more.
The second time that database was offline, we experienced a very similar issue with our website - ColdFusion pages becoming reaaaally slow or unavailable altogether. During one of the times when I could access the CF administrator, I updated the datasource and checked "Disable connections". Then I stopped and restarted both the CF ODBC agent and ODBC server services. After that, the problem seemed to stop, but I don't know enough to know if this is causation or coincidence.
Anyone have insights on this?
Server setup: Windows Server 2003 SP2, ColdFusion 9, IIS 6
There are a number of ways to slow a database to a crawl if not stop it completely. If you have hackers for example attacking your database through Port 1433 with attempted logins several times a second that can slow it down and if they get in they can of course do whatever they want. When this happened to me I found a record of attacks in the Event logs; the solution is better network security intercepting such attacks and never letting them actually talk to the database. Or say if your site is vulnerable to SQL injection attacks hackers could be messing with your database that way too but network security wouldn't necessarily work in that case. It doesn't require hackers to degrade the performance of your database however, you could be having a problem with allocated disk space for transaction logs or indexes filling up, or heaven forbid an imminent hardware failure showing early symptoms. You're backing up your database often I hope, off the server. To answer your question yes ColdFusion can and will become unresponsive when pages are called that call the database, and will usually display error messages when the database finally times out and never sends the requested data to ColdFusion. You can protect against that to some extent with CFTRY tags around your queries that display clean and polite error messages instead of ColdFusion's ugly ones if the database fails to return data, at least your site continues to look professional that way. One project I worked used a shared SQL Server database that often got overloaded and slowed down terribly and there was nothing I could do about improving that situation. What I did to keep the site functioning was to maintain a DB backup in the form of a MS Access database (yeah it was inappropriate but it worked when SQL Server wouldn't) and anytime SQL Server failed I had the application set up to automatically use code that called the Access database instead.
These are some ideas for you to think about if you are continuing to have problems, I see nobody's even tried to answer your question in the last six months and that's kinda been my experience with the quality of assistance this site has offered me too. I hope my thoughts can be of some use to you.
Related
I had installed ColdFusion 2018 recently and with the installation less than a month old (and my understanding of the technology even less), my Cold Fusion service has stopped working. I have tried a number of things and have referred to a number of articles and out of many such errors where the service is not being accessible, some of them were able to get it resolved. However, some other obscure reason that may be causing this error have been untouched and unknown.
Whenever, I try to restart the service, I get an error as shown below:
Windows could not start the ColdFusion 8 Application Server on Local Computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to server-specific code error 2.”
Without much understanding, I started to google it out. Looking into every one of these posts, I tried
Configure JRE and try to relaunch the service by looking at "JAVA_HOME" variable and JVM.config
Run the batch files in every possible combination to find if anything clicks
Check if the present JAVA version works and is compatible with Coldfusion version installed
Fiddling with the "SessionStorage" var in neo-runtime.xml file as some suggested
and few other tricks coupled with a numerous service restart attempts and a few machine reboots as well.
A service that renders Cold Fusion pages should be shut down abruptly. To add to agony, the CF Admin also depends on the service and hence does not work.
Any pointers to any potential solutions?
I have an app on heroku, running on Puma:
workers 2
threads_count 3
pool 5
It looks like some requests get stuck in the middleware, and it makes the app very slow (VERY!).
I have seen other people threads about this problem but no solution so far.
Please let me know if you have any hint.
!
!
I work for Heroku support and Middleware/Rack/ActiveRecord::QueryCache#call is a commonly reported as a problem by New Relic. Unfortunately, it's usually a red herring as each time the source of the problem lies elsewhere.
QueryCache is where Rails first tries to check out a connection for use, so any problems with a connection will show up here as a request getting 'stuck' waiting. This doesn't mean the database server is out of connections necessarily (if you have Librato charts for Postgres they will show this). It likely means something is causing certain database connections to enter a bad state, and new requests for a connection are waiting. This can occur in older versions of Puma where multiple threads are used and the reaping_frequency is set - if some connections get into a bad state and the others are reaped this will cause problems.
Some high-level suggestions are as follows:
Upgrade Ruby & Puma
If using the rack-timeout gem, upgrade that too
These upgrades often help. If not, there are other options to look into such as switching from threads to worker based processes or using a Postgres connection pool such as PgBouncer. We have more suggestions on configuring concurrent web servers for use with Postgres here: https://devcenter.heroku.com/articles/concurrency-and-database-connections
I will answer my own question:
I simply had to check all queries to my DB. One of them was taking a VERY long time, and even if it was not requested often, it would slow down the whole server for quite some time afterwards(even after the process was done, there was a sort of "traffic jam" on the server).
Solution:
Check all the queries to your database, fix the slowest ones (it might simply mean breaking it down in few steps, it might mean make it run at night when there is no traffic, etc...).
Once this queries are fixed, everything should go back to normal.
I recently started seeing a spike in time spent in ActiveRecord::QueryCache#call. After looking at the source, I decided to try clearing said cache using ActiveRecord::Base.connection.clear_query_cache from a Rails Console attached to the production environment. The error I got back was PG::ConnectionBad: could not fork new process for connection: Cannot allocate memory which lead me to this other SO question at least Heroku Rails could not fork new process for connection: Cannot allocate memory
UPDATE
It appears this issue is caused by a bug related specifically to using Axis2 with ColdFusion and we have been able
to replicate the issue in our production environment on two different servers by
switching between Axis1 and Axis2. My original tests to compare the
two were apparently thwarted by an override in an Application.cfc
which forced Axis2.
We ran into a memory leak today which forced us to speed up the resolution to this issue. It resembled the leak
discussed here though we aren't sure if it is the exact same
problem
(https://www.hass.de/content/coldfusion-10-webservice-leaking-memory-trusted-cache-leaks-memory).
Our primary webservices are in Axis1 and we only switched to Axis2 for
this new set of webservices because we needed document literal style
for SalesForce and with Axis1 an invalid wsdl was being created (did
not properly describe all object types in arrays). So now we have it as
Axis1 and using a manually manipulated wsdl. Not entirely sure if it
will work out with SalesForce but as far as a general fix this works.
I am investigating an issue with our coldfusion based soap webservices in our production environment. It appears that the time between the return statement in the webservices method code and actually receiving a response can be significant and appears to directly correspond to the size of the response and/or number of objects.
In development a particular request that returns 1000 records takes about 6 seconds to return. However in production that same hit takes 50+ seconds to return. I added some timing code and found that the actual function code takes less than 1 second to run at the start of the request, meaning that generating the response is taking coldfusion about 50 seconds in production. Hitting the webservice with simple http request does not have the same slowness so seems to be soap/axis specific. The resulting xml is about 1MB which I have compared and found no differences. I also copied out settings from cfadmin in both environments to compare and could find no performance related setting differences.
Both environments are at the same CF 10 update level. The server monitor shows no significant memory usage. I also ran the request from in the server to make sure there wasn't some slow connection issues or https slowing things down but the results are the same.
Any suggestions or solution would be appreciated.
Additional notes...
CPU sits at about 17% for most of the time of the request which is a lot of work to be doing. Something is happening very inefficiently
I tried switching instance to Axis1 and back again followed by an instance restart and additional tests with no change in results
One possibility is that you have them throttled - check the "request tuning" in your CF administrator. By default the setting for "number of simultaneous web service requests" is 10. Are you looping and hitting the server? In production is there more traffic?
In server monitor enable profiling and monitoring, then click on "statistics". On the far right there is a little chart icon. click on it and you will see a chart and a counter legend in the top right. Then run your code. Does the "web services running" reach a threshold and cross into "web services queued" - if so you need to increase that threshold.
One more clue - in the server monitor do NOT run the "memory profiling for more than a few seconds - say 30. If you don't you will have performance problems for sure.
I am using Coldfusion MX8 server and one of the scheduled task was running from 2 years but now suddenly from 01/12/2014 scheduled tasks are not running. When i browsed the file in browser then the file is running successfully without error.
I am not sure is there any updatation or license expiration problem. I am aware that mid of this year Adobe closed the support for coldfusion 8.
The first most common problem of this problem is external to the server. When you say you browsed to the file and it worked in a browser, it is very important to know if that test was performed on the server desktop. Knowing that you can browse to the file from your desktop or laptop is of small value.
The most common source of issues like this is a change in the DNS or network stack that is interfereing with resolution. For example, if the internal DNS serving your DMZ suddenly starts serving the "external" address - suddenly your server can't browse to your domain. Or if the IP served by the server for the domain in question goes from being 127.0.0.1 to some other IP that the server can't acces correctly due to reverse proxy or LB or some other rule. Finally, sometimes the Apache or IIS is altered so that an IP that previously was serviced (127.0.0.1 being the most common example) now does not respond.
If it is something intrinsic to the scheduler service then Frank's advice is pretty good - especially look for "proxy schduler" entries in the log - they can give you good clues. I would also log results of a scheduled task to a file. Then check the file. If it exists then your scheduled tasks ARE running - they are just not succeeding. Good luck!
I've seen the cf scheduling service crash in CF8. The rest of CF is unaffected.
Have you tried restarting the server?
Here are your concerns:
Your File (works since you tested it manually).
Your Scheduled Task (failed).
Your Coldfusion Application (Service) (any changes here)?
Your Server (what about here).
To test your problem create a duplicate task and schedule it. Leave the other one in place (maybe set your new one to run earlier). Use the same file too. See if it completes.
If it doesn't then you have a larger problem. Since the Coldfusion Server sits atop of the JVM there could be something happening there. Things just don't stop working unless something got corrupted or you got compromised. If you hardened your server by rearranging/renaming the file structure to make it more secure...It would break your task.
So going back: if your test schedule works then determine what is different between the two. Note you have logging capabilities. Logging abilities for CF8
If you are not directly incharge of maintaining this server, then I would recommend asking around and see if there was recent maintenance, if so, what was done to the server?
We are using Django 1.3.1 and Postgres 9.1
I have a view which just fires multiple selects to get data from the database.
In Django documents it is mentioned, that when a request is completed then ROLLBACK is issued if only select statements were fired during a call to a view. But, I am seeing lot of "idle in transaction" in the log, especially when I have more than 200 requests. I don't see any commit or rollback statements in the postgres log.
What could be the problem? How should I handle this issue?
First, I would check out the related post What does it mean when a PostgreSQL process is “idle in transaction”? which covers some related ground.
One cause of "Idle in transaction" can be developers or sysadmins who
have entered "BEGIN;" in psql and forgot to "commit" or "rollback".
I've been there. :)
However, you mentioned your problem is related to have a lot of
concurrent connections. It sounds like investigating the "locks" tip
from the post above may be helpful to you.
A couple more suggestions: this problem may be secondary. The primary
problem might be that 200 connections is more than your hardware and
tuning can comfortably handle, so everything gets slow, and when things
get slow, more things are waiting for other things to finish.
If you don't have a reverse proxy like Nginx in front of your web app,
considering adding one. It can run on the same host without additional
hardware. The reverse proxy will serve to regulate the number of
connections to the backend Django web server, and thus the number of
database connections-- I've been here before with having too many
database connections and this is how I solved it!
With Apache's prefork model, there is 1=1 correspondence between the
number of Apache workers and the number of database connections,
assuming something like Apache::DBI is in use. Imagine someone connects
to the web server over a slow connection. The web and database server
take care of the request relatively quickly, but then the request is
held open on the web server unnecessarily long as the content is
dribbled back to the client. Meanwhile, the database connection slot is
tied up.
By adding a reverse proxy, the backend server can quickly delivery a
repliy back to the reverse proxy and then free the backend worker and
database slot.. The reverse proxy is then responsible for getting the
content back to the client, possibly holding open it's own connection
for longer. You may have 200 connections to the reverse proxy up front,
but you'll need far fewer workers and db slots on the backend.
If you graph the db slots with MRTG or similar, you'll see how many
slots you are actually using, and can tune down max_connections in
PostgreSQL, freeing those resources for other things.
You might also look at pg_top to
help monitor what your database is up to.
I understand this is an older question, but this article may describe the problem of idle transactions in django.
Essentially, Django's TransactionMiddleware will not explicitly COMMIT a transaction if it is not marked dirty (usually triggered by writing data). Yet, it still BEGINs a transaction for all queries even if they're read only. So, pg is left waiting to see if any more commands are coming and you get idle transactions.
The linked article shows a small modification to the transaction middleware to always commit (basically remove the condition that checks if the transaction is_dirty). I'll be trying this fix in a production environment shortly.