I've got CF10 running on a dev box, Windows 7, 64 bit.
Periodically, every minute or so, the CPU usage for CF10 will spike up to 100% for about 20 seconds and come back down. It's pretty regular.
I've found it difficult to diagnose this issue. I've seen talk of client variables purges, logging, monitoring and all manner of things - but I've turned these all off to no avail.
With VisualVM, I've managed to track the issue down to the 'scheduler' threads. I have 5 of these in a waiting state. Periodically each will run, bumping up the CPU dramatically.
Taking a thread dump, it seems that all these threads are calling java.io.WinNTFileSystem.getBooleanAttributes - something I've seen mentioned a few times as potentially problematic.
UPDATE: Recently I've been playing with onSessionEnd on another app, and discovered that the scheduler-x threads appear to be internal to ColdFusion - my onSessionEnd tasks always seem to run in one of these threads.
Looking in the temp folder, I can see that a lot of EH Cache folders have been made which I think are to do with query caching. The apps I have running make use of this fairly extensively. I thought clearing the temp folder out might improve performance but it has had no effect.
It's worth noting that if I start the CF service without actually calling any of my apps, the problem does not occur. That might suggest the issue is with the apps themselves, however they do not cause any issue in production - only on this box.
There are no scheduled tasks set up either.
Below is an example of one of the threads causing high CPU. I'd appreciate any help in diagnosing what this thread is doing and why, as well as how to potentially stop it from using so much resources.
"scheduler-2" - Thread t#84
java.lang.Thread.State: RUNNABLE
at java.io.WinNTFileSystem.getBooleanAttributes(Native Method)
at java.io.File.isDirectory(File.java:849)
at coldfusion.watch.Watcher.accept(Watcher.java:352)
at java.io.File.listFiles(File.java:1252)
at coldfusion.watch.Watcher.getFiles(Watcher.java:386)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.checkWatchedDirectories(Watcher.java:166)
at coldfusion.watch.Watcher.run(Watcher.java:216)
at coldfusion.scheduling.ThreadPool.run(ThreadPool.java:211)
at coldfusion.scheduling.WorkerThread.run(WorkerThread.java:71)
My environment:
Win 7 64-bit
CF10 Update 12
JDK 1.8.0_11
The issue occurs on multiple versions of JVM - this version is currently used to make monitoring available.
My java settings:
Min heap size: 512mb
Max heap size: 1024mb
-server -XX:MaxPermSize=512m -XX:+UseParallelGC -Xbatch -Dcoldfusion.home={application.home} -Dcoldfusion.rootDir={application.home} -Dcoldfusion.libPath={application.home}/lib -Dorg.apache.coyote.USE_CUSTOM_STATUS_MSG_IN_HEADER=true -Dcoldfusion.jsafe.defaultalgo=FIPS186Random -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8701 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
I'd be lying if I said I understood what all of these settings do!
Sorry if you're one of those people that believes all CF developers should be Java app stack experts. I am not.
Any help, much appreciated. ;)
Using FusionReactor 6, I was able to solve this for us today. We were using this.javaSettings to hot load java class files. The WatchInterval from this.javaSettings uses the DirectoryWatcher at the specified watch number. In our case, I had lowered it to one second.
How I solved it: I set a breakpoint in FusionReactor and could see that it was constantly stuck scanning the directory above the one I specified in this.javasettings. This directory has enough files and subfolders, that it looks like one DirectoryWatcher was unable to finish before the next one was created. Had ColdFusion just stuck to the subfolder, I specified in this.javaSettings, it would not have been a problem.
Example:
This.javaSettings = {
loadPaths = ["\externals\lib\"]
, loadColdFusionClassPath = true
, reloadOnChange = true
, watchInterval = 1
};
In the above case, lib has just 5 files. However, "externals" is loaded with stuff. In the breakpoint, it was typically looking at stuff in "externals."
Do you have scheduled tasks running that use the CFFILE tag? They tend to be resource hogs. Spinning these into their own threads may help with the CPU spike.
another thought:
looking at the JVM,
•Min heap size: 512mb
•Max heap size: 1024mb
These establish the minimum and maximum memory available to the java virtual machine
-server -XX:MaxPermSize=512m
This is the amount of memory dedicated to the java permanent memory generation.
you've got half of your JVM allocated memory dedicated to the permanent generation, try bumping up the maximum heap size to 2048mb. and restarting the ColdFusion service. It could go higher based on whether or not you're running a 64Bit operating system or not.
Related
I have a .net Core web service which seems to slowly increase its cpu usage.
meaning at the first day it won't go past 10%, the second day it can go up to 20% and so on.
Using the TOP command in linux, all my webservices seems to sometime be shown there (probably when a request is made) and afterward disappear.
This specific process after running for a while just stays there constantly consuming cpu even when no request has been made.
the API still working fine, it seems like there are some threads that just keeps hanging and consuming cpu. last time I checked I had 5 threads that consumed 3-4% cpu and didn't die for some reason.
My guess is that in some specific scenario a thread just stays alive consuming cpu.
The app runs on ubuntu machine, my first step was trying to create a dump file with ProcDump so I can analyze those threads and maybe find where they are hanging.
ProcDump generates a huge 21gb file, which trying to analyze with lldb throws out of memory exception. even tried transferring it to a windows machine to debug with windbg , no help there as it couldn't open the file.
As there is no specific exception or anything I can't really share any piece of code as I have no idea where the issue is... just kind a hoping for some suggestion that might help me get to a solution or at least understand where the problem is.
Thanks a lot for reading, cheers
You could try using something like jetBrains’ DotMemory, they also have a fairly high level but helpful guide https://www.jetbrains.com/help/dotmemory/How_to_Find_a_Memory_Leak.html it also worth checking your startup file and double checking the services you’ve registered are used in the correct way ie not added as scoped when they should be transient or even a singleton etc
so iv'e been at it for a while.
Eventually found out that my problem was with HttpClient
Probably some bad mix of static class and creating new instances of HttpClient that causes the issue Iv'e explained above.
Solved it by utilizing HttpClientFactory as explained here -
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-2.1
Lesson learned :)
A little late but Procdump for Linux just added .NET Core 3 support that generates much more managable sized core dumps. It automatically detects if the target process is .NET Core and does the right thing (i.e., no need to specify switches).
my first question on stack,
I'm running cf10 Enterprise on windows 2003 server AMD Opteron 2.30 Ghz with 4gb ram. Im using cfindex action = update to index over 1k pdfs
I'm getting jvm memory errors and the page is being killed when its run as a scheduled task in the early hours of the morning.
This is the code in the page :
cfindex collection= "pdfs" action = "update" type= "path" extensions = ".pdf" recurse="yes" urlpath="/site/files/" key="D:\Inetpub\wwwroot\site\files"
JVM.config contents
java.home=s:\ColdFusion10\jre
application.home=s:\ColdFusion10\cfusion
java.args=-server -Xms256m -Xmx1024m -XX:MaxPermSize=192m -XX:+UseParallelGC -Xbatch -Dcoldfusion.home={application.home} -Dcoldfusion.rootDir={application.home} -Dcoldfusion.libPath={application.home}/lib -Dorg.apache.coyote.USE_CUSTOM_STATUS_MSG_IN_HEADER=true -Dcoldfusion.jsafe.defaultalgo=FIPS186Random -Dcoldfusion.classPath={application.home}/lib/updates,{application.home}/lib,{application.home}/lib/axis2,{application.home}/gateway/lib/,{application.home}/wwwroot/WEB-INF/flex/jars,{application.home}/wwwroot/WEB-INF/cfform/jars
java.library.path={application.home}/lib,{application.home}/jintegra/bin,{application.home}/jintegra/bin/international,{application.home}/lib/oosdk/classes/win
java.class.path={application.home}/lib/oosdk/lib,{application.home}/lib/oosdk/classes
Ive also tried going higher than 1024mb for -Xmx however cf would not restart until i tokk it back to 1024mb
Could it be a rogue pdf or do i need more ram on the server ?
Thanks in advance
I would say you probably need more RAM. CF10 64 bit with 4Gigs of RAM is pretty paltry. As an experiment why don't you try indexing half of the files. Then try the other half (or divide them up however appropriate). If in each case the process completes and mem use remains normal or recovers to normal then there is your answer. You have ceiling on RAM.
Meanwhile, more information would be helpful. Can you post your JVM settings (the contents of your jvm.config file). If you are using the default heap size (512megs) then you may have room (not much but a little) to increase. Keep in mind it is the max heap size and not the size of physical RAM that constrains your CF engine - though obviously your heap must run within said RAM.
Also keep in mind taht Solr runs in it's own jvm with it's own settings. Chekc out this post for information on that - though I suspect it is your CF heap that is being overrun.
I am using an old version of a metastorm workflow designer.
We support this while we rewriting it in Microsoft Technologies.
After a few changes the "MAP" (*.epc) has become exceedingly slow to work with and "PUBLISH".
The publish writes the map and its binaries to the DB which then a service will pick up and execute.
However the publish "hangs" never completing and taking from a completion time of 15 min to in excess of 3 hours but still not completing.
I can see the CPU is being hammered but memory seems fine.
I ran process monitor but it does not show me much which leads me to believe the process is doing something either than the norm or the map has grown to a point which is leading it to destruction.
My question: How else can I profile this black box exe?
I have an application that mmaps a large number of files. 3000+ or so. It also uses about 75 worker threads. The application is written in a mix of Java and C++, with the Java server code calling out to C++ via JNI.
It frequently, though not predictably, runs out of file descriptors. I have upped the limits in /etc/security/limits.conf to:
* hard nofile 131072
/proc/sys/fs/file-max is 101752. The system is a Linode VPS running Ubuntu 8.04 LTS with kernel 2.6.35.4.
Opens fail from both the Java and C++ bits of the code after a certain point. Netstat doesn't show a large number of open sockets ("netstat -n | wc -l" is under 500). The number of open files in either lsof or /proc/{pid}/fd are the about expected 2000-5000.
This has had me grasping at straws for a few weeks (not constantly, but in flashes of fear and loathing every time I start getting notifications of things going boom).
There are a couple other loose threads that have me wondering if they offer any insight:
Since the process has about 75 threads, if the mmaped files were somehow taking up one file descriptor per thread, then the numbers add up. That said, doing a recursive count on the things in /proc/{pid}/tasks/*/fd currently lists 215575 fds, so it would seem that it should be already hitting the limits and it's not, so that seems unlikely.
Apache + Passenger are also running on the same box, and come in second for the largest number of file descriptors, but even with children none of those processes weigh in at over 10k descriptors.
I'm unsure where to go from there. Obviously something's making the app hit its limits, but I'm completely blank for what to check next. Any thoughts?
So, from all I can tell, this appears to have been an issue specific to Ubuntu 8.04. After upgrading to 10.04, after one month, there hasn't been a single instance of this problem. The configuration didn't change, so I'm lead to believe that this must have been a kernel bug.
your setup uses a huge chunk of code that may be guilty of leaking too; the JVM. Maybe you can switch between the sun and the opensource jvms as a way to check if that code is not by chance guilty. Also there are different garbage collector strategies available for the jvm. Using a different one or different sizes will cause more or less garbage collects (which in java includes the closing of a descriptor).
I know its kinda far fetched, but it seems like all the other options you already followed ;)
For the last few years we've been randomly seeing this message in the output logs when running scheduled tasks in ColdFusion:
Recursion too deep; the stack overflowed.
The code inside the task that is being called can vary, but in this case it's VERY simple code that does nothing but reset a counter in the database and then send me an email to tell me it was successful. But I've seen it happen with all kinds of code, so I'm pretty sure it's not the code that's causing this problem.
It even has an empty application.cfm/cfc to block any other code being called.
The only other time we see this is when we are restarting CF and we are attempting to view a page before the service has fully started.
The error rarely happens, but now we have some rather critical scheduled tasks that cause issues if they don't run. (Hence I'm posting here for help)
Memory usage is fine. The task that ran just before it reported over 80% free memory. Monitoring memory through the night doesn't show any out-of-the-ordinary spikes. The machine has 4 gigs of memory and nothing else running on it but the OS and CF. We recently tried to reinstall CF to resolve the problem, but it did not help. It happens on several of our other servers as well.
This is an internal server, so usage at 3am should be nonexistent. There are no other scheduled tasks being run at that time.
We've been seeing this on our CF7, CF8, and CF9 boxes (fully patched).
The current box in question info:
CF version: 9,0,1,274733
Edition: Enterprise
OS: Windows 2003 Server
Java Version: 1.6.0_17
Min JVM Heap: 1024
Max JVM Heap: 1024
Min Perm Size: 64m
Max Perm Size: 384m
Server memory: 4gb
Quad core machine that rarely sees more than 5% CPU usage
JVM settings:
-server -Dsun.io.useCanonCaches=false -XX:PermSize=64m -XX:MaxPermSize=384m -XX:+UseParallelGC -XX:+AggressiveHeap -Dcoldfusion.rootDir={application.home}/../
-Dcoldfusion.libPath={application.home}/../lib
-Doracle.jdbc.V8Compatible=true
Here is the incredible complex code that failed to run last night, but has been running for years, and will most likely run tomorrow:
<cfquery datasource="common_app">
update import_counters
set current_count = 0
</cfquery>
<cfmail subject="Counters reset" to="my#email.com" from="my#email.com"></cfmail>
If I missed anything let me know. Thank you!
We had this issue for a while after our server was upgraded to ColdFusion 9. The fix seems to be in this technote from Adobe on jRun 4: http://kb2.adobe.com/cps/950/950218dc.html
You probably need to make some adjustments to permissions as noted in the technote.
Have you tried reducing the size of your heap from 1024 to say 800 something. You say there is over 80% of memory left available so if possible I would look at reducing the max.
Is it a 32 or 64 bits OS? When assigning the heap space you have to take into consideration all the overhead of the JVM (stack, libraries, etc.) so that you don't go over the OS limit for the process.
what you could try is to set Minimum JVM Heap Size to the same as your Maximum JVM Heap Size (MB) with in your CF administrator.
Also update the JVM to the latest (21) or at least 20.
In the past i've always upgraded the JVM whenever something wacky started happening as that usually solved the problem.