I am using django 1.5 and gunicorn (sync workers)
Workers memory usage grow with time
i try to use Dozer to find the reason, but get:
AssertionError: Dozer middleware is not usable in a multi-process environment
How can i found the reason of leaking, any ideas?
We had this issue recently (with memory usage slowly climbing until the server(s) could not keep up).
We set gunicorn max_requests variable and it solved our problem. We set ours to 1000, although I'm not certain what the "ideal" setting would be.
http://docs.gunicorn.org/en/latest/configure.html#max-requests
Although I'm not sure if there may potentially be some reason why this became a problem to begin with.
I solved it by upgrading django to 1.5.1 (1.5 has some memory leak bugs)
Related
I have a .net Core web service which seems to slowly increase its cpu usage.
meaning at the first day it won't go past 10%, the second day it can go up to 20% and so on.
Using the TOP command in linux, all my webservices seems to sometime be shown there (probably when a request is made) and afterward disappear.
This specific process after running for a while just stays there constantly consuming cpu even when no request has been made.
the API still working fine, it seems like there are some threads that just keeps hanging and consuming cpu. last time I checked I had 5 threads that consumed 3-4% cpu and didn't die for some reason.
My guess is that in some specific scenario a thread just stays alive consuming cpu.
The app runs on ubuntu machine, my first step was trying to create a dump file with ProcDump so I can analyze those threads and maybe find where they are hanging.
ProcDump generates a huge 21gb file, which trying to analyze with lldb throws out of memory exception. even tried transferring it to a windows machine to debug with windbg , no help there as it couldn't open the file.
As there is no specific exception or anything I can't really share any piece of code as I have no idea where the issue is... just kind a hoping for some suggestion that might help me get to a solution or at least understand where the problem is.
Thanks a lot for reading, cheers
You could try using something like jetBrains’ DotMemory, they also have a fairly high level but helpful guide https://www.jetbrains.com/help/dotmemory/How_to_Find_a_Memory_Leak.html it also worth checking your startup file and double checking the services you’ve registered are used in the correct way ie not added as scoped when they should be transient or even a singleton etc
so iv'e been at it for a while.
Eventually found out that my problem was with HttpClient
Probably some bad mix of static class and creating new instances of HttpClient that causes the issue Iv'e explained above.
Solved it by utilizing HttpClientFactory as explained here -
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-2.1
Lesson learned :)
A little late but Procdump for Linux just added .NET Core 3 support that generates much more managable sized core dumps. It automatically detects if the target process is .NET Core and does the right thing (i.e., no need to specify switches).
I'm using wso2esb 4.7.0.I'm facing problem of memory overflow.Actually if their is less load on system then it works fine but as per load increases the system get slower.upto 80 to 90% system handles itself somehow but above than that it can't allocate the new objects.After that it gives timeout error or server get down.i have to restart the system.
For the performance improvement i have tried thrashing.It helps me littlebit but not that much..still system causes problem.Is their any suggestion to improve system performance?
WSO2 ESB document contains some performance tuning parameters. Did you go though them. Please find it from here for ESB 4.7.0. If it is OOM issue, Sometime you may have not increase the memory foot print of ESB. You can increase it using following parameters in wso2server script file.
-Xms1024m -Xmx2048m -XX:MaxPermSize=1024m
I've got CF10 running on a dev box, Windows 7, 64 bit.
Periodically, every minute or so, the CPU usage for CF10 will spike up to 100% for about 20 seconds and come back down. It's pretty regular.
I've found it difficult to diagnose this issue. I've seen talk of client variables purges, logging, monitoring and all manner of things - but I've turned these all off to no avail.
With VisualVM, I've managed to track the issue down to the 'scheduler' threads. I have 5 of these in a waiting state. Periodically each will run, bumping up the CPU dramatically.
Taking a thread dump, it seems that all these threads are calling java.io.WinNTFileSystem.getBooleanAttributes - something I've seen mentioned a few times as potentially problematic.
UPDATE: Recently I've been playing with onSessionEnd on another app, and discovered that the scheduler-x threads appear to be internal to ColdFusion - my onSessionEnd tasks always seem to run in one of these threads.
Looking in the temp folder, I can see that a lot of EH Cache folders have been made which I think are to do with query caching. The apps I have running make use of this fairly extensively. I thought clearing the temp folder out might improve performance but it has had no effect.
It's worth noting that if I start the CF service without actually calling any of my apps, the problem does not occur. That might suggest the issue is with the apps themselves, however they do not cause any issue in production - only on this box.
There are no scheduled tasks set up either.
Below is an example of one of the threads causing high CPU. I'd appreciate any help in diagnosing what this thread is doing and why, as well as how to potentially stop it from using so much resources.
"scheduler-2" - Thread t#84
java.lang.Thread.State: RUNNABLE
at java.io.WinNTFileSystem.getBooleanAttributes(Native Method)
at java.io.File.isDirectory(File.java:849)
at coldfusion.watch.Watcher.accept(Watcher.java:352)
at java.io.File.listFiles(File.java:1252)
at coldfusion.watch.Watcher.getFiles(Watcher.java:386)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.getFiles(Watcher.java:397)
at coldfusion.watch.Watcher.checkWatchedDirectories(Watcher.java:166)
at coldfusion.watch.Watcher.run(Watcher.java:216)
at coldfusion.scheduling.ThreadPool.run(ThreadPool.java:211)
at coldfusion.scheduling.WorkerThread.run(WorkerThread.java:71)
My environment:
Win 7 64-bit
CF10 Update 12
JDK 1.8.0_11
The issue occurs on multiple versions of JVM - this version is currently used to make monitoring available.
My java settings:
Min heap size: 512mb
Max heap size: 1024mb
-server -XX:MaxPermSize=512m -XX:+UseParallelGC -Xbatch -Dcoldfusion.home={application.home} -Dcoldfusion.rootDir={application.home} -Dcoldfusion.libPath={application.home}/lib -Dorg.apache.coyote.USE_CUSTOM_STATUS_MSG_IN_HEADER=true -Dcoldfusion.jsafe.defaultalgo=FIPS186Random -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8701 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
I'd be lying if I said I understood what all of these settings do!
Sorry if you're one of those people that believes all CF developers should be Java app stack experts. I am not.
Any help, much appreciated. ;)
Using FusionReactor 6, I was able to solve this for us today. We were using this.javaSettings to hot load java class files. The WatchInterval from this.javaSettings uses the DirectoryWatcher at the specified watch number. In our case, I had lowered it to one second.
How I solved it: I set a breakpoint in FusionReactor and could see that it was constantly stuck scanning the directory above the one I specified in this.javasettings. This directory has enough files and subfolders, that it looks like one DirectoryWatcher was unable to finish before the next one was created. Had ColdFusion just stuck to the subfolder, I specified in this.javaSettings, it would not have been a problem.
Example:
This.javaSettings = {
loadPaths = ["\externals\lib\"]
, loadColdFusionClassPath = true
, reloadOnChange = true
, watchInterval = 1
};
In the above case, lib has just 5 files. However, "externals" is loaded with stuff. In the breakpoint, it was typically looking at stuff in "externals."
Do you have scheduled tasks running that use the CFFILE tag? They tend to be resource hogs. Spinning these into their own threads may help with the CPU spike.
another thought:
looking at the JVM,
•Min heap size: 512mb
•Max heap size: 1024mb
These establish the minimum and maximum memory available to the java virtual machine
-server -XX:MaxPermSize=512m
This is the amount of memory dedicated to the java permanent memory generation.
you've got half of your JVM allocated memory dedicated to the permanent generation, try bumping up the maximum heap size to 2048mb. and restarting the ColdFusion service. It could go higher based on whether or not you're running a 64Bit operating system or not.
I wasn't using ritz-nrepl before, and nrepl took around 10 sec which is long but still bearable since I don't restart it that often.
When I tried out ritz-repl, it took nearly 30s to boot, and consumes around 1.3G memory.
This makes me reluctant to use it.
I even threw in a SSD hoping it can increase the speed, because I heard someone mention that he "hardly notice the lein repl startup time" using ubuntu + ssd. But I can't tell the difference myself between ssd and hdd. I don't know if I did something wrong or if its just a myth.
There might be ways to reduce the startup time of an nrepl server that includes ritz but for the most part you will be stuck with at least the 10 seconds it takes to boot up the jvm on your machine. For me that is kind of an unacceptable delay when doing interactive development.
As an alternative you can use a smarter code reloading approach using the clojure.tools.namespace library. It basically keeps a dependency graph in memory and reloads only those namespaces that have been changed since you last refresh.
This will work out of the box for some but not all Clojure code. See the 'Preparing Your Application' section of the readme for more info on those edge cases to avoid.
Hope this helps!
For the last few years we've been randomly seeing this message in the output logs when running scheduled tasks in ColdFusion:
Recursion too deep; the stack overflowed.
The code inside the task that is being called can vary, but in this case it's VERY simple code that does nothing but reset a counter in the database and then send me an email to tell me it was successful. But I've seen it happen with all kinds of code, so I'm pretty sure it's not the code that's causing this problem.
It even has an empty application.cfm/cfc to block any other code being called.
The only other time we see this is when we are restarting CF and we are attempting to view a page before the service has fully started.
The error rarely happens, but now we have some rather critical scheduled tasks that cause issues if they don't run. (Hence I'm posting here for help)
Memory usage is fine. The task that ran just before it reported over 80% free memory. Monitoring memory through the night doesn't show any out-of-the-ordinary spikes. The machine has 4 gigs of memory and nothing else running on it but the OS and CF. We recently tried to reinstall CF to resolve the problem, but it did not help. It happens on several of our other servers as well.
This is an internal server, so usage at 3am should be nonexistent. There are no other scheduled tasks being run at that time.
We've been seeing this on our CF7, CF8, and CF9 boxes (fully patched).
The current box in question info:
CF version: 9,0,1,274733
Edition: Enterprise
OS: Windows 2003 Server
Java Version: 1.6.0_17
Min JVM Heap: 1024
Max JVM Heap: 1024
Min Perm Size: 64m
Max Perm Size: 384m
Server memory: 4gb
Quad core machine that rarely sees more than 5% CPU usage
JVM settings:
-server -Dsun.io.useCanonCaches=false -XX:PermSize=64m -XX:MaxPermSize=384m -XX:+UseParallelGC -XX:+AggressiveHeap -Dcoldfusion.rootDir={application.home}/../
-Dcoldfusion.libPath={application.home}/../lib
-Doracle.jdbc.V8Compatible=true
Here is the incredible complex code that failed to run last night, but has been running for years, and will most likely run tomorrow:
<cfquery datasource="common_app">
update import_counters
set current_count = 0
</cfquery>
<cfmail subject="Counters reset" to="my#email.com" from="my#email.com"></cfmail>
If I missed anything let me know. Thank you!
We had this issue for a while after our server was upgraded to ColdFusion 9. The fix seems to be in this technote from Adobe on jRun 4: http://kb2.adobe.com/cps/950/950218dc.html
You probably need to make some adjustments to permissions as noted in the technote.
Have you tried reducing the size of your heap from 1024 to say 800 something. You say there is over 80% of memory left available so if possible I would look at reducing the max.
Is it a 32 or 64 bits OS? When assigning the heap space you have to take into consideration all the overhead of the JVM (stack, libraries, etc.) so that you don't go over the OS limit for the process.
what you could try is to set Minimum JVM Heap Size to the same as your Maximum JVM Heap Size (MB) with in your CF administrator.
Also update the JVM to the latest (21) or at least 20.
In the past i've always upgraded the JVM whenever something wacky started happening as that usually solved the problem.