There's a server that might be experiencing PostgreSQL database connection leaks. That server has also maxed out it's CPU at times (as indicated by %user being extremely high upon running sar -u). Could database connection leaks be causing the abnormally high CPU usage?
This can happen if the connections are busy running queries that take forever and consume CPU.
Use operating system tools on the PostgreSQL server to see which processes consume CPU. On Linux that would be top.
Related
i have a newbie question in here, but i'm new to clouds and linux, i'm using google cloud now and wondering when choosing a machine config
what if my machine is too slow? will it make the app crash? or just slow it down
how fast should my vm be? in the image bellow
last 6 hours of a python scripts i'm running and it's cpu usage, it's obviously running for less than %2 of the cpu for most of it's time, but there's a small spike, should i care about the spike? and also, how much should my cpu usage be max before i upgrade? if a script i'm running is using 50-60% of the cpu most of the i assume i'm safe, or what's the max before you upgrade?
what if my machine is too slow? will it make the app crash? or just
slow it down
It depends.
Some applications will just respond slower. Some will fail if they have timeout restrictions. Some applications will begin to thrash which means that all of a sudden the app becomes very very slow.
A general rule, which varies among architects, is to never consume more than 80% of any resource. I use the rule 50% so that my service can handle burst traffic or denial of service attempts.
Based on your graph, your service is fine. The spike is probably normal system processing. If the spike went to 100%, I would be concerned.
Once your service consumes more than 50% of a resource (CPU, memory, disk I/O, etc) then it is time to upgrade that resource.
Also, consider that there are other services that you might want to add. Examples are load balancers, Cloud Storage, CDNs, firewalls such as Cloud Armor, etc. Those types of services tend to offload requirements from your service and make your service more resilient, available and performant. The biggest plus is your service is usually faster for the end user. Some of those services are so cheap, that I almost always deploy them.
You should choose machine family based on your needs. Check the link below for details and recommendations.
https://cloud.google.com/compute/docs/machine-types
If CPU is your concern you should create a managed instance group that automatically scales based on CPU usage. Usually 80-85% is a good value for a max CPU value. Check the link below for details.
https://cloud.google.com/compute/docs/autoscaler/scaling-cpu
You should also consider the availability needed for your workload to keep costs efficient. See below link for other useful info.
https://cloud.google.com/compute/docs/choose-compute-deployment-option
I've experimented with PoCo HTTP server and found it consumes some CPU even on complete idle. This is not high usage but if we have a lot of instances running it may become a problem.
For network services using poll it's normal to permanently use small amount of cpu time. Nginx and redis also have some cpu consumption on idle. To achieve zero cpu usage on idle you well have to use another approach to network communications.
Redis can give sub millisecond response times. That's a great promise. I'm testing heroku redis and I get 1ms up to about 8ms, for a zincrby. I'm using microtime() in php to wrap the call. This heroku redis (I'm using the free plan) is a shared instance and there is resource contention so I expect response times for identical queries to vary, and they certainly do.
I'm curious as to the cause of the difference in performance vs. redis installed on my macbook pro via homebrew. There's obviously no network latency there. What I'm curious about is does this mean that any cloud redis (i.e. connecting over the network, say within aws), is always going to be quite a bit slower than if I were to have one cloud server and run a redis inside the same physical machine, thus eliminating network latency?
There is also resource contention in these cloud offerings, unless a private server is chosen which costs a lot more.
Some numbers: my local macbook pro consistently gives 0.2ms for the identical zincrby that takes between 1ms & 8ms on the heroku redis.
Is network latency the cause of this?
No, probably not.
The typical latency of a 1 Gbit/s network is about 200us. That's 0.2ms.
What's more, in aws you're probably on 10gbps at least.
As this page in the redis manual explains, the main cause of the latency variation between these two environments will almost certainly be a result of the higher intrinsic latency (there's a redis command to test this on any particular system: redis-cli --intrinsic-latency 100, see the manual page above) arising from being run in a linux container.
i.e., network latency is not the dominant cause of the variation seen here.
Here is a checklist (from redis manual page linked above).
If you can afford it, prefer a physical machine over a VM to host the server.
Do not systematically connect/disconnect to the server (especially true for web based applications). Keep your connections as long lived
as possible.
If your client is on the same host than the server, use Unix domain sockets.
Prefer to use aggregated commands (MSET/MGET), or commands with variadic parameters (if possible) over pipelining.
Prefer to use pipelining (if possible) over sequence of roundtrips.
Redis supports Lua server-side scripting to cover cases that are not suitable for raw pipelining (for instance when the result of a command
is an input for the following commands).
We have been using AWS Elasticache for our applications. We had initially set a CPU alarm threshold for 22% (4 core node, so effectively 90% CPU usage), which is based on the recommended thresholds. But we often see the CPU utilization crossing well over 25% to values like 28%, 34%.
What I am trying to understand that how is this theoretically possible, considering Redis is single-threaded ? The only way I can think that this can happen is if there is maintenance operation happening on other cores, which can bump the CPU usage > 25%. Even if the cluster is highly loaded, it should cap CPU usage at 25% and probably start timing out for clients. Can someone help me understand under what scenarios can the CPU usage of a single-threaded Redis instance cross 100% CPU utilization ?
Redis event loop is single-threaded. the Redis process itself is not. There are a couple of extra threads to offline some I/O bound operations. Now, these threads should not consume CPU.
However, Redis also forks child processes to take care of heavy duty operations like AOF rewrite or RDB save. Each forked process generally consumes 100% of a CPU core (except if the operation is slowed down by I/Os), on top of the Redis event loop consumption.
If you find the CPU consumption regularly high, it may be due to a wrong AOF and RDB configuration (i.e. the Redis instance rewrites the AOF or generates a dump too frequently).
I am facing serious ColdFusion Server crashing issue. I have many live sites on that server so that is serious and urgent.
Following are the system specs:
Windows Server 2003 R2, Enterprise X64 Edition, Service Pack 2
ColdFusion (8,0,1,195765) Enterprise Edition
Following are the hardware specs:
Intel(R) Xeon(R) CPU E7320 #2.13 GHZ, 2.13 GHZ
31.9 GB of RAM
It is crashing on the hourly bases. Can somebody help me to find out the exact issue? I tried to find it through ColdFusion log files but i do not find anything over there. Every times when it crashes, i have to reset the ColdFusion services to get it back.
Edit1
When i saw the runtime log files "ColdFusion-out165.log" so i found following errors
error ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
04/18 16:19:44 error ROOT CAUSE:
java.lang.OutOfMemoryError: GC overhead limit exceeded
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Here are my current JVM settings:
As you can see my JVM setting are
Minimum JVM Heap Size (MB): 512
Maximum JVM Heap Size (MB): 1024
JVM Arguments
-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib
Note:- when i tried to increase Maximum JVM Heap size to 1536 and try to reset coldfusion services, it does not allow me to start them and give the following error.
"Windows could not start the ColdFusion MX Application Server on Local Computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to service-specific error code 2."
Should i not able to set my maximum heap size to 1.8 GB, because i am using 64 bit operating system. Isn't it?
How much memory you can give to your JVM is predicated on the bitness off your JVM, not your OS. Are you running a 64-bit CF install? It was an uncommon thing to do back in the CF8 days, so worth asking.
Basically the error is stating you're using too much RAM for how much you have available (which you know). I'd be having a look at how much stuff you're putting into session and application scope, and culling back stuff that's not necessary.
Objects in session scope are particularly bad: they have a far bigger footprint than one might think, and cause more trouble than they're worth.
I'd also look at how many inactive but not timed-out sessions you have, with a view to being far more agressive with your session time-outs.
Have a look at your queries, and get rid of any SELECT * you have, and cut them back to just the columns you need. Push dataprocessing back into the DB rather than doing it in CF.
Farm scheduled tasks off onto a different CF instance.
Are you doing anything with large files? Either reading and processing them, or serving them via <cfcontent>? That can chew memory very quickly.
Are all your function-local variables in CFCs properly VARed? Especially ones in CFCs which end up in shared scopes.
Do you accidentally have debugging switched on?
Are you making heavy use of custom tags or files called in with <cfmodule>? I have heard apocryphyal stories of custom tags causing memory leaks.
Get hold of Mike Brunt or Charlie Arehart to have a look at your server config / app (they will obviously charge consultancy fees).
I will update this as I think of more things to look out for.
Turn on ColdFusion monitor in the administrator. Use it to observe behavior. Find long running processes and errors.
Also, make sure that memory monitoring is turned off in the ColdFusion Server Monitor. That will bring down a production server easily.
#Adil,
I have same kind of issue but it wasn't crashing it but CPU usage going high upto 100%, not sure it relevant to your issue but atleast worth to look.
See question at below URL:
Strange JRUN issue. JRUN eating up 50% of memory for every two hours
My blog entry for this
http://www.thecfguy.com/post.cfm/strange-coldfusion-issue-jrun-eating-up-to-50-of-cpu
For me it was high traffic site and storing client variables in registry which was making thing going wrong.
hope this help.