We have wso2esb-5.0.0 and we can see that intermittently the server gets high CPU usage and starts to increase gradually and then makes the API run slow and finally stops to respond back, in order make it work we restart the ESB servers which will come back to the normal working state. Could anyone please let me know what could be the issue?
Do we have any limitation that ESB can handle only x-number of API calls/sec and can have only x-number of open connection/sec? Any inputs and suggestion would be helpful.!
Configuration -
We have 2 ESB & 2 MB running on a cluster mode. The issue is seen in both the ESB's.
ESB - 16GB RAM, cache 8GB
We can see the ESTABLISHED connection value varying from 100 to 500 based on the numbers of incoming requests.
Thanks
There are limitations to the number of requests that can be handled by the ESB server. This depends on a number of factors such as backend latency, mediation implementations, request paylods, etc.
For example, consider a scenario where you use a mediator such as a script mediator to process a large payload (which is not recommended). In this scenario, the transformation may take a considerable amount of time resulting in threads blocked at the script mediator. By default, the passthrough message processor thread pool is defined as 500. Thus it can result in a scenario where there are no threads to process new requests resulting in a delay for the responses and in the worst case an out of memory scenario.
Therefore with the available information, we are unable to determine the exact cause of the issue. But from the above description, we can suspect that there is an issue with the available threads (due to the slow response). You can capture thread dumps and thread usage in your environment and analyze the possible cause of the issue. Please refer to the documentation [1], [2] to identify how you can capture a thread dump and a thread usage. Please refer to [3] to clarify the thread usage analysis.
Also, capture and analyze a heap dump in your environment.
[1]-https://docs.wso2.com/display/CLUSTER420/Troubleshooting+in+Production+Environments
[2]-https://gist.github.com/bsenduran/02e8bf024fcaaa7707a6bb2321e097a8
[3]-https://medium.com/#prabushi/analyse-thread-dump-with-process-instructions-c5490b97e2d1
Related
I have been diving into a Stackdriver Trace integration on Google Cloud Run. I can get it to work with the agent, but I am bothered by a few questions.
Given that
The Stackdriver agent aggregates traces in a small buffer and sends them periodically.
CPU access is restricted when a Cloud Run service is not handling a request.
There is no shutdown hook for Cloud Run services; you can't clear the buffer before shutdown: the container just gets a SIGKILL. This is a signal you can't catch from your application.
Running a background process that sends information outside of the request-response cycle seems to violate the Knative Container Runtime contract
The collections of logging data is documented and does not require me to run an agent, but there is no such solution for telemetry.
I found one report of someone experiencing lost traces on Cloud Run using the agent-based approach
How Google does it
I went into the source code for the Cloud Endpoints ESP, (the Cloud Run integration is in beta) to see if they solve it in a different way, but there the same pattern is used: there is a buffer with traces (1s) and it is cleared periodically.
Question
While my tracing integration seems to work in my test setup, I am worried about incomplete and missing traces when I run this in a production environment.
Is this a hypothetical problem or a real issue?
It looks like the right way to approach this is to write telemetry to logs, instead of using an agent process. Is that supported with Stackdriver Trace?
Is this a hypothetical problem or a real issue?
If you consider a Cloud Run service receiving a single request, then it is definitely a problem, as the library will not have time to flush the data before the CPU of the container instance get throttled.
However, in real life use cases:
A Cloud Run service often receives requests continuously or frequently, which means that its container instance are going to either: continuously have CPU or have CPU available from time to time.
It is OK to drop traces: If some traces are not collected because the instance is turned down, it is likely that you have collected a diverse enough set of samples before this happens. Also, you might just be interested in the aggregated reports, in which case, collecting individual traces do not matter.
Note that Trace libraries usually themselves sample the requests to trace, they rarely trace 100% of the requests.
It looks like the right way to approach this is to write telemetry to logs, instead of using an agent process. Is that supported with Stackdriver Trace?
No, Stackdriver Trace takes its data from the spans sent to its API. Note that to send data to Stackdriver Trace, you can use libraryes like OpenCenss and OpenTelemetry, proprietary Stackdriver Trace libraries are not the recommended way anymre.
You're right. This is a fair concern since most tracing libraries tend to sample/upload trace spans in the background.
Since (1) your CPU is nearly scaled nearly to zero when the container isn't handling any requests and (2) the container instance can be killed any time due to inactivity, you cannot reliably upload those trace spans collected in your app. As you said, it may sometimes work since we don't fully stop CPU, but it won't always work.
It appears like some of the Stackdriver (and/or OpenTelemetry f.k.a. OpenCensus) libraries let you control the lifecycle of pushing trace spans.
For example, this Go package for OpenCensus Stackdriver exporter has a Flush() method that you can call before completing your request rather than relying on the runtime to periodically upload the trace spans: https://godoc.org/contrib.go.opencensus.io/exporter/stackdriver#Exporter.Flush
I assume other tracing libraries in other languages also expose similar Flush() methods, if not, please let me know in the comments and this would be a valid feature request to those libraries.
Cloud Run now supports sending SIGTERM. If your application handles SIGTERM it'll get 10 seconds grace time before shutdown.
You can use the 10 seconds to:
Flush buffers that have unsent data
Close connections to other systems
Docs: Container runtime contract
I'm trying to implement some throttles on our REST API. A typical approach is after a certain threshold to block the request (with 403 or 429 response). However, I've seen one api that adds a delay to the response instead.
As you make calls to the API, we will be looking at your average calls per second (c/s) over the previous five-minute period. Here's what will happen:
over 3c/s and we add a 2 second delay
over 5c/s and we add a 4 second delay
over 7c/s and we add a 5 second delay
From the client's perspective, I see this being better than getting back an error. The worst that can happen is that you'll slow down.
I am wondering how this can be achieved without negatively impacting the app server. i.e. To add those delays, the server needs to keep the request open, causing it to keep more and more request processors busy, meaning it has less capacity for new requests coming in.
What's the best way to accomplish this? (i.e. is this something that can be done on the web server / load balancer so that the application server is not negatively affected? Is there some kind of a throttling layer that can be added for this purpose?)
We're using Django/Tastypie, but the question is more on the architecture/conceptual level.
If your are using synchronous application server which is the most common setup for Django applications (for example a gunicorn with default --worker-class sync), then adding such a delay in the application would indeed have a very bad impact on performance. A worker handling a delayed request would be blocked during a delay period.
But you can use asynchronous application server (for example gunicorn with '--worker-class gevent`) and then an overhead should be negligible. A worker that handles a delayed requests is able to handle other requests while a delay is in progress.
Doing this in the reverse proxy server may be a better option, because it allows to easily and flexibly adjust a policy. There is an external nginx module for exactly such thing.
I am writing a scheduled task which I would like to run frequently.
The problem is that I do not want this task to be run if the server is experiencing a high traffic load.
Is there any way other then getting the free/total/max memory from java to try and figure out whether this task should continue?
GetMetricData() is going to give you a very good indication of how busy your server is, i.e. how many requests are running and how many are queued as well as other info.
It's the same info that you get from running cfstat from the command line (you'll find that under {cfroot}\bin\cfstat.exe).
However, knowing how busy you are at the very moment might not be very useful to you if you just call that function once. It might be better for you to log performance data to file or to a database table using Windows perfmon. You can then get the average number of running/queued requests over the past 5 minutes (or whatever) and make your decision on whether to run your task.
There's an easy way to retrieve the memory usage information.
http://misterdai.wordpress.com/2009/11/25/retrieving-coldfusion-memory-usage/
For CPU load I think you can get it from getMetricData() but there are other methods too, but since this is my first stackoverflow post I'm only allowed one link :P But it's on my blog so just do a CPU search when you look at the link above.
You might find it useful to dig into getMetricData() for the performance monitoring stats. It's a good way of telling how busy your server is by the number of running and queued requests.
Hope this helps,
Dave (aka Mister Dai)
Use the ColdFusion AdminApi. Call http://servername/CFIDE/adminapi/servermonitor.cfc in your browser to get the cfcdocs of the component. If gives you many methods to get the health of you CF server instance.
I am debugging an ASMX web service that receives "bursts" of requests. i.e., it is likely that the web service will receive 100 asynchronous requests within about 1 or 2 seconds. Each request seems to take about a second to process (this is expected and I'm OK with this performance). What is important however, is that each request is dealt with sequentially and no parallel processing takes places. I do not want any concurrent request processing due to the external components called by the web service. Is there any way I can force the web service to only handle each response sequentially?
I have seen the maxconnection attribute in the machine.config but this seems to only work for outbound connections, where as I wish to throttle the incoming connections.
Please note that refactoring into WCF is not an option at this point in time.
We are usinng IIS6 on Win2003.
What I've done in the past is to simply put a lock statement around any access to the external resource I was using. In my case, it was a piece of unmanaged code that claimed to be thread-safe, but which in fact would trash the C runtime library heap if accessed from more than one thread at a time.
Perhaps you should be queuing the requests up internally and processing them one by one?
It may cause the clients to poll for results (if they even need them), but you'd get the sequential pipeline you wanted...
In IIS7 you can set up a limit of connections allowed to a web site. Can you use IIS7?
I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.