What is optimal value for Phusion passenger PassengerMaxRequestQueueSize

What is optimal value for Phusion passenger PassengerMaxRequestQueueSize - concurrency

I know this depends on the box hardware, but for example if there are set 100 processes, the default queue is also 100. Does it makes sense to increase PassengerMaxRequestQueueSize to 200 or 300? Probably this depends on free memory. Thoughts?
The best answer will be explaining the setting and probably one or two examples, assuming the server process requests for 2-3 seconds.
Thanks in advance!

Why you should limit queuing
Any requests that aren't immediately handled by an application process, are queued. Queuing is usually is bad: it often means that your server cannot handle the requests quickly enough.
A larger queue means that requests are less likely to be dropped. But this comes with a drawback: during busy times, the larger the queue, the longer your visitors have to wait before they see a response. This causes them to click reload, making the queue even longer (their previous request will stay in the queue; the OS does not know that they've disconnected until it tries to send data back to the visitor), or causes them to leave in frustration.
So having a limit on the queue is a good thing. It limits the impact of the above situation.
You should ensure that requests are queued as little as possible. That could mean:
Making your app faster (if your workload is CPU bound).
Upgrading to faster hardware (if your workload is CPU bound).
Increasing your app's concurrency settings (if your workload is I/O bound), e.g. by increasing the number of processes or threads.
If you cannot prevent requests from being queued, then the next best thing to do is to keep the queue short, and to display a friendly error message upon reaching the queue limit. Something like, "We're sorry, a lot of people are visiting us right now. Please try again later." The documentation for PassengerMaxRequestQueueSize tells you how to do that.
Optimal value for the queue size
It's hard to say what the optimal queue size should be. A good rule of thumb is: set the request queue size to the maximum number of requests you can handle in one second. Depending on your situation you may have to tweak things a little bit.
This rule of thumb comes from the notion of expected burst traffic. How many simultaneous requests do you expect on your server?
Suppose that your queue size is 100, and that for whatever reason you receive 150 requests at the same time. Suppose that your server is fast enough to handle 150 requests in half a second, so you know it's not a performance problem. But if you have a request queue size of 100, then 50 of those requests will be dropped with a "Request queue full" error.
In such a situation, you should set the queue size to the maximum number of concurrent requests that you think you can safely handle without performance issues.

This SO question and the Passenger docs here talk more about working with this. If you want more information about why this is happening on your server you can try running passenger-status (usually you need to run this as root).
If you would like to set a custom error page when visitors see this issue you can use the following (in Apache) to set a custom error page:
PassengerErrorOverride on
ErrorDocument 503 /error503.html
As mentioned by Hongli you can also change the setting PassengerMaxRequestQueueSize to a higher number to queue more requests. You can also set this to 0 and disable it (for most situations this is not an optimal solution however).
For reference, the default error message a visitor to your site will see when bumping against this limit is:
This website is under heavy load
We're sorry, too many people are accessing this website at the same time. We're working on this problem. Please try again later.

Related

How to do a very large number of HTTP requests in shortest time

So we have a very huge database which has around 300,000 urls. These urls have to be pinged and get data from.(these urls are radio stations which are playing song. The data is metadata)
Some of them are sometimes inactive and sometimes active.
On any given time, around 80,000 are active. Some respond slow, some respond quickly. I have a server and I am thinking to do this using c++
My goal is to ping and parse(or crawl) them within 1 minute and keep repeating the process because information(the song playing on them) can change over time. ranging from 2-7 minutes mostly. But I am not sure if it is possible.
What should be my approach to do it?
I have thought of creating two programs, one to test if the url is active or not and run it twice a day. And how much time it generally takes to respond. Does it usually respond slow or whether it is responding slower now.
And the other to do the actual crawling where fastest will be crawled first and some dedicated threads for urls which respond faster.
Please i would love more better ideas or better solutions for it. Can any one tell me how to do the maths to find out the number of dedicated threads i should allot to each for getting the results in least number of time

You don't need performance of your CPU (not your bottleneck at the moment), but you need to avoid network layer stall... if the request timeout is 60 seconds, and you have 16 threads, and hit 16 very slow servers (which will time-out eventually), you are generally stalled for 60 seconds and not processing anything more.
So I would start with let's say 500 threads (and like 15-30s timeout, if you know the very slow radios are capable to fit even this), and keep some statistic about their turnaround, and keep adding more working threads dynamically for every original which didn't get response within 2-3 secs. 80000/500 = 160, so each "normally quick" worker thread has then to ping around 160 urls, if each does take 2 seconds, that's still 320 = 5min! So 500 sounds like minimum.
That said, having 500+ threads will somewhat burden CPU and memory (not sure how much, with decent thread/memory model implementation 500 doesn't sounds like much for modern x86 CPU with GB of RAM, even 5000 sounds still reasonable), but I would worry lot more about the network layer and about possible firewalls around, you need server-grade like network for such amount of requests (if I would try something like that from my home, my own router would filter me out with default settings, detecting it as some kind of DoS attack).
So get some statistic how long the request on average take, then take your target time (2-7min), and divide the number of urls by those, like average ping 5s, round time 3min = 300,000/(3*60/5) = 8333.33 threads at least needed. Then you will have to profile your app to verify, that with 8000 threads it will not choke on something else, but it will really handle the task as expected.
(other option is to fire asynchronous http request from single thread, but that sort of creates its own threads for each task any way, so I would rather manage the threads myself, and use synchronous http calls)
And thinking about dynamic grow mechanics... you can keep some counters about how many new requests were added in last second, and how many finished (either responded or failed), and after few seconds of running these should start to form some kind of "throughput" statistic, then if throughput is under desired threshold, you can add more threads.
About active/inactive... keep the response time/last-seen/last-check together with url, and add some further logic to check url only when it makes sense (like not within next 60s, if it did just respond, or check inactive just after 6h from last test). You need also avoid checking the same url in two different threads at the same time, so some central manager code should feed the threads with target (maybe some FIFO thread-safe queue ... actually you can use its size to estimate how well the worker threads are processing it, so you can add more threads when you see the queue is not emptying fast enough = that avoids adding the statistic code to thread themselves).

Limit number of CFHTTP requests sent every x seconds

I'm making an application that will continually send CFHTTP requests to a server to search for items, as well as sending further CFHTTP requests to perform actions on any returned results.
The issue I'm having is that the server has a maximum threshold of 3 requests per second and even when I try to implement a sleep call every 4 milliseconds it doesn't work properly as, although it delays, the CFHTTP requests can queue up if it takes them a couple of seconds to return so that it then tries to send multiple in the same second triggering the threshold to be exceeded.
Is there a way I can ensure that there are never more than 3 active CFHTTP requests?

I think you are going to need to implement some sort of logging widget as part of your process. The log will keep track of request frequency. If the threshold is not met, then you would just skip over that iteration of your CFHTTP call. I don't mean a file log or a database log, but something implemented in the application or even request scope depending on your implementation. There is no way to throttle CFHTTP itself. It is basically a very simplistic wrapper around a Java HTTP library which then goes straight to the underlying operating system.

If you're limiting concurrent requests, then first part of this answer applies. If you're looking to limit the number of requests per second, then the bit at the end applies. The question kind of asks both things.
If I understand correctly, you've got a number of threads (either as requests CF is processing or threads CF has created itself) which all need to make calls to the same rate-limited domain. What you need is a central way of co-ordinating access, combined with a nice way of controlling program execution.
I don't know of any native limits that CF might support (I'd be happy to be proven wrong) so you're likely to have to implement your own. The cheap'n'nasty way to do this is to increment and decrement a allowed_conenctions variable in a long-lived scope such as appliation. The downsides are that you have to implement checking all over the place and that if there are no spare connections, you'll have to wait somehow.
Really what you have is a resource pool (of allowed HTTP connections) and I'm guessing that you want your code to wait until a connection is free. CF does this kind of thing already for database connections.
In your case, there isn't really a need to keep anything in a pool (as HTTP connections aren't long-lived), other than a permit to use the resource. Java provides a class which ought to provide what you're after, the Semaphore.
I've not tried it but in theory, something like the snippet below ought to work:
//Application.cfc:onApplicationStart()
application.http_pool = CreateObject("java","java.util.concurrent.Semaphore").init(3)
//Meanwhile, elsewhere in your code
application.http_pool.acquire()
//Make my HTTP call
application.http_pool.release()
You could even wrap the HTTP object to provide this functionality without having to use the acquire/release each time, which would make it more reliable.
EDIT
It you're looking to limit rates, look at guava's RateLimiter which has the same general interface as Semaphore above, but implements rate limiting for you. You'd need to add guava to ColdFusion's classpath, or use JavaLoader or use CF10 which has classloading facilities built-in.

IIS App Pool Monitoring Infinite Loops (or in-appropriate load)

Im just wondering if there is anyway I can handle when our webservice might get stuck in an infinite loop. I know the first answer is not to have an infinite loop and we have tested the system and no loops should occur. But just for a fallback is there a way on putting something on the IIS app pool to say if the CPU has been running at say 99% for more than 1 minute than recycle the app pool?
Thanks in advance

There is no IIS-built-in way of doing something like that (the recycle options allow you to recycle at a set time each day, or after a set number of minutes, based on hitting virtual or private memory limits, or based on hitting a particular number of requests - nothing CPU-ish).
You could build your own monitor that would watch for certain events (like CPU going above 99% for a minute) and causes a recycle to happen (there are various programmatic ways to do this).

In IIS 7.0+ this can be done very easily (although instead of recycling the Application Pool, it will terminate the process and then restart it when resetInterval has been reached). See:
http://www.iis.net/configreference/system.applicationhost/applicationpools/add/cpu

How to setup ZERO-MQ architecture to deal with workers of different speed

[as a small context provider: I am new to networking and ZERO-MQ, but I did spend quite a bit of time on the guide and examples]
I have the following challenge (done in C++, but irrelevant to the question). I have a single source that generates tasks. I have multiple engines that need to process those tasks, and send back the result.
First attempt:
I created a client with a ZMQ_PUSH socket. The engines have a ZMQ_PULL socket. To get the answers back to the client, I created the reverse: a ZMQ_PUSH on the workers and a ZMQ_PULL on the client. It worked out of the box. Only to find out that after some time the client ran out of memory since I was pushing way more requests than the workers could process. I need some backpressure.
Second attempt:
I added a counter on the client that took care of only pushing when no more than say 1000 tasks were 'in progress'. The out of memory issue was solved, since I was never having more than 1000 'in progress' tasks. But ... some workers were slower than others. Since PUSH/PULL uses fair queueing, the amount of work for that slow worker kept increasing and increasing...until the slowest worker had all 1000 requests queued and the others were starved. I was not using my workers effectively.
Now, what architecture could I use that solves the issue of 'workers with different speed'? Is the 'count the number of in progress tasks' approach a good way of balancing the number of pushed requests? Or is there a way I can PUSH tasks to the workers, and the pushing blocks on a predefined point? Can I do that with HWM?
I am sure this problem is of such a generic nature that I should be able to easily deal with this. Can anyone point me in the right direction?
Thanks!

we used the Paranoid Pirate Protocol http://rfc.zeromq.org/spec:6,
but in case of many very small jobs, where the overhead of communication might be high, a credit-based flow control pattern might be more efficient. http://unprotocols.org/blog:15
in both cases it is necessary for the requester to directly assign jobs to individual workers. this is abstracted away of course and, depending on the use-case, could be made available as a sync call, which returns when all tasks have been processed.

c++ process cpu usage jump causes detection

Given: multithreaded (~20 threads) C++ application under RHEL 5.3.
When testing under load, top shows that CPU usage jumps in range 10-40% every second.
The design mostly pretty simple - most of the threads implement active object design pattern: thread has a thread-safe queue, requests from other queues are pushed to the queue, while the thread only polling on the queue and process incomming requests. Processed request causes to a new request to be pushed to next processing thread.
The process has several TCP/UDP connection over each a data is received/sent in a high load.
I know I did not provided sufficiant data. This is pretty big application, and I'n not familiar well with all it's parts. It's now ported from Windows on Linux over ACE library (used for networking part).
Suppusing the problem is in the application and not external one, what are the techicues/tools/approaches can be used to discover the problem. For example I suspect that this maybe caused by some mutex contention.

I have faced similar problem some time back and here are the steps that helped me.
1) Start with using strace to see where the application is spending the time executing system calls.
2) Use OProfile to profile both the application and the kernel.
3) If you are using an SMP system , look at the numa settings,
In my case that caused a havoc .
/proc/appPID/numa_maps will give a quick look at how the access to the memory is happening.
numa misses can cause the jumps.
4) You have mentioned about TCP connections in your app.
Look at the MTU size and see its set to right value and
Depending upon the type of Data getting transferred use the Nagles Delay appropriately.
Nagles Delay

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js