I have a class object which is acting as a server. It receives request from anywhere and pushes the request in its request queue (Producer). Now there is a consumer thread running which is popping the request from the request queue and based on the request calling appropriate class method to furnish the request. Now the consumption of the request from the queue and launching of appropriate function is being executed in a synchronous manner. What I want is the consumer thread pops up a request from the queue and launches the appropriate function in asynchronous manner so that the consumer can pop next request from the queue immediately.
One solution I have tried with this is the consumer pops up a request from the queue and create a boost::thread and start appropriate function in a new thread. I have saved thread pointers in std::vector as well as also tried boost::thread_group. So far so good. But there is a problem in this solution.
Once I have furnished more than 150 requests, there are more 150 threads and after that pthread does not create new thread giving error "pthread_create: Resource temporarily unavailable", which I believe means the stack of the current process has ran out so new threads cannot be created.
Question #1 My request handlers does not contain while (1), and those are just doing some work and exiting and are not waiting for anything at all, thats why I am expecting my initial threads has completed their processing and exited from the thread handler function. Considering this if the thread has completed its processing and exited shouldn't it got cleaned up its stuff from the stack?
One solution to this problem is I can set the stack size of the thread, but that will still raise this error after say 1000 threads.
So my requirement is I must clean up the completed threads after some time (i.e. say when the thread pointers vector has exceeded 100 or after every 1 minute or something like that).
Question #2 Besides launching the new thread as I have mentioned above what is the other asynchronous function call mechanism I should try. Is boost::function + boost::bind asynchronous? Is this a good solution to the situation I have mentioned? Say my system is supposed to be online 24/7/365 and receiving say >1000 request each day.
Update #1
So I found one problem in my design. I have mentioned in my question #1 that my request handler contains just plain calls which I found is not true. It is downloading a file from a server synchronously, which is essentially a blocking operation. I should download the file asynchronously.
There is no use of making threads which your underlying system can not handle concurrently if the request handler is not doing any blocking operation.
So as Alex has mentioned having more than one consumer threads (I think 5 are enough) to pop a request from the queue and have a asynchronous file download will solve my issue.
One solution is to have multiple consumer threads, which each pop a work item off the queue and deal with it synchronously. It enables you to manage concurrency (avoid over subscription), whilst still processing multiple items at a time. You also remove the overhead of launching a new thread on every item, which I'd predict is one of your bottlenecks.
You should ensure your queue is designed for multiple consumers.
Never used this implementation, but a threadpool might help.
You use already use Boost and you download files. So it would be pretty natural to use Boost.Asio for networking and for all other multithreading/async related stuff, like a central dispatcher.
First of all, I'd recommend to create a pool of threads and run Asio dispatcher over them: like here. Use Asio async networking to download files: example here. When a file is downloaded just process it.
This approach is pretty scalable and you won't worry about async networking or multithreading synchronization (rather tricky stuff). Boost.Asio provides good examples how to accomplish this.
Related
TLDR;
I need to run the gRPC Cpp client library as a single thread. From what I can tell, initializing grpc creates two threads for Executors (default executor and resolver executor) and one to two threads for Timers (from timer_manager). I can turn these threads off after creation but I cant figure out how to prevent them from being created. Is there a way to stop their creation using any of the APIs?
Explanation
Threading in Completion Queue, Executors, and Execution Contexts
Lets say we have a cpp file with a completion queue:
using grpc::CompletionQueue;
CompletionQueue* globalCompletionQueuePtr;
void main()
{
globalCompletionQueuePtr = new CompletionQueue;
}
Having done this we then have this sequence kick off:
Creating a CompletionQueue in this way initializes grpc (grpc_init()
in init.cc)
grpc_init will then call grpc_iomgr_init which then calls InitAll off of grpc_core::Executor
In executor.cc, InitAll creates the default and resolver executors and then calls Init() on each.
Init then calls SetThreading(true) which goes about starting up an execution thread for each executor.
Now we have two threads spun up separate from the main thread, one for the default executor and one for resolver executor. Not looking any farther into this, I can then remove the threads by calling grpc_core::Executor::SetThreadingAll(false); after creating the completion queue but this means that the threads will create and start work and then be terminated.
Questions about executors, the completion queue, and execution contexts:
How do the executors relate to the poll engine? I see that the executors run closures but are they responsible for executing all closures? I can run closures when they are turned off so I must assume thats happening on the main thread. Is that right?
Calling AsyncNext on the completion queue above drives the operations on the queue to finish as the documentation says. I can push operations onto the queue (with grpc_cq_begin_op and grpc_cq_end_op) and I can grab the underlying pollset, create a pollent, and use that to schedule calls myself. In this way, it looks like the queue tracks the state of operations but is not itself responsible for the operations doing work. Is that right?
I know that certain calls into grpc need the grpc_core::ExecCtx exec_ctx; context object created on the stack. How does the stack ctx interact with the resolver and default executors? Does it?
Is it possible to init grpc without the executors? Calling SetThreading(false) seems to keep the library working but I dont want to create threads and then kill them.
Threading in Timers
Separate from the completion queue, after the iomgr init:
grpc_init in init.cc later calls grpc_iomgr_start in iomgr.cc which calls grpc_timer_manager_init in timer_manager.cc
The last thing grpc_timer_manager_init does is call start_threads()
start_threads() checks g_threaded to see it needs to start some threads and then does so by calling start_timer_thread_and_unlock
Now theres a timer thread which will figure out how long until the next timer fires, sleep until that time, then wake up and fire the timers. If we run out of threads, we will start up another thread as long as we are in threaded mode (g_threaded). The code basically puts us in threaded mode no matter what, but there is a call grpc_timer_manager_set_threading(false); from timer_manager that will stop all the timer threads.
Questions about timers:
For these timer threads, what is their main use relative to grpc calls? Are the timers mostly an internal construct or are they used by the public API in some way? Are they responsible for enforcing the deadlines on closures?
Is there a way to init grpc without the timer threads? I can turn them off as stated above but its got the same problem as the executors in that they get created and then I destroy them afterwards.
Will turning off the timer threads have any negative impact on the operations of gRPC such as deadlines no longer working? Will gRPC spin up new threads even after calling grpc_timer_manager_set_threading? Are the timers resolved on the main thread in a coroutine way similiar to the closures by calling AsyncNext on the queue without threads? Is it already doing that?
Extra Context
Finally, is there anything else in the library that will spin up threads that I'm blind to seeing?
The reason I ask all these questions is that I need to run grpc inside an application where the application provides a single thread for the library to run on. Performance degradation from the lack of threads is not a concern.
If anything I have said here is inaccurate, please do correct me. I know I am working with an imperfect understanding of the grpc cpp library.
Thanks in advance for any answers and anyone who takes the time to read through this and provide support. I greatly appreciate it!
UPDATE: Why do I need a single thread?
I have a specific hardware environment where the actual application that will run the gRPC client will be managing several threads. Each thread will get a time slice to run in and must be done at the end of that time slice. While extra threads can spin up during that thread's time slice, they must all be finished when the time slice is over so that the next thread, when given its time slice, has all the hardware resources available to it.
GRPC does not have any support for single threading. At the very least, we use threads for running timers and the resolver, along with a few other tasks as you had noticed.
You can avoid thread blowup by using the async server API rather than sync which creates a new thread per RPC
But nothing is single threaded
I am working on designing a websocket server which receives a message and saves it to an embedded database. For reading the messages I am using boost asio. To save the messages to the embedded database I see a few options in front of me:
Save the messages synchronously as soon as I receive them over the same thread.
Save the messages asynchronously on a separate thread.
I am pretty sure the second answer is what I want. However, I am not sure how to pass messages from the socket thread to the IO thread. I see the following options:
Use one io service per thread and use the post function to communicate between threads. Here I have to worry about lock contention. Should I?
Use Linux domain sockets to pass messages between threads. No lock contention as far as I understand. Here I can probably use BOOST_ASIO_DISABLE_THREADS macro to get some performance boost.
Also, I believe it would help to have multiple IO threads which would receive messages in a round robin fashion to save to the embedded database.
Which architecture would be the most performant? Are there any other alternatives from the ones I mentioned?
A few things to note:
The messages are exactly 8 bytes in length.
Cannot use an external database. The database must be embedded in the running
process.
I am thinking about using RocksDB as the embedded
database.
I don't think you want to use a unix socket, which is always going to require a system call and pass data through the kernel. That is generally more suitable as an inter-process mechanism than an inter-thread mechanism.
Unless your database API requires that all calls be made from the same thread (which I doubt) you don't have to use a separate boost::asio::io_service for it. I would instead create an io_service::strand on your existing io_service instance and use the strand::dispatch() member function (instead of io_service::post()) for any blocking database tasks. Using a strand in this manner guarantees that at most one thread may be blocked accessing the database, leaving all the other threads in your io_service instance available to service non-database tasks.
Why might this be better than using a separate io_service instance? One advantage is that having a single instance with one set of threads is slightly simpler to code and maintain. Another minor advantage is that using strand::dispatch() will execute in the current thread if it can (i.e. if no task is already running in the strand), which may avoid a context switch.
For the ultimate optimization I would agree that using a specialized queue whose enqueue operation cannot make a system call could be fastest. But given that you have network i/o by producers and disk i/o by consumers, I don't see how the implementation of the queue is going to be your bottleneck.
After benchmarking/profiling I found the facebook folly implementation of MPMC Queue to be the fastest by at least a 50% margin. If I use the non-blocking write method, then the socket thread has almost no overhead and the IO threads remain busy. The number of system calls are also much less than other queue implementations.
The SPSC queue with cond variable in boost is slower. I am not sure why that is. It might have something to do with the adaptive spin that folly queue uses.
Also, message passing (UDP domain sockets in this case) turned out to be orders of magnitude slower especially for larger messages. This might have something to do with copying of data twice.
You probably only need one io_service -- you can create additional threads which will process events occurring within the io_service by providing boost::asio::io_service::run as the thread function. This should scale well for receiving 8-byte messages from clients over the network socket.
For storing the messages in the database, it depends on the database & interface. If it's multi-threaded, then you might as well just send each message to the DB from the thread that received it. Otherwise, I'd probably set up a boost::lockfree::queue where a single reader thread pulls items off and sends them to the database, and the io_service threads append new messages to the queue when they arrive.
Is that the most efficient approach? I dunno. It's definitely simple, and gives you a baseline that you can profile if it's not fast enough for your situation. But I would recommend against designing something more complicated at first: you don't know whether you'll need it at all, and unless you know a lot about your system, it's practically impossible to say whether a complicated approach would perform any better than the simple one.
void Consumer( lockfree::queue<uint64_t> &message_queue ) {
// Connect to database...
while (!Finished) {
message_queue.consume_all( add_to_database ); // add_to_database is a Functor that takes a message
cond_var.wait_for( ... ); // Use a timed wait to avoid missing a signal. It's OK to consume_all() even if there's nothing in the queue.
}
}
void Producer( lockfree::queue<uint64_t> &message_queue ) {
while (!Finished) {
uint64_t m = receive_from_network( );
message_queue.push( m );
cond_var.notify_all( );
}
}
Assuming that the constraint of using cxx11 is not too hard in your situtation, I would try to use the std::async to make an asynchronous call to the embedded DB.
I would like to have a way to add async tasks form multiple threads and execute them sequentially in a c++ boost::asio application.
Update: I would like to make a server-to-server communication with only one persistent socket between them and I need to sequence the multiple requests trough it. It needs to keep the incoming request in a queue, fire the top one / wait for it response and pick up the next. I'm trying to avoid using zeromq because it needs a dedicated thread.
Update2: Ok, Here is with what I ended up: The concurrent worker threads are "queued" for the use of the server-to-server socket with a simple mutex. The communication is blocking write/wait for response/read then release the mutex. Simple isn't it :)
From the ASIO documentation:
Asynchronous completion handlers will only be called from threads that
are currently calling io_service::run().
If you're already calling io_service::run() from multiple threads, you can wrap your async calls in an io_service::strand as described here.
Not sure if I understand you correctly either, but what's wrong with the approach in the client chat example? Messages are posted to the io_service thread, queued while a write is in progress and popped/sent in the write completion handler. If more messages were added in the meantime, the write handler launches the next async write.
Based on your comment to Sean, I also don't understand the benefit of having multiple threads calling io_service::run since you can only execute one async_write/async_read on one persistent socket at a time i.e. you can only call async_write again once the handler has returned? The number of calling threads might require you to lock the queue with a mutex though.
AFAICT the benefit of having multiple threads calling io_service::run is to increase the scalability of a server that is serving multiple requests simultaneously.
I have already used wininet to send some synchronous HTTP requests. Now, I want to go one step further and want to request some content asynchronously.
The goal is to get something "reverse proxy"-like. I send an HTTP request which gets answered delayed - as soon as someone wants to contact me. My thread should continue as if there was nothing in the meanwhile, and a callback should be called in this thread as soon as the response arrives. Note that I don't want a second thread which handles the reply (if it is necessary, it should only provide some mechanism which interrupts the main thread to invoke the callback there)!
Update: Maybe, the best way to describe what I want is a behaviour like in JavaScript where you have only one thread but can send AJAX requests which then result in a callback being invoked in this main thread.
Since I want to understand how it works, I don't want library solutions. Does anybody know some good tutorial which explains me how to achieve my wanted behavior?
My thread should continue as if there
was nothing in the meanwhile, and a
callback should be called in this
thread as soon as the response
arrives.
What you're asking for here is basically COME FROM (as opposed to GO TO). This is a mythical instruction which doesn't really exist. The only way you can get your code called is to either poll in the issuing thread, or to have a separate thread which is performing the synchronous IO and then executing the callback (in that thread, or in yet another spawned thread) with the results.
When I was working in C++ with sockets I set up a dedicated thread to iterate over all the open sockets, poll for data which would be available without blocking, take the data and stuff it in a buffer, sending the buffer to a callback on a given circumstance (EOL, EOF, that sort of thing).
Unless your main thread is listening to something like a message queue there isn't really a way to just hijack it and start it executing code other than what it is currently doing.
Take a look at how boost::asio works, it basically lets you asyncronously do connects, reads, writes, etc... For example you start an async read with the primary (or any) thread, asio then uses overlapped IO to ask the OS to notify it of IO completion. When the async read completes your callback will be executed by one of the worker threads.
All you need to do is to be sure to call io_service::run() with either your main thread or a worker thread to handle the IO completion queue. Any threads that you call run with will be the ones that execute the callback.
Asio has some guarantees that make this method of multithreading fairly robust if you follow the rules.
Take a look at the documentation for asio even if you don't plan to use it, a lot of the patterns and ideas are quite interesting if this is something you want to tackle yourself.
If you don't want to look at it, remember, on Windows the method of doing async IO is called "Overlapped IO".
In my application I have two threads
a "main thread" which is busy most of the time
an "additional thread" which sends out some HTTP request and which blocks until it gets a response.
However, the HTTP response can only be handled by the main thread, since it relies on it's thread-local-storage and on non-threadsafe functions.
I'm looking for a way to tell the main thread when a HTTP response was received and the corresponding data. The main thread should be interrupted by the additional thread and process the HTTP response as soon as possible, and afterwards continue working from the point where it was interrupted before.
One way I can think about is that the additional thread suspends the main thread using SuspendThread, copies the TLS from the main thread using some inline assembler, executes the response-processing function itself and resumes the main thread afterwards.
Another way in my thoughts is, setting a break point onto some specific address in the second threads callback routine, so that the main thread gets notified when the second threads instruction pointer steps on that break point - and therefore - has received the HTTP response.
However, both methods don't seem to be nicely at all, they hurt even if just thinking about them, and they don't look really reliable.
What can I use to interrupt my main thread, saying it that it should be polite and process the HTTP response before doing anything else? Answers without dependencies on libraries are appreciated, but I would also take some dependency, if it provides some nice solution.
Following question (regarding the QueueUserAPC solution) was answered and explained that there is no safe method to have a push-behaviour in my case.
This may be one of those times where one works themselves into a very specific idea without reconsidering the bigger picture. There is no singular mechanism by which a single thread can stop executing in its current context, go do something else, and resume execution at the exact line from which it broke away. If it were possible, it would defeat the purpose of having threads in the first place. As you already mentioned, without stepping back and reconsidering the overall architecture, the most elegant of your options seems to be using another thread to wait for an HTTP response, have it suspend the main thread in a safe spot, process the response on its own, then resume the main thread. In this scenario you might rethink whether thread-local storage still makes sense or if something a little higher in scope would be more suitable, as you could potentially waste a lot of cycles copying it every time you interrupt the main thread.
What you are describing is what QueueUserAPC does. But The notion of using it for this sort of synchronization makes me a bit uncomfortable. If you don't know that the main thread is in a safe place to interrupt it, then you probably shouldn't interrupt it.
I suspect you would be better off giving the main thread's work to another thread so that it can sit and wait for you to send it notifications to handle work that only it can handle.
PostMessage or PostThreadMessage usually works really well for handing off bits of work to your main thread. Posted messages are handled before user input messages, but not until the thread is ready for them.
I might not understand the question, but CreateSemaphore and WaitForSingleObject should work. If one thread is waiting for the semaphore, it will resume when the other thread signals it.
Update based on the comment: The main thread can call WaitForSingleObject with a wait time of zero. In that situation, it will resume immediately if the semaphore is not signaled. The main thread could then check it on a periodic basis.
It looks like the answer should be discoverable from Microsoft's MSDN. Especially from this section on 'Synchronizing Execution of Multiple Threads'
If your main thread is GUI thread why not send a Windows message to it? That what we all do to interact with win32 GUI from worker threads.
One way to do this that is determinate is to periodically check if a HTTP response has been received.
It's better for you to say what you're trying to accomplish.
In this situation I would do a couple of things. First and foremost I would re-structure the work that the main thread is doing to be broken into as small of pieces as possible. That gives you a series of safe places to break execution at. Then you want to create a work queue, probably using the microsoft slist. The slist will give you the ability to have one thread adding while another reads without the need for locking.
Once you have that in place you can essentially make your main thread run in a loop over each piece of work, checking periodically to see if there are requests to handle in the queue. Long-term what is nice about an architecture like that is that you could fairly easily eliminate the thread localized storage and parallelize the main thread by converting the slist to a work queue (probably still using the slist), and making the small pieces of work and the responses into work objects which can be dynamically distributed across any available threads.