Pthread_create error 11 with only 5 simultaneous thread

Pthread_create error 11 with only 5 simultaneous thread - c++

I have a problem in a multi threaded program. My program has 4 threads that are always here.
We will name these thread from 1 to 4.
The goal for my program is to communicate on a socket with a peripheral.
Thread number 4 is used to send the messages to the peripheral.
Each time a message is sent to the peripheral, I use :
pthread_create(&m_hThreadMsgReader, NULL, &ThreadMsgReader, (void*) &argStruct);
This will create a thread to listen to the response on the socket, this thread will return when the socket doesn't contain anymore message : ie
ifiRet = recv(m_iSocket, pcRecBuf, DEFAULT_READ_DATA_LEN, 0); makes iRet take the value 0.
So with the current implementation, a data pooling is made (every minute) on the peripheral (aside from other command sent via user input).
The problem here is that after a few hours, pthread create will crash with an error 11. I've seen on stack overflow that it means that the system might not have enough resources or too many threads.
But I don't understand as in QtCreator debug, i can only see the thread from 1 to 4. I know I might have created like 300 threads, but the list of thread only contains 4, which means all other threads were terminated.
So I don't really understand if the maximum number of thread creation is during the whole lifetime of the process or the number of thread present at the same time.
Should I just find a way to have a single thread for the listening even if it will listen on a empty socket ? Is my implementation a bad pattern ?

You should call pthread_join to free the resources acquired. Or you can use attribute PTHREAD_CREATE_DETACHED in that case you'll not require to use pthread_join

Related

How to stop select() immediately on closing the worker thread? [duplicate]

I have a loop which basically calls this every few seconds (after the timeout):
while(true){
if(finished)
return;
switch(select(FD_SETSIZE, &readfds, 0, 0, &tv)){
case SOCKET_ERROR : report bad stuff etc; return;
default : break;
}
// do stuff with the incoming connection
}
So basically for every few seconds (which is specified by tv), it reactivates the listening.
This is run on thread B (not a main thread). There are times when I want to end this acceptor loop immediately from thread A (main thread), but seems like I have to wait until the time interval finishes..
Is there a way to disrupt the select function from another thread so thread B can quit instantly?

The easiest way is probably to use pipe(2) to create a pipe and add the read end to readfds. When the other thread wants to interrupt the select() just write a byte to it, then consume it afterward.

Yes, you create a connected pair of sockets. Then thread B writes to one side of socket and thread A adds the other side socket to select. So once B writes to socket A exits select, do not forget to read this byte from socket.
This is the most standard and common way to interrupt selects.
Notes:
Under Unix, use socketpair to create a pair of sockets, under windows it is little bit tricky but googling for Windows socketpair would give you samples of code.

Can't you just make the timeout sufficiently short (like 10ms or so?).
These "just create a dummy connection"-type solution seem sort of hacked. I personally think that if an application is well designed, concurrent tasks never have to be interrupted forcefully, the just has worker check often enough (this is also a reason why boost.threads do not have a terminate function).
Edit Made this answer CV. It is bad, but it might help other to understand why it is bad, which is explained in the comments.

You can use shutdown(Sock, SHUT_RDWR) call from main thread to come out of waiting select call which will also exit your another thread before the timeout so you don't need to wait till timeout expires.
cheers. :)

How does a thread pool allows me to handle many client connections?

I want to handle 300 to 400 client connections, but I do not want to create a thread for each client connection (or is there anything wrong in creating a 400 threads?).
so I have read that I should use a thread pool to fix this problem, but I am unable to understand how does a thread pool actually fix this problem. I mean in my understanding of a thread pool, there is a limited number of threads that start to take tasks. But once a thread takes a recv() task it will immediately block if there is nothing to read! so isn't the solution should be that I should have a mechanism that allows me to know if there is something to be read before actually attempting to read it? So how exactly does a thread pool fix my problem of handling many client connections?
Edit: Changed read() to recv().

As user743414 already pointed out, to many threads are not a good idea. But the main problem lies IMHO in your blocking read. You only should use read if there is something to read. Use select to find out which socket has something to read and dispatch that to a worker thread out of the threadpool is the usual way.
With Windows you should use WSASockets.
You use selectin a single thread. Than you use the result of select (which will tell you on which socket are action needed) to dispatch the connection to a worker thread.
You wrote that you use microsoft. Take the sample:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms742219(v=vs.85).aspx
search for the code
//-----------------------------------------
// If data has been received, echo the received data
// from DataBuf back to the client
iResult =
WSASend(AcceptSocket, &DataBuf, 1, &RecvBytes, Flags, &AcceptOverlapped, NULL);
if (iResult != 0) {
wprintf(L"WSASend failed with error = %d\n", WSAGetLastError());
}
you would replace this part with your threadpool like (pseudocode):
mythreadpool *thread=takeOrCreateThreadFromThreadPool();
thread->callWith(&DataBuf,&RecvBytes);
You will find many different but good threadpool implementation which will use methods like that.

Creating 300 - 400 threads should work, but isn't the best solution. Context switch is here the keyword you have to search for. Context switches are expensive.
Another problem with more threads is that each thread gets 1 MB stack memory, and that memory is limited. You could easily try this an check how many threads you can create.
With a threadpool you have one thread which receives a request an then give these request to your threadpool to work.
So you wouldn't have a thread which blocks while waiting for reads. You threadpool is just working when there is something to read.
Another, better option would be on Windows I/O completion ports. Similar technics are also available on linux.

The thread pool helps because you probably will not have all the 400 connections constantly sending and receiving data, so your app needs only a handful of threads to manage them all.
A single thread can monitor all the connections (using select for instance) and as soon as select unlocks then it loops through all the sockets that need attention and pass them to the thread pool. If select specified that a sockets has received data then read will not block (and you can still set the timeout of read to 0)

boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >

I need some help with this exception, I am implementing a NPAPI plugin to be able to use local sockets from browser extensions, to do that I am using Firebreath framework.
For socket and connectivity I am using Boost asio with async calls and a thread pool of 5 worker threads.
Also I have a deadline per thread to implement a transmission timeout.
My extension workflow with the plugin is as this:
Open socket 1(this starts a async_receive and the deadline
async_wait)
Write in the socket 1
Get response 1
Open another socket 2
Write in the socket 2
Write socket 1
Close socket 1
(socket.cancel(), deadline.cancel(), socket.shutdown(), socket
release).
Get response 2
Write socket 2
Close socket 2
As everything is cross language and async is really hard to debug but all open, write or close are called from javascript and the read from socket 1 that calls open 2, write 2, write 1 and close 1 in that order.
Maybe evrything I am telling is unrelated as the call stack when the exception is thrown does not show any of my functions and only show that it is inside a malloc that calls _heap_alloc_dbg_impl
As it is it usually fails in the 2nd or 3rd full cycle and it seems that happens between steps 5 and 7.
But, I think that it must be asio related as doing everything with a single worker thread just crashes with the exception on the first cycle.
I am open to publish more information code if you need it.
Update 1:
Update 2:
There are 10 threads are launched with:
workPtr.reset( new boost::asio::io_service::work(io_service));
for ( int i = 0; i < 10; ++i) {
m_threadGroup.create_thread( boost::bind(&boost::asio::io_service::run, &io_service) );
}
The 11th _threadstartex I don't know who launched it
On another thread (not the one that VS claims as causing the crash) there is a join_all() in process because my class is being destroyed but I think it shouldn't, so maybe this crash is due to another exception and the Firebreath process to close everything when it crashes.

I found the errors by continuing to inspect the other threads. I found that my principal class that Firebreath was invoking was in the process of being destroyed. Inspecting a little more I found that it was totally my fault I have a class for storing the sockets information that needed to use a function in the principal class (I didn't like it but it was the only way I found to use it) so I added a shared_ptr to the principal class. So if it after destroying on of those SocketInfo objects as there were no others the ptr ref count reached 0 and the principal class was being destroyed.
What is fun is that sockets usually close normally after being used so I see no reason why this was not triggered when there were not sockets opened and only happened when 2 sockets where opened and closed in a row.
Anyway I also had a shared_from_this error with the deadline handler but that seemed unrelated.
And now it seems it is working as expected with any number of threads.

Priority of kernel modules and SCHED_RR threads

I have an embedded Linux platform (the Beagleboard, running Angstrom Linux) with two devices connected:
a Laser range finder (Hokuyo UTM 30) connected via USB
a custom external board connected via SPI
We have a written a Linux kernel module which is responsible for the SPI data transfer. It has an IRQ handler in which spi_async is called which in turn causes an async callback method to be called.
My C++ application consists of three threads:
a main thread for data processing
a laser polling thread
an SPI polling thread
I am experiencing problems which seem to be caused by how the modules described above interact.
When I switch off the USB device (laser range finder) I receive all SPI messages correctly (1 message every 3ms, message length divided by data rate is <1ms), independent from thread scheduling
When I switch on the USB device and I run my program with normal thread scheduling (SCHED_OTHER, priority 0, no nice level set) about 1% of the messages is "lost" because the callback method of spi_async is running when the next IRQ occurs (I could handle this case differently in order not to loose the messages, so this is not a big issue.)
With the USB device turned on and I run the program with SCHED_RR and
priority = 10 for main thread
priority = 10 for SPI reading thread
priority = 4 for USB/Laser polling thread
then I am loosing 40% of the messages because the IRQ is triggered again before the spi-callback method is called! (I could still maybe find a workaround, but the problem is that I need fast response times which can no longer be reached in this case). I need to use the thread scheduling and the laser device so I am looking for a way to solve this case.
Question 1:
My assumption was that IRQ handlers and the callbacks triggered by spi_async in kernel space have higher priority than any thread running in user space (no matter if SCHED_RR or SCHED_OTHER). This would mean that turning to SCHED_RR in my application shouldn't slow down SPI transfer, but this seems very wrong. Is it?
Question 2:
How can I determine what happens here? Which debugging aids exist? (Or maybe you don't need any further information?) The main question for me is: why do I experience the problems only when the laser device is turned on. Could the USB driver consume so much time?
----- EDIT:
I have made the following observation:
The spi_async's callback calls wake_up_interruptible(&mydata->readq); (with wait_queue_head_t readq;). From the user space (my app) I call a function which results in poll_wait(file, &mydata->readq, wait); When the poll returns the user space calls read().
When my application runs with SCHED_OTHER I can see that the callback method first finishes before the read() method in my kernel module is entered.
When my application runs with SCHED_RR read is entered before exiting the callback.
This seems to proof that the priority of the user space threads is higher than the callback method's context's priority. Is there any way to change this behaviour and still have SCHED_RR for my application's threads?

Not all kernel thread have an RT priority. Imagine a periodically waking up thread that needs to do some background work is waking up. You don't want this thread to preemt your RT thread. So I guess your first assumption is wrong.
Based on your other questions :
your main processing loop receives SPI data through a queue
the spi processing thread feeds the main processing queue
It seems your main processing thread get in the way of the spi driver thread responsible for the spi data transfer.
Here is what happens :
an IRQ is fired
spi_async is called, which means a data transfer is queued, that will be picked up by a thread created by the spi master driver.
spi master thread compete with your main processing thread, the laser thread, but this kernel thread has not RT priority, so it looses every time one of the RR thread is running.
What you can do is going back to normal scheduling, while playing with the various CONFIG_PREEMPT_ options. Or mess with the spi master driver, to ensure that any delayed work is queued with enough priority. Or even not queued at all.

breaking out from socket select

I have a loop which basically calls this every few seconds (after the timeout):
while(true){
if(finished)
return;
switch(select(FD_SETSIZE, &readfds, 0, 0, &tv)){
case SOCKET_ERROR : report bad stuff etc; return;
default : break;
}
// do stuff with the incoming connection
}
So basically for every few seconds (which is specified by tv), it reactivates the listening.
This is run on thread B (not a main thread). There are times when I want to end this acceptor loop immediately from thread A (main thread), but seems like I have to wait until the time interval finishes..
Is there a way to disrupt the select function from another thread so thread B can quit instantly?

The easiest way is probably to use pipe(2) to create a pipe and add the read end to readfds. When the other thread wants to interrupt the select() just write a byte to it, then consume it afterward.

Yes, you create a connected pair of sockets. Then thread B writes to one side of socket and thread A adds the other side socket to select. So once B writes to socket A exits select, do not forget to read this byte from socket.
This is the most standard and common way to interrupt selects.
Notes:
Under Unix, use socketpair to create a pair of sockets, under windows it is little bit tricky but googling for Windows socketpair would give you samples of code.

Can't you just make the timeout sufficiently short (like 10ms or so?).
These "just create a dummy connection"-type solution seem sort of hacked. I personally think that if an application is well designed, concurrent tasks never have to be interrupted forcefully, the just has worker check often enough (this is also a reason why boost.threads do not have a terminate function).
Edit Made this answer CV. It is bad, but it might help other to understand why it is bad, which is explained in the comments.

You can use shutdown(Sock, SHUT_RDWR) call from main thread to come out of waiting select call which will also exit your another thread before the timeout so you don't need to wait till timeout expires.
cheers. :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js