pthread scheduling problems - c++

I have two threads in a producer-consumer pattern. Code works, but then the consumer thread will get starved, and then the producer thread will get starved.
When working, program outputs:
Send Data...semValue = 1
Recv Data...semValue = 0
Send Data...semValue = 1
Recv Data...semValue = 0
Send Data...semValue = 1
Recv Data...semValue = 0
Then something changes and threads get starved, program outputs:
Send Data...semValue = 1
Send Data...semValue = 2
Send Data...semValue = 3
...
Send Data...semValue = 256
Send Data...semValue = 257
Send Data...semValue = 258
Recv Data...semValue = 257
Recv Data...semValue = 256
Recv Data...semValue = 255
...
Recv Data...semValue = 0
Send Data...semValue = 1
Recv Data...semValue = 0
Send Data...semValue = 1
Recv Data...semValue = 0
I know threads are scheduled by the OS, and can run at different rates and in random order. My question: When I do a YieldThread(calls pthread_yield), shouldn’t the Talker give Listener a chance to run? Why am I getting this bizarre scheduling?
Snippet of Code below. Thread class and Semaphore class are abstractions classes. I went ahead as stripped out the queue for data passing between the threads so I could eliminate that variable.
const int LOOP_FOREVER = 1;
class Listener : public Thread
{
public:
Listener(Semaphore* dataReadySemaphorePtr)
: Thread("Listener"),
dataReadySemaphorePtr(dataReadySemaphorePtr)
{
//Intentionally left blank.
}
private:
void ThreadTask(void)
{
while(LOOP_FOREVER)
{
this->dataReadySemaphorePtr->Wait();
printf("Recv Data...");
YieldThread();
}
}
Semaphore* dataReadySemaphorePtr;
};
class Talker : public Thread
{
public:
Talker(Semaphore* dataReadySemaphorePtr)
: Thread("Talker"),
dataReadySemaphorePtr(dataReadySemaphorePtr)
{
//Intentionally left blank
}
private:
void ThreadTask(void)
{
while(LOOP_FOREVER)
{
printf("Send Data...");
this->dataReadySemaphorePtr->Post();
YieldThread();
}
}
Semaphore* dataReadySemaphorePtr;
};
int main()
{
Semaphore dataReadySemaphore(0);
Listener listener(&dataReadySemaphore);
Talker talker(&dataReadySemaphore);
listener.StartThread();
talker.StartThread();
while (LOOP_FOREVER); //Wait here so threads can run
}

No. Unless you are using a lock to prevent it, even if one thread yields it's quantum, there's no requirement that the other thread receives the next quantum.
In a multithreaded environment, you can never ever ever make assumptions about how processor time is going to be scheduled; if you need to enforce correct behavior, use a lock.

Believe it or not, it runs that way because it's more efficient. Every time the processor switches between threads, it performs a context switch that wastes a certain amount of time. My advice is to let it go unless you have another requirement like a maximum latency or queue size, in which case you need another semaphore for "ready for more data" in addition to your "data ready for listening" one.

Related

IOCP when read is busy?

while I am writing some IOCP server-client codes, I saw misc behavior in IOCP.
The scenario goes here,
Register Socket to IOCP
Recv event catched by GetQueuedCompletionStatus
while (!mExitFlag)
{
bool bSuccess = ::GetQueuedCompletionStatus(mIocpHandle, &dwIoSize, (PULONG_PTR)&client, (LPOVERLAPPED*)&ioData, INFINITE);
logger->st = std::chrono::steady_clock::now();
// ... queue to recv worker
}
then read buffer (char[]) related to iocp registered buffer (WSABUF)
int dataLength = recvBytes; // when iocp completed
int pktLength = Serializer::toInt32(mBuffer + mDataPos);
if (dataLength > 0 && pktLength == 0)
{
using namespace std::chrono;
char buffer[512];
if (mBuffer[mDataPos] == 0)
{
// take snapshot
memcpy(buffer, &mBuffer[mDataPos], dataLength);
}
while (mBuffer[mDataPos] == 0) { }
// elapsed < 1ms
auto elapsed_in_microseconds = CTimer::count<microseconds>(mLogger->st);
printf("elapsed %llu us", elapsed_in_microseconds);
int val = mBuffer[mDataPos]; // this gives positive value
throw std::runtime_error("serializer failed to read packet length");
}
Snapshot in buffer[512] gives always 0 padded with dataLength.
After some microseconds elapsed, the mBuffer (WSABUF registered) is retrieved with data.
I checked the recv pending and handling in the single thread with log.
I observed this only happens when client sends huge data in shortly.
When the client sends data with term (10ms?), it was fine.
Does anyoneknows this IOCP issue?
Perhaps the solution can be waiting the buffer when client recv buffer is busy.

How can I parallelize transmission and reception of UDP packets in an object

What I have to do is a class which basically has 3 methods :
send(message, dst) which keeps sending messages (and maybe add delay with a sleep(t) and an increasing t) to dst until receiving an ACK.
listen() which receives messages and delivers ACKs. If the message is an ACK, destroys the thread who sent the msg acquitted.
shutdown() which stops every communication (every thread) and writes a log.
For the ACK mechanism, I thought of tables :
host_map[port][ipadress][id] // Also used for other things which require (port,adress) =>id mapping and also because all host have ids.
ACK[id][message][thread_to_stop] // Will be used to destroy threads except I didn't know how to put infos about the thread here and don't know where to put a join() if I even have to
A vector of threads (or threads id, idk) to stop all the threads when I call shutdown().
I want send() and listen() to be parallelized. In other words, listen() should not block the program so it should be in a thread. Also it needs to keep receiving stuff so it would need a while loop.
In my main.cpp :
A link = A(my_ip,port);
A.listen();
for (int i, i < 5, i++)
link.send(to_string(i), dst_ip, dst_port);
This should is supposed to make 5 threads which have while loop where they send i and then sleep a little, repeat until I receive an ACK.
I am new to C++ and never did multi-threading before so I don't even know if I can do this and don't even know where to put my join() if there is any.
Another thing that I thought and don't know if it's possible is to have a queue inside my class Link which keeps sending stuff and have send(msg) just adding it to the queue.
Here is something I made in A.hpp.
void sendm(std::string m, in_addr_t dst_ip, unsigned short dst_port){
int id = resolveId(dst_port, dst_ip);
int seq_number = resolveSeq(); // TODO
std::thread th= (&A::sendMessage, this, m, dst_ip, dst_port);
// I need something to be able to add th or an ID in my ACK table and thread vector.
th.join();
}
void sendMessage(std::string m, in_addr_t dst_ip, unsigned short dst_port){
//dst
struct in_addr dst_ipt;
struct sockaddr_in dst;
dst_ipt.s_addr = dst_ip;
dst.sin_family = AF_INET;
dst.sin_port = htons(dst_port);
dst.sin_addr = dst_ipt;
int t = 50;
while(true){
std::cout << "send message m = " << m << "\n";
//Sends a message to dst_ip through dst_port and increments the number of messages sent.
if (sendto(obj_socket, m.c_str(), m.size(), 0, reinterpret_cast<const sockaddr*>(&dst), sizeof(dst)) < 0){
std::cerr << " Error sendto\n";
exit(EXIT_FAILURE);
};
std::cout<< "message sent\n";
std::this_thread::sleep_for(std::chrono::milliseconds(t));
t+=10;
}
}
void receive(){
char buffer[1500];
sockaddr_in from;
socklen_t fromlen = sizeof(from);
ssize_t tmp = recvfrom(obj_socket, buffer, 1500, 0, reinterpret_cast<sockaddr*>(&from), &fromlen);
if (tmp < 0){
std::cout << "Exit\n";
std::cerr << "Error receive from\n";
exit(EXIT_FAILURE);
} else{
if (tmp >= 0){
int id = resolveId(from.sin_port, from.sin_addr.s_addr);
if (!verifySomething(m,id)){
doSomethingCool(m,id);
}
}
}
}
My listen() would just be a threaded version of while(true)receive();
Idk if this version compiles to be honest. I keep changing it every 2 minutes. Without the while loop in the send() and the threading, it works so far.
I didn't really implement the ACK mechanism yet.
Thank you for reading and for your help.

Strange behaviour of GetQueuedCompletionStatus when used from thread pool worker threads

I've been testing to combine the IO Completion Ports with the worker threads from the Thread Pool and stumbled on a behaviour I can't explain. In particular, while the following code:
int data;
for (int i = 0; i < NUM; ++i)
PostQueuedCompletionStatus(cp, 1, NULL, reinterpret_cast<LPOVERLAPPED>(&data));
{
std::thread t([&] ()
{
LPOVERLAPPED aux;
DWORD cmd;
ULONG_PTR key;
for (int i = 0; i < NUM; ++i)
{
if (!GetQueuedCompletionStatus(cp, &cmd, &key, &aux, 0))
break;
++count;
}
});
t.join();
}
works perfectly fine and receives NUM status notifications (with NUM being large number, 100000 or more), the similar code that uses the thread pool work object that reads one status notification per work item and repost the work item after reading it, fails after reading couple of hundred status notifications. Having the following global variables (please don't mind the names):
HANDLE cport;
PTP_POOL pool;
TP_CALLBACK_ENVIRON env;
PTP_WORK work;
std::size_t num_calls;
std::mutex mutex;
std::condition_variable cv;
bool job_done;
and the callback function:
static VOID CALLBACK callback(PTP_CALLBACK_INSTANCE instance_, PVOID pv_, PTP_WORK work_)
{
LPOVERLAPPED aux;
DWORD cmd;
ULONG_PTR key;
if (GetQueuedCompletionStatus(cport, &cmd, &key, &aux, 0))
{
++num_calls;
SubmitThreadpoolWork(work);
}
else
{
std::unique_lock<std::mutex> l(mutex);
std::cout << "No work after " << num_calls << " calls.\n";
job_done = true;
cv.notify_one();
}
}
the following code:
{
job_done = false;
std::unique_lock<std::mutex> l(mutex);
num_calls = 0;
cport = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 1);
pool = CreateThreadpool(nullptr);
InitializeThreadpoolEnvironment(&env);
SetThreadpoolCallbackPool(&env, pool);
work = CreateThreadpoolWork(callback, nullptr, &env);
for (int i = 0; i < NUM; ++i)
PostQueuedCompletionStatus(cport, 1, NULL, reinterpret_cast<LPOVERLAPPED>(&data));
SubmitThreadpoolWork(work);
cv.wait_for(l, std::chrono::milliseconds(10000), [] { return job_done; } );
}
would report "No more work after ..." after 250 or so calls to GetQueuedCompletionStatus although the NUM was set to 1000000. Even more curious is that setting the wait from 0 to, way, 10 milliseconds would increase the number of successful calls to couple of hundred thousand and would occasionally read all 1000000 notifications. Which I don't really understand since all status notifications were posted before submitting the work object for the first time.
Is it possible that there really is a problem with combining completion ports and a thread pool or is there something wrong in my code? Please don't go into why would I want to do this - I was investigating the possibilities and stumbled on this. In my view it should work and can't figure put what's wrong. Thank you.
I've tried running this code, the issue seems to be the NumberOfConcurrentThreads parameters supplied to CreateIoCompletionPort. Passing 1 means that the first pool thread that executes callback becomes associated with io completion port but since thread pool may execute callback using different thread GetQueuedCompletionStatus will fail when this happens. From documentation:
The most important property of an I/O completion port to consider carefully is the concurrency value. The concurrency value of a completion port is specified when it is created with CreateIoCompletionPort via the NumberOfConcurrentThreads parameter. This value limits the number of runnable threads associated with the completion port. When the total number of runnable threads associated with the completion port reaches the concurrency value, the system blocks the execution of any subsequent threads associated with that completion port until the number of runnable threads drops below the concurrency value.
Although any number of threads can call GetQueuedCompletionStatus for a specified I/O completion port, when a specified thread calls GetQueuedCompletionStatus the first time, it becomes associated with the specified I/O completion port until one of three things occurs: The thread exits, specifies a different I/O completion port, or closes the I/O completion port. In other words, a single thread can be associated with, at most, one I/O completion port.
So to use io completion with thread pool you need to set the number of concurrent threads to the size of the thread pool (that you can set using SetThreadpoolThreadMaximum).
::DWORD const threads_count{1};
cport = ::CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, threads_count);
...
pool = ::CreateThreadpool(nullptr);
::SetThreadpoolThreadMaximum(pool, threads_count);

UDP sendto packet sent signal

I'm developing an application that sends a lot of messages by an UDP connection.
Sometimes some packets were lost and after some tests I conclude that the socket was busy.
Thus I put a tiny sleep between calls to sendto API trying to prevent a new send before the last one ends.
It worked, but I want to use a better approach, like treat a signal or something else which point me that the previous send was done.
Is there anything like that?
I'm using C++ language on a Linux environment.
The below code snippet shows what I'm doing:
#define MAX_SIZE 4096
string long_msg = GetLongMessage();
if (!long_msg.empty()) {
long int to_send = long_msg.size();
while (to_send) {
long int ret = sendto(socket_fd,
&long_msg[long_msg.size() - to_send],
(to_send > MAX_SIZE ? MAX_SIZE : to_send), 0,
reinterpret_cast<struct sockaddr*>(&addr_client),
addr_client_len);
if (ret > 0) {
to_send -= ret;
sleep(10);
} else {
// Log error
}
}
}
Edit: The intent of this question is to know a way to detect if a UDP socket is busy due a previous send call and not discuss TCP vs UDP advantages/disadvantages.

Linux poll on serial transmission end

I'm implementing RS485 on arm developement board using serial port and gpio for data enable.
I'm setting data enable to high before sending and I want it to be set low after transmission is complete.
It can be simply done by writing:
//fd = open("/dev/ttyO2", ...);
DataEnable.Set(true);
write(fd, data, datalen);
tcdrain(fd); //Wait until all data is sent
DataEnable.Set(false);
I wanted to change from blocking-mode to non-blocking and use poll with fd. But I dont see any poll event corresponding to 'transmission complete'.
How can I get notified when all data has been sent?
System: linux
Language: c++
Board: BeagleBone Black
I don't think it's possible. You'll either have to run tcdrain in another thread and have it notify the the main thread, or use timeout on poll and poll to see if the output has been drained.
You can use the TIOCOUTQ ioctl to get the number of bytes in the output buffer and tune the timeout according to baud rate. That should reduce the amount of polling you need to do to just once or twice. Something like:
enum { writing, draining, idle } write_state;
while(1) {
int write_event, timeout = -1;
...
if (write_state == writing) {
poll_fds[poll_len].fd = write_fd;
poll_fds[poll_len].event = POLLOUT;
write_event = poll_len++
} else if (write == draining) {
int outq;
ioctl(write_fd, TIOCOUTQ, &outq);
if (outq == 0) {
DataEnable.Set(false);
write_state = idle;
} else {
// 10 bits per byte, 1000 millisecond in a second
timeout = outq * 10 * 1000 / baud_rate;
if (timeout < 1) {
timeout = 1;
}
}
}
int r = poll(poll_fds, poll_len, timeout);
...
if (write_state == writing && r > 0 && (poll_fds[write_event].revent & POLLOUT)) {
DataEnable.Set(true); // Gets set even if already set.
int n = write(write_fd, write_data, write_datalen);
write_data += n;
write_datalen -= n;
if (write_datalen == 0) {
state = draining;
}
}
}
Stale thread, but I have been working on RS-485 with a 16550-compatible UART under Linux and find
tcdrain works - but it adds a delay of 10 to 20 msec. Seems to be polled
The value returned by TIOCOUTQ seems to count bytes in the OS buffer, but NOT bytes in the UART FIFO, so it may underestimate the delay required if transmission has already started.
I am currently using CLOCK_MONOTONIC to timestamp each send, calculating when the send should be complete, when checking that time against the next send, delaying if necessary. Sucks, but seems to work