Boost scoped lock assert fails - c++

I'm using Boost 1.41 in a linux app that receives data on one thread and sticks it in a queue, another thread pops it off the queue and processes it. To make it thread safe I'm using scoped locks.
My problem is that very infrequently the lock function fails in the read function with the message:
void boost::mutex::lock() Assertion '!pthread_mutext_lock(&m)' failed
It is very infrequent, on last run, it took 36 hours (~425M transactions) before it failed. The read and write functions are listed below, its always in the read function that the Assert arises
Write to queue
void PacketForwarder::Enqueue(const byte_string& newPacket, long sequenceId)
{
try
{
boost::mutex::scoped_lock theScopedLock(pktQueueLock);
queueItem itm(newPacket,sequenceId);
packetQueue.push(itm);
if (IsConnecting() && packetQueue.size() > MaximumQueueSize)
{
// Reached maximum queue size while client unavailable; popping.
packetQueue.pop();
}
}
catch(...)
{
std::cout << name << " Exception was caught:" << std::endl;
}
}
Read from queue
while ( shouldRun )
{
try
{
if (clientSetsHaveChanged)
{
tryConnect();
}
size_t size = packetQueue.size();
if (size > 0)
{
byte_string packet;
boost::mutex::scoped_lock theQLock(pktQueueLock);
queueItem itm = packetQueue.front();
packet = itm.data;
packetQueue.pop();
BytesSent += packet.size();
trySend(packet);
}
else
{
boost::this_thread::sleep(boost::posix_time::milliseconds(50));
}
}
catch (...)
{
cout << name << " Other exception in send packet" << endl;
}
I've googled and found a few problems when destroying scoped_locks but nothing on failing to get a lock. I have also had a search through boost release notes and Trac logs to see if this has been identified as an issue by anyone else. I thought my code was about as simple as it gets but obviously something is up. Any thoughts?
TIA
Paul

There is one thread-safety issue in your program, in this piece of code:
size_t size = packetQueue.size();
if (size > 0)
{
byte_string packet;
boost::mutex::scoped_lock theQLock(pktQueueLock);
queueItem itm = packetQueue.front();
packet = itm.data;
packetQueue.pop();
// ...
}
The issue here is that between the time you checked the queue size and the time you got the lock some other reader thread might take the last item out of the queue, which will cause front() and pop() to fail. Unless you have only one reader thread, you need the size check to be under the lock as well.
I do not know if this is the reason of the assertion failure though. The assertion means the call to pthread_mutex_lock returned a non-zero value signaling an error. Unfortunately, Boost does not show which exactly of possible pthread_mutex_lock errors has happened.

Related

boost::interprocess message queue not removing

Using boost::interprocess message queue, I ran into a problem with removing the queue, was hoping someone could provide an explanation for the behavior, or correct my misunderstanding.
I'm on a Linux RHEL7 box. I have two processes, one that creates and feeds the queue, and the other one that opens the message queue and reads from it. For clarity, the initial process is started first, before the second process that does the reading/removing msgs from queue.
The one that creates, is create_only_t. I want it to fail if it already exists. The creation of the first queue always fails however. The specific exception it throws is File exists.
When switched to an open_or_create_t queue, it works fine. I took this information as I wasn't cleaning it up properly, so I made sure I was trying to remove the queue before I tried to create it, as well as after the process finished sending all messages.
I would log if the remove was successful. One of my assumptions is that if the remove returns true it means it successfully removed the queue.
The boost docs for remove reads: "Removes the message queue from the system. Returns false on error.", I wasn't sure if maybe a true just means that it had a successful 'attempt' at removing it. Upon further looking at another Boost Inprocess pdf it explains:
The remove operation might fail returning false if the shared memory does not exist, the file is open or the file is still memory mapped by other processes
Either case, I feel like one would expect the queue to be removed if it always returns true, which is currently my case.
Still when trying to do a 'create_t' message queue it will continue to fail, but 'open_or_create_t' still works.
I had a hard time understanding the behavior, so I also tried to remove the message queue twice in a row before trying to initialize a create_t queue to see if the second one would fail/return false, however both returned true (which was not what I expected based on what the documentation said). The first remove should it be successful means the second one should fail as there should no longer be a message queue that exists anymore.
I've attached a snippet of my create process code. And I'll note, that this error happens without the "open process" being run.
Maybe I'm missing something obvious, thank you in advance.
try {
bool first_removal = remove(msg_queue_name);
if (first_removal) {
log_info("first removal - true"); // this log always prints
bool second_removal = remove(msg_queue_name);
if (second_removal ) {
log_info("removal was again true"); // this also always prints
} else {
log_info("second removal - false");
}
} else {
log_info("did not remove queue before creation");
}
log_info("attempting to initialize msg queue");
message_queue mq(ooc, msg_queue_name, max_num_msgs, max_msg_size); // this is where it will fail (File exists)
while(1) {
// insertion logic, but does not get here
}
} catch ( interprocess_exception& err ) {
log_error(err.what()); // File exists
bool removal_after_failure = remove(msg_queue_name);
if (removal_after_failure) {
log_info("Message queue was successfully removed"); // always logs here after every failure
} else {
log_warn("Message queue was NOT removed");
}
}
It worked for me.
Then it dawned on me. You're probably using namespace. Don't. For this reason:
bool first_removal = remove(msg_queue_name);
This doesn't call the function you expect. It calls ::remove from the C standard library.
Simply qualify your call:
bool first_removal = message_queue::remove(msg_queue_name);
Measures
What you can do:
write hygienic code
avoid using namespace directives
avoid ADL traps
use warnings (-Wall -Wextra -pedantic at least)
use linters. See below
check your assumptions (a simple trip into the debugger would have shown you what's happening)
Linters?
E.g. clang-tidy reported:
test.cpp|27 col 30| warning: implicit conversion 'int' -> bool [readability-implicit-bool-conversion]
|| bool first_removal = remove(msg_queue_name);
Suggesting to write:
bool first_removal = remove(msg_queue_name) != 0;
This tipped me off something might be fishy.
Fixes
Several of these fixes later and the code runs
Live On Coliru
#include <boost/interprocess/ipc/message_queue.hpp>
#include <chrono>
#include <iostream>
namespace bip = boost::interprocess;
using bip::message_queue;
using bip::interprocess_exception;
using namespace std::chrono_literals;
using C = std::chrono::high_resolution_clock;
static constexpr char const* msg_queue_name = "my_mq";
static constexpr bip::open_or_create_t ooc;
static constexpr auto max_num_msgs = 10;
static constexpr auto max_msg_size = 10;
static auto log_impl = [start=C::now()](auto severity, auto const& ... args) {
std::clog << severity << " at " << std::fixed << (C::now()-start)/1.ms << "ms ";
(std::clog << ... << args) << std::endl;
};
static auto log_error = [](auto const& ... args) { log_impl("error", args...); };
static auto log_warn = [](auto const& ... args) { log_impl("warn", args...); };
static auto log_info = [](auto const& ... args) { log_impl("info", args...); };
int main() {
try {
bool first_removal = message_queue::remove(msg_queue_name);
if (first_removal) {
log_info("first removal - true"); // this log always prints
bool second_removal = message_queue::remove(msg_queue_name);
if (second_removal) {
log_info("removal was again true"); // this also always prints
} else {
log_info("second removal - false");
}
} else {
log_info("did not remove queue before creation");
}
log_info("attempting to initialize msg queue");
message_queue mq(
ooc, msg_queue_name, max_num_msgs,
max_msg_size); // this is where it will fail (File exists)
log_info("Start insertion");
} catch (interprocess_exception& err) {
log_error(err.what()); // File exists
bool removal_after_failure = message_queue::remove(msg_queue_name);
if (removal_after_failure) {
log_info("Message queue was successfully removed"); // always logs
// here after
// every failure
} else {
log_warn("Message queue was NOT removed");
}
}
}
Printing, on Coliru:
info at 22.723521ms did not remove queue before creation
info at 22.879425ms attempting to initialize msg queue
error at 23.098989ms Function not implemented
warn at 23.153540ms Message queue was NOT removed
On my system:
info at 0.148484ms first removal - true
info at 0.210316ms second removal - false
info at 0.232181ms attempting to initialize msg queue
info at 0.299645ms Start insertion
./sotest
info at 0.099407ms first removal - true
info at 0.173156ms second removal - false
info at 0.188026ms attempting to initialize msg queue
info at 0.257117ms Start insertion
Of course now your logic can be greatly simplified:
Live On Coliru
int main() {
try {
bool removal = message_queue::remove(msg_queue_name);
log_info("attempting to initialize msg queue (removal:", removal, ")");
message_queue mq(
ooc, msg_queue_name, max_num_msgs,
max_msg_size); // this is where it will fail (File exists)
log_info("insertion");
} catch (interprocess_exception const& err) {
bool removal = message_queue::remove(msg_queue_name);
log_info(err.what(), " (removal:", removal, ")");
}
}
Printing
info at 0.462333ms attempting to initialize msg queue (removal:false)
info at 0.653085ms Function not implemented (removal:false)
Or
info at 0.097283ms attempting to initialize msg queue (removal:true)
info at 0.239138ms insertion

boost::asio write: Broken pipe

I have a TCP server that handles new connections, when there's a new connection two threads will be created (std::thread, detached).
void Gateway::startServer(boost::asio::io_service& io_service, unsigned short port) {
tcp::acceptor TCPAcceptor(io_service, tcp::endpoint(tcp::v4(), port));
bool UARTToWiFiGatewayStarted = false;
for (;;) { std::cout << "\nstartServer()\n";
auto socket(std::shared_ptr<tcp::socket>(new tcp::socket(io_service)));
/*!
* Accept a new connected WiFi client.
*/
TCPAcceptor.accept(*socket);
socket->set_option( tcp::no_delay( true ) );
// This will set the boolean `Gateway::communicationSessionStatus` variable to true.
Gateway::enableCommunicationSession();
// start one thread
std::thread(WiFiToUARTWorkerSession, socket, this->SpecialUARTPort, this->SpecialUARTPortBaud).detach();
// start the second thread
std::thread(UARTToWifiWorkerSession, socket, this->UARTport, this->UARTbaud).detach();
}
}
The first of two worker functions look like this (here I'm reading using the shared socket):
void Gateway::WiFiToUARTWorkerSession(std::shared_ptr<tcp::socket> socket, std::string SpecialUARTPort, unsigned int baud) {
std::cout << "\nEntered: WiFiToUARTWorkerSession(...)\n";
std::shared_ptr<FastUARTIOHandler> uart(new FastUARTIOHandler(SpecialUARTPort, baud));
try {
while(true == Gateway::communicationSessionStatus) { std::cout << "WiFi->UART\n";
unsigned char WiFiDataBuffer[max_incoming_wifi_data_length];
boost::system::error_code error;
/*!
* Read the TCP data.
*/
size_t length = socket->read_some(boost::asio::buffer(WiFiDataBuffer), error);
/*!
* Handle possible read errors.
*/
if (error == boost::asio::error::eof) {
// this will set the shared boolean variable from "true" to "false", causing the while loop (from the both functions and threads) to stop.
Gateway::disableCommunicationSession();
break; // Connection closed cleanly by peer.
}
else if (error) {
Gateway::disableCommunicationSession();
throw boost::system::system_error(error); // Some other error.
}
uart->write(WiFiDataBuffer, length);
}
}
catch (std::exception &exception) {
std::cerr << "[APP::exception] Exception in thread: " << exception.what() << std::endl;
}
std::cout << "\nExiting: WiFiToUARTWorkerSession(...)\n";
}
And the second one (here I'm writing using the thread-shared socket):
void Gateway::UARTToWifiWorkerSession(std::shared_ptr<tcp::socket> socket, std::string UARTport, unsigned int baud) {
std::cout << "\nEntered: UARTToWifiWorkerSession(...)\n";
/*!
* Buffer used for storing the UART-incoming data.
*/
unsigned char UARTDataBuffer[max_incoming_uart_data_length];
std::vector<unsigned char> outputBuffer;
std::shared_ptr<FastUARTIOHandler> uartHandler(new FastUARTIOHandler(UARTport, baud));
while(true == Gateway::communicationSessionStatus) { std::cout << "UART->WiFi\n";
/*!
* Read the UART-available data.
*/
auto bytesReceived = uartHandler->read(UARTDataBuffer, max_incoming_uart_data_length);
/*!
* If there was some data, send it over TCP.
*/
if(bytesReceived > 0) {
boost::asio::write((*socket), boost::asio::buffer(UARTDataBuffer, bytesReceived));
std::cout << "\nSending data to app...\n";
}
}
std::cout << "\nExited: UARTToWifiWorkerSession(...)\n";
}
For stopping this two threads I do the following thing: from the WiFiToUARTWorkerSession(...) function, if the read(...) fails (there's an error like boost::asio::error::eof, or any other error) I set the Gateway::communicationSessionStatus boolean switch (which is shared (global) by the both functions) to false, this way the functions should return, and the threads should be killed gracefully.
When I'm connecting for the first time, this works well, but when I'm disconnecting from the server, the execution flow from the WiFiToUARTWorkerSession(...) goes through else if (error) condition, it sets the while condition variable to false, and then it throws boost::system::system_error(error) (which actually means Connection reset by peer).
Then when I'm trying to connect again, I got the following exception and the program terminates:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
what(): write: Broken pipe
What could be the problem?
EDIT: From what I found about this error, it seems that I write(...) after the client disconnects, but how could this be possible?
EDIT2: I have debugged the code even more and it seems that one thread (on which runs the UARTToWifiWorkerSession(...) function) won't actually exit (because there's a blocking read(...) function call at where the execution flow stops). This way that one thread will hang until there's some data received by the read(...) function, and when I'm reconnecting there will be created another two threads, this causing some data racing problems.
Can someone confirm me that this could be the problem?
The actual problem was that the function UARTToWifiWorkerSession(...) didn't actually exit (because of a blocking read(...) function, this causing two threads (the hanging one, and one of the latest two created ones) to write(...) (without any concurrency control) using the same socket.
The solution was to set a read(...) timeout, so I can return from the function (and thus destroy the thread) without pending from some input.

strange behavior in concurrently executing a function for objects in queue

My program has a shared queue, and is largely divided into two parts:
one for pushing instances of class request to the queue, and the other accessing multiple request objects in the queue and processing these objects. request is a very simple class(just for test) with a string req field.
I am working on the second part, and in doing so, I want to keep one scheduling thread, and multiple (in my example, two) executing threads.
The reason I want to have a separate scheduling thread is to reduce the number of lock and unlock operation to access the queue by multiple executing threads.
I am using pthread library, and my scheduling and executing function look like the following:
void * sched(void* elem) {
queue<request> *qr = static_cast<queue<request>*>(elem);
pthread_t pt1, pt2;
if(pthread_mutex_lock(&mut) == 0) {
if(!qr->empty()) {
int result1 = pthread_create(&pt1, NULL, execQueue, &(qr->front()));
if (result1 != 0) cout << "error sched1" << endl;
qr->pop();
}
if(!qr->empty()) {
int result2 = pthread_create(&pt2, NULL, execQueue, &(qr->front()));
if (result2 != 0) cout << "error sched2" << endl;
qr->pop();
}
pthread_join(pt1, NULL);
pthread_join(pt2, NULL);
pthread_mutex_unlock(&mut);
}
return 0;
}
void * execQueue(void* elem) {
request *r = static_cast<request*>(elem);
cout << "req is: " << r->req << endl; // req is a string field
return 0;
}
Simply, each of execQueue has one thread to be executed on, and just outputs a request passed to it through void* elem parameter.
sched is called in main(), with a thread, (in case you're wondering how, it is called in main() like below)
pthread_t schedpt;
int schresult = pthread_create(&schedpt, NULL, sched, &q);
if (schresult != 0) cout << "error sch" << endl;
pthread_join(schedpt, NULL);
and the sched function itself creates multiple(two in here) executing threads and pops requests from the queue, and executes the requests by calling execQueue on multiple threads(pthread_create and then ptrhead_join).
The problem is the weird behavior by the program.
When I checked the size and the elements in the queue without creating threads and calling them on multiple threads, they were exactly what I expected.
However, when I ran the program with multiple threads, it prints out
1 items are in the queue.
2 items are in the queue.
req is:
req is: FIRST! �(x'�j|1��rj|p�rj|1����FIRST!�'�j|!�'�j|�'�j| P��(�(��(1���i|p��i|
with the last line constantly varying.
The desired output is
1 items are in the queue.
2 items are in the queue.
req is: FIRST
req is: FIRST
I guess either the way I call the execQueue on multiple threads, or the way I pop() is wrong, but I could not figure out the problem, nor could I find any source to refer to for a correct usage.
Please help me on this. Bear with me for clumsy use of pthread, as I am a beginner.
Your queue holds objects, not pointers to objects. You can address the object at the front of the queue via operator &() as you are, but as soon as you pop the queue that object is gone and that address is no longer valid. Of course, sched doesn't care, but the execQueue function you sent that address do certainly does.
The most immediate fix for your code is this:
Change this:
pthread_create(&pt1, NULL, execQueue, &(qr->front()));
To this:
// send a dynamic *copy* of the front queue node to the thread
pthread_create(&pt1, NULL, execQueue, new request(qr->front()));
And your thread proc should be changed to this:
void * execQueue(void* elem)
{
request *r = static_cast<request*>(elem);
cout << "req is: " << r->req << endl; // req is a string field
delete r;
return nullptr;
}
That said, I can think of better ways to do this, but this should address your immediate problem, assuming your request object class is copy-constructible, and if it has dynamic members, follows the Rule Of Three.
And here's your mildly sanitized c++11 version just because I needed a simple test thingie for MSVC2013 installation :)
See it Live On Coliru
#include <iostream>
#include <thread>
#include <future>
#include <mutex>
#include <queue>
#include <string>
struct request { std::string req; };
std::queue<request> q;
std::mutex queue_mutex;
void execQueue(request r) {
std::cout << "req is: " << r.req << std::endl; // req is a string field
}
bool sched(std::queue<request>& qr) {
std::thread pt1, pt2;
{
std::lock_guard<std::mutex> lk(queue_mutex);
if (!qr.empty()) {
pt1 = std::thread(&execQueue, std::move(qr.front()));
qr.pop();
}
if (!qr.empty()) {
pt2 = std::thread(&execQueue, std::move(qr.front()));
qr.pop();
}
}
if (pt1.joinable()) pt1.join();
if (pt2.joinable()) pt2.join();
return true;
}
int main()
{
auto fut = std::async(sched, std::ref(q));
if (!fut.get())
std::cout << "error" << std::endl;
}
Of course it doesn't actually do much now (because there's no tasks in the queue).

How to trace resource deadlocks?

I've wrote a timer using std::thread - here is how it looks like:
TestbedTimer::TestbedTimer(char type, void* contextObject) :
Timer(type, contextObject) {
this->active = false;
}
TestbedTimer::~TestbedTimer(){
if (this->active) {
this->active = false;
if(this->timer->joinable()){
try {
this->timer->join();
} catch (const std::system_error& e) {
std::cout << "Caught system_error with code " << e.code() <<
" meaning " << e.what() << '\n';
}
}
if(timer != nullptr) {
delete timer;
}
}
}
void TestbedTimer::run(unsigned long timeoutInMicroSeconds){
this->active = true;
timer = new std::thread(&TestbedTimer::sleep, this, timeoutInMicroSeconds);
}
void TestbedTimer::sleep(unsigned long timeoutInMicroSeconds){
unsigned long interval = 500000;
if(timeoutInMicroSeconds < interval){
interval = timeoutInMicroSeconds;
}
while((timeoutInMicroSeconds > 0) && (active == true)){
if (active) {
timeoutInMicroSeconds -= interval;
/// set the sleep time
std::chrono::microseconds duration(interval);
/// set thread to sleep
std::this_thread::sleep_for(duration);
}
}
if (active) {
this->notifyAllListeners();
}
}
void TestbedTimer::interrupt(){
this->active = false;
}
I'm not really happy with that kind of implementation since I let the timer sleep for a short interval and check if the active flag has changed (but I don't know a better solution since you can't interrupt a sleep_for call). However, my program core dumps with the following message:
thread is joinable
Caught system_error with code generic:35 meaning Resource deadlock avoided
thread has rejoined main scope
terminate called without an active exception
Aborted (core dumped)
I've looked up this error and as seems that I have a thread which waits for another thread (the reason for the resource deadlock). However, I want to find out where exactly this happens. I'm using a C library (which uses pthreads) in my C++ code which provides among other features an option to run as a daemon and I'm afraid that this interfers with my std::thread code. What's the best way to debug this?
I've tried to use helgrind, but this hasn't helped very much (it doesn't find any error).
TIA
** EDIT: The code above is actually not exemplary code, but I code I've written for a routing daemon. The routing algorithm is a reactive meaning it starts a route discovery only if it has no routes to a desired destination and does not try to build up a routing table for every host in its network. Every time a route discovery is triggered a timer is started. If the timer expires the daemon is notified and the packet is dropped. Basically, it looks like that:
void Client::startNewRouteDiscovery(Packet* packet) {
AddressPtr destination = packet->getDestination();
...
startRouteDiscoveryTimer(packet);
...
}
void Client::startRouteDiscoveryTimer(const Packet* packet) {
RouteDiscoveryInfo* discoveryInfo = new RouteDiscoveryInfo(packet);
/// create a new timer of a certain type
Timer* timer = getNewTimer(TimerType::ROUTE_DISCOVERY_TIMER, discoveryInfo);
/// pass that class as callback object which is notified if the timer expires (class implements a interface for that)
timer->addTimeoutListener(this);
/// start the timer
timer->run(routeDiscoveryTimeoutInMilliSeconds * 1000);
AddressPtr destination = packet->getDestination();
runningRouteDiscoveries[destination] = timer;
}
If the timer has expired the following method is called.
void Client::timerHasExpired(Timer* responsibleTimer) {
char timerType = responsibleTimer->getType();
switch (timerType) {
...
case TimerType::ROUTE_DISCOVERY_TIMER:
handleExpiredRouteDiscoveryTimer(responsibleTimer);
return;
....
default:
// if this happens its a bug in our code
logError("Could not identify expired timer");
delete responsibleTimer;
}
}
I hope that helps to get a better understanding of what I'm doing. However, I did not to intend to bloat the question with that additional code.

EOF in boost::async_read with thread_pull and boost 1.54

I have a strange problem with my server application. My system is simple: I have 1+ devices and one server app that communicate over a network. Protocol has binary packets with variable length, but fixed header (that contain info about current packet size). Example of packet:
char pct[maxSize] = {}
pct[0] = 0x5a //preambule
pct[1] = 0xa5 //preambule
pct[2] = 0x07 //packet size
pct[3] = 0x0A //command
... [payload]
The protocol is built on the principle of a command-answer.
I use boost::asio for communication - io_service with thread pull (4 threads) + async read/write operation (code example below) and create a "query cycle" - each 200ms by timer:
query one value from device
get result, query second value
get result, start timer again
This work very well on boost 1.53 (Debug and Release). But then i switch to boost 1.54 (especially in Release mode) magic begins. My server successfuly starts, connects to device and starts "query cycle". For about 30-60 seconds everything work well (I receive data, data is correct), but then I start receive asio::error on last read handle (always in one place). Error type: EOF. After recieving the error, I must disconnect from device.
Some time of googling give me info about EOF indicate that other side (device in my case) initiated disconnect procedure. But, according to the logic of the device it can not be true.
May somebody explain what's going on? May be i need set some socket option or defines? I see two possible reason:
my side init disconnect (with some reason, that i don't know) and EOF is answer of this action.
some socket timeout firing.
My environment:
OS: Windows 7/8
Compiler: MSVC 2012 Update 3
Sample code of main "query cycle". Is adapted from official boost chat example All code simplified for reduce space :)
SocketWorker - low level wrapper for sockets
DeviceWorker - class for device communication
ERes - internal struct for error store
ProtoCmd and ProtoAnswer - wrapper for raw array command and answer (chat_message
analog from boost chat example)
lw_service_proto namespace - predefined commands and max sizes of packets
So, code samples. Socket wrapper:
namespace b = boost;
namespace ba = boost::asio;
typedef b::function<void(const ProtoAnswer answ)> DataReceiverType;
class SocketWorker
{
private:
typedef ba::ip::tcp::socket socketType;
typedef std::unique_ptr<socketType> socketPtrType;
socketPtrType devSocket;
ProtoCmd sendCmd;
ProtoAnswer rcvAnsw;
//[other definitions]
public:
//---------------------------------------------------------------------------
ERes SocketWorker::Connect(/*[connect settings]*/)
{
ERes res(LGS_RESULT_ERROR, "Connect to device - Unknow Error");
using namespace boost::asio::ip;
boost::system::error_code sock_error;
//try to connect
devSocket->connect(tcp::endpoint(address::from_string(/*[connect settings ip]*/), /*[connect settings port]*/), sock_error);
if(sock_error.value() > 0) {
//[work with error]
devSocket->close();
}
else {
//[res code ok]
}
return res;
}
//---------------------------------------------------------------------------
ERes SocketWorker::Disconnect()
{
if (devSocket->is_open())
{
boost::system::error_code ec;
devSocket->shutdown(bi::tcp::socket::shutdown_send, ec);
devSocket->close();
}
return ERes(LGS_RESULT_OK, "OK");
}
//---------------------------------------------------------------------------
//query any cmd
void SocketWorker::QueryCommand(const ProtoCmd cmd, DataReceiverType dataClb)
{
sendCmd = std::move(cmd); //store command
if (sendCmd .CommandLength() > 0)
{
ba::async_write(*devSocket.get(), ba::buffer(sendCmd.Data(), sendCmd.Length()),
b::bind(&SocketWorker::HandleSocketWrite,
this, ba::placeholders::error, dataClb));
}
else
{
cerr << "Send command error: nothing to send" << endl;
}
}
//---------------------------------------------------------------------------
// boost socket handlers
void SocketWorker::HandleSocketWrite(const b::system::error_code& error,
DataReceiverType dataClb)
{
if (error)
{
cerr << "Send cmd error: " << error.message() << endl;
//[send error to other place]
return;
}
//start reading header of answer (lw_service_proto::headerSize == 3 bytes)
ba::async_read(*devSocket.get(),
ba::buffer(rcvAnsw.Data(), lw_service_proto::headerSize),
b::bind(&SocketWorker::HandleSockReadHeader,
this, ba::placeholders::error, dataClb));
}
//---------------------------------------------------------------------------
//handler for read header
void SocketWorker::HandleSockReadHeader(const b::system::error_code& error, DataReceiverType dataClb)
{
if (error)
{
//[error working]
return;
}
//decode header (check preambule and get full packet size) and read answer payload
if (rcvAnsw.DecodeHeaderAndGetCmdSize())
{
ba::async_read(*devSocket.get(),
ba::buffer(rcvAnsw.Answer(), rcvAnsw.AnswerLength()),
b::bind(&SocketWorker::HandleSockReadBody,
this, ba::placeholders::error, dataClb));
}
}
//---------------------------------------------------------------------------
//handler for andwer payload
void SocketWorker::HandleSockReadBody(const b::system::error_code& error, DataReceiverType dataClb)
{
//if no error - send anwser to 'master'
if (!error){
if (dataClb != nullptr)
dataClb(rcvAnsw);
}
else{
//[error process]
//here i got EOF in release mode
}
}
};
Device worker
class DeviceWorker
{
private:
const static int LW_QUERY_TIME = 200;
LWDeviceSocketWorker sockWorker;
ba::io_service& timerIOService;
typedef std::shared_ptr<ba::deadline_timer> TimerPtr;
TimerPtr queryTimer;
bool queryCycleWorking;
//[other definitions]
public:
ERes DeviceWorker::Connect()
{
ERes intRes = sockWorker.Connect(/*[connect settings here]*/);
if(intRes != LGS_RESULT_OK) {
//[set result to error]
}
else {
//[set result to success]
//start "query cycle"
StartNewCycleQuery();
}
return intRes;
}
//---------------------------------------------------------------------------
ERes DeviceWorker::Disconnect()
{
return sockWorker.Disconnect();
}
//---------------------------------------------------------------------------
void DeviceWorker::StartNewCycleQuery()
{
queryCycleWorking = true;
//start timer
queryTimer = make_shared<ba::deadline_timer>(timerIOService, bt::milliseconds(LW_QUERY_TIME));
queryTimer->async_wait(boost::bind(&DeviceWorker::HandleQueryTimer,
this, boost::asio::placeholders::error));
}
//---------------------------------------------------------------------------
void DeviceWorker::StopCycleQuery()
{
//kill timer
if (queryTimer)
queryTimer->cancel();
queryCycleWorking = false;
}
//---------------------------------------------------------------------------
//timer handler
void DeviceWorker::HandleQueryTimer(const b::system::error_code& error)
{
if (!error)
{
ProtoCmd cmd;
//query for first value
cmd.EncodeCommandCore(lw_service_proto::cmdGetAlarm, 1);
sockWorker.QueryCommand(cmd, boost::bind(&DeviceWorker::ReceiveAlarmCycle,
this, _1));
}
}
//---------------------------------------------------------------------------
//receive first value
void DeviceWorker::ReceiveAlarmCycle(ProtoAnswer adata)
{
//check and fix last bytes (remove \r\n from some commands)
adata.CheckAndFixFooter();
//[working with answer]
if (queryCycleWorking)
{
//query for second value
ProtoCmd cmd;
cmd.EncodeCommandCore(lw_service_proto::cmdGetEnergyLevel, 1);
sockWorker.QueryCommand(cmd, b::bind(&DeviceWorker::ReceiveEnergyCycle,
this, _1));
}
}
//---------------------------------------------------------------------------
//receive second value
void DeviceWorker::ReceiveEnergyCycle(ProtoAnswer edata)
{
//check and fix last bytes (remove \r\n from some commands)
edata.CheckAndFixFooter();
//[working with second value]
//start new "query cycle"
if (queryCycleWorking)
StartNewCycleQuery();
}
};
Any ideas are welcome :)
edit:
After several test I see anower picture:
this issue reproduce on boost 1.54 only (Debug and Release mode, Release - much more faster), with boost 1.53 no more error (maybe i poorly clean my code then rebuild first times....)
with boost 1.54 and 1 thread (instead of 4) all work well
I also spend some time with debugger and boost source and making some conclusion:
When i receive EOF my data is already fully received.
This EOF indicate that is nothing to transfer in this operation, i.e. socket result flag is 0 (no error), but boost operation flag if EOF (transfer bytes == 0)
At this moment I am forced to switch on boost 1.53...
I had the exact same problem and I am quite sure that this is a bug of boost::asio 1.54.0
Here is the bug report.
The solution is effectively to get back to 1.53, although there is a patch available for 1.54 in the bug report page.
If your application works fine with a single thread invoking io_service::run() but fails with four threads, you very likely have a race condition. This type of problem is difficult to diagnose. Generally speaking you should ensure your devSocket has at most one outstanding async_read() and async_write() operation. Your current implementation of SocketWorker::QueryCommand() unconditionally invokes async_write() which may violate the ordering assumption documented as such
This operation is implemented in terms of zero or more calls to the
stream's async_write_some function, and is known as a composed
operation. The program must ensure that the stream performs no other
write operations (such as async_write, the stream's async_write_some
function, or any other composed operations that perform writes) until
this operation completes.
The classic solution to this problem is to maintain a queue of outgoing messages. If a previous write is outstanding, append the next outgoing message to the queue. When the previous write completes, initiate the async_write() for the next message in the queue. When using multiple threads invoking io_service::run() you may need to use a strand as the linked answer does.