I am using the asio library and I am trying to asynchronously connect to a socket with use_future. My code works 90% of the time but segfaults occasionally. Here is the code that I have. I narrowed it down to find that it prints checkpoints 2,3 but not 4.
I tried running with lldb and I get this
Process 63910 stopped
* thread #4, stop reason = EXC_BAD_ACCESS (code=2, address=0x100072e08)
frame #0: 0x000000010002648c XYZ`std::__1::enable_if<is_error_code_enum<asio::error::basic_errors>::value, std::__1::error_code&>::type std::__1::error_code::operator=<asio::error::basic_errors>(this=0x0000000100072e08, __e=operation_aborted) at system_error:349:20
346 error_code&
347 >::type
348 operator=(_Ep __e) _NOEXCEPT
-> 349 {*this = make_error_code(__e); return *this;}
350
351 _LIBCPP_INLINE_VISIBILITY
352 void clear() _NOEXCEPT
Target 0: (XYZ) stopped.
Any ideas why this may segfault or if there is any fault in my logic.
struct Peer {
std::string address;
uint16_t port;
};
class bar
{
private:
std::shared_ptr<asio::ip::tcp::endpoint> endpoint;
std::shared_ptr<asio::io_context> context;
std::shared_ptr<asio::ip::tcp::socket> socket;
public:
bool foo(Peer peer);
};
bool bar::foo(Peer peer) {
endpoint = std::move(std::make_shared<asio::ip::tcp::endpoint>(asio::ip::make_address(peer.address), peer.port));
context = std::move(std::make_shared<asio::io_context>());
socket = std::move(std::make_shared<asio::ip::tcp::socket>(*context));
std::chrono::milliseconds span(100);
std::chrono::milliseconds zero_s(0);
std::cout << "checkpoint 2" << std::endl;
std::future<void> connect_status = socket->async_connect(*endpoint, asio::use_future);
context->run_for(span);
std::cout << "checkpoit 3" << std::endl;
if (connect_status.wait_for(zero_s) == std::future_status::timeout)
return false;
std::cout << "checkpoint 4" << std::endl;
connect_status.get();
}
You can also see a version that compiles here: https://compiler-explorer.com/z/4WesMM4nP.
Edit:
Thanks to everyone who shared this. I figured out the error in my code. It was a pretty simple mistake. The order of destructing is the reverse of constructing. I repeatedly call bar (which sets the socket and context) In my program execution. However, my bar method has flawed logic. The context is destructed at the move assignment. This leads to errors when I call the destructor for socket since the context it is dependent on is now out of scope. Seems that my pinpointing of the checkpoints was inaccurate I backed traced on lldb to find out that indeed it is the problem. A refined version of my code if you are running to a similar problem is like this
socket.reset();
context.reset();
endpoint.reset();
endpoint = std::make_shared<asio::ip::tcp::endpoint>(asio::ip::make_address(peer.address), peer.port);
context = std::make_shared<asio::io_context>();
socket = std::make_shared<asio::ip::tcp::socket>(*context);
It could be that you are just not handling the exceptions from context->run (e.g. the connection is refused/reset by peer).
I got the hunch because the make_error_code will naturally be a part of constructing the system_error exception.
This should not, in itself, normally segfault. However, if you are running with dynamically linked boost::system and you are not linking against the same version of Boost during build as at runtime, the error categories might not be compatible.
It's also possible that different dynamically loaded modules are linking incompatible versions.
This is a pretty long shot, but I provide the hint just in case it sparks ideas.
Related
I am currently working on a project where I use the MQTT protocol for communication.
There is a Session class in a dedicated file which basically just sets up the publish handler, i.e. the callback that is invoked, when this client receives a message (the handler checks if the topic matches "ZEUXX/var", then deserialized the binary content of the frame and subsequently unsubscribes the topic):
session.hpp:
class Session
{
public:
Session()
{
comobj = MQTT_NS::make_sync_client(ioc, "localhost", "1883", MQTT_NS::protocol_version::v5);
using packet_id_t = typename std::remove_reference_t<decltype(*comobj)>::packet_id_t;
// Setup client
comobj->set_client_id(clientId);
comobj->set_clean_session(true);
/* If someone sends commands to this client */
comobj->set_v5_publish_handler( // use v5 handler
[&](MQTT_NS::optional<packet_id_t> /*packet_id*/,
MQTT_NS::publish_options pubopts,
MQTT_NS::buffer topic_name,
MQTT_NS::buffer contents,
MQTT_NS::v5::properties /*props*/) {
std::cout << "[client] publish received. "
<< " dup: " << pubopts.get_dup()
<< " qos: " << pubopts.get_qos()
<< " retain: " << pubopts.get_retain() << std::endl;
std::string_view topic = std::string_view(topic_name.data(), topic_name.size());
std::cout << " -> topic: " << topic << std::endl;
else if (topic.substr(0, 9) == "ZEUXX/var")
{
std::cout << "[client] reading variable name: " << topic.substr(10, topic.size() - 9) << std::endl;
auto result = 99; // dummy variable, normally an std::variant of float, int32_t uint8_t
// obtained by deserialzing the binary content of the frame
std::cout << comobj->unsubscribe(std::string{topic});
}
return true;
});
}
void readvar(const std::string &varname)
{
comobj->publish(serialnumber + "/read", varname, MQTT_NS::qos::at_most_once);
comobj->subscribe(serialnumber + "/var/" + varname, MQTT_NS::qos::at_most_once);
}
void couple()
{
comobj->connect();
ioc.run();
}
void decouple()
{
comobj->disconnect();
std::cout << "[client] disconnected..." << std::endl;
}
private:
std::shared_ptr<
MQTT_NS::callable_overlay<
MQTT_NS::sync_client<MQTT_NS::tcp_endpoint<as::ip::tcp::socket, as::io_context::strand>>>>
comobj;
boost::asio::io_context ioc;
};
The client is based on a boost::asio::io_context object which happens to be the origin of my confusion. In my main file I have the following code.
main.cpp:
#include "session.hpp"
int main()
{
Session session;
session.couple();
session.readvar("speedcpu");
}
Essentially, this creates an instance of the class Session and the couple member invokes the boost::asio::io_context::run member. This runs the io_context object's event processing loop and blocks the main thread, i.e. the third line in the main function will never be reached.
I would like to initiate a connection (session.couple) and subsequently do my publish and subscribe commands (session.readvar). My question is: How do I do that correctly?
Conceptionally what I aim for is best expressed by the following python-code:
client.connect("localhost", 1883)
# client.loop_forever() that's what happens at the moment, the program
# doesn't continue from here
# The process loop get's started, however it does not block the program and
# one can send publish command subsequently.
client.loop_start()
while True:
client.publish("ZEUXX/read", "testread")
time.sleep(20)
Running the io_context object in a separate thread seems not to be working the way I tried it, any suggestions on how to tackle this problem? What I tried is the following:
Adaption in session.hpp
// Adapt the couple function to run io_context in a separate thread
void couple()
{
comobj->connect();
std::thread t(boost::bind(&boost::asio::io_context::run, &ioc));
t.detach();
}
Adpations in main.cpp
int main(int argc, char** argv)
{
Session session;
session.couple();
std::cout << "successfully started io context in separate thread" << std::endl;
session.readvar("speedcpu");
}
The std::cout line is now reached, i.e. the program does not get stuck in the couple member of the class by io_context.run(). However directly after this line I get an error: "The network connection was aborted by the local system".
The interesting thing about this is that when I use t.join() instead of t.detach() then there is no error, however I have the same behavior with t.join() as when I call io_context.run() directly, namely blocking the program.
Given your comment to the existing answer:
io_context.run() never return because it never runs out of work (it is being kept alive from the MQTT server). As a result, the thread gets blocked as soon as I enter the run() method and I cannot send any publish and subscribe frames anymore. That was when I thought it would be clever to run the io_context in a separate thread to not block the main thread. However, when I detach this separate thread, the connection runs into an error, if I use join however, it works fine but the main thread gets blocked again.
I'll assume you know how to get this running successfully in a separate thread. The "problem" you're facing is that since io_context doesn't run out of work, calling thread::join will block as well, since it will wait for the thread to stop executing. The simplest solution is to call io_context::stop before the thread::join. From the official docs:
This function does not block, but instead simply signals the io_context to stop. All invocations of its run() or run_one() member functions should return as soon as possible. Subsequent calls to run(), run_one(), poll() or poll_one() will return immediately until restart() is called.
That is, calling io_context::stop will cause the io_context::run call to return ("as soon as possible") and thus make the related thread joinable.
You will also want to save the reference to the thread somewhere (possibly as an attribute of the Session class) and only call thread::join after you've done the rest of the work (e.g. called the Session::readvar) and not from within the Session::couple.
When io_context runs out of work, it returns from run().
If you don't post any work, run() will always immediately return. Any subsequent run() also immediately returns, even if new work was posted.
To re-use io_context after it completed, use io_context.reset(). In your case, better to
use a work guard (https://www.boost.org/doc/libs/1_73_0/doc/html/boost_asio/reference/executor_work_guard.html), see many of the library examples
don't even "run" the ioc in couple() if you already run it on a background thread
If you need synchronous behaviour, don't run it on a background thread.
Also keep in mind that you need to afford graceful shutdown which is strictly harder with a detached thread - after all, now you can't join() it to know when it exited.
This is my first question on stackoverflow and I'm new to C++. I hope you can all forgive my ignorance to the probably obvious problem here, but I'm at a loss.
Basically, I'm just trying to catch events emitted by a nodejs server in my C++ client. I've successfully compiled my binary (imported boost and socketio) and much hardache. I'm trying to emit an event through a websocket connection, but I first need to ensure the connection is successful. I've been mostly following the tutorial at this link: https://socket.io/blog/socket-io-cpp/. I've also been following the source code, which can be found here: https://github.com/socketio/socket.io-client-cpp/tree/master/examples/QT
For some reason, I seem to be getting a segfault when I access my _io pointer in my bound function (in the onConnected function of the SocketHandler class).
I'm sure I'm doing something silly, but any help is appreciated. Maybe I'm misunderstanding the use of the std::bind function? I'm coming from a mostly javascript world.
main.cpp
#include "sockethandler.h"
int main()
{
SocketHandler sh;
}
sockethandler.cpp
#include <iostream>
#include "sockethandler.h"
const char name[13] = "raspberry_pi";
SocketHandler::SocketHandler() :
_io(new client())
{
using std::placeholders::_1;
_io->set_socket_open_listener(std::bind(&SocketHandler::OnConnected,this,_1));
_io->connect("http://127.0.0.1:3000");
_io->socket()->on("bot-command", [&](sio::event& ev) {
std::cout << "GOT IT!" << "\n";
//handle login message
//post to UI thread if any UI updating.
});
}
void SocketHandler::OnConnected(std::string const& nsp)
{
std::cout << "CONNECTED" << "\n";
// I can access a private class variable such as _a as a string
// here
_io->socket()->emit("join");
}
sockethandler.h
#ifndef SOCKETHANDLER_H
#define SOCKETHANDLER_H
#include <sio_client.h>
using namespace sio;
class SocketHandler {
public:
explicit SocketHandler();
private:
void OnConnected(std::string const& nsp);
std::unique_ptr<client> _io;
};
#endif // SOCKETHANDLER_H
Pretty sure the socket io library you are using is threaded. Your object is created, sets up the callback (which include references to itself), the constructor exits, main exits and the automatic (stack) variable sh is destroyed. Then the socket io library tries to run the callback which no longer has references to a valid object and it crashes. Put a debug statement in your SocketHandler destructor like cerr << "destructor called" << endl; and I'm pretty sure you'll always see that called before the program crashes.
To prove it to yourself, put a sleep(10); or whatever as the last line of code in your main to stall it from exiting and I'm guessing you'll see your program succeed.
I have a multi-threaded websocketpp server. With no clients connected when I quit the program and relaunch, it works with no issues.
However when a client is connected and I quit/relaunch, the program throws this error
[2017-08-06 15:36:05] [info] asio listen error: system:98 ()
terminate called after throwing an instance of 'websocketpp::exception'
what(): Underlying Transport Error
Aborted
I believe I have a proper disconnect sequence going and I have the following message (my own debug info) when I initiate the quit sequence
[2017-08-06 15:35:55] [control] Control frame received with opcode 8
on_close
[2017-08-06 15:35:55] [disconnect] Disconnect close local:[1000] remote:[1000]
Quitting :3
Waiting for thread
What does the asio error mean? I am hoping someone has seen this before so that I can begin troubleshooting. Thanks!
EDIT:
I am adapting the stock broadcast_server example where
typedef std::map<connection_hdl, connection_data, std::owner_less<connection_hdl> > con_list;
con_list m_connections;
Code to close connections.
lock_guard<mutex> guard(m_connection_lock);
std::cout << "Closing Server" << std::endl;
con_list::iterator it;
for (it = m_connections.begin(); it != m_connections.end(); ++it)
{
m_server.close(it->first, websocketpp::close::status::normal, "", ec);
if (ec)
{
std::cout << "> Error initiating client close: " << ec.message() << std::endl;
}
m_connections.erase(it->first);
}
Also in destructor for broadcast_server class I have a m_server.stop()
Whenever there's a websocketpp::exception, I first check anywhere I'm explicitly using the endpoint, in your case m_server.
For instance, it could be somewhere where you are calling m_server.send(...). Since you're multithreading, it's very possible that one of the threads may be trying to utilize a connection_hdl while it has already been closed by a different thread.
In that case, it's usually a websocketpp::exception invalid state. I'm not sure for the Underlying Transport Error.
You can use breakpoints to spot the culprit (or put a bunch of cout sequences in different methods, and see which sequence is broken before the exception is thrown), or use a try/catch:
try {
m_server.send(hdl, ...);
// or
m_server.close(hdl, ...);
// or really anything you're trying to do using `m_server`.
} catch (const websocketpp::exception &e) {//by safety, I just go with `const std::exception` so that it grabs any potential exceptions out there.
std::cout << "Exception in method foo() because: " << e.what() /* log the cause of the exception */ << std::endl;
}
Otherwise, I have noticed that it will sometimes throw an exception when you're trying to close a connection_hdl, even if no other thread is seemingly accessing it. But if you put it in a try/catch, although it still throws the exception, since it doesn't terminate the program, it eventually closes the handler.
Also, maybe try m_server.pause_reading(it->first) before calling close() to freeze activity from that handler.
After second look, I think the exception you're getting is thrown where you listen with m_server.listen(...). Try surrounding it with a try/catch and putting a custom logging message.
I've wrapped libcurl for a C++ daemon I've deployed on Google Container Engine. Everything works splendid except for one small problem. It segfaults whenever I call curl_slist_free_all(). It doesn't happen on the Ubuntu 14s or 16s nor macOS. It only occurs in the GKE Docker environment with Debian 8.7. This is literally my only bug and it's been bothering me for weeks.
I've wrapped resource handles with RAII style containers for exception safety (yeah, yeah... I use exceptions) and leak protection. The easy_init and easy_cleanup are in the CurlSession constructors and destructors. The global_init & cleanup are in the HTTP constructors and destructors.
I validated that there are no double-free situations, spelunked the libcurl code, and still cannot fathom why this is happening only on this OS env. I managed to attach a debugger and isolated it to the single slist cleanup call.
The only way I can get my code to work is to leak in every other env, which isn't a deal breaker, I'd just rather my memory profiler gave me a clean bill of health.
Any insight or shared pain appreciated.
My header slist wrapper:
HTTP::Headers::Headers() : slist{nullptr} {}
HTTP::Headers::Headers(const HeaderKeyValues &headers)
: slist{nullptr}
{
for (const auto& header : headers) add(header.first, header.second);
}
HTTP::Headers::~Headers() {
curl_slist_free_all(slist); // <- seems to crash on Google's Debian image
slist = nullptr;
};
void HTTP::Headers::add(const std::string& key, const std::string& value)
{
std::ostringstream os;
os << key << ": " << value;
slist = curl_slist_append(slist, os.str().c_str());
if (!slist) {
LOG(fatal) << "Failed appending to header list";
throw std::runtime_error{"Failed appending to header list"};
}
}
Subset of the dispatcher:
HTTP::Response HTTP::dispatch(const Request& req) const {
CurlSession session;
const auto handle = session.handle;
Headers headerList{req.headers};
if (req.chunked)
headerList.add("Transfer-Encoding", "chunked");
// more ... //
if (headerList.notEmpty())
curl_easy_setopt(handle, CURLOPT_HTTPHEADER, headerList.slist);
// perform the actual request
CURLcode result = curl_easy_perform(handle);
I suspect this was some sort of subtle incompatibility between the Docker build image and the Docker deploy image that only manifested when running on GKE.
In my case it was like this
if ( strcmp(req->headers, ""){
curl_slist_free_all(list);// segfault
}
if ( strcmp(req->headers, ""){
// no segfault
}
and req->headers was NULL, so whenever I remove the curl_slist_free_all line, the compiler does not produce binary code for this IF statement at all as an optimization step, so the strcmp is not called which was what actually causing the segfault not curl_slist_free_all(list);
I'm trying to implement Actor calculation model over threads on C++ using boost::thread.
But program throws weird exception during execution. Exception isn't stable and some times program works in correct way.
There my code:
actor.hpp
class Actor {
public:
typedef boost::function<int()> Job;
private:
std::queue<Job> d_jobQueue;
boost::mutex d_jobQueueMutex;
boost::condition_variable d_hasJob;
boost::atomic<bool> d_keepWorkerRunning;
boost::thread d_worker;
void workerThread();
public:
Actor();
virtual ~Actor();
void execJobAsync(const Job& job);
int execJobSync(const Job& job);
};
actor.cpp
namespace {
int executeJobSync(std::string *error,
boost::promise<int> *promise,
const Actor::Job *job)
{
int rc = (*job)();
promise->set_value(rc);
return 0;
}
}
void Actor::workerThread()
{
while (d_keepWorkerRunning) try {
Job job;
{
boost::unique_lock<boost::mutex> g(d_jobQueueMutex);
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
job = d_jobQueue.front();
d_jobQueue.pop();
}
job();
}
catch (...) {
// Log error
}
}
void Actor::execJobAsync(const Job& job)
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(job);
d_hasJob.notify_one();
}
int Actor::execJobSync(const Job& job)
{
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(boost::bind(executeJobSync, &error, &promise, &job));
d_hasJob.notify_one();
}
int rc = future.get();
if (rc) {
ErrorUtil::setLastError(rc, error.c_str());
}
return rc;
}
Actor::Actor()
: d_keepWorkerRunning(true)
, d_worker(&Actor::workerThread, this)
{
}
Actor::~Actor()
{
d_keepWorkerRunning = false;
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_hasJob.notify_one();
}
d_worker.join();
}
Actually exception that is thrown is boost::thread_interrupted in int rc = future.get(); line. But form boost docs I can't reason of this exception. Docs says
Throws: - boost::thread_interrupted if the result associated with *this is not ready at the point of the call, and the current thread is interrupted.
But my worker thread can't be in interrupted state.
When I used gdb and set "catch throw" I see that back trace looks like
throw thread_interrupted
boost::detail::interruption_checker::check_for_interruption
boost::detail::interruption_checker::interruption_checker
boost::condition_variable::wait
boost::detail::future_object_base::wait_internal
boost::detail::future_object_base::wait
boost::detail::future_object::get
boost::unique_future::get
I looked into boost sources but can't get why interruption_checker decided that worker thread is interrupted.
So someone C++ guru, please help me. What I need to do to get correct code?
I'm using:
boost 1_53
Linux version 2.6.18-194.32.1.el5 Red Hat 4.1.2-48
gcc 4.7
EDIT
Fixed it! Thanks to Evgeny Panasyuk and Lazin. The problem was in TLS
management. boost::thread and boost::thread_specific_ptr are using
same TLS storage for their purposes. In my case there was problem when
they both tried to change this storage on creation (Unfortunately I
didn't get why in details it happens). So TLS became corrupted.
I replaced boost::thread_specific_ptr from my code with __thread
specified variable.
Offtop: During debugging I found memory corruption in external library
and fixed it =)
.
EDIT 2
I got the exact problem... It is a bug in GCC =)
The _GLIBCXX_DEBUG compilation flag breaks ABI.
You can see discussion on boost bugtracker:
https://svn.boost.org/trac/boost/ticket/7666
I have found several bugs:
Actor::workerThread function does double unlock on d_jobQueueMutex. First unlock is manual d_jobQueueMutex.unlock();, second is in destructor of boost::unique_lock<boost::mutex>.
You should prevent one of unlocking, for example release association between unique_lock and mutex:
g.release(); // <------------ PATCH
d_jobQueueMutex.unlock();
Or add additional code block + default-constructed Job.
It is possible that workerThread will never leave following loop:
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
Imagine following case: d_jobQueue is empty, Actor::~Actor() is called, it sets flag and notifies worker thread:
d_keepWorkerRunning = false;
d_hasJob.notify_one();
workerThread wakes up in while loop, sees that queue is empty and sleeps again.
It is common practice to send special final job to stop worker thread:
~Actor()
{
execJobSync([this]()->int
{
d_keepWorkerRunning = false;
return 0;
});
d_worker.join();
}
In this case, d_keepWorkerRunning is not required to be atomic.
LIVE DEMO on Coliru
EDIT:
I have added event queue code into your example.
You have concurrent queue in both EventQueueImpl and Actor, but for different types. It is possible to extract common part into separate entity concurrent_queue<T> which works for any type. It would be much easier to debug and test queue in one place than catching bugs scattered over different classes.
So, you can try to use this concurrent_queue<T>(on Coliru)
This is just a guess. I think that some code can actually call boost::tread::interrupt(). You can set breakpoint to this function and see what code is responsible for this. You can test for interruption in execJobSync:
int Actor::execJobSync(const Job& job)
{
if (boost::this_thread::interruption_requested())
std::cout << "Interruption requested!" << std::endl;
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
The most suspicious code in this case is a code that has reference to thread object.
It is good practice to make your boost::thread code interruption aware anyway. It is also possible to disable interruption for some scope.
If this is not the case - you need to check code that works with thread local storage, because thread interruption flag stored in the TLS. Maybe some your code rewrites it. You can check interruption before and after such code fragment.
Another possibility is that your memory is corrupt. If no code is calling boost::thread::interrupt() and you doesn't work with TLS. This is the most hard case, try to use some dynamic analyzer - valgrind or clang memory sanitizer.
Offtopic:
You probably need to use some concurrent queue. std::queue will be very slow because of high memory contention and you will end up with poor cache performance. Good concurrent queue allow your code to enqueue and dequeue elements in parallel.
Also, actor is not something that supposed to execute arbitrary code. Actor queue must receive simple messages, not functions! Youre writing a job queue :) You need to take a look at some actor system like Akka or libcpa.