Race condition in shared_ptr doesn't happen - c++

Why there is no any race condition in my code?
Due to source here: http://en.cppreference.com/w/cpp/memory/shared_ptr
If multiple threads of execution access the same shared_ptr without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur;
class base
{
public:
std::string val1;
};
class der : public base
{
public:
std::string val2;
int val3;
char val4;
};
int main()
{
std::mutex mm;
std::shared_ptr<der> ms(new der());
std::thread t1 = std::thread([ms, &mm]() {
while (1)
{
//std::lock_guard<std::mutex> lock(mm);
std::string some1 = ms->val2;
int some2 = ms->val3;
char some3 = ms->val4;
ms->val2 = "1232324";
ms->val3 = 1232324;
ms->val4 = '1';
}
});
std::thread t2 = std::thread([ms, &mm]() {
while (1)
{
//std::lock_guard<std::mutex> lock(mm);
std::string some1 = ms->val2;
int some2 = ms->val3;
char some3 = ms->val4;
ms->val2 = "123435";
ms->val3 = 123435;
ms->val4 = '3';
}
});
std::shared_ptr<base> bms = ms;
std::thread t3 = std::thread([bms]() {
while (1)
{
bms->val1 = 434;
}
});
while (1)
{
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}

Data races do not yield compilation failure; they yield undefined behavior. That behavior could be "works fine". Or "appears to work fine but subtly breaks something 12 minutes later". Or "immediately fails."
Just because code appears to work doesn't mean it actually does. This is more true for threading code than any other kind.

I would recommend you to use valgrind tool - helgrind.
It is very hard to find race conditions sometimes when debugging multi-threading programs.
To run this tool you need to have valgrind on your computer and run it using:
valgrind --tool=helgrind ./Your_Complied_File arg1 arg2 ...

Related

Why am I getting a race condition?

I'm trying to combine multiple CGAL meshes into one single geometry.
I have the following sequential code that works perfectly fine:
while (m_toCombine.size() > 1) {
auto mesh1 = m_toCombine.front();
m_toCombine.pop_front();
auto mesh2 = m_toCombine.front();
m_toCombine.pop_front();
bool result = CGAL::Polygon_mesh_processing::corefine_and_compute_union(mesh1, mesh2, mesh2);
m_toCombine.push_back(mesh2);
}
Where m_toCombine is a std::list<Triangle_mesh_exact>.
Triangle_mesh_exact is a type of CGAL mesh (triangulated polyhedron geometry). But I don't think it's really relevant to the problem.
Unfortunately, this process is way too slow for my intended application, so I decided to use the "divide to conquer" concept and combine meshes in a parallel fashion:
class Combiner
{
public:
Combiner(const std::list<Triangle_mesh_exact>& toCombine) :
m_toCombine(toCombine) {};
~Combiner() {};
Triangle_mesh_exact combineMeshes();
void combineMeshes2();
private:
std::mutex m_listMutex, m_threadListMutex;
std::mutex m_eventLock;
std::list<MiniThread> m_threads;
std::list<Triangle_mesh_exact> m_toCombine;
std::condition_variable m_eventSignal;
std::atomic<bool> m_done = false;
//void poll(int threadListIndex);
};
Triangle_mesh_exact Combiner::combineMeshes()
{
std::unique_lock<std::mutex> uniqueLock(m_eventLock, std::defer_lock);
int runningCount = 0, finishedCount = 0;
int toCombineCount = m_toCombine.size();
bool stillRunning = false;
bool stillCombining = true;
while (stillCombining || stillRunning) {
uniqueLock.lock();
//std::lock_guard<std::mutex> lock(m_listMutex);
m_listMutex.lock();
Triangle_mesh_exact mesh1 = std::move(m_toCombine.front());
m_toCombine.pop_front();
toCombineCount--;
Triangle_mesh_exact mesh2 = std::move(m_toCombine.front());
m_toCombine.pop_front();
toCombineCount--;
m_listMutex.unlock();
runningCount++;
auto thread = new std::thread([&, this, mesh1, mesh2]() mutable {
//m_listMutex.lock();
CGAL::Polygon_mesh_processing::corefine_and_compute_union(mesh1, mesh2, mesh2);
std::lock_guard<std::mutex> lock(m_listMutex);
m_toCombine.push_back(mesh2);
toCombineCount++;
finishedCount++;
m_eventSignal.notify_one();
//m_listMutex.unlock();
});
thread->detach();
while (toCombineCount < 2 && runningCount != finishedCount) {
m_eventSignal.wait(uniqueLock);
}
stillRunning = runningCount != finishedCount;
stillCombining = toCombineCount >= 2;
uniqueLock.unlock();
}
return m_toCombine.front();
}
Unfortunately, despite being extra careful, I'm getting crashes of memory access violation or errors related to either mesh1 or mesh2 destructors.
Am I missing something?
Instead complicating things check capability of standard library:
std::reduce - cppreference.com
Triangle_mesh_exact combine(Triangle_mesh_exact& a, Triangle_mesh_exact& b)
{
auto success = CGAL::Polygon_mesh_processing::corefine_and_compute_union(a, b, b);
if (!success) throw my_combine_exception{};
return b;
}
Triangle_mesh_exact combineAll()
{
if (m_toCombine.size() == 1) return m_toCombine.front();
if (m_toCombine.empty()) throw std::invalid_argument("");
return std::reduce(std::execution::par,
m_toCombine.begin() + 1, m_toCombine.end(),
m_toCombine.front(), combine);
}

QLibrary functions work slow on first call

I'm using QLibrary to load functions from one .dll file.
I succesfully load it, succesfully resolve functions.
But when i use some function from that .dll for the first time, this function works very slow(even if it is very simple one). Next time i use it again - and the speed is just fine (immediately, as it should be).
What is the reason for such behaviour? I suspect some caсhing somewhere.
Edit 1: Code:
typedef int(*my_type)(char *t_id);
QLibrary my_lib("Path_to_lib.dll");
my_lib.load();
if(my_lib.isLoaded){
my_type func = (my_type)my_lib.resolve("_func_from_dll");
if(func){
char buf[50] = {0};
char buf2[50] = {0};
//Next line works slow
qint32 resultSlow = func(buf);
//Next line works fast
qint32 resultFast = func(buf2);
}
}
I wouldn't blame QLibrary: func simply takes long the first time it's invoked. I bet that you'll have identical results if you resolve its address using platform-specific code, e.g. dlopen and dlsym on Linux. QLibrary doesn't really do much besides wrapping the platform API. There's nothing specific to it that would make the first call slow.
There is some code smell of doing file I/O in constructors of presumably generic classes: do the users of the class know that the constructor may block on disk I/O and thus ideally shouldn't be invoked from the GUI thread? Qt makes the doing this task asynchronously fairly easy, so I'd at least try to be nice that way:
class MyClass {
QLibrary m_lib;
enum { my_func = 0, other_func = 1 };
QFuture<QVector<FunctionPointer>> m_functions;
my_type my_func() {
static my_type value;
if (Q_UNLIKELY(!value) && m_functions.size() > my_func)
value = reinterpret_cast<my_type>(m_functions.result().at(my_func));
return value;
}
public:
MyClass() {
m_lib.setFileName("Path_to_lib.dll");
m_functions = QtConcurrent::run{
m_lib.load();
if (m_lib.isLoaded()) {
QVector<QFunctionPointer> funs;
funs.push_back(m_lib.resolve("_func_from_dll"));
funs.push_back(m_lib.resolve("_func2_from_dll"));
return funs;
}
return QVector<QFunctionPointer>();
}
}
void use() {
if (my_func()) {
char buf1[50] = {0}, buf2[50] = {0};
QElapsedTimer timer;
timer.start();
auto result1 = my_func()(buf1);
qDebug() << "first call took" << timer.restart() << "ms";
auto result2 = my_func()(buf2);
qDebug() << "second call took" << timer.elapsed() << "ms";
}
}
};

Segmentation fault(core dumped) in multi threading using boost threads

When try to run my program with up to 1 thread, it works fine for a while (some seconds or minutes) but finally get segmentation fault(core dumped) or double free(faststop ) error.
Here are the function which the threads run.
//used in the Function
[Added] typedef folly::ProducerConsumerQueue<std::string*> PcapTask;
struct s_EntryItem {
Columns* p_packet; //has some arbitrary method and variables
boost::mutex _mtx;
};
//_buffersConnection.wait_and_pop()
Data wait_and_pop() {
boost::mutex::scoped_lock lock(the_mutex);
while (the_queue.empty()) {
the_condition_variable.wait(lock);
}
Data popped_value = the_queue.front();
the_queue.pop();
return popped_value;
}
struct HandlerTask {
std::string year;
folly::ProducerConsumerQueue<std::string*> queue = NULL;
};
-----------------------------------------
//The function which threads run
void Connection() {
std::string datetime, year;
uint32_t srcIPNAT_num, srcIP_num;
std::string srcIP_str, srcIPNAT_str, srcIPNAT_str_hex;
int counter = 0;
while (true) {
//get new task
HandlerTask* handlerTask = _buffersConnection.wait_and_pop();
PcapTask* pcapTask = handlerTask->queue;
year = handlerTask->year;
counter = 0;
do {
pcapTask->popFront();
s_EntryItem* entryItem = searchIPTable(srcIP_num);
entryItem->_mtx.lock();
if (entryItem->p_packet == NULL) {
Columns* newColumn = new Columns();
newColumn->initConnection(srcIPNAT_str, srcIP_str, datetime, srcIP_num);
entryItem->p_packet = newColumn;
addToSequanceList(newColumn);
} else {
bool added = entryItem->p_packet->addPublicAddress(srcIPNAT_str_hex, datetime);
if (added == false) {
removeFromSequanceList(entryItem->p_packet);
_bufferManager->addTask(entryItem->p_packet);
Columns* newColumn = new Columns();
newColumn->initConnection(srcIPNAT_str, srcIP_str, datetime, srcIP_num);
//add to ip table
entryItem->p_packet = newColumn;
addToSequanceList(newColumn);
}
}
entryItem->_mtx.unlock();
++_totalConnectionReceived;
} while (true);
delete pcapTask;
delete handlerTask;
}
}
You can use Valgrind, its very easy. Build your app in debug config and pass program executable to valgrind. It can tell you wide spectre of programming errors occuring in your app in runtime. The price of using Valgrind is that program runs considerably slower (some times tens times slower) than without Valgrind. Specically, for example, Valgrind will tell you where your your programs' memory was free'ed first when it tried to free it second time when it happens.
I'm not sure that it's the problem, but...
Are you sure that you must call delete over pcapTask?
I mean: you delete it but queue in struct HandlerTask is a class member, not a pointer to a class.
Suggestion: try to comment the line
delete pcapTask;
at the end of Connection()
--- EDIT ---
Looking at you added typedef, I confirm that (if I'm not wrong) there is something strange in your code.
pcapTask is defined as a PcapTask pointer, that is a folly::ProducerConsumerQueue<std::string*> pointer; you initialize it with a folly::ProducerConsumerQueue<std::string*> (not pointer)
I'm surprised that you can compile your code.
I think you should, first of all, resolve this antinomy.
p.s.: sorry for my bad English.

What's wrong with sequental consistency here?

I'm playing with lock-free algorithms in C and C++ and recently stumbled upon a behavior I don't quite understand. If you have the following code, running it will give you something like
reader started
writer started
iters=79895047, less=401131, eq=48996928, more=30496988
Aren't std::atomics are expected to be sequentially-consistent? If so, why does the reader sometimes see b being updated before a? I also tried to do various tricks involving memory fences with no success. The full compilable code can be seen at https://github.com/akamaus/fence_test
What's wrong with the example?
std::atomic<uint> a(0);
std::atomic<uint> b(0);
volatile bool stop = false;
void *reader(void *p) {
uint64_t iter_counter = 0;
uint cnt_less = 0,
cnt_eq = 0,
cnt_more = 0;
uint aa, bb;
printf("reader started\n");
while(!stop) {
iter_counter++;
aa = a.load(std::memory_order_seq_cst);
bb = b.load(std::memory_order_seq_cst);
if (aa < bb) {
cnt_less++;
} else if (aa > bb) {
cnt_more++;
} else {
cnt_eq++;
}
}
printf("iters=%lu, less=%u, eq=%u, more=%u\n", iter_counter, cnt_less, cnt_eq, cnt_more);
return NULL;
}
void *writer(void *p) {
printf("writer started\n");
uint counter = 0;
while(!stop) {
a.store(counter, std::memory_order_seq_cst);
b.store(counter, std::memory_order_seq_cst);
counter++;
}
}
Sequentially consistent memory ordering implies that the modification order (of the atomic objects manipulated with seq cst) observed by all threads is consistent. The program behaves as if all those operations happen interleaved in a single total order. Consider the following cases:
Writer Reader
a == 0
a = 1
b = 1
b == 1
Result: aa < bb.
Writer Reader
a = 1
a == 1
b == 0
b = 1
Result: aa > bb
With a lock, e.g. a mutex, you can make sure that the operations don't interleave.

Synchronization between threads without overload

I can't find a good solution on how to implement a good mutual exclusion on a common resource between different threads.
I've got many methods (from a class) that do a lot of access to a database, this is one of them
string id = QUERYPHYSICAL + toString(ID);
wait();
mysql_query(connection, id.c_str());
MYSQL_RES *result = mysql_use_result(connection);
while (MYSQL_ROW row = mysql_fetch_row(result)){
Physical[ID - 1].ID = atoi(row[0]);
Physical[ID - 1].NAME = row[1];
Physical[ID - 1].PEOPLE = atoi(row[2]);
Physical[ID - 1].PIRSTATUS = atoi(row[3]);
Physical[ID - 1].LIGHTSTATUS = atoi(row[4]);
}
mysql_free_result(result);
signal();
The methods wait and signal do these things:
void Database::wait(void) {
while(!this->semaphore);
this->semaphore = false;
}
void Database::signal(void) {
this->semaphore = true;
}
But in this case my CPU goes to more than 190% of usage (reading from /proc/loadavg). What should I do to reduce CPU overload and let the system be more efficient? I'm on a 800MHz RaspberryPi
You can use pthread_mutex_t init at the constructor, lock for wait, unlock for signal, destroy at the destructor.
like this:
class Mutex{
pthread_mutex_t m;
public:
Mutex(){
pthread_mutex_init(&m,NULL);
}
~Mutex(){
pthread_mutex_destroy(&m);
}
void wait() {
pthread_mutex_lock(&m);
}
void signal() {
pthread_mutex_unlock(&m);
}
} ;
You also should check the return value of the pthread_mutex functions: 0 for success, non zero means error.