Boost lock-free queue is triggering clang's thread sanitizer - c++

I created a lock-free object pool using boost's lock-free queue, version 1.68. While it seems to be working fine, I keep getting these warnings from clang's thread sanitizer in the function pop().
WARNING: ThreadSanitizer: data race (pid=32452)
Write of size 2 at 0x7b4400081e00 by thread T54 (mutexes: write M3349, write M3517):
#0 boost::lockfree::detail::tagged_index::set_index(unsigned short) /opt/boost/boost/lockfree/detail/freelist.hpp:292:15 (Exchange-tests+0xb37c09)
#1 boost::lockfree::detail::fixed_size_freelist<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node, boost::lockfree::detail::runtime_sized_freelist_storage<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node, std::allocator<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node> > >::deallocate_impl(unsigned short) /opt/boost/boost/lockfree/detail/freelist.hpp:592:33 (Exchange-tests+0xb3c31f)
#2 void boost::lockfree::detail::fixed_size_freelist<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node, boost::lockfree::detail::runtime_sized_freelist_storage<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node, std::allocator<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node> > >::deallocate<true>(unsigned short) /opt/boost/boost/lockfree/detail/freelist.hpp:580:13 (Exchange-tests+0xb3c23b)
#3 void boost::lockfree::detail::fixed_size_freelist<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node, boost::lockfree::detail::runtime_sized_freelist_storage<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node, std::allocator<boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::node> > >::destruct<true>(boost::lockfree::detail::tagged_index) /opt/boost/boost/lockfree/detail/freelist.hpp:478:9 (Exchange-tests+0xb3c1d7)
#4 bool boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::pop<unsigned long>(unsigned long&) /opt/boost/boost/lockfree/queue.hpp:418:39 (Exchange-tests+0xb3c102)
#5 boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::pop(unsigned long&) /opt/boost/boost/lockfree/queue.hpp:375:16 (Exchange-tests+0xb3bda8)
#6 ObjectPool<MarketsController>::borrowObj()
...
Previous atomic read of size 4 at 0x7b4400081e00 by thread T52 (mutexes: write M3339, write M3518):
#0 __tsan_atomic32_load /tmp/final/llvm.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:535:3 (Exchange-tests+0x503e11)
#1 std::atomic<boost::lockfree::detail::tagged_index>::load(std::memory_order) const /usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/atomic:250:2 (Exchange-tests+0xb37e97)
#2 bool boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::pop<unsigned long>(unsigned long&) /opt/boost/boost/lockfree/queue.hpp:394:54 (Exchange-tests+0xb3beca)
#3 boost::lockfree::queue<unsigned long, boost::lockfree::fixed_sized<true> >::pop(unsigned long&) /opt/boost/boost/lockfree/queue.hpp:375:16 (Exchange-tests+0xb3bda8)
#4 ObjectPool<MController>::borrowObj()
...
Following boost's code I see that this boost/lockfree/detail/freelist.hpp:292:
void set_index(index_t i)
{
index = i; // this line
}
and this boost/lockfree/queue.hpp:394:54:
tagged_node_handle next = head_ptr->next.load(memory_order_acquire);
I'm not sure how to react to this, but how worried should I be? Is this a bug in boost or the sanitizer? Or maybe something wrong I did?

Related

How does a unsigned char vector deallocation crash a program with a segfault...? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm literally deallocating a vector of unsigned chars during just normal object deallocation, and it crashes with a segfault at the vector_base deallocation free():
[Switching to Thread 17648.0x3528]
0x00007ff9ba0a9606 in ntdll!RtlAllocateHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) back
#0 0x00007ff9ba0a9606 in ntdll!RtlAllocateHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x00007ff9ba0a5d21 in ntdll!RtlFreeHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2 0x00007ff9b9839c9c in msvcrt!free () from C:\WINDOWS\System32\msvcrt.dll
#3 0x00000000004bc540 in __gnu_cxx::new_allocator<unsigned char>::deallocate(unsigned char*, unsigned long long) ()
#4 0x00000000004ea87b in std::allocator_traits<std::allocator<unsigned char> >::deallocate(std::allocator<unsigned char>&, unsigned char*, unsigned long long) ()
#5 0x00000000004df392 in std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_deallocate(unsigned char*, unsigned long long) ()
#6 0x00000000004df436 in std::_Vector_base<unsigned char, std::allocator<unsigned char> >::~_Vector_base() ()
#7 0x000000000050110d in std::vector<unsigned char, std::allocator<unsigned char> >::~vector() ()
#8 0x0000000000420dd4 in Text::~Text() ()
#9 0x000000000041c9f7 in Scene::clearOnScreenText() ()
#10 0x0000000000410a52 in Application::NextScene() ()
#11 0x0000000000412a41 in Application::update() ()
#12 0x00000000004119a8 in Application::Run()::{lambda()#2}::operator()() const ()
#13 0x000000000041501a in std::_Function_handler<void (), Application::Run()::{lambda()#2}>::_M_invoke(std::_Any_data const&) ()
#14 0x00000000004cad92 in std::function<void ()>::operator()() const ()
#15 0x000000000051fba9 in std::intervalThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<void ()>, std::function<void ()>, std::function<void ()>, std::function<bool ()>, long long, std::vector<long long, std::allocator<long long> >*, bool)::{lambda()#1}::operator()() const ()
Just three questions:
How is this even possible?
What did I monumentally do wrong...?
Most importantly, how does one fix this?
Side note:
I have had nothing but problems with deallocating memory recently (specificly in this program), so would there possibly be something wrong with MinGW, or is possibly GDB not reading the stack correctly? All debugging symbols are off, and optimation is at 0;
How is this even possible?
With undefined behavior, anything is possible. :) More helpfully, what's probably happening is that your heap has been corrupted (e.g. by a bad memory write somewhere), and the new_allocator<unsigned char>::deallocate() method tried to dereference a bad pointer in the heap's metadata, which caused the crash... but the damage had been silently done sometime earlier in your program's execution.
Another possibility is that clearOnScreenText() tried to call delete on an invalid (but non-NULL) Text * pointer, and so when Text::~Text() tries to run the destructor of the std::vector<char> member variable, it's trying to destroy a "vector object" that is really just arbitrary bytes that are not a valid state for a vector, with catastrophic consequences.
Most importantly, how does one fix this?
If you can run your code on Linux, valgrind is a valuable tool in situations like this. Under Windows, there are similar tools (I think one is called Electric Fence, but I forget what else there is out there). Short of that, you might have to just start playing "twenty questions" with the code, by commenting out various parts of the program until the crash goes away, then adding them back in until the crash comes back, and repeating until you have a better understanding of which parts of the code are required to execute in order to reproduce the crash. Once you've figure out what code to look at, you can start trying figure out what is wrong in the suspect code. Very tedious, but sometimes that is the only way.

malloc() memory corruption by writing an int array only for a specific amount

I have a memory corruption in my code given as
*** Error in `./match': malloc(): memory corruption: 0x0000000001036470 ***
Aborted (core dumped)
Strangely this happens not before having an input of about 30-40 elements.
Since I couldn't figure out what's going on I did some research and read that valgrind is a good choice. I'm very new into this so I ran valgrind via two options (not sure the one makes sense) which points at subsequent code lines:
void match(vector<APD> apdvec_database, const char* OUTPUTFILE, bool EXTRACT_APD_VALUE)
{
(...)
const int N = apdvec_database.size();
vector<vector<double> > cost (N, vector<double>(N,0));
(...)
// Calculate amount of edges
double number=0;
double n=N;
double k=2;
number=binom(n, k);
// convert edges
int node_num;
long int edge_num;
int* edges;
double* weights;
node_num=N;
edge_num=(int)number;
[line 561:] edges = new int[2*edge_num];
weights = new double[edge_num];
cout << "\t\t " << node_num << " APDs and " << edge_num << " edges calculated" << endl;
int e=0;
for(size_t i = 0; i<apdvec_database.size(); i++)
{
for(size_t j = 0; j<apdvec_database.size(); j++)
{
if(i!=j)
{
// Assign edges among themselves
[line 575:] edges[2*e] = (int)i;
edges[2*e+1] = (int)j;
// Use only triangular matrix with diagonals
if(j>i){ weights[e] = (double)cost[i][j];}
else{ if((int)+(int)j<node_num) weights[e] = (double)cost[i][j]; }
e++;
}
}
}
Due to the output I marked two specific lines via [line] at the beginning: The output is:
valgrind options: --tool=memcheck --leak-check=yes
==24195== at 0x40D5CE: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:575)
==24195== by 0x410866: main (match.cpp:680)
==24195== Address 0x8be5a38 is 0 bytes after a block of size 11,448 alloc'd
==24195== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24195== by 0x40D4A6: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:561)
==24195== by 0x410866: main (match.cpp:680)
==24195==
==24195== Invalid write of size 4
==24195== at 0x40D5D2: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:576)
==24195== by 0x410866: main (match.cpp:680)
==24195== Address 0x8be5a3c is 4 bytes after a block of size 11,448 alloc'd
==24195== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24195== by 0x40D4A6: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:561)
==24195== by 0x410866: main (match.cpp:680)
==24195==
==24195== Invalid write of size 8
==24195== at 0x40D607: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:580)
==24195== by 0x410866: main (match.cpp:680)
==24195== Address 0x158e86c8 is 0 bytes after a block of size 11,448 alloc'd
==24195== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24195== by 0x40D4D3: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:562)
==24195== by 0x410866: main (match.cpp:680)
==24195==
==24195== Invalid write of size 8
==24195== at 0x40D5EA: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (match.cpp:579)
==24195== by 0x410866: main (match.cpp:680)
==24195== Address 0x158e87a0 is 128 bytes inside a block of size 184 free'd
==24195== at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24195== by 0x508A5B8: THashList::Delete(char const*) (THashList.cxx:199)
==24195== by 0x5515ACF: TDirectoryFile::Close(char const*) (TDirectoryFile.cxx:562)
==24195== by 0x550A549: TFile::Close(char const*) (TFile.cxx:935)
==24195== by 0x410623: database_irradiated(char const*, bool, std::vector<APD, std::allocator<APD> >&, double&) (match.cpp:317)
==24195== by 0x410820: main (match.cpp:655)
==24195==
valgrind options: -fno-inline
==23041== Invalid write of size 4
==23041== at 0x416D17: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041== Address 0x8be5a38 is 0 bytes after a block of size 11,448 alloc'd
==23041== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23041== by 0x416C6E: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041==
==23041== Invalid write of size 4
==23041== at 0x416D1B: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041== Address 0x8be5a3c is 4 bytes after a block of size 11,448 alloc'd
==23041== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23041== by 0x416C6E: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041==
==23041== Invalid write of size 8
==23041== at 0x416D72: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041== Address 0x158e86c8 is 0 bytes after a block of size 11,448 alloc'd
==23041== at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23041== by 0x416C9B: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041==
==23041== Invalid write of size 8
==23041== at 0x416D45: match(std::vector<APD, std::allocator<APD> >, char const*, bool) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417AA9: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041== Address 0x158e87a0 is 128 bytes inside a block of size 184 free'd
==23041== at 0x4C2C2BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23041== by 0x508A5B8: THashList::Delete(char const*) (THashList.cxx:199)
==23041== by 0x5515ACF: TDirectoryFile::Close(char const*) (TDirectoryFile.cxx:562)
==23041== by 0x550A549: TFile::Close(char const*) (TFile.cxx:935)
==23041== by 0x417901: database_irradiated(char const*, bool, std::vector<APD, std::allocator<APD> >&, double&) (in /home/ben/analysis/APDs/match/Combine/match)
==23041== by 0x417A6D: main (in /home/ben/analysis/APDs/match/Combine/match)
==23041==
Interestingly via valgrind the program does not get stuck resp. does not throw a malloc() and instead it even ends successfully with reasonable output.
I obviously forget to comment further here: Nevertheless, I had to declare each variable clearly. Probably there was a conflict because I included Cern ROOT framework. But this was then solved anyhow by declaring each variable with care.

Can valgrind report a memory address of a lost block (for debugging recursive function calls)?

This question is the most similar to mine, but it's rather old, so I wonder if anything has changed since then.
The valgrind output for me is:
==29443== 109 (16 direct, 93 indirect) bytes in 2 blocks are definitely lost in loss record 270 of 309
==29443== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29443== by 0x4F4E8DB: grl::Configuration::Configuration(grl::Configuration const&) (configuration.h:192)
==29443== by 0x4F49973: grl::YAMLConfigurator::load(YAML::Node const&, grl::Configuration*, std::string const&) (configurable.cpp:74)
==29443== by 0x4F499FC: grl::YAMLConfigurator::load(YAML::Node const&, grl::Configuration*, std::string const&) (configurable.cpp:75)
==29443== by 0x4F499FC: grl::YAMLConfigurator::load(YAML::Node const&, grl::Configuration*, std::string const&) (configurable.cpp:75)
==29443== by 0x4F499FC: grl::YAMLConfigurator::load(YAML::Node const&, grl::Configuration*, std::string const&) (configurable.cpp:75)
==29443== by 0x40C78E: grl::YAMLConfigurator::load(std::string, grl::Configuration*, std::string const&) (configurable.h:321)
==29443== by 0x40B897: main (deployer.cpp:180)
Program is configured at the start with recursive calls of reading from yaml file and storing all required parameters in a map as a pair (name, allocated address). I can print these pairs. Therefore, if valgrind could tell me an address of lost values then I could get a name of a parameter and check why it is not freed.
If the functionality is not possible, what else can I use?
You can run your program under valgrind+gdb, using vgdb.
See http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
Then you can use various valgrind memcheck monitor commands to do
leak search and have the addresses/sizes of leaked blocks.
See http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands

"Conditional jump or move depends on uninitialised value", but the allocating function is not listed in the stack trace. How?

This is the part of some Valgrind log:
==1652== Conditional jump or move depends on uninitialised value(s)
==1652== at 0x868DBFC: Dfm_db::io::Layer_cell_writer::end_cell() (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_db_io_layer.C:224)
==1652== by 0x862C9FD: Dfm_db::Hdb_layer_writer::end_cell() (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/../Isrc/dfm/dfm_db_io_layer.h:916)
==1652== by 0x861197F: Dfm_db::Hdb_writer::save_layer_geometries(Dfm_db::Hdb_layer_writer&, Drc_Hierarchical_database&, unsigned long, Drc_Hierarchical_geometry_type, bool, bool, bool, bool, Dfm_produced_layer_type) (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_db_hdb_io.C:2362)
==1652== by 0x8610755: Dfm_db::Hdb_writer::save_layer(Dfm_db::Database*, Drc_Hierarchical_database&, unsigned long, unsigned long, Dfm_db::Geometry_types, char const*, Dfm_db::Run_info&, Dfm_db::Layer_origin const&, char const*, bool, char const*, char const*, bool, Pdb_security*, bool, int, bool, bool, Dfm_db::must_be, Dfm_produced_layer_type) (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_db_hdb_io.C:2102)
==1652== by 0x8595B1E: Dfm_db::Database::save_layer(char const*, unsigned long, bool) (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_database.C:2490)
==1652== by 0x8594D39: Dfm_db::Database::save_layers(std::map, std::less, std::allocator > > >&) (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_database.C:2317)
==1652== by 0x85937B8: Dfm_db::Database::save_revision(bool) (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_database.C:2082)
==1652== by 0x4C6AE76: Cockpit_cli::save_revision(Dfm_db::must_be) (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/cockpit_db_rev_cli.C:520)
==1652== by 0x4C15153: cockpit_save_revision (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/cockpit_db_hier_rev_cmds.C:529)
==1652== by 0xD8BAC67: TclEvalObjvInternal (in /amy/ic_wg_server/CACHED_WG_SERVER/ic/comp/exports.v0-0_6-19-2015_engr-aoi/mgc_home/pkgs/icv_lib.aoi/lib64/libcalibre_utils.so)
==1652== by 0xD8E3255: TclExecuteByteCode (in /amy/ic_wg_server/CACHED_WG_SERVER/ic/comp/exports.v0-0_6-19-2015_engr-aoi/mgc_home/pkgs/icv_lib.aoi/lib64/libcalibre_utils.so)
==1652== by 0xD8E7280: TclCompEvalObj (in /amy/ic_wg_server/CACHED_WG_SERVER/ic/comp/exports.v0-0_6-19-2015_engr-aoi/mgc_home/pkgs/icv_lib.aoi/lib64/libcalibre_utils.so)
==1652== Uninitialised value was created by a stack allocation
==1652== at 0x859F6DC: Dfm_db::Database::get_pl_index_level() const (/home/lvardany/tmp_IWA/ic/lv/aoi-asserts-valg/dfm/Isrc/dfm_database.C:4325)
The last line is a result of --track-origins option which shows exactly in which function an uninitialized value was created. The only magical part of this output for me is that the last function doesn't appear in the call stack. Also --num-callers option was given to 20.
My question is how is it possible that the last function doesn't appear in a call stack?
Quite easily. An example:
#include <functional>
#include <iostream>
std::function<void()> callback;
void foo()
{
int x;
callback = [&]() { if (x > 5) std::cout << "hi"; };
}
int main()
{
foo();
callback();
}
Here, foo will not appear in the callstack of invoking callback.
You're thinking too linearly.

What does the second column in the gdb stack trace mean?

I have a stack trace that looks like this:
#3 0x00007fffde86c206 in GetMedia (p_ml=0xb91560, id=<value optimized out>, select=ML_MEDIA, reload=<value optimized out>) at ../../../modules/media_library/sql_media_library.c:1170
#4 0x00007fffde86a7d0 in GetInputItemFromMedia (p_ml=0xb91560, i_media=12276000) at ../../../modules/media_library/sql_media_library.c:1204
#5 0x00007ffff6765eab in ml_CreateInputItem (this=0x7784f0) at ../../../../include/vlc_media_library.h:887
#6 MLModel::popupInfo (this=0x7784f0) at ../../../../modules/gui/qt4/components/playlist/media_library/ml_model.cpp:528
#7 0x00007ffff67a7204 in MLModel::qt_metacall (this=0x7784f0, _c=<value optimized out>, _id=17710, _a=<value optimized out>) at components/playlist/media_library/ml_model.moc.cpp:79
#8 0x00007ffff4ec8e3f in QMetaObject::activate(QObject*, QMetaObject const*, int, void**) () from /usr/lib/libQtCore.so.4
I'm wondering that the second column signifies. Also, what does the lack of it signify? As can be seen, frame #6 does not have this address, and I believe my problem( a segfault ) is being caused due to something related.
That column contains the return address from the called function just above to the caller function on that line. Its lack probably means that the function was inlined.