I have a program, which is using thread_local std::shared_ptr to manage some objects that are mainly accessed thread-locally. However when the thread is joined and the thread local shared_ptr is destructing, there is always SIGSEGV when debugging if the program is compiled by MinGW (Windows 10). Here is a minimum code to reproduce the bug:
// main.cpp
#include <memory>
#include <thread>
void f() {
thread_local std::shared_ptr<int> ptr = std::make_shared<int>(0);
}
int main() {
std::thread th(f);
th.join();
return 0;
}
How to compile:
g++ main.cpp -o build\main.exe -std=c++17
Compiler version:
>g++ --version
g++ (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Run using gdb it will give SIGSEGV in new thread, when the main thread is waiting for join(). It works fine when compiled by gcc, clang (Linux) and MSVC (Windows).
I tried to debug and found that, a continuous segment of memory containing the thread local shared_ptr was erased to repeated 0xfeeefeee before destruction when calling RtlpWow64SetContextOnAmd64. The frames:
RtlpWow64SetContextOnAmd64 0x00007ffd8f4deb5f
RtlpWow64SetContextOnAmd64 0x00007ffd8f4de978
SbSelectProcedure 0x00007ffd8f4ae2e0
CloseHandle 0x00007ffd8ce3655b
pthread_create_wrapper 0x00007ffd73934bac
_beginthreadex 0x00007ffd8e9baf5a
_endthreadex 0x00007ffd8e9bb02c
BaseThreadInitThunk 0x00007ffd8ec87614
RtlUserThreadStart 0x00007ffd8f4c26a1
The assembly:
...
mov %rax,(%rdi)
movdqu %xmm0,(%rsi) ; <------ erased here
call 0x7ffd8f491920 ; <ntdll!RtlReleaseSRWLockShared>
mov $0x1,%r9d
mov 0x30(%rsp),%rbx
...
later the shared_ptr is destructed, and when reading 0xfeeefeee there is SIGSEGV.
I want to know that:
Why MinGW (or Windows library?) is erasing the thread local storage before destruction? In my opinion erasing memory should only happen after the destruction. I notice that if join() is replaced by detach(), the program exits normally. Maybe join() did something to instruct the new thread to erase the storage?
Is such behavior a violation of standard? I think the standard should forbid erasing the memory before destruction. Please correct me if I'm mistaken.
This is a longstanding, open and known bug in mingw, see the corresponding issue with analyses and links on github: https://github.com/msys2/MINGW-packages/issues/2519
Yes, this violates the standard: it shouldn't crash.
Basically the order of destruction is incorrect, as you already suspected. The 0xfeeefeee is the magic number used by HeapFree() to mark the freed memory. See e.g. this post.
To quote lhmouse:
So here comes the rule of thumb: Don't use thread_local on GCC for MinGW targets.
Related
Background:
I have discovered something of an interesting edge case relating to static memory initialization across multiple threads. Specifically, I am using Howard Hinnant's TZ library which has been working fine for the rest of my code across many different threads.
Now, I am developing a logging class which relies on yet another thread and condition variable. Unfortunately, when I attempt to format a chrono time_point using date::make_zoned(data::locate_zone("UTC"), tp) the library crashes. Upon digging through tz.cpp, I find that the time zone database returned internally is evaluating to NULL. This all comes from the following snippet:
tzdb_list&
get_tzdb_list()
{
static tzdb_list tz_db = create_tzdb();
return tz_db;
}
As can be seen, the database list is stored statically. With a few printf()s and some time with GDB I can see that the same db is returned for multiple calls from the main thread but returns NULL when called from my logger thread.
If, however, I change the declaration of tzdb_list to:
static thread_local tzdb_list tz_db = create_tzdb();
Everything works as expected. This is not surprising as thread_local will cause each thread to do the heavy-lifting of creating a standalone instance of tzdb_list. Obviously this is wasteful of memory and can easily cause problems later. As such, I really don't see this as a viable solution.
Questions:
What about the invocation of one thread versus another would cause static memory to behave differently? If anything, I would expect the opposite of what is happening (eg. for the threads to 'fight' over initialized memory; not have one receive a NULL pointer).
How is it possible for a returned static reference to have multiple different values in the first place (in my case, valid memory versus NULL)?
With thread_local built into the library I get wildly different memory locations on opposite ends of the addressable region; why? I suspect that this has to do with where thread memory is allocated versus the main process memory but do not know the exact details of thread allocation regions.
Reference:
My logging thread is created with:
outputThread = std::thread(Logger::outputHandler, &outputQueue);
And the actual output handler / invocation of the library (LogMessage is just a typedef for std::tuple):
void Logger::outputHandler(LogQueue *queue)
{
LogMessage entry;
std::stringstream ss;
while (1)
{
queue->pop(entry); // Blocks on a condition variable
ss << date::make_zoned(date::locate_zone("UTC"), std::get<0>(entry))
<< ":" << levelId[std::get<1>(entry)
<< ":" << std::get<3>(entry) << std::endl;
// Printing stuff
ss.str("");
ss.clear();
}
}
Additional code and output samples available on request.
EDIT 1
This is definitely a problem in my code. When I strip everything out my logger works as expected. What is strange to me is that my test case in the full application is just two prints in main and a call to the logger before manually exiting. None of the rest of the app initialization is run but I am linking in all support libraries at that point (Microsoft CPP REST SDK, MySQL Connector for C++ and Howard's date library (static)).
It is easy for me to see how something could be stomping this memory but, even in the "full" case in my application, I don't know why the prints on the main thread would work but the next line calling into the logger would fail. If something were going sideways at init I would expect all calls to break.
I also noticed that if I make my logger static the problem goes away. Of course, this changes the memory layout so it doesn't rule out heap / stack smashing. What I do find interesting is that I can declare the logger globally or on the stack at the start of main() and both will segfault in the same way. If I declare the logger as static, however, both global and stack-based declaration work.
Still trying to create a minimal test case which reproduces this.
I am already linking with -lpthread; have been pretty much since the inception of this application.
OS is Fedora 27 x86_64 running on an Intel Xeon. Compiler:
$ g++ --version
g++ (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
It appears that this problem was caused by a bug in tz.cpp which has since been fixed.
The bug was that there was a namespace scope variable whose initialization was not guaranteed in the proper order. This was fixed by turning that variable into a function-local-static to ensure the proper initialization order.
My apologies to all who might have been impacted by this bug. And my thanks to all those who have reported it.
I've come across an hard-to-debug situation in one of my real projects where I was accidentally accessing a reference to a local variable inside a lambda that had been moved. The access was being done from another thread, but the moved lambda was kept alive until the second thread was finished.
The bug only occurred with optimizations disabled and was caused by careless refactoring.
I've created a minimal example (available here on wandbox) that reproduces the issue:
struct state
{
int x = 100;
};
template <typename TF>
void eat1(TF&& f)
{
// Call the lambda.
f();
// Simulate waiting for the second thread
// to finish.
std::this_thread::sleep_for(1000ms);
}
template <typename TF>
void eat0(TF&& f)
{
// Move the lambda to some other handler.
eat1(std::forward<TF>(f));
}
void use_state(state& s)
{
// Will print `100`.
std::cout << s.x << "\n";
// Separate thread. Note that `s` is captured by
// reference.
std::thread t{[&s]
{
// Simulate computation delay.
std::this_thread::sleep_for(500ms);
// Will print garbage.
std::cout << s.x << "\n";
}};
t.detach();
}
int main()
{
eat0([]
{
// Local lambda variable that will be accessed
// after the lambda is moved.
state s;
// Function that takes `s` by reference and
// accesses it in a separate thread after the
// lambda is moved.
use_state(s);
});
}
Surprisingly, none of the sanitizers and warning flags managed to help here.
I've tried the following combinations of compilers and sanitizers, with
-Wall -Wextra -Wpedantic -g -O0
flags always enabled:
Compilers: g++ 6.1.1 on Arch Linux x64; clang++ 3.8.0 on Arch Linux x64; g++ 5.3.1 on Fedora x64; clang++ 3.7.0 on Fedora x64.
Sanitizers: -fsanitize=address; -fsanitize=undefined, -fsanitize=thread.
None of the combinations produced any helpful diagnostic. I expected either AddressSanitizer to tell me I was accessing a dangling reference, or UndefinedSanitizer to catch UB while accessing it, or ThreadSanitizer to tell me a separate thread was accessing an invalid memory location.
Is there a reliable way to diagnose this problem? Should I post this example to any of the sanitizers' bug trackers as a feature request/defect?
valgrind's memcheck tool caught this problem at default settings. However, this kind of nasty bugs have chances of escaping memcheck. I am not sure that the problem would be caught on the real program.
The fact that the first lambda was moved is not relevant to the problem (though maybe it complicated the debugging process). The problem is due to accessing a local variable in a function that has finished its execution (again, the fact that the access happened from a different thread just made the investigation more difficult but didn't contribute to the bug in any other way). The fact that the first lambda was kept alive should by no means protect you - the local variables belong to the lambda invocation and not the lambda itself.
The title pretty much conveys all relevant information, but here's a minimal repro:
#include <atomic>
#include <cstdio>
#include <memory>
int main() {
auto ptr = std::make_shared<int>(0);
bool is_lockless = std::atomic_is_lock_free(&ptr);
printf("shared_ptr is lockless: %d\n", is_lockless);
}
Compiling this with the following compiler options produces a lock-free shared_ptr implementation:
g++ -std=c++11 -march=native main.cpp
While this doesn't:
g++ -std=c++11 -march=native -pthread main.cpp
GCC version: 5.3.0 (on Linux, using libstdc++), tested on multiple machines that should have the necessary atomic instructions to make this work.
Is there any way to force the lock-free implementation (I'd need the lock-free version, regardless of performance)?
There are two separate things:
Manipulation of the reference counter in the control block (or equivalent thing) is typically implemented with lock-free atomics whenever possible. This is not what std::atomic_is_lock_free tells you.
libstdc++'s __shared_ptr is templated on the lock policy, so you can explicitly use
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
if you know what you are doing.
std::atomic_is_lock_free tells you whether the atomic access functions (std::atomic_{store, load, exchange, compare_exchange} etc.) on shared_ptr are lock-free. Those functions are used to concurrently access the same shared_ptr object, and typical implementations will use a mutex.
If you use shared_ptr in a threaded environment, you NEED to have locks [of some kind - they could be implemented as atomic increment and decrement, but there may be places where a "bigger" lock is required to ensure no races]. The lockless version only works when there is only one thread. If you are not using threads, don't link with -lpthread.
I'm sure there is some tricky way to convince the compiler that you are not REALLY using the threads for your shared pointers, but you are REALLY in fragile territory if you do - what happens if a shared_ptr is passed to a thread? You may be able to guarantee that NOW, but someone will probably accidentally or on purpose introduce one into something that runs in a different thread, and it all breaks.
While working with clang's thread sanitizer we noticed data race warnings. We think it's due to std::string's copy-on-write technique not being thread safe, but we could be wrong. We reduced the warning we were seeing to this code:
void test3() {
std::unique_ptr<std::thread> thread;
{
auto output = make_shared<string>();
std::string str = "test";
thread.reset(new std::thread([str, output]() { *output += str; }));
// The str string now goes out of scope but due to COW
// the captured string may not have the copy of the content yet.
}
thread->join();
}
When compiled with thread sanitizer enabled:
clang++ -stdlib=libc++ -std=c++11 -O0 -g -fsanitize=thread -lpthread -o test main.cpp
or
clang++ -std=c++11 -O0 -g -fsanitize=thread -lpthread -o test main.cpp
And when run multiple times, it eventually produces this warning:
WARNING: ThreadSanitizer: data race (pid=30829)
Write of size 8 at 0x7d0c0000bef8 by thread T62:
#0 operator delete(void*) <null>:0
...
Previous write of size 1 at 0x7d0c0000befd by thread T5:
#0 std::__1::char_traits<char>::assign(char&, char const&) string:639
...
Is this a false positive from the thread sanitizer or is it a real data race? If the later,
can it be work arounded without changing the code (e.g. by passing some flags to the compiler), is this a know bug in the string implemntation (or something else)?
UPDATE: clang --version outputs:
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix
UPDATE: The cpp I use to reproduce this warning.
[edit] Assumptions below turn out to be faulty, see link in comments. T5, not T62 is the thread spawned in the code above.
It would be useful to understand the thread ID's, but I assume that T5 is the main thread and T62 is the spawned thread. It looks like the copy is made on the main thread (before the new thread is spwaned) and destroyed on the new thread (obviously). This is safe because the new thread cannot race with the main thread before it exists.
Hence, this is a thread sanitizer bug. It failed to check whether thread T62 existed at the time of the previous write.
This is quite tricky. I've summarized the logic in your code below:
In thread T62:
Create string s (with reference count)
Create output_1 pointing to s in the thread storage for T62
Create thread T5
Create output_2 pointing to s in the thread storage for T5
Sync point
In thread T5:
Append to s ** MODIFY **
Thread-safe decrement of reference count for s (not a sync point)
End of output_2 lifetime
Exit
In thread T62:
Thread-safe decrement of reference count for s (not a sync point)
End of output_1 lifetime
Deallocate s ** MODIFY **
Join
Sync point
In thread T62:
Destroy T5
As far as I can tell, the standard makes no guarantees about synchronization with regard to calling the shared_ptr deleter:
(20.8.2.2/4) For purposes of determining the presence of a data race, member functions shall access and modify only the shared_ptr and weak_ptr objects themselves and not objects they refer to.
I take this to mean that any modifications that do actually happen to the pointed-to object while calling a member function of the shared_ptr, such as any modifications the deleter might make, are considered to be outside the scope of the shared_ptr, and therefore it is not the responsibility of the shared_ptr to make sure they do not introduce a data race. For example, the modifications made to the string by T5 may not be visible to T62 by the time thread T62 tries to destroy it.
However, Herb Sutter, in his "Atomic<> weapons" talk, indicated he saw it as a bug to have the atomic decrement of the reference count in the shared_ptr destructor without both acquire and release semantics, but I'm not sure how it violates the standard.
i have multithreaded project and i've run it through valgrind with --tool=helgrind and it showed me few errors. I'm using mutex there exaxtly how i found on the net how to use it, can you please show me whats wrong?
#include <iostream>
#include <pthread.h>
#define MAX_THREADS 100
#define MAX_SESSIONS 100
static pthread_mutex_t M_CREATE_SESSION_LOCK= PTHREAD_MUTEX_INITIALIZER;
.....
void connection::proccess(threadVarsType &THREAD) {
....
pthread_mutex_lock(&M_CREATE_SESSION_LOCK);
unsigned int ii;
for (ii=0; ii<MAX_SESSIONS; ii++) {
if (SESSION[ii]==NULL) {
break;
}
}
if (ii==MAX_SESSIONS-1) {
....
pthread_mutex_unlock(&M_CREATE_SESSION_LOCK); // unlock session mutex
....
return;
} else {
....
pthread_mutex_unlock(&M_CREATE_SESSION_LOCK); // unlock session mutex
....
}
....
}
and the error messages:
==4985== Thread #1's call to pthread_mutex_lock failed
==4985== with error code 22 (EINVAL: Invalid argument)
....
==4985== Thread #1 unlocked an invalid lock at 0x4E7B40
==4985== at 0x32CD8: pthread_mutex_unlock (hg_intercepts.c:610)
....
==4985== Thread #1's call to pthread_mutex_unlock failed
==4985== with error code 22 (EINVAL: Invalid argument)
....
==4985== Thread #1's call to pthread_mutex_lock failed
==4985== with error code 22 (EINVAL: Invalid argument)
....
==4985== Thread #1 unlocked an invalid lock at 0x4E7B40
==4985== at 0x32CD8: pthread_mutex_unlock (hg_intercepts.c:610)
....
==4985== Thread #1's call to pthread_mutex_unlock failed
==4985== with error code 22 (EINVAL: Invalid argument)
First, always check the return values of your function calls. If a pthread call fails, it's a good choice to just call abort() which will core-dump if you have that enabled or drop into the debugger if you are running with one.
The pthread function calls really should never fail, which means that something is seriously wrong with your program. In a C or C++ program something that commonly causes mysterious failures is memory corruption. Use valgrind in its normal modes to check for that.
Another thing that can cause pthread calls to fail is to not compile using -pthread. If using GCC you should compile and link using gcc with a command like gcc -pthread. That will link the pthread library and it will set some preprocessor defines that may be important for your system's header files.
Some systems will successfully compile and link a program that is using pthread calls without linking it to the pthread libraries. This is done so that a program or library can be made thread-safe without actually using threads. The thread calls will be linked to dummy functions unless the real pthread library is linked. That can lead to some function calls failing.
So make sure you are building with the correct compiler options to include the pthread libraries.
Another possible cause is if you are building on some whacked-out half-and-half hybrid OS where it started as Linux 2.4 and got upgraded to Linux 2.6 NPTL at some point (I worked on something like this once). If you are attempting to compile against old header files with an outdated definition of PTHREAD_MUTEX_INITIALIZER or the wrong size for the type of pthread_mutex_t then that could cause the problem.
That error suggests something is wrong with the initialization of the mutex. It's hard to way what, but make sure you're initializing it in the right place.
On the Helgrind docs page, they mention that there could be false positives that are suppose to be suppressed ... somehow you might be bumping into those since on the surface it does not seem like you're using pthread mutexes incorrectly.
Here's what they write:
Helgrind's error checks do not work
properly inside the system threading
library itself (libpthread.so), and it
usually observes large numbers of
(false) errors in there. Valgrind's
suppression system then filters these
out, so you should not see them.
If you see any race errors reported
where libpthread.so or ld.so is the
object associated with the innermost
stack frame, please file a bug report
at http://www.valgrind.org/.
They also note that you should be using a "supported Linux distribution" ... they don't mention what exactly that means, but if you're using a non-Linux OS, that could also possibly cause some of these "false positives". It might be worth asking the development team to see what they say about this.
The error EINVAL on a call to pthread_mutex_lock means one of two things.
The mutex was created with the protocol attribute having the value PTHREAD_PRIO_PROTECT and the calling thread's priority is higher than the mutex's current priority ceiling.
or
The value specified by mutex does not refer to an initialised mutex object.
The second one seems more likely. Try initializing the mutex in your main function with int error = pthread_mutex_init(&M_CREATE_SESSION_LOCK, NULL); and check if there is an error, instead of initializing it with the macro like you are currently.