I am debugging a huge legacy code with a heap-use-after-free issue. The address sanitizer has let me know where in the code the allocation happened, where it was freed and where the use was afterwards which has been massively useful.
In order to debug, I need to keep track of this faulty memory address i.e. the pointer and how it propagates through the code from allocation to the bad use. It could potentially be copied into some other variables (either free or class member variable) eventually making its way to the bad use. Is there any way in to track the address propagation using debugger?
PS: I know the origin pointer variable that holds the address returned from the allocation (using ASAN).
Under macOS there are so-called lldb.macosx.heap scripts which are available system-wide and provide extra diagnosis means to lldb. Let's assume you have a minimalistic code like this (where quite a few pointers refer to a deleted instance of int):
#include <iostream>
void func(int* var) {
int* newPtr = var;
std::cout << *newPtr << std::endl;
}
int main() {
int* ptr = new int{2};
delete ptr;
int* ptr2 = ptr;
func(ptr);
}
First, under debug session, you need to load these tools:
(lldb) command script import lldb.macosx.heap
"malloc_info", "ptr_refs", "cstr_refs", "find_variable", and "objc_refs" commands have been installed, use the "--help" options on these commands for detailed help.
Then, provided you have the execution stopped at std::cout line, and can see the address newPtr points to (in my case it's 0x000060000000c000), you can print all other pointers which points to the same address:
(lldb) ptr_refs 0x000060000000c000
0x00007ff7bfeff1f0: stack in frame #0 of thread #1: tid 0x14e058 in variable at 0x7ff7bfeff1f0:
(int *) newPtr = 0x000060000000c000
0x00007ff7bfeff1f8: stack in frame #0 of thread #1: tid 0x14e058 in variable at 0x7ff7bfeff1f8:
(int *) var = 0x000060000000c000
0x00007ff7bfeff210: stack in frame #1 of thread #1: tid 0x14e058
0x00007ff7bfeff218: stack in frame #1 of thread #1: tid 0x14e058 in variable at 0x7ff7bfeff218:
(int *) ptr2 = 0x000060000000c000
0x00007ff7bfeff220: stack in frame #1 of thread #1: tid 0x14e058 in variable at 0x7ff7bfeff220:
(int *) ptr = 0x000060000000c000
Finally, under lldb you can inspect what frame #1 actually is with use of frame command:
(lldb) frame select 1
frame #1: 0x0000000100003115 CPPPlayground`main at main.cpp:19:5
16 int* ptr = new int{2};
17 delete ptr;
18 int* ptr2 = ptr;
-> 19 func(ptr);
^
20 }
(lldb) frame info
frame #1: 0x0000000100003115 CPPPlayground`main at main.cpp:19:5
Related
Our code has just started crashing due to a thread calling a memory alloc function and losing the pointer to the memory pool.
The pointer is initialised before the threads are started, but when the thread uses it to call the memory alloc code, it's zero.
In out init code we have
poolptr = InitMemoryPool ()
This sets it to a non zero memory address
In our .mm code on the thread we have
unsigned byte * p=(unsigned byte * ) MyAlloc ( poolptr, amount )
When the code gets into the MyAlloc function, poolptr is 0
Do I need my poolptr pointer to be volatile ? Even so, it's value is set up before the thread starts and never changes, so if the compiler is assuming it's a const, why doesn't it have it set correctly ?
Also, this has worked fine for years - and just started going wrong yesterday, simultaneously on two peoples machines.
Any ideas ?
This, what you mentioned, I don't do. What eventually worked for me is as follows:
I call my function or method and put in that function or method local instances of an class on the heap via command "new". Data that is to be returned is also paid respect to. Triggering a new thread will have access to that heap area if the heap area is a simple parameter. I.e., t= new thread( parameter);
void* function_or_method() {
clist *lstp;
string *_ps;
bool b;
try {
lstp= NULL;
lstp= new clist;
_ps= new string;
lstp->set( (void *)_ps);
mathclass *math;
thread *_thread;
math= new mathclass();
if((NULL==math))
throw Exception();
b= math->set( lstp);
if(! b) {
throw Exception();
}
_thread= new thread( math);
_thread->join();
delete _thread;
_thread= NULL;
} catch(const exception& e) {
clog <<"exception: logging" <<endl;
}
return (void*)lstp;
}
Okay, this is just C++ as well as C. I hope it will help a bit.
I'm using a lock free stack (via tagged pointers) to manage a pool of small blocks of memory. The list nodes are created and destroyed in-place when the blocks are inserted into, and removed from, the pool.
This is a very simplified test program, which only pops from the stack. So, no ABA problem and no tagged pointers. It is sufficient to demonstrate the race I'm running into:
#include <atomic>
#include <list>
#include <thread>
#include <type_traits>
struct Node {
Node() = default;
Node(Node *n) { next.store(n); }
std::atomic<Node *> next;
};
using Memory = std::aligned_storage_t<sizeof(Node)>;
struct Stack {
bool pop_and_use() {
for (Node *current_head = head.load(); current_head;) {
Node *next = current_head->next.load(); // READ RACE
if (head.compare_exchange_weak(current_head, next, std::memory_order_seq_cst)) {
current_head->~Node();
Memory *mem = reinterpret_cast<Memory *>(current_head);
new (mem) int{0}; // use memory with non-atomic write (WRITE RACE)
return true;
}
}
return false;
}
void populate(Memory *mem, int count) {
for (int i = 0; i < count; ++i) {
head = new (mem + i) Node(head.load());
}
}
std::atomic<Node *> head{};
};
int main() {
Memory storage[10000];
Stack test_list;
test_list.populate(storage, 10000);
std::thread worker([&test_list]() {
while (test_list.pop_and_use()) {
};
});
while (test_list.pop_and_use()) {};
worker.join();
return 0;
}
Thread sanitizer reports the following error:
clang++-10 -fsanitize=thread tsan_test_2.cpp -o tsan_test_2 -O2 -g2 -Wall -Wextra && ./tsan_test_2
LLVMSymbolizer: error reading file: No such file or directory
==================
WARNING: ThreadSanitizer: data race (pid=35998)
Atomic read of size 8 at 0x7fff48bd57b0 by thread T1:
#0 __tsan_atomic64_load <null> (tsan_test_2+0x46d88e)
#1 std::__atomic_base<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/atomic_base.h:713:9 (tsan_test_2+0x4b3e6c)
#2 std::atomic<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/atomic:452:21 (tsan_test_2+0x4b3e6c)
#3 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:17:39 (tsan_test_2+0x4b3e6c)
#4 main::$_0::operator()() const /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:40:22 (tsan_test_2+0x4b3e6c)
#5 void std::__invoke_impl<void, main::$_0>(std::__invoke_other, main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60:14 (tsan_test_2+0x4b3e6c)
#6 std::__invoke_result<main::$_0>::type std::__invoke<main::$_0>(main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95:14 (tsan_test_2+0x4b3e6c)
#7 decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<main::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244:13 (tsan_test_2+0x4b3e6c)
#8 std::thread::_Invoker<std::tuple<main::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253:11 (tsan_test_2+0x4b3e6c)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196:13 (tsan_test_2+0x4b3e6c)
#10 <null> <null> (libstdc++.so.6+0xbd6de)
Previous write of size 4 at 0x7fff48bd57b0 by main thread:
#0 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:21:9 (tsan_test_2+0x4b3d5d)
#1 main /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:43:20 (tsan_test_2+0x4b3d5d)
Location is stack of main thread.
Location is global '??' at 0x7fff48bad000 ([stack]+0x0000000287b0)
Thread T1 (tid=36000, running) created by main thread at:
#0 pthread_create <null> (tsan_test_2+0x4246bb)
#1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xbd994)
#2 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 (libc.so.6+0x21b96)
SUMMARY: ThreadSanitizer: data race (/home/BOSDYN/akhripin/tmp/tsan_test_2+0x46d88e) in __tsan_atomic64_load
==================
ThreadSanitizer: reported 1 warnings
The problem arises when the two threads read the same value of current_head, but one of them completes the pop and overwrites the node before the other has a chance to read current_head->next.
This is similar to the problem discussed here: Why would 'deleting' nodes in this lock-free stack class would cause race condition? except the memory is not actually being deallocated.
I know that from the machine's perspective, this race is benign -- if the read race occurs, the compare-and-swap will not succeed -- but I think this is still getting into undefined behavior territory in C++.
Is there any way to write this code without getting a race condition?
Is there any way to annotate the code to make thread sanitizer ignore it? I experimented with __tsan_acquire and __tsan_release but could not find something that consistently worked.
Update I'm pretty convinced that there is no way to perform the atomic read safely in standard C++ -- the object just doesn't exist any more. But -- can I go from relying on undefined behavior to relying on implementation-defined behavior? What's the best I could do, given typical architectures and toolchains (x86/ARM, gcc/clang)?
Update 2 One implementation-specific approach that seems to work is to replace the load with inline assembly:
inline Node *load_next_wrapper(Node *h) {
Node *ret;
asm volatile("movq (%1), %0" : "=r"(ret) : "r"(&h->next));
return ret;
}
This is both architecture and compiler specific -- but I think this does replace "undefined" behavior with "implementation-defined" behavior.
Tagged pointers are fine if you simply want to reuse the same nodes in the data structure, i.e., you don't destroy it, but simply put it on a free-list so it can be reused when you need a new node in the next push operation. In this case tagged pointers are sufficient to prevent the ABA problem, but they are no solution to the _ memory reclamation problem_ that you face here.
Another object of some type will be constructed in the same location. Eventually, it will be destroyed and the memory would return to the pool.
This is the real issue - you are destroying the object and reusing the memory for something else. As many others have already explained in the comments this causes undefined behavior. I am not sure what you mean by "return to the pool" - return to the memory manager? Ignoring the UB for a moment - you are right that this race is usually benign (from the hardware perspective), but if you do release the memory at some point, you could actually run into a segmentation fault (e.g. in case the memory manager decides to return the memory to the OS).
How to avoid undefined behavior in this scenario
If you want to reuse the memory for something else, you have to use a memory reclamation scheme like lock-free reference counting, hazard pointers, epoch based reclamation or DEBRA. These can ensure that an object is only destroyed once it is guaranteed that all references to it have been dropped, so it can no longer be accessed by any thread.
My xenium library provides C++ implementations of various reclamation schemes (including all those previously mentioned) that you could use in this situation.
As usual, we use pthread_setspecific to bind a dynamically allocated block to a global key.
void do_something()
{
//get thread specific data
int* glob_spec_var = pthread_getspecific(glob_var_key);
*glob_spec_var += 1;
}
void* thread_func(void *arg)
{
int *p = malloc(sizeof(int));
*p = 1;
pthread_setspecific(glob_var_key, p);
do_something();
pthread_setspecific(glob_var_key, NULL);
free(p);
pthread_exit(NULL);
}
However, if I simplify thread_func to this:
void do_something(int* p)
{
//get thread specific data
int* glob_spec_var = p; //pthread_getspecific(glob_var_key);
*glob_spec_var += 1;
}
void* thread_func(void *arg)
{
int *p = malloc(sizeof(int));
*p = 1;
// pthread_setspecific(glob_var_key, p);
do_something(p);
// pthread_setspecific(glob_var_key, NULL);
free(p);
pthread_exit(NULL);
}
It will do exactly the same thing with the last version. the pointer p is also different in each thread. So why do we have to bind the memory to a key, rather than just keep the pointer?
You can indeed implement your own thread-specific storage by allocating it within the thread start function, then passing a pointer to it to every thread function, and freeing it on thread exit.
The advantage of the pthread_setspecific() / pthread_getspecific() interface is that it lets you avoid that bookkeeping - in particular, the need to pass that pointer to your thread-specific storage down through all of your code paths in case some leaf function needs it is quite onerous.
It also means that library code can access thread-local storage without requiring the library user to set it up and pass it in to the library on every call.
It seems the breakpoint is due to heap corruption.
Here's a snapshot of two frames from the call stack:
First:
void QString::free(Data *d)
{
#ifdef QT3_SUPPORT
if (d->asciiCache) {
QMutexLocker locker(asciiCacheMutex());
Q_ASSERT(asciiCache);
asciiCache->remove(d);
}
#endif
qFree(d);//Breakpoint here, d = 0x08c9efd4
}
Second:
void qFree(void *ptr)
{
::free(ptr); //Breakpoint here, ptr = 0x00000000
}
What makes me confused is the pointer is 0x08c9efd4 before it is passed to qFree and suddenly becomes NULL when it is passed to qFree.
What may cause the sudden change of the pointer ?
I am creating 5 thread here using ThrdFunc. Here each thread update the listBox.
I was expecting message in this way. Initially come in this way but after some time
Thread1:Adding msg
Thread2:Adding msg
Thread3:Adding msg
But after some time I get message like
Thread0:Adding msg
Thread18967654:Adding msg
Thread18967654:Adding msg
Thread18967654:Adding msg
This is the code:
for (int i = 0;i<6;i++)
{
nThreadNo = i+1;
hWndProducer[i] = CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)ProducerThrdFunc,(void*)&nThreadNo,0,&dwProducerThreadID[i]);
if (hWndProducer[i] == NULL)
{
//ErrorHandler(TEXT("CreateThread"));
ExitProcess(3);
}
}
DWORD WINAPI ThrdFunc ( LPVOID n )
{
int *nThreadNo = (int*)n;
char chThreadNo[3];
memset(chThreadNo,0,3);
while(1)
{
itoa(*nThreadNo,chThreadNo,10);
char* pMsg1 = new char[100];
char* pMsg2 = new char[100];
memset(pMsg1,0,100);
memset(pMsg2,0,100);
strcpy(pMsg1," Thread No:");
strcat(pMsg1,chThreadNo);
strcat(pMsg1," Adding Msg:");
PostMessage(stThreadInfoProd.hWndHandle,UWM_ONUPDATEPRODUCERLIST,(WPARAM)pMsg1,0);
}
return 0;
}
Most likely nThreadNo is allocated on the stack. You're giving each thread a pointer to one of it's elements.
Once the function creating the threads returns, the array is no longer valid, but the thread functions are still pointing to it. The memory the threads are holding pointers to will most likely be overwritten, causing what was originally the thread ID to be overwritten with garbage.
Anything you pass another thread should generally be allocated on heap, either via malloc type functions or new, preferably new since this is C++.
For example, instead of int nThreadNo[6], use int* nThreadNo = new int[6]. However, keep in mind that you will have to delete[] the memory nThreadNo points to when you're done with it.
Well, I can't be sure because you've not given all your code.
However, it looks like nThreadNo is a local variable, defined on the stack of the main thread. You are passing the address of this variable to the threads, but you should be passing the value, or passing some heap allocated memory.
What you are doing is morally equivalent to returning from a function a pointer to a local variable, e.g.
int* foo()
{
int i;
return &i;
}
The simplest way to make your code behave is to make the following changes:
CreateThread(..., (void*)nThreadNo, ...
int nThreadNo = (int)n;
nThreadNo has to be global because you are giving a pointer to it to your new thread.