Mysterious C++ threading crash - c++

The following code strangely crashes when entering the run function. None of the printfs trigger, and single-stepping into the function will cause a crash. Is it undefined behaviour, or a compiler bug? I'm on MacOS compiling with Apple clang 12.0.0. It crashes with EXC_BAD_ACCESS (code = 2).
#include <iostream>
#include <thread>
#include <vector>
struct Chunk {
// the array must be this big
// other array sizes don't trigger the bug consistently
uint8_t array[524288];
};
std::thread generator_thread;
std::mutex generator_lock;
std::vector<Chunk*> to_generate;
std::condition_variable data_ready;
std::atomic<bool> running;
void begin();
void run();
void begin() {
running = true;
auto func = [] {
run();
};
generator_thread = std::thread(func);
}
void run() {
printf("Running in generator\n");
while (running) {
printf("Running in loop\n");
Chunk *task;
// take a chunk from the queue
{
std::unique_lock<std::mutex> lock(generator_lock);
data_ready.wait(lock, [] { return to_generate.size() > 0 || !running; });
if (!running) break;
task = to_generate.back();
to_generate.pop_back();
}
printf("deref chunk\n");
// Despite never being executed in this example, this line merely existing
// will cause a crash when entering the run_generator function.
Chunk chunk = *task;
// *task.chunk;
// ^^^ Only having the line above will not trigger the bug; it must be assigned
}
}
int main(int argc, const char *argv[]) {
begin();
while (true) {
printf("run\n");
}
return 0;
}

So, when you change your function to pre-reserve a stack frame with space for a half-megabyte object ... it crashes right at the start of the function when setting up that stack frame?
That's probably because you made sizeof Chunk equal to the entire default OSX thread stack size of 512Kb. If you single-step into the function, you should be able to see the instruction that triggers the fault, and it will likely be part of the stack frame setup/function preamble.
All of this is implementation-specific (the stack frames, the per-thread stack size), but putting really big things on the stack is generally a bad idea.

Related

Expanding on where put put a critical section when dealing with loader threads and large amount of data

I've read tutorials on the subject and watched YouTube videos on it and I think I understand the reasoning behind why they are used. There are a lot of reasons why they are used, but one of the is because modern multi-core CPU's have internal caches to increase performance (L1 and L2). This might result in one core reading old information if they have old memory stored in the cache associated with that core. Adding critical sections forces those caches to be refreshed.
I'm trying to increase my understanding on where this critical section has to be put, which I feel most online information sources actually fail to explain well. That's why I'm asking the you pros here! :)
Let be give you a short example:
#include <iostream>
#include <thread>
#include <mutex>
#include <list>
static constexpr auto OneMegabyte = 1 * 1024 * 1024;
struct Result
{
char data[OneMegabyte];
};
std::mutex mutex;
std::list<Result*> results;
void write_lots_of_data(Result* result)
{
auto f = fopen("large_file.txt", "rb");
fread(result->data, OneMegabyte, 1, f);
fclose(f);
}
void read_lots_of_data(Result* result)
{
//
}
void thread()
{
auto result = new Result();
// Writes one megabyte of data from somewhere into Memory::values
write_lots_of_data(result);
std::lock_guard<std::mutex> l(mutex);
results.push_back(result);
}
int main(int argc, char** argv)
{
std::thread t(&thread);
while (true)
{
std::lock_guard<std::mutex> l(mutex);
if (results.empty())
continue;
auto first_result = results.begin();
read_lots_of_data(*first_result);
results.erase(first_result);
break;
}
return 0;
}
Access to results is protected from being read and written at the same time - this I understand. But what about the actual memory put into the results list? Do I have to put the critical section before the write_lots_of_data method to be safe or is it enough to safely protect the results list?
Your code is correct.
The mutex is guarding the list and not the data contained withing the list node. Therefore, as you have written, the filling of the data into the Result struct is unprotected and does not need protection.
When reading, only the list needs protection. The reading of the results do not need protection. Your code should work as written. It can be improved in one way:
Result* get_result()
{
std::lock_guard<std::mutex> l(mutex);
if (not results.empty())
{
Result* first_result = results.front();
results.pop_front();
return first_result;
}
return nullptr;
}
int main(int argc, char** argv)
{
std::thread t(&thread);
while (true)
{
Result* first_result = get_result();
if (first_result)
{
read_lots_of_data(first_result);
delete first_result;
}
}
return 0;
}
Limiting the lock to operate only on the list would help both performance and readability. The reader will realize that only the list is being locked and not the reading operation of the result.
Before you go too far, consider using std::unique_ptr<Result> as well in your code rather than new/delete.

Object in stack of `main` function is overwritten when first task runs (FreeRTOS)

I try to explain my problem with a simple example
typedef function<bool()> TaskCallback;
class Task
{
public:
Task(TaskCallback task_callback) : task_callback(task_callback)
{
long_string_test = "This is a long string 0123456789ABCDEF 0123456789ABCDEF 0123456789ABCDEF";
xTaskCreate(Task::RunTask, "task_name", 2560, this, 3, &task_handle);
}
~Task()
{
while(1); //Breakpoint: The destructor is never called
}
private:
static void RunTask(void* params)
{
Task* _this = static_cast<Task*>(params);
_this->task_callback(); //The program crashes here because task_callback doesn't exist
}
string long_string_test;
TaskCallback task_callback;
TaskHandle_t task_handle;
};
main.cpp
static bool Init_task() { }
void main()
{
Task task(Init_task);
vTaskStartScheduler();
//We should never get here as control is now taken by the FreeRTOS scheduler
while(1);
}
If I check the value of the string long_string_test through the debbuger in the RunTask function I find that it has a strange value, as if the string had been destroyed.
But the destructor of Task class was never called.
If I change the "main.cpp" as below the program works correctly, I think the compiler does some sort of optimization:
static bool Init_task() { }
Task task(Init_task);
void main()
{
vTaskStartScheduler();
//We should never get here as control is now taken by the FreeRTOS scheduler
while(1);
}
p.s. obviously compiler optimizations are disabled
As part of the vTaskStartScheduler call, prvPortStartFirstTask will reset the stack pointer. I can imagine that this will eventually cause other code to overwrite parts of the Task object on the discarded stack space allocated for main. You could set a data breakpoint with the debugger, but I would consider the main stack space trashed when the first task starts.
I think the best solution here is indeed to allocate the Task object statically or possibly with a heap allocation (if your system allows it).
#Botje You're right I changed my example to verify what you said.
main.cpp
int* test;
static void RunTask(void* params)
{
Print(*test); //The "test" pointer has a random value
}
void main()
{
int temp = 9999;
test = &temp;
xTaskCreate(RunTask, "task_name", 2560, NULL, 3, NULL);
vTaskStartScheduler(); //It seems that FreeRTOS clears the main() stack
//We should never get here as control is now taken by the FreeRTOS scheduler
while(1);
}

How to correctly handle SIGBUS so I can continue to search an address?

I'm currently working on a project running on a heavily modified version of Linux patched to be able to access a VMEbus. Most of the bus-handling is done, I have a VMEAccess class that uses mmap to write at a specific address of /dev/mem so a driver can pull that data and push it onto the bus.
When the program starts, it has no idea where the slave board it's looking for is located on the bus so it must find it by poking around: it tries to read every address one by one, if a device is connected there the read method returns some data but if there isn't anything connected a SIGBUS signal will be sent to the program.
I tried several solutions (mostly using signal handling) but after some time, I decided on using jumps. The first longjmp() call works fine but the second call to VMEAccess::readWord() gives me a Bus Error even though my handler should prevent the program from crashing.
Here's my code:
#include <iostream>
#include <string>
#include <sstream>
#include <csignal>
#include <cstdlib>
#include <csignal>
#include <csetjmp>
#include "types.h"
#include "VME_access.h"
VMEAccess *busVME;
int main(int argc, char const *argv[]);
void catch_sigbus (int sig);
void exit_function(int sig);
volatile BOOL bus_error;
volatile UDWORD offset;
jmp_buf env;
int main(int argc, char const *argv[])
{
sigemptyset(&sigBusHandler.sa_mask);
struct sigaction sigIntHandler;
sigIntHandler.sa_handler = exit_function;
sigemptyset(&sigIntHandler.sa_mask);
sigIntHandler.sa_flags = 0;
sigaction(SIGINT, &sigIntHandler, NULL);
/* */
struct sigaction sigBusHandler;
sigBusHandler.sa_handler = catch_sigbus;
sigemptyset(&sigBusHandler.sa_mask);
sigBusHandler.sa_flags = 0;
sigaction(SIGBUS, &sigBusHandler, NULL);
busVME = new VMEAccess(VME_SHORT);
offset = 0x01FE;
setjmp(env);
printf("%d\n", sigismember(&sigBusHandler.sa_mask, SIGBUS));
busVME->readWord(offset);
sleep(1);
printf("%#08x\n", offset+0xC1000000);
return 0;
}
void catch_sigbus (int sig)
{
offset++;
printf("%#08x\n", offset);
longjmp(env, 1);
}
void exit_function(int sig)
{
delete busVME;
exit(0);
}
As mentioned in the comments, using longjmp in a signal handler is a bad idea. After doing the jump out of a signal handler your program is effectively still in the signal handler. So calling non-async-signal-safe functions leads to undefined behavior for example. Using siglongjmp won't really help here, quoting man signal-safety:
If a signal handler interrupts the execution of an unsafe function, and the handler terminates via a call to longjmp(3) or siglongjmp(3) and the program subsequently calls an unsafe function, then the behavior of the program is undefined.
And just for example, this (siglongjmp) did cause some problems in libcurl code in the past, see here: error: longjmp causes uninitialized stack frame
I'd suggest to use a regular loop and modify the exit condition in the signal handler (you modify the offset there anyway) instead. Something like the following (pseudo-code):
int had_sigbus = 0;
int main(int argc, char const *argv[])
{
...
for (offset = 0x01FE; offset is sane; ++offset) {
had_sigbus = 0;
probe(offset);
if (!had_sigbus) {
// found
break;
}
}
...
}
void catch_sigbus(int)
{
had_sigbus = 1;
}
This way it's immediately obvious that there is a loop, and the whole logic is much easier to follow. And there are no jumps, so it should work for more than one probe :) But obviously probe() must handle the failed call (the one interrupted with SIGBUS) internally too - and probably return an error. If it does return an error using the had_sigbus function might be not necessary at all.

Use of set_jmp/longjmp in C++ is not working

I am trying to implement simple user level thread library in c.when one thread start and this thread call second thread. this second thread run correctly but when it exit program crash.here is my coding.
//**********************************************
#include <setjmp.h>
typedef void *any_ptr;
/* Define Boolean type, and associated constants. */
typedef int Boolean;
typedef void (*ThreadFunc)(any_ptr);
#define TRUE ((Boolean)1);
#define FALSE ((Boolean)0);
typedef struct TheadSystem
{
queue<any_ptr> readyQ;
// currently executing thread
jmp_buf lastContext; // location on which the system jumps after all threads have exited
char name[30]; // optional name string of a thread, may be used for debugging
jmp_buf context; // saved context of this thread
signal_t *sig; // signal that wakes up a waiting thread
ThreadFunc func; // function that this thread started executing
any_ptr arg;
}TheadSystem;
void t_start(ThreadFunc f, any_ptr v, char *name);
void t_yield();
void block();
void unblock();
void t_sig(Condition cond, any_ptr val, Boolean queue_signal);
void t_fork(ThreadFunc f, any_ptr v, char *name);
void t_exit(int val);
My implementation of threads.h
#include "threads.h"
#include<iostream>
#include<queue>
using namespace std;
TheadSystem th;
queue<any_ptr> blocked_queue;
jmp_buf set_env,ready_env,yeild_buf;
void t_start(ThreadFunc f, any_ptr v, char *name){
if(!th.ready_queue.empty()){
cout<<"sorry thread already started now you have to create by t_fork:"<<endl;
}
else{
th.ready_queue.push(th.context);
if(!setjmp(th.context)){
memcpy(th.lastContext,th.context,sizeof(jmp_buf));
th.arg=v;
th.func=f;
//memcpy(th.currentThread->context,set_env,sizeof(jmp_buf));
//cout<<"when jmp buf set then:"<<endl;
th.ready_queue.push(th.context);
th.func(th.arg);
}
//cout<<"after come back from long jump:"<<endl;
}
}
void t_yield(){
jmp_buf *j=(jmp_buf *)th.ready_queue.front();
th.ready_queue.front()=th.context;
longjmp(*j,2);
}
void t_fork(ThreadFunc f, any_ptr v, char *name){
memcpy(th.lastContext,th.context,sizeof(jmp_buf));
if(!setjmp(th.context)){
f(v);
th.ready_queue.push(th.context);
}else
{
}
}//end of t_fork
void t_exit(int val){
cout<<"before long jump in t_exit"<<endl;
jmp_buf *j=(jmp_buf *)th.ready_queue.front();
th.ready_queue.pop();
longjmp(*j,2);
}
void block(){
blocked_queue.push(th.context);
jmp_buf *j=(jmp_buf *)th.ready_queue.front();
th.ready_queue.pop();
longjmp(*j,2);
}
void unblock(){
th.ready_queue.push(th.context);
jmp_buf *j=(jmp_buf *)blocked_queue.front();
blocked_queue.pop();
longjmp(*j,2);
}
my test case is
#include<iostream>
#include<setjmp.h>
#include<stdio.h>
#include "threads.h"
#include<queue>
using namespace std;
void fun2(any_ptr v){
cout<<"in 2nd function:"<<endl;
t_exit(0);
}
void helloworld(any_ptr v){
cout<<"in hello world from start"<<endl;
t_fork(fun2,NULL,"no value");
cout<<"after second thread:"<<endl;
cout<<"before exit"<<endl;
t_exit(0);
}
void main(){
cout<<"1 start"<<endl;
t_start(helloworld, NULL, "my function");
cout<<"main function"<<endl;
}//end of void main
Here is one problem:
In the t_start function you do this:
th.ready_queue.push(th.context);
The ready_queue is a queue of pointers, but th.context is not a pointer.
Then in the t_yield function you do
jmp_buf *j=(jmp_buf *)th.ready_queue.front();
So you push non-pointer object, and pop them as pointers. If you try to access a non-pointer object as a pointer you have undefined behavior.
You code, if it compiles without errors, should at least give you a lot of warnings, and if you only get a few warnings then I suggest you enable more warnings. When the compiler gives you a warning, it's often a sign about you doing something you should not be doing, like doing something that leads just to undefined behavior. Just silencing the warnings by e.g. type-casting is a very bad solution as it doesn't actually solve the cause of the warning.
Also, using void* is a very good sign of bad code coming. Don't use it if you can avoid it, and in this case it's really not needed in most of the places you use it (like the ready_queue).
There are SEVERAL problems with this code, some of which Joachim Pileborg points out.
Another problem is that you only have one context, which you are using multiple times to store different data, yet expect the data to be there when you come back.
The solution is to split your ThreadSystem and your Thread (the actual context of a thread) into separate objects:
struct Thread
{
jmp_buf context; // saved context of this thread
void* arg;
ThreadFunc func; // function that this thread started executing
};
After removing stuff that isn't currently used, the ThreadSystem looks like this:
struct ThreadSystem
{
queue<Thread*> ready_queue;
};
The thread creation/exit functions now look like this:
void t_start(ThreadFunc f, void* v)
{
if(!sys.ready_queue.empty()){
cout<<"sorry thread already started now you have to create by t_fork:"<<endl;
}
else{
Thread* th = new Thread;
sys.ready_queue.push(th);
if(!setjmp(th->context)){
th->arg=v;
th->func=f;
cout << "&th->context=" << &th->context << endl;
th->func(th->arg);
}
}
}
void t_fork(ThreadFunc f, void* v){
Thread* th = new Thread;
th->func = f;
th->arg = v;
if(!setjmp(th->context))
{
cout << "&th->context=" << &th->context << endl;
f(v);
sys.ready_queue.push(th);
}
}//end of t_fork
void t_exit(int val){
cout<<"before long jump in t_exit"<<endl;
Thread* th=sys.ready_queue.front();
sys.ready_queue.pop();
// Memory leak here. We can't delete `th`, and still have a context.
longjmp(th->context,2);
}
But as you can see, there is a problem in destroying the thread - so some other solution would have to be found for this. I'm not sure this is a great solution, but this works (to the limited degree of executing the test-code posted), where the original code didn't.
OK. My first pass at this was inadequate as I didn't spend sufficient time understanding the original code.
The code is buggy and messy, but probably fixable. When you push th.context onto the ready_queue you need to save the whole buffer, not just the buffer address. Probably many other problems.
Update 1
Solved first problem by wrapping the jmp_buf in a struct declaration and then making ready_queue and blocked_queue queues of structs. Then a simple assign will transfer the buffer contents.
struct SJBuff
{
jmp_buf jb;
};
Second problem: in t_start(), don't push th.context before it is first initialised.
else
{
// remove this line
// th.readyQ.push(th.context);
if(!setjmp(th.context.jb))
{
End Update 1
Notwithstanding that, I really cannot recommend using setjmp(). Modern architectures have moved on and just saving a few registers does not really capture enough state. I shudder to think what an optimizing compiler might do to your code. Piplining, conditional execution, lazy evaluation, extra registers, unscheduled system interrupts, ...
If you focus on your real objectives, there is probably a better way to do it.

Thread Safe queue in C++

Is this the correct way to make a Thread Safe Queue in C++ which can handle unsigned char* arrays of binary data?
Notice that in the data is produced from the main thread and not a created pthread, which makes me question if the pthread_mutex_t will actually work correctly on the push and pop.
Thread Safe Queue
#include <queue>
#include <pthread.h>
class ts_queue
{
private:
std::queue<unsigned char*> _queue_;
pthread_mutex_t mutex;
pthread_cond_t cond;
public:
ts_queue()
{
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&cond, NULL);
}
void push(unsigned char* data)
{
pthread_mutex_lock(&mutex);
_queue_.push(data);
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
}
void pop(unsigned char** popped_data)
{
pthread_mutex_lock(&mutex);
while (_queue_.empty() == true)
{
pthread_cond_wait(&cond, &mutex);
}
*popped_data = _queue_.front();
_queue_.pop();
pthread_mutex_unlock(&mutex);
}
};
CONSUMER TEST:
void *consumer_thread(void *arguments)
{
ts_queue *tsq = static_cast<ts_queue*>(arguments);
while (true)
{
unsigned char* data = NULL;
tsq->pop(&data);
if (data != NULL)
{
// Eureka! Received from the other thread!!!
// Delete it so memory keeps free.
// NOTE: In the real scenario for which I need
// this class, the data received are bitmap pixels
// and at this point it would be processed
delete[] data;
}
}
return 0;
}
PRODUCER TEST:
void main()
{
ts_queue tsq;
// Create the consumer
pthread_t consumer;
pthread_create(&consumer, NULL, consumer_thread, &tsq));
// Start producing
while(true)
{
// Push data.
// Expected behaviour: memory should never run out, as the
// consumer should receive the data and delete it.
// NOTE: test_data in the real purpose scenario for which I
// need this class would hold bitmap pixels, so it's meant to
// hold binary data and not a string
unsigned char* test_data = new unsigned char [8192];
tsq.push(test_data);
}
return 0;
}
How do you know the consumer never gets the data? When I try your program out, I get a segmentation fault, and GDB tells me the consumer did get a pointer, but it's an invalid one.
I believe your problem is that you have a data race on the _queue_ member. push() calls _queue_.push(data) (a write on _queue_) while holding push_mutex and pop() calls _queue_.front() (a read on _queue_) and _queue_.pop() (another write on _queue_) while holding pop_mutex, but push() and pop() can occur at the same time, causing both threads to be writing (and reading) _queue_ at the same time, a classical data-race.