gcov not working with pthreads, WSL-2, Clion

gcov not working with pthreads, WSL-2, Clion - c++

Background:
This seems to work fine if I don't use any threads or just spawn 1 thread, which makes this all the more confusing.
Clion Project here
Problem:
I set up a basic example project that starts 2 threads and does some printing to the console from main thread, thread 2, and thread 3.
#include <iostream>
#include <thread>
void thread1()
{
for(int i = 0; i < 10000; i++)
{
std::cout << "thread1" << std::endl;
}
}
void thread2()
{
for(int i = 0; i < 10000; i++)
{
std::cout << "thread2" << std::endl;
}
}
int main()
{
std::cout << "Hello, World!" << std::endl;
std::thread threadObj(thread1);
std::thread threadObj2(thread2);
for(int i = 0; i < 10000; i++)
{
std::cout<<"MainThread"<<std::endl;
}
threadObj.join();
std::cout<<"Exit of Main function"<<std::endl;
return 0;
}
Compiling using:
--coverage -pthread -g -std=gnu++2a
When I run in clion using "Run 'EvalTest' with Coverage", I get the following error:
Could not find code coverage data
So it's not producing the gcov files needed, but it works fine if I comment out the following line of code:
int main()
{
std::cout << "Hello, World!" << std::endl;
std::thread threadObj(thread1);
// std::thread threadObj2(thread2);
for(int i = 0; i < 10000; i++)
{
std::cout<<"MainThread"<<std::endl;
}
threadObj.join();
std::cout<<"Exit of Main function"<<std::endl;
return 0;
}

Needed to do threadObj.join() and threadObj2.join(). So code looks like:
int main()
{
std::cout << "Hello, World!" << std::endl;
std::thread threadObj(thread1);
std::thread threadObj2(thread2);
for(int i = 0; i < 10000; i++)
{
std::cout<<"MainThread"<<std::endl;
}
threadObj.join();
threadObj2.join(); // need to join both thread for gcov to work properly
std::cout<<"Exit of Main function"<<std::endl;
return 0;
}

Related

Atomics and Multi-Threading in C++

So I'm messing around with atomic and thread and made this program that reads and writes to an array based on whether an atomic is engaged.
When it compiles, however, the outputs seems to vary.
EDIT: by vary, I mean "k" and "out" are not identical on compilation.
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>
std::atomic<bool> busy (false);
std::atomic_int out (0);
void do_thing(std::vector<int>* inputs) {
while(!busy) {
std::this_thread::yield();
}
int next = inputs->back();
inputs->pop_back();
out.store(out.load(std::memory_order_relaxed) + next, std::memory_order_relaxed);
}
int main(void) {
std::vector<int> inputs;
for(int i = 0; i < 100; i++) inputs.push_back(rand() % 10);
// base
int k = 0;
for(auto& el : inputs) k += el;
std::cout << k << std::endl;
// threaded
try {
std::vector<std::thread> threads;
for(int i = 0; i < 100; i++) threads.push_back(std::thread(do_thing, &inputs));
busy = true;
for (auto& th : threads) th.join();
std::cout << out.load(std::memory_order_relaxed);
std::cout << std::endl;
}
catch (const std::exception& ex) {
std::cout << ex.what() << std::endl;
}
return 0;
}
Could anyone point out what is happening?
I have a feeling this could be done better with a mutex, but was curious about why it happens anyhow.
Cheers,
K

Parallel programming for tasks in main function - c++

is it possible to define two tasks and let them work parallel in C++? I found something about parallel funtcions, but not about parallel tasks in main function like that:
int main()
// task 1
int a = 0;
for(int i = 0; i < 150; i++){
a++;
std::cout << a << std::endl;
// do more stuff
}
// task 2
int b = 0;
for(int i = 0; i < 150; i++){
b++;
std::cout << b << std::endl;
// do more stuff
}
}
Race conditions etc. can't occur.
Thank's for helping !

Yes, you can run the two in parallel, using std::async and lambdas. Here is an example:
int main()
{
auto f1 = std::async(std::launch::async, [](){
// task 1
int a = 0;
for(int i = 0; i < 10; i++){
a++;
std::cout << a << std::endl;
// do more stuff
}
});
auto f2 = std::async(std::launch::async, [](){
// task 2
int b = 100;
for(int i = 0; i < 10; i++){
b++;
std::cout << b << std::endl;
// do more stuff
}
});
f1.wait();
f2.wait();
}
(You'll get messed console output from this, because you need to guard access to the console with mutex or another similar resource.)

How do I correctly use std::mutex in C++ without deadlocks and/or races?

I am trying to debug a program that I am trying to run in parallel. I am at a loss for why I have both deadlocks and race conditions when I attempt to compile and run the code in C++. Here is all the relevant code that I have written thus far.
// define job struct here
// define mutex, condition variable, deque, and atomic here
std::deque<job> jobList;
std::mutex jobMutex;
std::condition_variable jobCondition;
std::atomic<int> numberThreadsRunning;
void addJobs(...insert parameters here...)
{
job current = {...insert parameters here...};
jobMutex.lock();
std::cout << "We have successfully acquired the mutex." << std::endl;
jobList.push_back(current);
jobCondition.notify_one();
jobMutex.unlock();
std::cout << "We have successfully unlocked the mutex." << std::endl;
}
void work(void) {
job* current;
numberThreadsRunning++;
while (true) {
std::unique_lock<std::mutex> lock(jobMutex);
if (jobList.empty()) {
numberThreadsRunning--;
jobCondition.wait(lock);
numberThreadsRunning++;
}
current = &jobList.at(0);
jobList.pop_front();
jobMutex.unlock();
std::cout << "We are now going to start a job." << std::endl;
////Call an expensive function for the current job that we want to run in parallel.
////This could either complete the job, or spawn more jobs, by calling addJobs.
////This recursive behavior typically results in there being thousands of jobs.
std::cout << "We have successfully completed a job." << std::endl;
}
numberThreadsRunning--;
std::cout << "There are now " << numberThreadsRunning << " threads running." << std::endl;
}
int main( int argc, char *argv[] ) {
//Initialize everything and add first job to the deque.
std::thread jobThreads[n]
for (int i = 0; i < n; i++) {
jobThreads[i] = std::thread(work);
}
for (int i = 0; i < n; i++) {
jobThreads[i].join();
}
}
The code compiles, but depending on random factors, it will either deadlock at the very end or have a segmentation fault in the middle while the queue is still quite large. Does anyone know more about why this is happening?
...
EDIT:
I have edited this question to include additional information and a more complete example. While I certainly don't want to bore you with the thousands of lines of code I actually have (an image rendering package), I believe this example better represents the type of problem I am facing. The example given in the answer by Alan Birtles only works on very simple job structure with very simple functionality. In the actual job struct, there are multiple pointers to different vectors and matrices, and therefore we need pointers to the job struct, otherwise the compiler would fail to compile because the constructor function was "implicitly deleted".
I believe the error I am facing has to do with the way I am locking and unlocking the threads. I know that the pointers are also causing some issues, but they probably have to stay. The function thisFunction() represents the function that needs to be run in parallel.
#include <queue>
#include <deque>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <iostream>
#include <cmath>
struct job {
std::vector<std::vector<int>> &matrix;
int num;
};
bool closed = false;
std::deque<job> jobList;
std::mutex jobMutex;
std::condition_variable jobCondition;
std::atomic<int> numberThreadsRunning;
std::atomic<int> numJobs;
struct tcout
{
tcout() :lock(mutex) {}
template < typename T >
tcout& operator<< (T&& t)
{
std::cout << t;
return *this;
}
static std::mutex mutex;
std::unique_lock< std::mutex > lock;
};
std::mutex tcout::mutex;
std::vector<std::vector<int>> multiply4x4(
std::vector<std::vector<int>> &A,
std::vector<std::vector<int>> &B) {
//Only deals with 4x4 matrices
std::vector<std::vector<int>> C(4, std::vector<int>(4, 0));
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
for (int k = 0; k < 4; k++) {
C.at(i).at(j) = C.at(i).at(j) + A.at(i).at(k) * B.at(k).at(j);
}
}
}
return C;
}
void addJobs()
{
numJobs++;
std::vector<std::vector<int>> matrix(4, std::vector<int>(4, -1)); //Create random 4x4 matrix
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
matrix.at(i).at(j) = rand() % 10 + 1;
}
}
job current = { matrix, numJobs };
std::unique_lock<std::mutex> lock(jobMutex);
std::cout << "The matrix for job " << current.num << " is: \n";
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
std::cout << matrix.at(i).at(j) << "\t";
}
std::cout << "\n";
}
jobList.push_back(current);
jobCondition.notify_one();
lock.unlock();
}
void thisFunction(std::vector<std::vector<int>> &matrix, int num)
{
std::this_thread::sleep_for(std::chrono::milliseconds(rand() * 500 / RAND_MAX));
std::vector<std::vector<int>> product = matrix;
std::unique_lock<std::mutex> lk(jobMutex);
std::cout << "The imported matrix for job " << num << " is: \n";
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
std::cout << product.at(i).at(j) << "\t";
}
std::cout << "\n";
}
lk.unlock();
int power;
if (num % 2 == 1) {
power = 3;
} else if (num % 2 == 0) {
power = 2;
addJobs();
}
for (int k = 1; k < power; k++) {
product = multiply4x4(product, matrix);
}
std::unique_lock<std::mutex> lock(jobMutex);
std::cout << "The matrix for job " << num << " to the power of " << power << " is: \n";
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
std::cout << product.at(i).at(j) << "\t";
}
std::cout << "\n";
}
lock.unlock();
}
void work(void) {
job *current;
numberThreadsRunning++;
while (true) {
std::unique_lock<std::mutex> lock(jobMutex);
if (jobList.empty()) {
numberThreadsRunning--;
jobCondition.wait(lock, [] {return !jobList.empty() || closed; });
numberThreadsRunning++;
}
if (jobList.empty())
{
break;
}
current = &jobList.front();
job newcurrent = {current->matrix, current->num};
current = &newcurrent;
jobList.pop_front();
lock.unlock();
thisFunction(current->matrix, current->num);
tcout() << "job " << current->num << " complete\n";
}
numberThreadsRunning--;
}
int main(int argc, char *argv[]) {
const size_t n = 1;
numJobs = 0;
std::thread jobThreads[n];
std::vector<int> buffer;
for (int i = 0; i < n; i++) {
jobThreads[i] = std::thread(work);
}
for (int i = 0; i < 100; i++)
{
addJobs();
}
{
std::unique_lock<std::mutex> lock(jobMutex);
closed = true;
jobCondition.notify_all();
}
for (int i = 0; i < n; i++) {
jobThreads[i].join();
}
}

Here is a fully working example:
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <iostream>
struct job { int num; };
bool closed = false;
std::deque<job> jobList;
std::mutex jobMutex;
std::condition_variable jobCondition;
std::atomic<int> numberThreadsRunning;
struct tcout
{
tcout() :lock(mutex) {}
template < typename T >
tcout& operator<< (T&& t)
{
std::cout << t;
return *this;
}
static std::mutex mutex;
std::unique_lock< std::mutex > lock;
};
std::mutex tcout::mutex;
void addJobs()
{
static int num = 0;
job current = { num++ };
std::unique_lock<std::mutex> lock(jobMutex);
jobList.push_back(current);
jobCondition.notify_one();
lock.unlock();
}
void work(void) {
job current;
numberThreadsRunning++;
while (true) {
std::unique_lock<std::mutex> lock(jobMutex);
if (jobList.empty()) {
numberThreadsRunning--;
jobCondition.wait(lock, [] {return !jobList.empty() || closed; });
numberThreadsRunning++;
}
if (jobList.empty())
{
break;
}
current = jobList.front();
jobList.pop_front();
lock.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(rand() * 500 / RAND_MAX));
tcout() << "job " << current.num << " complete\n";
}
numberThreadsRunning--;
}
int main(int argc, char *argv[]) {
const size_t n = 4;
std::thread jobThreads[n];
for (int i = 0; i < n; i++) {
jobThreads[i] = std::thread(work);
}
for (int i = 0; i < 100; i++)
{
addJobs();
}
{
std::unique_lock<std::mutex> lock(jobMutex);
closed = true;
jobCondition.notify_all();
}
for (int i = 0; i < n; i++) {
jobThreads[i].join();
}
}
I've made the following changes:
Never call lock() or unlock() on a std::mutex, always use std::unique_lock (or similar classes). You were calling jobMutex.unlock() in work() for the mutex you had locked with std::unique_lock, std::unique_lock would then call unlock for the second time leading to undefined behaviour. If an exception was thrown in addJobs then as you weren't using std::unique_lock at all the mutex would remain locked.
You need to use a predicate for jobCondition.wait otherwise a spurious wakeup could cause the wait to return while jobList is still empty.
I've added a closed variable to make the program exit when there's no more work to do
I've added a definition of job
In work you take a pointer to an item on the queue then pop it off the queue, as the item no longer exists the pointer is dangling. You need to copy the item before popping the queue. If you want to avoid the copy either make your job structure movable or change your queue to store std::unique_ptr<job> or std::shared_ptr<job>
I've also added a thread safe version of std::cout, this isn't strictly necessary but stops your output lines overlapping each other. Ideally you should use a proper thread safe logging library instead as locking a mutex for every print is expensive and if you have enough prints will make your program practically single threaded

Replace job* current; with job current; and then current = jobList.at(0);. Otherwise you end up with a pointer to an element of jobList that does not exist after jobList.pop_front().
Replace if (jobList.empty()) with while(jobList.empty()) to handle spurious wakeups.

Why is my For loop with Open MP producing a different output?

I have to snippets, one of which is standard, and one other which uses OpenMP. This is the first:
#include <iostream>
int main(){
double a = 0;
for(int i =0; i<= 100000; i++){
a+=1;
}
std::cout << a << std::endl;
return 0;
}
This code gives me: 100001, so it is OK; there is no problem.
The problem is that, for the second snippet:
#include <iostream>
int main(){
double a = 0;
#pragma omp parallel for
for(int i = 0; i<=100000; i++){
a+=1;
}
std::cout << a << std::endl;
return 0;
}
The result of this code is: 55246, and I don't know why I get this result instead of 100001.

Threads: Fine with 10, Crash with 10000

Can someone please explain why the following code crashes:
int a(int x)
{
int s = 0;
for(int i = 0; i < 100; i++)
s += i;
return s;
}
int main()
{
unsigned int thread_no = 10000;
vector<thread> t(thread_no);
for(int i = 0; i < 10; i++)
t[i] = std::thread(a, 10);
for(thread& t_now : t)
t_now.join();
cout << "OK" << endl;
cin.get();
}
But WORKS with 10 threads? I am new to multithreading and simply don't understand what is happening?!

This creates a vector of 10,000 default-initialized threads:
unsigned int thread_no = 10000;
vector<thread> t(thread_no);
You're running into the difference between "capacity" and "size". You didn't just create a vector large enough to house 10,000 threads, you created a vector of 10,000 threads.
See the following (http://ideone.com/i7LBQ6)
#include <iostream>
#include <vector>
struct Foo {
Foo() { std::cout << "Foo()\n"; }
};
int main() {
std::vector<Foo> f(8);
std::cout << "f.capacity() = " << f.capacity() << ", size() = " << f.size() << '\n';
}
You only initialize 10 of the elements as running threads
for(int i = 0; i < 10; i++)
t[i] = std::thread(a, 10);
So your for loop is going to see 10 initialized threads and then 9,990 un-started threads.
for(thread& t_now : t)
t_now.join();
You might want to try using t.reserve(thread_no); and t.emplace_back(a, 10);
Here's a complete example with renaming.
int threadFn(int iterations)
{
int s = 0;
for(int i = 0; i < iterations; i++)
s += i;
return s;
}
int main()
{
enum {
MaximumThreadCapacity = 10000,
DesiredInitialThreads = 10,
ThreadLoopIterations = 100,
};
vector<thread> threads;
threads.reserve(MaximumThreadCapacity);
for(int i = 0; i < DesiredInitialThreads; i++)
threads.emplace_back(threadFn, ThreadLoopIterations);
std::cout << threads.size() << " threads spun up\n";
for(auto& t : threads) {
t.join();
}
std::cout << "threads joined\n";
}
---- EDIT ----
Specifically, the crash you are getting is the attempt to join a non-running thread, http://ideone.com/OuLMyQ
#include <thread>
int main() {
std::thread t;
t.join();
return 0;
}
stderr
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
I point this out because you should be aware there is a race condition even with a valid thread, if you do
if (t.joinable())
t.join();
it's possible for 't' to become non-joinable between the test and the action. You should always put a t.join() in a try {} clause. See http://en.cppreference.com/w/cpp/thread/thread/join
Complete example:
int threadFn(int iterations)
{
int s = 0;
for(int i = 0; i < iterations; i++)
s += i;
return s;
}
int main()
{
enum {
MaximumThreadCapacity = 10000,
DesiredInitialThreads = 10,
ThreadLoopIterations = 100,
};
vector<thread> threads;
threads.reserve(MaximumThreadCapacity);
for(int i = 0; i < DesiredInitialThreads; i++)
threads.emplace_back(threadFn, ThreadLoopIterations);
std::cout << threads.size() << " threads spun up\n";
for(auto& t : threads) {
try {
if(t.joinable())
t.join();
} catch (std::system_error& e) {
switch (e.code()) {
case std::errc::invalid_argument:
case std::errc::no_such_process:
continue;
case std::errc::resource_deadlock_would_occur:
std::cerr << "deadlock during join - wth!\n";
return e.code();
default:
std::cout << "error during join: " << e.what() << '\n';
return e.code();
}
}
}
std::cout << "threads joined\n";
}

You create a vector that has 10000 elements in it, you then populate the first ten and you wait for all the the threads inside the vector to join. Your program crashes because you forgot to set the other 9990.
for(int i = 0; i < 10; i++) // Wrong
for(int i = 0; i < thread_no; i++) // Correct

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

gcov not working with pthreads, WSL-2, Clion - c++

Related

Atomics and Multi-Threading in C++

Parallel programming for tasks in main function - c++

How do I correctly use std::mutex in C++ without deadlocks and/or races?

Why is my For loop with Open MP producing a different output?

Threads: Fine with 10, Crash with 10000

Categories

Resources