Most basic parallelization with C++11 theads fail

Most basic parallelization with C++11 theads fail - c++

I try to use C++11 theading library using g++ 4.7.
First I have a question: is it expected for a next release to not be required to link by hand the pthread library ?
So my program is :
#include <iostream>
#include <vector>
#include <thread>
void f(int i) {
std::cout<<"Hello world from : "<<i<<std::endl;
}
int main() {
const int n = 4;
std::vector<std::thread> t;
for (int i = 0; i < n; ++i) {
t.push_back(std::thread(f, i));
}
for (int i = 0; i < n; ++i) {
t[i].join();
}
return 0;
}
I compile with:
g++-4.7 -Wall -Wextra -Winline -std=c++0x -pthread -O3 helloworld.cpp -o helloworld
And it returns:
Hello world from : Hello world from : Hello world from : 32
2
pure virtual method called
terminate called without an active exception
Erreur de segmentation (core dumped)
What is the problem and how to solve it ?
UPDATE:
Now using mutex:
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
static std::mutex m;
void f(int i) {
m.lock();
std::cout<<"Hello world from : "<<i<<std::endl;
m.unlock();
}
int main() {
const int n = 4;
std::vector<std::thread> t;
for (int i = 0; i < n; ++i) {
t.push_back(std::thread(f, i));
}
for (int i = 0; i < n; ++i) {
t[i].join();
}
return 0;
}
It returns :
pure virtual method called
Hello world from : 2
terminate called without an active exception
Abandon (core dumped)
UPDATE 2:
Hum... It works with my default GCC (g++4.6) but it fails with the version of gcc I compiled by hand (g++4.7.1). Was there an option I forgot to compile g++ 4.7.1 ?

General edit:
In order to prevent use of cout by multiple threads simultaneously, that will result in character interleaving, proceed as follows:
1) declare before the declaration of f():
static std::mutex m;
2) then, guard the "cout" line between:
m.lock();
std::cout<<"Hello world from : "<<i<<std::endl;
m.unlock();
Apparently, linking against the -lpthread library is a must, for some unclear reasons. At least on my machine, not linking against -lpthread results in a core dump. Adding -lpthread results in proper functionality of the program.
The possibility of character interleaving if locking is not used when accessing cout from different threads is expressed here:
https://stackoverflow.com/a/6374525/1284631
more exactly: "[ Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note ]"
OTOH, the race condition is guaranteed to be avoided, at least in the C++11 standard (beware, the gcc/g++ implementation of this standard is still at experimental level).
Note that the Microsoft implementation (see: http://msdn.microsoft.com/en-us/library/c9ceah3b.aspx credit to #SChepurin) is stricter than the standard (apparently, it guarantees character interleaving is avoided), but this might not be the case for the gcc/g++ implementation.
This is the command line that I use to compile (both updated and original code versions, everything works well on my PC):
g++ threadtest.cpp -std=gnu++11 -lpthread -O3
OTOH, without the -lpthread, it compiles but I have a core dump (gcc 4.7.2 on Linux 64).
I understand that you are using two different versions of the gcc/g++ compiler on the same machine. Just be sure that you use them properly (not mixing different library versions).

Related

How do I set different flags for multiple source files in Makefile? [duplicate]

I have a problem about the openmp compiling.
Like the following code:
#include <iostream>
#include <pthread.h>
#include <omp.h>
#include <semaphore.h>
#include <stack>
using namespace std;
sem_t empty,full;
stack<int> stk;
void produce(int i)
{
{
sem_wait(&empty);
cout<<"produce "<<i*i<<endl;
stk.push(i*i);
sem_post(&full);
}
}
void consume1(int &x)
{
sem_wait(&full);
int data=stk.top();
stk.pop();
x=data;
sem_post(&empty);
}
void consume2()
{
sem_wait(&full);
int data=stk.top();
stk.pop();
cout<<"consume2 "<<data<<endl;
sem_post(&empty);
}
int main()
{
sem_init(&empty,0,1);
sem_init(&full,0,0);
pthread_t t1,t2,t3;
omp_set_num_threads(3);
int TID=0;
#pragma omp parallel private(TID)
{
TID=omp_get_thread_num();
if(TID==0)
{
cout<<"There are "<<omp_get_num_threads()<<" threads"<<endl;
for(int i=0;i<5;i++)
produce(i);
}
else if(TID==1)
{
int x;
while(true)
{
consume1(x);
cout<<"consume1 "<<x<<endl;
}
}
else if(TID==2)
{
int x;
while(true)
{
consume1(x);
cout<<"consume2 "<<x<<endl;
}
}
}
return 0;
}
Firstly, I compile it using:
g++ test.cpp -fopenmp -lpthread
And, I got the right answer, there are 3 threads totally.
But, when I do the compile like this:
g++ -c test.cpp -o test.o
g++ test.o -o test -fopenmp -lpthread
there is just only ONE thread.
Anyone can tell me how to compile this code correctly. Thankyou in advance.

OpenMP is a set of code transforming pragmas, i.e. they are only applied at compile time. You cannot apply code transformation to an already compiled object code (ok, you can, but it is far more involving process and outside the scope of what most compilers do these days). You need -fopenmp during the link phase only for the compiler to automatically link the OpenMP runtime library libgomp - it does nothing else to the object code.
On a side note, although techically correct, your code does OpenMP in a very non-OpenMP way. First, you have reimplemented the OpenMP sections construct. The parallel region in your main function could be rewritten in a more OpenMP way:
#pragma omp parallel sections
{
#pragma omp section
{
cout<<"There are "<<omp_get_num_threads()<<" threads"<<endl;
for(int i=0;i<5;i++)
produce(i);
}
#pragma omp section
{
int x;
while(true)
{
consume1(x);
cout<<"consume1 "<<x<<endl;
}
}
#pragma omp section
{
int x;
while(true)
{
consume1(x);
cout<<"consume2 "<<x<<endl;
}
}
}
(if you get SIGILL while running this code with more than three OpenMP threads, you have encountered a bug in GCC, that will be fixed in the upcoming release)
Second, you might want to take a look at OpenMP task construct. With it you can queue pieces of code to be executed concurrently as tasks by any idle thread. Unfortunately it requires a compiler which supports OpenMP 3.0, which rules out MSVC++ from the equation, but only if you care about portability to Windows (and you obviously don't, because you are using POSIX threads).

The OpenMP pragmas are only enabled when compiled with -fopenmp. Otherwise they are completely ignored by the compiler. (Hence, only 1 thread...)
Therefore, you will need to add -fopenmp to the compilation of every single module that uses OpenMP. (As opposed to just the final linking step.)
g++ -c test.cpp -o test.o -fopenmp
g++ test.o -o test -fopenmp -lpthread

boehm-gc with C++11's thread library

As we know, using boehm-gc in multi-thread requires calling GC_register_my_thread with stack base from GC_get_stack_base. but It seems not to work well with C++11's thread library, such as std::thread... How can I use boehm-gc with C++11's thread library?
(I use VS2013)
edit: This is tested code. std::thread is good, but std::future doesn't work (stop on _CrtIsValidHeapPointer
#include <iostream>
#include <thread>
#include <future>
#define GC_THREADS
#include <gc.h>
#include <gc_cpp.h>
#pragma comment(lib, "gcmt-lib")
void foo()
{
GC_stack_base sb;
GC_get_stack_base(&sb);
GC_register_my_thread(&sb);
int *ptr;
for (int i = 0; i < 10; i++)
{
ptr = new (GC) int;
*ptr = 1;
}
GC_unregister_my_thread();
}
int main()
{
GC_INIT();
GC_allow_register_threads();
std::cout << "test for std::thread";
std::thread thrd(foo);
thrd.join();
std::cout << " [sucs]\n";
std::cout << "test for std::future";
std::future<void> fu = std::async(std::launch::async, foo);
fu.get();
std::cout << " [sucs]\n";
std::cin.get();
}
edit: here is a capture of stack trace (Sorry that it isn't English, but I think it doesn't matter, anyway)
and here is a debug message
HEAP[TestGC.exe]: Invalid address specified to RtlValidateHeap( 00E80000, 00C92F80 )
While debugging, I found The error occurs after fu.get().
edit: The error doesn't occur with /MD(or /MDd)...
(I think GC might touch library's pointers (namespcae Concurrency), but it is just guess;;)

Before you start using the collector and before you create the threads make sure that you issue both
GC_INIT, and
GC_allow_register_threads
Then in every thread follow it up with,
GC_get_stack_base/GC_register_my_thread, and eventually
GC_unregister_my_thread.
You didn't say what you are compiling with but it works for gcc 4.8 (with -std=c++11).
EDIT: The OP was able to resolve the issue by addressing the instruction above and compiling the code with the /MD[d] flags for the multi-threaded dynamic MSVCR100 runtime. The issue remained unresolved for the multithreaded statically compiled runtime.

How to compile openmp using g++

I have a problem about the openmp compiling.
Like the following code:
#include <iostream>
#include <pthread.h>
#include <omp.h>
#include <semaphore.h>
#include <stack>
using namespace std;
sem_t empty,full;
stack<int> stk;
void produce(int i)
{
{
sem_wait(&empty);
cout<<"produce "<<i*i<<endl;
stk.push(i*i);
sem_post(&full);
}
}
void consume1(int &x)
{
sem_wait(&full);
int data=stk.top();
stk.pop();
x=data;
sem_post(&empty);
}
void consume2()
{
sem_wait(&full);
int data=stk.top();
stk.pop();
cout<<"consume2 "<<data<<endl;
sem_post(&empty);
}
int main()
{
sem_init(&empty,0,1);
sem_init(&full,0,0);
pthread_t t1,t2,t3;
omp_set_num_threads(3);
int TID=0;
#pragma omp parallel private(TID)
{
TID=omp_get_thread_num();
if(TID==0)
{
cout<<"There are "<<omp_get_num_threads()<<" threads"<<endl;
for(int i=0;i<5;i++)
produce(i);
}
else if(TID==1)
{
int x;
while(true)
{
consume1(x);
cout<<"consume1 "<<x<<endl;
}
}
else if(TID==2)
{
int x;
while(true)
{
consume1(x);
cout<<"consume2 "<<x<<endl;
}
}
}
return 0;
}
Firstly, I compile it using:
g++ test.cpp -fopenmp -lpthread
And, I got the right answer, there are 3 threads totally.
But, when I do the compile like this:
g++ -c test.cpp -o test.o
g++ test.o -o test -fopenmp -lpthread
there is just only ONE thread.
Anyone can tell me how to compile this code correctly. Thankyou in advance.

OpenMP is a set of code transforming pragmas, i.e. they are only applied at compile time. You cannot apply code transformation to an already compiled object code (ok, you can, but it is far more involving process and outside the scope of what most compilers do these days). You need -fopenmp during the link phase only for the compiler to automatically link the OpenMP runtime library libgomp - it does nothing else to the object code.
On a side note, although techically correct, your code does OpenMP in a very non-OpenMP way. First, you have reimplemented the OpenMP sections construct. The parallel region in your main function could be rewritten in a more OpenMP way:
#pragma omp parallel sections
{
#pragma omp section
{
cout<<"There are "<<omp_get_num_threads()<<" threads"<<endl;
for(int i=0;i<5;i++)
produce(i);
}
#pragma omp section
{
int x;
while(true)
{
consume1(x);
cout<<"consume1 "<<x<<endl;
}
}
#pragma omp section
{
int x;
while(true)
{
consume1(x);
cout<<"consume2 "<<x<<endl;
}
}
}
(if you get SIGILL while running this code with more than three OpenMP threads, you have encountered a bug in GCC, that will be fixed in the upcoming release)
Second, you might want to take a look at OpenMP task construct. With it you can queue pieces of code to be executed concurrently as tasks by any idle thread. Unfortunately it requires a compiler which supports OpenMP 3.0, which rules out MSVC++ from the equation, but only if you care about portability to Windows (and you obviously don't, because you are using POSIX threads).

The OpenMP pragmas are only enabled when compiled with -fopenmp. Otherwise they are completely ignored by the compiler. (Hence, only 1 thread...)
Therefore, you will need to add -fopenmp to the compilation of every single module that uses OpenMP. (As opposed to just the final linking step.)
g++ -c test.cpp -o test.o -fopenmp
g++ test.o -o test -fopenmp -lpthread

Why OpenMP version is slower?

I am experimenting with OpenMP. I wrote some code to check its performance. On a 4-core single Intel CPU with Kubuntu 11.04, the following program compiled with OpenMP is around 20 times slower than the program compiled without OpenMP. Why?
I compiled it by g++ -g -O2 -funroll-loops -fomit-frame-pointer -march=native -fopenmp
#include <math.h>
#include <iostream>
using namespace std;
int main ()
{
long double i=0;
long double k=0.7;
#pragma omp parallel for reduction(+:i)
for(int t=1; t<300000000; t++){
for(int n=1; n<16; n++){
i=i+pow(k,n);
}
}
cout << i<<"\t";
return 0;
}

The problem is that the variable k is considered to be a shared variable, so it has to be synced between the threads.
A possible solution to avoid this is:
#include <math.h>
#include <iostream>
using namespace std;
int main ()
{
long double i=0;
#pragma omp parallel for reduction(+:i)
for(int t=1; t<30000000; t++){
long double k=0.7;
for(int n=1; n<16; n++){
i=i+pow(k,n);
}
}
cout << i<<"\t";
return 0;
}
Following the hint of Martin Beckett in the comment below, instead of declaring k inside the loop, you can also declare k const and outside the loop.
Otherwise, ejd is correct - the problem here does not seem bad parallelization, but bad optimization when the code is parallelized. Remember that the OpenMP implementation of gcc is pretty young and far from optimal.

Fastest code:
for (int i = 0; i < 100000000; i ++) {;}
Slightly slower code:
#pragma omp parallel for num_threads(1)
for (int i = 0; i < 100000000; i ++) {;}
2-3 times slower code:
#pragma omp parallel for
for (int i = 0; i < 100000000; i ++) {;}
no matter what it is in between { and }. A simple ; or a more complex computation, same results. I compiled under Ubuntu 13.10 64-bit, using both gcc and g++, trying different parameters -ansi -pedantic-errors -Wall -Wextra -O3, and running on an Intel quad-core 3.5GHz.
I guess thread management overhead is at fault? It doens't seem smart for OMP to create a thread everytime you need one and destroy it after. I thought there would be four (or eight) threads being either running whenever needed or sleeping.

I am observing similar behavior on GCC. However I am wondering if in my case it is somehow related with template or inline function. Is your code also within template or inline function? Please look here.
However for very short for loops, you may observe some small overhead related with thread switching like in your case:
#pragma omp parallel for
for (int i = 0; i < 100000000; i ++) {;}
If your loop executes for some seriously long time as few ms or even seconds, you should observe performance boost when using OpenMP. But only when you have more than one CPU. The more cores you have, the higher performance you reach with OpenMP.

ICC segfaulting with variable length arrays

So, when compiled with the basic icc bob.cpp -o bob and run, the following code segfaults:
#include <string>
int foo () {
return 6;
}
int main() {
std::string t[foo()];
}
The following two similar programs, however, seem to run fine.
#include <string>
int foo () {
return 6;
}
int main() {
int f = foo();
std::string t[f];
}
and
#include <string>
int foo () {
return 6;
}
int main() {
std::string t[6];
}
I'm a bit confused about what's going on. Apparently, variable length arrays are non-standard, and this was a surprise to me since I've always used g++ which supports it. However, if it's not supported by ICC, why would it compile? Also, why would example 2 "work"?
What is correct code here, and, if the first snippet is incorrect, why does it compile, and then why does it segfault?
I'm using icc (ICC) 12.0.2 20110112 on 2011 x86_64 Intel(R) Core(TM) i5.
Thanks

Well, while it is true that C++ has no variable-length arrays (C99 does though), apparently ICC does support them as an extension, since your code actually compiles (and since your second snippet actually runs without crashing).
If the first version crashes, then it must be a bug in ICC's implementation of that non-standard extension.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Most basic parallelization with C++11 theads fail - c++

Related

How do I set different flags for multiple source files in Makefile? [duplicate]

boehm-gc with C++11's thread library

How to compile openmp using g++

Why OpenMP version is slower?

ICC segfaulting with variable length arrays

Categories

Resources