Boost::threads work in debug, don't in release

Boost::threads work in debug, don't in release - c++

I started doing threads about an hour ago and am having some trouble where the debug mode does what I expect and the release mode cashes.
Debug
g++ -c -g -MMD -MP -MF build/Debug/GNU-Linux-x86/foo.o.d -o build/Debug/GNU-Linux-x86/foo.o foo.cpp
Whatever 2222222222
Release
g++ -c -O2 -MMD -MP -MF build/Release/GNU-Linux-x86/foo.o.d -o build/Release/GNU-Linux-x86/foo.o foo.cpp
Whatever
RUN FAILED (exit value 1, total time: 49ms)
Class
#include "foo.h"
#define NUMINSIDE 10
foo::foo()
{
inside = new int[NUMINSIDE];
}
void foo::workerFunc(int input)
{
for (int i = 0; i < NUMINSIDE; i++)
{
inside[i] += input;
}
}
void foo::operate()
{
std::cout << "Whatever" << std::endl;
boost::thread thA(boost::bind(&foo::workerFunc, this, 1));
boost::thread thB(boost::bind(&foo::workerFunc, this, 1));
thA.join();
thB.join();
for (int i = 0; i < NUMINSIDE; i++)
{
std::cout << this->inside[i] << std::endl;
}
}
main
int main(int argc, char** argv)
{
foo* myFoo = new foo();
myFoo->operate();
return 0;
}

You have not initialized inside array. Add initialization code to foo::foo()
foo::foo()
{
inside = new int[NUMINSIDE];
for (int i = 0; i < NUMINSIDE; i++)
{
inside[i] = 0;
}
}
It works only in debug because it is undefined behavior.

Related

My DPC++ program can't run on an intel ATS-P GPU. why?

The DPC++ code is very simple, just have a local array ,set the value of array be 0 and barrier mem.
#include <CL/sycl.hpp>
using namespace sycl;
#define WRAP_SIZE 32
int main(){
sycl::gpu_selector selector;
queue exec_queue(selector);
int num_blocks=128;
int num_threads=256;
int casBeg=0;
int casEnd=2;
auto device_mem=sycl::malloc_device(227000*sizeof(double),exec_queue);
exec_queue.submit([&](sycl::handler& cgh)
{
sycl::stream out{ 4096, 128, cgh };
auto sharedmem = sycl::accessor<int, 1, sycl::access_mode::read_write, sycl::access::target::local>(11, cgh);
cgh.parallel_for(
sycl::nd_range<1>(num_blocks * num_threads, num_threads),
[=](sycl::nd_item<1> item_ct1) [[intel::reqd_sub_group_size(WRAP_SIZE)]] {
int blkId = item_ct1.get_group(0);
int tid = item_ct1.get_local_id(0);
int stride = item_ct1.get_local_range().get(0);
out<<"inter\n";
if (tid == 0)
for (int i = 0; i < 11; ++i)
sharedmem[i] = 0;
item_ct1.barrier(sycl::access::fence_space::local_space);
});
}).wait();
return 0;
}
The build command is
dpcpp -DMKL_ILP64 -lmkl_sycl -lmkl_intel_ilp64 -lmkl_tbb_thread -lmkl_core -pthread -std=c++17 -O0 -o <project_name> <code_name>.cpp
Compiled program can work ok on the P690 GPU, but not work on the NDK intel ATS-P GPU.
Why? Thx

Why does Callgrind make atomic load never ending

I wrote a small program that works perfectly fine until it's being dynamically instrumented by Callgrind:
$ g++ -std=c++11 -pthread -g -ggdb -o program.exe program.cpp
$ time valgrind --tool=callgrind ./program.exe
The code:
#include <atomic>
#include <thread>
#include <iostream>
constexpr int CST_TARGET = 10*1000;
std::atomic<bool> g_lock = {false};
std::atomic<bool> g_got_work = {true};
int g_passer = 0;
long long g_total = 0;
void producer() {
while (1) {
while (g_lock.load(std::memory_order_seq_cst));
if (g_passer >= CST_TARGET) {
g_got_work.store(false, std::memory_order_seq_cst);
return;
}
++g_passer;
g_lock.store(true, std::memory_order_seq_cst);
}
}
void consumer() {
while (g_got_work.load(std::memory_order_seq_cst)) {
if (g_lock.load(std::memory_order_seq_cst)) {
g_total += g_passer;
g_lock.store(false, std::memory_order_seq_cst);
}
}
}
int main() {
std::atomic<int> val(0);
std::thread t1(producer);
std::thread t2(consumer);
t1.join();
t2.join();
std::cout << "g_passer = " << g_passer << std::endl;
std::cout << "g_total = " << g_total << std::endl;
return 0;
}
The instrumentation won't end after 10 mins, so I terminated it and had a look at KCachegrind stats. There are hundreds of millions to billions of calls to std::atomic<bool>::load(...).
Any ideas which parts of Callgrind altered the behaviour of atomic calls and failed them? The program itself runs in milliseconds without Callgrind.

Using --fair-sched=yes should solve the problem.

LLVM run PassManager (non-legacy)

How do I run a non-legacy PassManager? I have tried doing the following but there is some exception thrown when trying to invalidate the analysis manager in the run function. Is there something else I should do for initialization?
llvm::AnalysisManager<Module> mm;
PassBuilder builder;
auto pm = builder.buildModuleOptimizationPipeline(PassBuilder::OptimizationLevel::O3);
pm.run(module, mm );

These snippets illustrate how to run and setup to run modern custom function and module pass on some .c/.cpp file... complete with a makefile. This works for LLVM 6 which is pretty recent (march 2018). It does not use the legacy pass manager.
HelloWorld.cpp:
#include <llvm/Pass.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/Module.h>
#include <llvm/Support/raw_ostream.h>
namespace {
struct Hello : public llvm::FunctionPass {
static char ID;
Hello() : llvm::FunctionPass{ID} {}
bool runOnFunction(llvm::Function &F) override {
llvm::errs() << "Hello ";
llvm::errs().write_escaped(F.getName()) << "\n";
return false;
}
};
struct Hello2 : public llvm::ModulePass {
static char ID;
Hello2() : llvm::ModulePass{ID} {}
bool runOnModule(llvm::Module &M) override {
llvm::errs() << "Name of the module ", llvm::errs().write_escaped(M.getName()) << "\n";
for(auto iter = M.getFunctionList().begin(); iter != M.getFunctionList().end(); ++iter) {
llvm::errs() << "Function name:" << iter->getName() << "\n";
}
return false;
}
};
}
char Hello::ID = 0;
static llvm::RegisterPass<Hello> X("Hello",
"Hello World Pass",
false,
false
);
char Hello2::ID = 1;
static llvm::RegisterPass<Hello2> Y("Hello2",
"Hello World2 pass",
false,
false
);
Corresponding makefile:
LLVM_VERSION=
LLVM_INCLUDEDIR = `llvm-config-6.0 --includedir`
LLVM_FLAGS = `llvm-config-6.0 --cxxflags --ldflags --system-libs --libs all`
CXX = clang++-6.0
CXXFLAGS = -g -std=c++11 -O3 -I $(LLVM_INCLUDEDIR) -I $(LLVM_INCLUDEDIR)
Hello.so:
$(CXX) -fPIC $(CXXFLAGS) HelloWorld.cpp $(LLVM_FLAGS) -shared -o Hello.so
Hello: Hello.so
testfile:
clang++-6.0 -emit-llvm -c test.cpp -o test.bc
runFunctionPassOnTestFile: Hello testfile
opt-6.0 -load ./Hello.so -Hello < test.bc > /dev/null
runModulePassOnTestfile: Hello testfile
opt-6.0 -load ./Hello.so -Hello2 < test.bc > /dev/null
clean:
rm *.o *.so *.out *~
DBG:
#echo LLVM INCLUDE DIRS $(LLVM_INCLUDEDIR) $(test)
A simple file to test everything on, test.cpp:
#include <stdio.h>
#include <stdlib.h>
int a = 4;
int c = 5;
int d = 6;
int e = 7;
int bar() { int *a = (int*) malloc(4); e = 1; return 1;}
int foo() { return 2; }
int barfoo() { return 3; }
int main() {
printf("Testing testing\n");
return 0;
}

Why does the first call to readline() slow down all subsequent calls to fnmatch()?

The following program illustrates the issue:
Makefile:
CFLAGS = -O3 -std=c++0x
LDFLAGS = -lreadline
test: test.o
g++ $(CFLAGS) $< $(LDFLAGS) -o $#
test.o: test.cpp Makefile
g++ $(CFLAGS) -c $<
test.cpp:
#include <fnmatch.h>
#include <readline/readline.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static double time()
{
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return ts.tv_sec + (1e-9 * (double)ts.tv_nsec);
}
static void time_fnmatch()
{
for (int i = 0; i < 2; i++)
{
double t = time();
for (int i = 0; i < 1000000; i++)
{
fnmatch("*.o", "testfile", FNM_PERIOD);
}
fprintf(stderr, "%f\n", time()-t);
}
}
int main()
{
time_fnmatch();
char *input = readline("> ");
free(input);
time_fnmatch();
}
Output:
0.045371
0.044537
>
0.185246
0.181607
Before calling readline(), the fnmatch calls are about 4x faster. Although
this performance difference is worrying, I'm most interested in finding out
what exactly the readline() call might be doing to the program state that
would have this effect on other library calls.

Just a guess: readline initialization probably calls setlocale.
When a program starts up, it is in the C locale; a call to setlocale(LC_ALL, "") will enable the default locale, and these days, the default locale usually uses UTF-8, in which case many string operations become more complex. (Even just iterating over a string.)

pthread and semaphore not working for me in osx maverick 10.9

I have the following simple program involving pthread and semaphore. I am in osx Maverck 10.9. I use a makefile to compile the program (rather than xcode). I use c++11.
#include <pthread.h>
#include <semaphore.h>
#include <cassert>
#include <iostream>
#define ASSERT(a) if(!(a)) abort
using namespace std;
sem_t countMutex;
int myCount=0;
void *doThread(void *data) {
int *pi = reinterpret_cast<int *>(data);
sem_wait(&countMutex);
for(int i =0 ;i < 100; ++i) {
myCount += 1;
}
sem_post(&countMutex);
pthread_exit( NULL );
}
void LaunchThread() {
const int kNumThreads = 10;
pthread_t tids[kNumThreads];
int threadData[kNumThreads];
pthread_attr_t attr;
pthread_t tid;
int retVal=0;
retVal = pthread_attr_init(&attr);
ASSERT(retVal == 0);
retVal = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE );
ASSERT(retVal == 0);
sem_init(&countMutex, 0, 1);
myCount = 0;
for(int i=0; i < kNumThreads; ++i) {
threadData[i] = i;
retVal = pthread_create( &tids[i], &attr, &doThread, &threadData[i]);
if(retVal != 0) {
cerr << "cannot create thread" << endl;
return;
}
}
retVal = pthread_attr_destroy(&attr);
ASSERT(retVal == 0);
void *status = NULL;
for(int i=0; i < kNumThreads; ++i) {
retVal = pthread_join( tids[i], &status);
if(retVal != 0) {
cerr << "cannot join ghread " << i << ", " << tids[i] << endl;
return;
}
cout << "completed thread " << i << ", " << tids[i] << endl;
}
cout << "value of myCount: " << myCount << endl;
sem_destroy(&countMutex);
//sem_unlink(&countMutex);
pthread_exit( NULL );
}
int main( int argc, char **argv) {
LaunchThread();
return 0;
}
The makefile for compiling this is
CXX=clang++
CXXFLAGS=-g -Wall -Wno-deprecated -std=c++11 -pthread -D DEBUG -g3 $(INCLUDES)
LDFLAGS=$(LIBS)
OBJS=main.o
PROG=test
all: $(PROG)
$(PROG): $(OBJS)
$(CXX) -v -o $(PROG) main.o $(LDFLAGS)
%.o: %.cpp
$(CXX) -c $(CXXFLAGS) $<
clean:
rm $(OBJS); rm test
The program ought to have reported a value of 1000 for myCount. But is inconsistent, on multiple runs.
Eg:
completed thread 0, 0x107dca000
completed thread 1, 0x107e4d000
completed thread 2, 0x107ed0000
completed thread 3, 0x107f53000
completed thread 4, 0x107fd6000
completed thread 5, 0x108059000
completed thread 6, 0x1080dc000
completed thread 7, 0x10815f000
completed thread 8, 0x1081e2000
completed thread 9, 0x108265000
value of myCount: 900

Unnamed POSIX semaphores are not supported on OSX. If you check your return codes you will see sem_init fail with an error along those lines. You need to use named semaphores.
Use sem_open instead of sem_init. Don't use sem_destroy but rather sem_close and sem_unlink.
You will be good to go.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Boost::threads work in debug, don't in release - c++

You have not initialized inside array. Add initialization code to foo::foo() foo::foo() { inside = new int[NUMINSIDE]; for (int i = 0; i < NUMINSIDE; i++) { inside[i] = 0; } } It works only in debug because it is undefined behavior.

Related

My DPC++ program can't run on an intel ATS-P GPU. why?

Why does Callgrind make atomic load never ending

LLVM run PassManager (non-legacy)

Why does the first call to readline() slow down all subsequent calls to fnmatch()?

pthread and semaphore not working for me in osx maverick 10.9

Categories

Resources