Program crashing with C++ DLL using OpenMP - c++

I have a program using OpenMP on C++ and I need it to port into Dll so I can call it from Python. It returns an array of double values, which calculated using a lot of for loops with openmp pragma. I was doubtful if it is going to work, so I started from a little test program that calculates Pi value in a loop with different precision values, then I would measure performance and ensure that OpenMP works properly that way. Plain (w/o Omp) implementation works fine from Python and C++, however Omp variant gives a runtime error in Python (exception: access violation writing 0x000000000000A6C8) and crashes without an error in C++. Also Omp variant works fine if it is not a Dll and just a regular executable. The Dll is made with a makefile. App that uses the Dll built into an executable with g++ with no flags (source code is in UnitMain.cpp). All the relevant code and a Makefile below (I didn't include some files and functions for brevity).
UPD: I tried Microsoft compiler and it works, also I tested a linux dynamic library on WSL/g++ and it also works. Looks like it is Windows gcc specific, I'll try another version of gcc (btw my current version is this):
Thread model: posix gcc version 8.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project)
UnitFunctions.cpp
#include "UnitFunctions.h"
#include <omp.h>
#include <stdio.h>
#include <string.h>
typedef long long int64_t;
double pi(int64_t n) {
double sum = 0.0;
int64_t sign = 1;
for (int64_t i = 0; i < n; ++i) {
sum += sign/(2.0*i+1.0);
sign *= -1;
}
return 4.0*sum;
}
void calcPiOmp(double* arr, int N) {
int64_t base = 10e5;
#pragma omp parallel for
for(int i = 0; i < N; ++i) {
arr[i] = pi(base+i);
}
}
UnitMain.cpp
#include <windows.h>
#include <iostream>
using namespace std;
struct DllHandle
{
DllHandle(const char * const filename)
: h(LoadLibrary(filename)) {}
~DllHandle() { if (h) FreeLibrary(h); }
const HINSTANCE Get() const { return h; }
private:
HINSTANCE h;
};
int main()
{
const DllHandle h("Functions.DLL");
if (!h.Get())
{
MessageBox(0,"Could not load DLL","UnitCallDll",MB_OK);
return 1;
}
typedef const void (*calcPiOmp_t) (double*, int);
const auto calcPiOmp = reinterpret_cast<calcPiOmp_t>(GetProcAddress(h.Get(), "calcPiOmp"));
double arr[80];
calcPiOmp(arr, 80);
cout << arr[0] << endl;
return 0;
}
Makefile
all: UnitEntryPoint.o UnitFunctions.o
g++ -m64 -fopenmp -s -o Functions.dll UnitEntryPoint.o UnitFunctions.o
UnitEntryPoint.o: UnitEntryPoint.cpp
g++ -m64 -fopenmp -c UnitEntryPoint.cpp
UnitFunctions.o: UnitFunctions.cpp
g++ -m64 -fopenmp -c UnitFunctions.cpp
A Python script
import numpy as np
import ctypes as ct
cpp_fun = ct.CDLL('./Functions.dll')
cpp_fun.calcPiNaive.argtypes = [np.ctypeslib.ndpointer(), ct.c_int]
cpp_fun.calcPiOmp.argtypes = [np.ctypeslib.ndpointer(), ct.c_int]
arrOmp = np.zeros(N).astype('float64')
cpp_fun.calcPiOmp(arrOmp, N)

Related

How to solve multiple definition of `_start' error?

I am working on a project where I have to compare incoming data from a sensor. The main source code is in C++ along with a C source file, a .S file, and a .h file. When I am trying to link those files it shows an error and I don't have any clue as to what the error is. Any help regarding the problem will be very much appreciated.
My Makefile looks like:
all : main.cpp irq.c irq.h bootstrap.S
riscv32-unknown-elf-gcc -c irq.c bootstrap.S -march=rv32g -mabi=ilp32d -nostartfiles -Wl,--no-relax
riscv32-unknown-elf-g++ -c main.cpp -march=rv32g -mabi=ilp32d
riscv32-unknown-elf-g++ -o main main.o irq.o bootstrap.o -march=rv32g -mabi=ilp32d
dump-elf: all
riscv32-unknown-elf-readelf -a main
dump-code: all
riscv32-unknown-elf-objdump -D main
dump-comment: all
objdump -s --section .comment main
clean:
rm -f main`
main.cpp
#include "stdint.h"
extern "C"{
#include "irq.h"
}
#include<stdio.h>
#include<iostream>
using namespace std;
static volatile char * const TERMINAL_ADDR = (char * const)0x20000000;
static volatile char * const SENSOR_INPUT_ADDR = (char * const)0x50000000;
static volatile uint32_t * const SENSOR_SCALER_REG_ADDR = (uint32_t * const)0x50000080;
static volatile uint32_t * const SENSOR_FILTER_REG_ADDR = (uint32_t * const)0x50000084;
bool has_sensor_data = 0;
void sensor_irq_handler() {
has_sensor_data = 1;
}
void dump_sensor_data() {
while (!has_sensor_data) {
asm volatile ("wfi");
}
has_sensor_data = 0;
for (int i=0; i<64; ++i) {
*TERMINAL_ADDR = *(SENSOR_INPUT_ADDR + i) % 92 + 32;
}
*TERMINAL_ADDR = '\n';
}
int main() {
register_interrupt_handler(2, sensor_irq_handler);
*SENSOR_SCALER_REG_ADDR = 5;
*SENSOR_FILTER_REG_ADDR = 2;
for (int i=0; i<3; ++i)
dump_sensor_data();
return 0;
}
irq.c
https://github.com/agra-uni-bremen/riscv-vp/blob/master/sw/simple-sensor/irq.c
irq.h
https://github.com/agra-uni-bremen/riscv-vp/blob/master/sw/simple-sensor/irq.h
bootstrap.S
https://github.com/agra-uni-bremen/riscv-vp/blob/master/sw/simple-sensor/bootstrap.S
The output should be 64 random characters with interrupts.
The Error is:
/opt/riscv/lib/gcc/riscv32-unknown-elf/8.3.0/../../../../riscv32-unknown-elf/bin/ld: /tmp/cckjuDlw.o: in function `.L0 ':
(.text+0x0): multiple definition of `_start'; /opt/riscv/lib/gcc/riscv32-unknown-elf/8.3.0/../../../../riscv32-unknown-elf/lib/crt0.o:(.text+0x0): first defined here
You're using the -nostartfiles option, but in the wrong place.
You have it on a compilation step (-c option), while it belongs on linking.
-Wl, options are also only used when linking

The Cost of C++ Exceptions and setjmp/longjmp

I wrote a test to measure the cost of C++ exceptions with threads.
#include <cstdlib>
#include <iostream>
#include <vector>
#include <thread>
static const int N = 100000;
static void doSomething(int& n)
{
--n;
throw 1;
}
static void throwManyManyTimes()
{
int n = N;
while (n)
{
try
{
doSomething(n);
}
catch (int n)
{
switch (n)
{
case 1:
continue;
default:
std::cout << "error" << std::endl;
std::exit(EXIT_FAILURE);
}
}
}
}
int main(void)
{
int nCPUs = std::thread::hardware_concurrency();
std::vector<std::thread> threads(nCPUs);
for (int i = 0; i < nCPUs; ++i)
{
threads[i] = std::thread(throwManyManyTimes);
}
for (int i = 0; i < nCPUs; ++i)
{
threads[i].join();
}
return EXIT_SUCCESS;
}
Here's the C version that I initially wrote for fun.
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <glib.h>
#define N 100000
static GPrivate jumpBuffer;
static void doSomething(volatile int *pn)
{
jmp_buf *pjb = g_private_get(&jumpBuffer);
--*pn;
longjmp(*pjb, 1);
}
static void *throwManyManyTimes(void *p)
{
jmp_buf jb;
volatile int n = N;
(void)p;
g_private_set(&jumpBuffer, &jb);
while (n)
{
switch (setjmp(jb))
{
case 0:
doSomething(&n);
case 1:
continue;
default:
printf("error\n");
exit(EXIT_FAILURE);
}
}
return NULL;
}
int main(void)
{
int nCPUs = g_get_num_processors();
GThread *threads[nCPUs];
int i;
for (i = 0; i < nCPUs; ++i)
{
threads[i] = g_thread_new(NULL, throwManyManyTimes, NULL);
}
for (i = 0; i < nCPUs; ++i)
{
g_thread_join(threads[i]);
}
return EXIT_SUCCESS;
}
The C++ version runs very slow compared to the C version.
$ g++ -O3 -g -std=c++11 test.cpp -o cpp-test -pthread
$ gcc -O3 -g -std=c89 test.c -o c-test `pkg-config glib-2.0 --cflags --libs`
$ time ./cpp-test
real 0m1.089s
user 0m2.345s
sys 0m1.637s
$ time ./c-test
real 0m0.024s
user 0m0.067s
sys 0m0.000s
So I ran the callgrind profiler.
For cpp-test, __cxz_throw was called exactly 400,000 times with self-cost of 8,000,032.
For c-test, __longjmp_chk was called exactly 400,000 times with self-cost of 5,600,000.
The whole cost of cpp-test is 4,048,441,756.
The whole cost of c-test is 60,417,722.
I guess something much more than simply saving the state of the jump-point and later resuming is done with C++ exceptions. I couldn't test with larger N because the callgrind profiler will run forever for the C++ test.
What is the extra cost involved in C++ exceptions making it many times slower than the setjmp/longjmp pair at least in this example?
This is by design.
C++ exceptions are expected to be exceptional in nature and are optimized thusly. The program is compiled to be most efficient when an exception does not happen.
You can verify this by commenting out the exception from your tests.
In C++:
//throw 1;
$ g++ -O3 -g -std=c++11 test.cpp -o cpp-test -pthread
$ time ./cpp-test
real 0m0.003s
user 0m0.004s
sys 0m0.000s
In C:
/*longjmp(*pjb, 1);*/
$ gcc -O3 -g -std=c89 test.c -o c-test `pkg-config glib-2.0 --cflags --libs`
$ time ./c-test
real 0m0.008s
user 0m0.012s
sys 0m0.004s
What is the extra cost involved in C++ exceptions making it many times slower than the setjmp/longjmp pair at least in this example?
g++ implements zero-cost model exceptions, which have no effective overhead* when an exception is not thrown. Machine code is produced as if there were no try/catch block.
The cost of this zero-overhead is that a table lookup must be performed on the program counter when an exception is thrown, to determine a jump to the appropriate code for performing stack unwinding. This puts the entire try/catch block implementation within the code performing a throw.
Your extra cost is a table lookup.
*Some minor timing voodoo may occur, as the presence of a PC lookup table may affect memory layout, which may affect CPU cache misses.

map<shared_ptr<TiXmlDocument>, double> used in this offload region is not bitwise copyable

I am using Intel C++ Compiler v14.0.3. This following code troubles me:
#include <tinyxml/tinyxml.h>
#include <memory>
#include <map>
#include "offload.h"
using namespace std;
typedef map<shared_ptr<TiXmlDocument>, double,
less<shared_ptr<TiXmlDocument> >,
__offload::shared_allocator<pair<shared_ptr<TiXmlDocument>, double> > > xmlanddbl;
__declspec(target(mic)) _Cilk_shared xmlanddbl m;
int main(void)
{
const int maxct = 10;
for(int i = 0; i < 10; i++){
shared_ptr<TiXmlDocument> doc(new TiXmlDocument);
if(doc && doc->LoadFile("something.xml")){
m.insert(make_pair(doc, 0.0));
}
}
for(int ct = 0; ct < maxct; ct++){
#pragma offload target(mic) mandatory
#pragma omp parallel
#pragma omp single
{
for(auto it = m.begin(); it != m.end(); it++){
#pragma omp task firstprivate(it)
{
someclass obj(it->first);
it->second = obj.eval();
}
}
#pragma omp taskwait
}
somefunction(m);
}
return 0;
}
Compiler gives this message:
$ icpc -c thiscode.cpp -O2 -openmp -parallel -std=c++11 -I./include
thiscode.cpp(24): error: variable "m" used in this offload region is not bitwise copyable
#pragma offload target(mic) mandatory
^
compilation aborted for thiscode.cpp (code 2)
I've read this page. But I could not think of how to transfer this data.
What can I do?
Sorry for my poor english.
Thank you.
I could not think of how to transfer this data. What can I do?
From the document you linked:
"If the data exchanged between CPU and coprocessor is more complex than simple scalars and bit-wise copyable arrays, you may consider using the _Cilk_shared/_Cilk_offload constructs."
#pragma offload will not work because std::map is too complex to be bitwise copyable.

BigInteger java method to gmp c++

I want to convert java code in c++
code is
BigInteger value = new BigInteger(125, RandomNumber);
BigInteger clone = new BigInteger(value.toByteArray());
How to write this code in cpp using gmp library?
Please anyone help me.
Thanks.
With C++ you can do that
#include <gmpxx.h>
#include <gmp.h>
#include <iostream>
using namespace std;
int main(){
mpz_class value;
mpz_class clone;
gmp_randclass r(gmp_randinit_default);
value = r.get_z_bits(125);
clone = value;
cout << value << endl;
cout << clone << endl;
return 0;
}
and compile with
g++ file.cpp -lgmpxx -lgmp
to install libgmpxx.a
put --enable-cxx to the build option of ./configure
here is a carbon copy from wikipedia
Here is an example of C code showing the use of the GMP library to multiply and print large numbers:
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
int main(void)
{
mpz_t x;
mpz_t y;
mpz_t result;
mpz_init(x);
mpz_init(y);
mpz_init(result);
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
gmp_printf("\n %Zd\n*\n %Zd\n--------------------\n%Zd\n\n", x, y, result);
/* free used memory */
mpz_clear(x);
mpz_clear(y);
mpz_clear(result);
return EXIT_SUCCESS;
}
This code calculates the value of 7612058254738945 × 9263591128439081.
Compiling and running this program gives this result. (The -lgmp flag is used if compiling on Unix-type systems.)
7612058254738945
*
9263591128439081
--------------------
70514995317761165008628990709545

Why is my program giving a totally different output when I compile with mingw as compared to g++

So when i compile this code (using the mersenne twister found here: http://www-personal.umich.edu/~wagnerr/MersenneTwister.html ):
#include <iostream>
#include <cmath>
#include "mtrand.h"
using namespace std;
double pythag(double x, double y) {
double derp=0;
derp=(x*x)+(y*y);
derp=sqrt(derp);
}
int main() {
double x=0;
double y=0;
double pi=0;
double hold1=0;
double hold2=0;
double hits=0;
MTRand mt;
mt.seed();
// cout.precision(10);
for(long i=1; i<=100000000000l; i++) {
x=abs(mt.rand());
y=abs(mt.rand());
if(pythag(x,y)<=1) {
hits++;
}
if(i%100000l==0) {
pi=(4*hits)/i;
cout << "\r" << i << " " << pi ;
}
}
cout <<"\n";
return 42;
}
Using g++ ("g++ pi.cc -o pi")
And run the resulting application, I get the output i wanted, a running tally of pi calculated using the Monte Carlo method.
But, when i compile with mingw g++ ("i686-pc-mingw32-g++ -static-libstdc++ -static-libgcc pi.cc -o pi.exe")
I always get a running tally of 0.
Any help is greatly appreciated.
Perhaps it's because you omitted the return statement:
double pythag(double x, double y) {
double derp=0;
derp=(x*x)+(y*y);
derp=sqrt(derp);
// You're missing this!!!
return derp;
}
I'd be surprised that you didn't get any warnings or errors on this.
pythag() does not return anything, as Loki is trying to say without telling you the exact answer. That means the return value is not specified.
Why do you return 42 in main()?! 8-)