Segmentation Fault with Global Variable in MPICH 1.6 - c++

Consider the following simple program:
#include <mpi.h>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <string>
#include <vector>
using std::cout;
using std::string;
using std::vector;
vector<float> test;
#ifdef GLOBAL
string hostname;
#endif
int main(int argc, char** argv) {
int rank; // The node id of this processor.
int size; // The total number of nodes.
#ifndef GLOBAL
string hostname;
#endif
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
cout << "Joining the job as processor: " << rank << std::endl;
{
char buf[2048] = "HELLO";
hostname.assign(buf, 2048);
}
test.push_back(1.0f);
cout << "Hostname: " << hostname << "::" << test[0] << std::endl;
MPI_Finalize();
return 0;
}
If I compile/run this with:
mpicxx -c test.cc && mpicxx -lstdc++ test.o -o test && ./test
there is no segmentation fault, but if I run it with:
mpicxx -DGLOBAL -c test.cc && mpicxx -lstdc++ test.o -o test && ./test
then there is a segmentation fault at the hostname.assign() line. In addition, if I remove this assignment, there is a segmentation fault in the string destructor once the main method returns so the assign method isn't the actual culprit.
Notice that the only difference is where the "global" variable hostname gets declared.
I am compiling with MPICH2 version 1.6, and don't really have the option to change this since I am running this on a supercomputer.
If I remove MPI_Init, etc. the error goes away leading me to believe that there is something unexpected happening with MPI and this global variable.
I found some other examples of this happening to people online, but they all resolved their issues by installing a new version of MPICH, which again is not a possibility for me.
Moreover, I want to know WHY this happening more than just a way around it.
Thanks for your time.

Ok, after quite a bit of debugging I have found that the MVAPICH2-1.6 library defines a variable called hostname in:
mpid/ch3/channels/mrail/src/rdma/ch3_shmem_coll.c
Here is the line (55 in this version of the file):
char hostname[SHMEM_COLL_HOSTNAME_LEN];
The compiler didn't complain about the name clash here, but this is almost certainly the culprit since changing the variable name in my program removed the error. I imagine this is changed in later versions of MVAPICH2, but I will file the bug if not.

Related

seg fault when I change local class name

segment fault is gone when I change the name of my class and I don't understand. I have built a class called Environ and I create it and call it in the main. What I have found is when I change the local variable name in main from this_environ to environ i get a segment fault where none of my Environ variables have been initialized. Has anyone come into this or understand why this would be an issue? The interesting thing is that this isn't an issue when I compile on my Ubuntu machine...
#include <vector>
#include <map>
#include <iostream>
//#include "Environ.hpp"
// Namespaces
using namespace std;
class Environ {
public:
// Public objects.
vector<unsigned> years_;
void initialise() {
cerr << "entering initialsie" << endl;
years_ = {12,32,23};
}
};
int main() {
cout << "!!!Hello World!!!" << endl; // prints !!!Hello World!!!
Environ this_environ;
this_environ.initialise();
cout << "Finished initialisation" << endl;
system("PAUSE");
return 0;
}
For reproducibility I am building on windows 10 with gcc version 5.1.0 with the following build call
g++ -std=c++0x -O2 -g3 -Wall -c -fmessage-length=0
I have just discovered that environ is a macro in stdlib so most likely not a good idea to call a variable this. The macro defined on line 633 of stdlib.h, perhaps I should mention my GCC is from here
#define sys_errlist _sys_errlist
#define sys_nerr _sys_nerr
#define environ _environ
char *__cdecl ecvt(double _Val,int _NumOfDigits,int *_PtDec,int *_PtSign) __MINGW_ATTRIB_DEPRECATED_MSVC2005;
char *__cdecl fcvt(double _Val,int _NumOfDec,int *_PtDec,int *_PtSign) __MINGW_ATTRIB_DEPRECATED_MSVC2005;
char *__cdecl gcvt(double _Val,int _NumOfDigits,char *_DstBuf) __MINGW_ATTRIB_DEPRECATED_MSVC2005;
char *__cdecl itoa(int _Val,char *_DstBuf,int _Radix) __MINGW_ATTRIB_DEPRECATED_MSVC2005;
char *__cdecl ltoa(long _Val,char *_DstBuf,int _Radix) __MINGW_ATTRIB_DEPRECATED_MSVC2005;
int __cdecl putenv(const char *_EnvString) __MINGW_ATTRIB_DEPRECATED_MSVC2005;

C++ on ubuntu hello world

I'm trying to write my first code on ubuntu terminal using c++
.I created a new cpp file named aaa by
"nano aaa.cpp"
then inside I wrote
#include<iostream>
using std::cout;
using std::endl;
int main(int argc, car** argv)
{
cout << "hello" << endl;
return 0;
}
i saved and got out but when i tried typing
g++ aaa.cpp
I got the error
error: ‘endl’ was not declared in this scope
cout << "hello" << endl;
where did I go wrong
I tried
$ sudo apt-get remove g++ libstdc++-6.4.7-dev
$ sudo apt-get install build-essential g++-multilib
but it was no good
any help?
Stylistically, I prefer to be explicit: std::cout and std::endl.
#include <iostream>
int main(int argc, char** argv) {
std::cout << "hello" << std::endl;
return 0;
}
This also fixes a tyo of yours: char, not car and repairs the #include.
This works as expected:
$ g++ -Wall -pedantic -o foo2 foo2.cpp
$ ./foo2
hello
$
If you wanted to, you could also use
using namespace std;
but as stated, I prefer to more explicit form.
Edit: Nothing as much fun as debating the beancounters. OP question is likely having _another error he is not sharing. His code, repaired for char actually builds:
$ cat foo3.cpp
#include <iostream>
using std::cout;
using std::endl;
int main(int argc, char** argv) {
cout << "hello" << endl;
return 0;
}
$ g++ -Wall -pedantic -o foo3 foo3.cpp
$ ./foo3
hello
$
Ubuntu 16.04, g++ 5.4.0
First, make sure you have the tools you need to be able to compile a C++ code on Ubuntu. For that run the following code in the command line :
This line will install all the basic stuff you need for compiling a C++ code, it will install C, C++, and make.
sudo apt-get install build-essential
Now that you have all you need, I will suggere to explicetely using std::cout / std::endl . That way you don't import all the stuff available under the namespace std that you are not using. Using std::cout / std::endl shows clearly the origin the instance you are using.
Notice : you have an error in the main function argument, namely : car, it should be char
#include<iostream>
int main(int argc, char** argv)
{
std::cout << "hello" << std::endl;
return 0;
}
Now you can compile and run your code this way :
in this example I'm calling the executable file "hello"
g++ -Wall -o hello aaa.cpp
./hello

Calling function in shared library (Linux) get Segmentation Fault

I was trying to write a basic example of shared library opening and function calling for practice, but it turns out that I always get "segmentation fault" when the exectuable is actually running. Here are the source code:
main.cpp:
#include<iostream>
#include<dlfcn.h>
using namespace std;
typedef void (*API)(unsigned int);
int main(int argc,char** argv){
void* dl;
API api;
unsigned int tmp;
//...
dl=dlopen("pluginA.so",RTLD_LAZY);
api=(API)dlsym(dl,"API");
cin>>tmp;
(*api)(tmp);
dlclose(dl);
//...
return 0;
}
pluginA.cpp:
#include<iostream>
using namespace std;
extern "C" void API(unsigned int N){switch(N){
case 0:cout<<"1\n"<<flush;break;
case 1:cout<<"2\n"<<flush;break;
case 2:cout<<"4\n"<<flush;break;
case 4:cout<<"16\n"<<flush;break;}}
I compiled the two part with the following command:
g++ -shared -o pluginA.so -fPIC plugin.cpp
g++ main.cpp -ldl
Here is the output
Segmentation fault (core dumped)
BTW, I also tried directly call api(tmp) rather than (*api)(tmp), that also don't work. Since api is a pointer, (*api) makes more sense?
I'm not sure what should I do. There are many totorials about calling function in shared library online, but most of them aren't fully coded, or they actually don't work.
And also I'm not sure what should I do with "attribute((visibility("default")))". Should I even write it down?
EDT1
Thanks for giving me so much advice. I finally find out that actually everything is a typo in compiling command... I mistakenly typed pluginA.so to pluginA.o, and that's the reason why it don't work...
Anyway, here is my revised program, with error handling added, and more "full" system added:
main.cpp:
#include<dirent.h>
#include<dlfcn.h>
#include<iostream>
#include<cstring>
using namespace std;
typedef bool (*DLAPI)(unsigned int);
int main(){
DIR* dldir=opendir("dl");
struct dirent* dldirf;
void* dl[255];
DLAPI dlapi[255];
unsigned char i,dlc=0;
char dldirfname[255]="./dl/";
unsigned int n;
while((dldirf=readdir(dldir))!=NULL){
if(dldirf->d_name[0]=='.')continue;
strcat(dldirfname,dldirf->d_name);
dl[dlc]=dlopen(dldirfname,RTLD_LAZY);
if(!dl[dlc])cout<<dlerror()<<endl;else{
dlapi[dlc]=(DLAPI)dlsym(dl[dlc],"API");
if(!dlapi[dlc])cout<<dlerror()<<endl;else dlc++;}
dldirfname[5]='\0';}
if(dlc==0){
cerr<<"ERROR:NO DL LOADED"<<endl;
return -1;}
while(true){
cin>>n;
for(i=0;i<dlc;i++)if((*dlapi[i])(n))break;
if(i==dlc)cout<<"NOT FOUND"<<endl;}
for(i=0;i<dlc;i++)dlclose(dl[i]);
return 0;}
You should read documentation of dlopen(3) and dlsym and you should always handle failure. So code
dl=dlopen("./pluginA.so",RTLD_LAZY);
if (!dl) { fprintf(stderr, "dlopen failure: %s\n", dlerror());
exit (EXIT_FAILURE); };
api=(API)dlsym(dl,"API");
if (!api) { fprintf(stderr, "dlsym failure: %s\n", dlerror());
exit (EXIT_FAILURE); };
The documentation of dlopen is explaining why you want to pass ./pluginA.so with a ./ prefix
At last, you should always compile with all warnings and debug info, so:
g++ -Wall -Wextra -g -shared -o pluginA.so -fPIC plugin.cpp
g++ -Wall -Wextra -g -rdynamic main.cpp -ldl
(It is useful to link the main program with -rdynamic so that the plugin could access its symbols)
You could want to dlclose(dl) just before the end of main ... (calling or returning from a dlsym-ed function will crash your program if you dlclose too early). You might even avoid the dlclose (i.e. accept some resource leak). By experience you usually can dlopen many hundreds of thousands shared objects (see my manydl.c)
Only once your program is debugged you could add some optimization flag like -O or -O2 (and perhaps remove the debugging flag -g, but I don't recommend that for beginners).
You should perhaps read Drepper's paper: How To Write Shared Libraries.
I correted your code a bit and use the error checking. Try that and get the idea what's going on:
#include<iostream>
#include<dlfcn.h>
using namespace std;
typedef void (*API)(unsigned int);
int main(int argc,char** argv)
{
API api;
unsigned int tmp;
//...
void* handle = dlopen("pluginA.so", RTLD_LAZY);
if (!handle)
{
std::cerr << dlerror() << std::endl;
return 1;
}
dlerror();
api = reinterpret_cast<API>(dlsym(handle, "API"));
if (!api)
{
std::cerr << dlerror() << std::endl;
return 2;
}
cin>>tmp;
(*api)(tmp);
dlclose(handle);
//...
return 0;
}
At last: why it is failed? Use the right path: "./pluginA.so", not "pluginA.so" or put the full path to your plugin.

GCC iostream fstream error in Ubuntu 13.10

I am using Ubuntu 13.10. I am getting some errors for the following code.
#include <stdlib.h>
#include <stdio.h>
#include <fstream.h>
int main(int argc, char *argv[])
{
error.set_program_name(argv[0]);
if ( argc != 2 )
{
// printf(argv[0] + " usage: fifo_client [string] \n");
/// cout << argv[0] << " usage: fifo_client [string]" << endl;
exit(EXIT_FAILURE);
}
ofstream out(fifo_file);
if(out)
out << argv[1] << endl;
return(EXIT_SUCCESS);
}
If I run the above program a.c using command
gcc a.c -o a
a.c:1:20: fatal error: iostream: No such file or directory
#include <iostream>
^
compilation terminated.
I don't know whats the problem.
Use g++ instead of gcc. gcc could compile a c++ file if it had the right extension (.cpp for instance) or with the right arguments (-x c++) but adding the arguments needed to link with the C++ libraries is far too complex to avoid the simple solution.
The problem is that you're mixing C & C++ code and compiling it using GCC.
try
#include <fstream>
using namespace std;
instead of #include <fstream.h>
anyway your source code is not full to make correct suggestion.
I ran your code in my compiler and got following error :-
test2.c:3:21: fatal error: fstream.h: No such file or directory
#include <fstream.h>
^
compilation terminated.
so i think your question has typo.
It is because you are mixing c and c++ code, fstream is part of c++. try to run by g++.

dlopen() gives unresolved symbol error when .so tries to use a class from the main executable. Why?

I'm on Linux, the question is concerning shared objects of C++ classes.
The problem comes when my shared objects try to use resources linked into the main executable. I have the following codes:
loader.cpp:
#include <dlfcn.h>
#include <iostream>
#include "CommonInfo.h"
using namespace std;
int main(int argc, char** argv) {
for(int i=1; i<argc; ++i) {
string pth = "./";
pth.append(argv[i]);
void* dh = dlopen(pth.c_str(), RTLD_NOW);
if(dh==NULL) {
cerr << dlerror() << endl;
return 1;
}
CommonInfo::GetInfoFunc getInfo = (CommonInfo::GetInfoFunc)(dlsym(dh,"getInfo"));
if(getInfo==NULL) {
cerr << dlerror() << endl;
return 1;
}
CommonInfo* info = getInfo();
cout << "INFO: " << info->getX() << endl;
delete info;
}
return 0;
}
CommonInfo.h:
#include <string>
class CommonInfo {
public:
typedef CommonInfo* (*GetInfoFunc)();
private:
std::string x;
public:
CommonInfo(const std::string& nx);
std::string getX() const;
};
EDIT:
I accidentaly forgot to ctrl-c + ctrl-v the source of CommonInfo.cpp here. Of course, it is there during compilation, so CommonInfo.cpp:
#include "CommonInfo.h"
CommonInfo::CommonInfo(const std::string& nx) : x(nx) {
}
std::string CommonInfo::getX() const {
return x;
}
A Plugin.h header:
#include "CommonInfo.h"
extern "C" CommonInfo* getInfo();
A very simple Plugin.cpp:
#include <iostream>
#include "Plugin.h"
#include "CommonInfo.h"
using namespace std;
CommonInfo* getInfo() {
return new CommonInfo("I'm a cat!");
}
Compiling is done with:
g++ -rdynamic -ldl -Werror CommonInfo.cpp loader.cpp -o loader
g++ -shared -fPIC -Werror Plugin.cpp -o Plugin.so
Running:
./loader Plugin.so
And there goes the error:
./loader: symbol lookup error: ./Plugin.so: undefined symbol: _ZN10CommonInfoC1ERKSs
Indeed, looking inside Plugin.so with nm Plugin.so | grep -i CommonInfo it gives an 'U' for this symbol (unresolved), which is perfectly ok.
Also, looking inside the binary of loader with nm loader.so | grep -i CommonInfo I could find the symbol with 'T', which is also ok.
Question is, shouldn't dlfcn.h unresolve the symbol in question from the main binary? Without this feature it becomes quite hard to use these stuff... Do I have to write a class factory function for CommonInfo, load it with dlfcn from the plugin and call that?
Thanks in advance,
Dennis
I haven't looked closely at your code, but I have in the past found behavior like you describe in the title when I did not link the executable with -E. (Or -Wl,-E when linking with gcc rather than ld.)
Note that not all platforms let the shared libraries take symbols from the calling binary. Linux and the *BSDs allow you to. But if you ever want to port to, say, Windows, you will not be able to use this pattern. I believe there are also some Unix-type OS's that won't let you do this. (It's been a while so I don't remember... Maybe it was Solaris?)