Simple console program will not exit if cudaMalloc is called - c++

The following simple program never exits if the cudaMalloc call is executed. Commenting out just the cudaMalloc causes it to exit normally.
#include <iostream>
using std::cout;
using std::cin;
#include "cuda.h"
#include "cutil_inline.h"
void PrintCudaVersion(int version, const char *name)
{
int versionMaj = version / 1000;
int versionMin = (version - (versionMaj * 1000)) / 10;
cout << "CUDA " << name << " version: " << versionMaj << "." << versionMin << "\n";
}
void ReportCudaVersions()
{
int version = 0;
cudaDriverGetVersion(&version);
PrintCudaVersion(version, "Driver");
cudaRuntimeGetVersion(&version);
PrintCudaVersion(version, "Runtime");
}
int main(int argc, char **argv)
{
//CUresult r = cuInit(0); << These two lines were in original post
//cout << "Init result: " << r << "\n"; << but have no effect on the problem
ReportCudaVersions();
void *ptr = NULL;
cudaError_t err = cudaSuccess;
err = cudaMalloc(&ptr, 1024*1024);
cout << "cudaMalloc returned: " << err << " ptr: " << ptr << "\n";
err = cudaFree(ptr);
cout << "cudaFree returned: " << err << "\n";
return(0);
}
This is running on Windows 7, CUDA 4.1 driver, CUDA 3.2 runtime. I've trace the return from main through the CRT to ExitProcess(), from which it never returns (as expected) but the process never ends either. From VS2008 I can stop debugging OK. From the command line, I must kill the console window.
Program output:
Init result: 0
CUDA Driver version: 4.1
CUDA Runtime version: 3.2
cudaMalloc returned: 0 ptr: 00210000
cudaFree returned: 0
I tried making the allocation amount so large that cudaMalloc would fail. It did and reported an error, but the program still would not exit. So it apparently has to do with merely calling cudaMalloc, not the existence of allocated memory.
Any ideas as to what is going on here?
EDIT: I was wrong in the second sentence - I have to eliminate both the cudaMalloc and the cudaFree to get the program to exit. Leaving either one in causes the hang up.
EDIT: Although there are many references to the fact that CUDA driver versions are backward compatible, this problem went away when I reverted the driver to V3.2.

It seems like you're mixing the driver API (cuInit) with the runtime API (cudaMalloc).
I don't know if anything funny happens (or should happen) behind the scenes, but one thing you could try is to remove the cuInit and see what happens.

Related

How to recover from segmentation fault on C++?

I have some production-critical code that has to keep running.
think of the code as
while (true){
init();
do_important_things(); //segfault here
clean();
}
I can't trust the code to be bug-free, and I need to be able to log problems to investigate later.
This time, I know for a fact somewhere in the code there is a segmentation fault getting thrown, and I need to be able to at least log that, and then start everything over.
Reading here there are a few solutions, but following each one is a flame-war claiming the solution will actually do more harm than good, with no real explanation. I also found this answer which I consider using, but I'm not sure it is good for my use case.
So, what is the best way to recover from segmentation fault on C++?
I suggest that you create a very small program that you make really safe that monitors the buggy program. If the buggy program exits in a way you don't like, restart the program.
Posix example:
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <cstdio>
#include <iostream>
int main(int argc, char* argv[]) {
if(argc < 2) {
std::cerr << "USAGE: " << argv[0] << " program_to_monitor <arguments...>\n";
return 1;
}
while(true) {
pid_t child = fork(); // create a child process
if(child == -1) {
std::perror("fork");
return 1;
}
if(child == 0) {
execvp(argv[1], argv + 1); // start the buggy program
perror(argv[1]); // starting failed
std::exit(0); // exit with 0 to not trigger a retry
}
// Wait for the buggy program to terminate and check the status
// to see if it should be restarted.
if(int wstatus; waitpid(child, &wstatus, 0) != -1) {
if(WIFEXITED(wstatus)) {
if(WEXITSTATUS(wstatus) == 0) return 0; // normal exit, terminate
std::cerr << argv[0] << ": " << argv[1] << " exited with "
<< WEXITSTATUS(wstatus) << '\n';
}
if(WIFSIGNALED(wstatus)) {
std::cerr << argv[0] << ": " << argv[1]
<< " terminated by signal " << WTERMSIG(wstatus);
if(WCOREDUMP(wstatus)) std::cout << " (core dumped)";
std::cout << '\n';
}
std::cout << argv[0] << ": Restarting " << argv[1] << '\n';
} else {
std::perror("wait");
break;
}
}
}

Trying to compile example code from Octave's Standalone Programs example, getting segfault on first line

I am trying to learn how to embed Octave in my C++ code. When running the second example from here, the code compiles fine, but when running the code, a segmentation fault appears in the first line, when trying to initialize the interpreter. I'm not extremely adept at C++ but even when looking it up I can't find any answers.
The original code had octave::feval instead of feval, that threw a different, namespace error, so I just got rid of that and added the parse.h in the includes. I doubt this is at all related to the issue but that is a modification I did do.
#include <iostream>
#include <octave/oct.h>
#include <octave/octave.h>
#include <octave/parse.h>
#include <octave/interpreter.h>
int
main (void)
{
// Create interpreter.
octave::interpreter interpreter;
try
{
int status = interpreter.execute ();
if (status != 0)
{
std::cerr << "creating embedded Octave interpreter failed!"
<< std::endl;
return status;
}
octave_idx_type n = 2;
octave_value_list in;
for (octave_idx_type i = 0; i < n; i++)
in(i) = octave_value (5 * (i + 2));
octave_value_list out = feval ("gcd", in, 1);
if (out.length () > 0)
std::cout << "GCD of ["
<< in(0).int_value ()
<< ", "
<< in(1).int_value ()
<< "] is " << out(0).int_value ()
<< std::endl;
else
std::cout << "invalid\n";
}
catch (const octave::exit_exception& ex)
{
std::cerr << "Octave interpreter exited with status = "
<< ex.exit_status () << std::endl;
}
catch (const octave::execution_exception&)
{
std::cerr << "error encountered in Octave evaluator!" << std::endl;
}
return 0;
}
The actual output is supposed to be:
GCD of [10, 15] is 5
I am using Linux Ubuntu 18.04 with Octave 4.2.2
The documentation looked at is a different version than the version I have installed on my computer. I have 4.2, but I was looking at 4.4 docs, which has different code for the task I was trying to accomplish.

how to attach to an existing shared memory segment

I am having trouble with shared memory. I have one process that creates and writes to a shared memory segment just fine. But I cannot get a second process to attach that same existing segment. My second process can create a new shared segment if I use IPC_CREATE flag but I need to attach to the existing shared segment that was created by the 1st process.
This is my code in the 2nd process:
int nSharedMemoryID = 10;
key_t tKey = ftok("/dev/null", nSharedMemoryID);
if (tKey == -1) {
std::cerr << "ERROR: ftok(id: " << nSharedMemoryID << ") failed, " << strerror(errno) << std::endl;
exit(3);
}
std::cout << "ftok() successful " << std::endl;
size_t nSharedMemorySize = 10000;
int id = shmget(tKey, nSharedMemorySize, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
if (id == -1) {
std::cerr << "ERROR: shmget() failed, " << strerror(errno) << std::endl << std::endl;
exit(4);
}
std::cout << "shmget() successful, id: " << id << std::endl;
unsigned char *pBaseSM = (unsigned char *)shmat(id, (const void *)NULL, SHM_RDONLY);
if (pBaseSM == (unsigned char *)-1) {
std::cerr << "ERROR: shmat() failed, " << strerror(errno) << std::endl << std::endl;
exit(5);
}
std::cout << "shmat() successful " << std::endl;
The problem is that the 2nd process always errors out on the call to shmget() with a "No such file or directory" error. But this is the exact same code I used in the 1st process and it works just fine there. In the 1st process that created the shared segment, I can write to the memory segment, I can see it with "ipcs -m" Also, if I get the shmid from the "ipcs -m" command of the segment and hard code it in my 2nd process and the 2nd process can attach to it just fine. So the problem seems to be generation of the common id that both processes use to identify a single shared segment.
I have several questions:
(1) Is there an easier way to get the shmid of an existing shared memory segment? It seems crazy to me that I have to pass three separate parameters from the 1st process (that created the segment) to the 2nd process just so the 2nd process can get the same shared segment. I can live with having to pass 2 parameters: the file name like "/dev/null" and the same shared id (nSharedMemoryID in my code). But the size of the segment that has to be passed to the shmget() routine in order to get the shmid seems senseless because I have no idea of exactly how much memory was actually allocated (because of the page size issues) so I cannot be certain it is the same.
(2) does the segment size that I use in the 2nd process have to be the same as the size of the segment used to initially create the segment in the 1st process? I have tried to specify it as 0 but I still get errors.
(3) likewise, do the permissions have to be the same? that is, if the shared segment was created with read/write for user/group/world, can the 2nd process just use read for user? (same user for both processes).
(4) and why does shmget() fail with the "No such file or directory" error when the file "/dev/null" obviously exists for both processes? I am assuming that the 1st process does not put some kind of a lock on that node because that would be senseless.
Thanks for any help anyone can give. I have been struggling with this for hours--which means I am probably doing something really stupid and will ultimately embarrass myself when someone points out my error :-)
thanks,
-Andres
(1) as a different way: the attaching process scan the existing segments of the user, tries to attach with the needed size, check for a "magic byte sequence" at the beginning of the segment (to exclude other programs of the same user). Alternatively you can check if the process attached is the one that you expect. If one of the steps fails, this is the first one and will create the segment... cumbersome yes, I saw it in a code from the '70s.
Eventually you can evaluate to use the POSIX compliant shm_open() alternative - should be simpler or at least more modern...
(2) Regarding the size, it's important that the size specified be less/equal than the size of the existing segment, so no issues if it's rounded to the next memory page size. you get the EINVAL error only if it's larger.
(3) the mode flags are only relevant when you create the segment the first time (mostly sure).
(4) The fact that shmget() fail with the "No such file or directory" means only that it hasn't found a segment with that key (being now pedantic: not id - with id we usually refer to the value returnet by shmget(), used subsequently) - have you checked that the tKey is the same? Your code works fine on my system. Just added a main() around it.
EDIT: attached the working program
#include <iostream>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char **argv) {
int nSharedMemoryID = 10;
if (argc > 1) {
nSharedMemoryID = atoi(argv[1]);
}
key_t tKey = ftok("/dev/null", nSharedMemoryID);
if (tKey == -1) {
std::cerr << "ERROR: ftok(id: " << nSharedMemoryID << ") failed, " << strerror(errno) << std::endl;
exit(3);
}
std::cout << "ftok() successful. key = " << tKey << std::endl;
size_t nSharedMemorySize = 10000;
int id = shmget(tKey, nSharedMemorySize, 0);
if (id == -1) {
std::cerr << "ERROR: shmget() failed (WILL TRY TO CREATE IT NEW), " << strerror(errno) << std::endl << std::endl;
id = shmget(tKey, nSharedMemorySize, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH | IPC_CREAT);
if (id == -1) {
std::cerr << "ERROR: shmget() failed, " << strerror(errno) << std::endl << std::endl;
exit(4);
}
}
std::cout << "shmget() successful, id: " << id << std::endl;
unsigned char *pBaseSM = (unsigned char *)shmat(id, (const void *)NULL, SHM_RDONLY);
if (pBaseSM == (unsigned char *)-1) {
std::cerr << "ERROR: shmat() failed, " << strerror(errno) << std::endl << std::endl;
exit(5);
}
std::cout << "shmat() successful " << std::endl;
}
EDIT: output
$ ./a.out 33
ftok() successful. key = 553976853
ERROR: shmget() failed (WILL TRY TO CREATE IT NEW), No such file or directory
shmget() successful, id: 20381699
shmat() successful
$ ./a.out 33
ftok() successful. key = 553976853
shmget() successful, id: 20381699
shmat() successful
SOLUTION - after in-chat (wow SO has a chat!) discussion:
At the end the problem was that in the original code he was calling shmctl() later on to tell to detach the segment as the last process detached it, before the other process was attached.
The problem is that this in fact make the segment private. It's key is marked as 0x00000000 by ipcs -m and cannot be attached anymore by other processes - it's in fact marked for lazy deletion.
I just want to post the result of all the help Sigismondo gave me and post the solution to this issue just in case anyone else has the same problem.
The clue was using "ipcs -m" and noticing that the key value was 0 which means that the shared segment is private and so the 2nd process could not attach to it.
An additional quirk was this: I was calling the following:
int nReturnCode = shmctl(id, IPC_RMID, &m_stCtrlStruct);
My intent was to set the mode for the segment so that it would be deleted when all processes that are using it have exited. However, this call has the side effect of making the segment private even though it was created without using the IPC_EXCL flag.
Hopefully this will help anyone else who trips across this issue.
And, many, many thanks to Sigismondo for taking the time to help me--I learned a lot from our chat!
-Andres

boost removing managed_shared_memory when process is attached

I have 2 processes, process 1 creates a boost managed_shared_memory segment and process 2 opens this segment. Process 1 is then restarted and the start of process 1 has the following,
struct vshm_remove
{
vshm_remove()
{
boost::interprocess::shared_memory_object::remove("VMySharedMemory");
}
~vshm_remove()
{
boost::interprocess::shared_memory_object::remove("VMySharedMemory");
}
} vremover;
I understand that when process 1 starts or ends the remove method will be called on my shared memory but shouldnt it only remove it if Process 2 is not attached to it? I am attaching to the shared memory in process 2 using the following,
boost::interprocess::managed_shared_memory *vfsegment;
vfsegment = new boost::interprocess::managed_shared_memory(boost::interprocess::open_only, "VMySharedMemory");
I am noticing that the shared memory is removed regardless of Process 2 being connected.
I don't believe that there is any mention in the documentation that shared_memory_object::remove will fail if a process is attached.
Please see this section for reference: Removing shared memory. Particularly:
This function can fail if the shared memory objects does not exist or it's opened by another process.
This means that a call to shared_memory_object::remove("foo") will attempt to remove shared memory named "foo" no matter what.
The implementation of that function (source here) reflects that behavior:
inline bool shared_memory_object::remove(const char *filename)
{
try{
//Make sure a temporary path is created for shared memory
std::string shmfile;
ipcdetail::tmp_filename(filename, shmfile);
return ipcdetail::delete_file(shmfile.c_str());
}
catch(...){
return false;
}
}
In my experience with released production code, I've had success not calling shared_memory_object::remove until I no longer need access to the shared memory.
I wrote a very simple example main program that you might find helpful. It will attach to, create, or remove shared memory depending on how you run it. After compiling, try the following steps:
Run with c to create the shared memory (1.0K by default) and insert dummy data
Run with o to open ("attach to") the shared memory and read dummy data (reading will happen in a loop every 10 seconds by default)
In a separate session, run with r to remove the shared memory
Run again with o to try to open. Notice that this will (almost certainly) fail because the shared memory was (again, almost certainly) removed during the previous step
Feel free to kill the process from the second step
As to why step 2 above continues to be able to access the data after a call to shared_memory_object::remove, please see Constructing Managed Shared Memory. Specifically:
When we open a managed shared memory
A shared memory object is opened.
The whole shared memory object is mapped in the process' address space.
Mostly likely, because the shared memory object is mapped into the process' address space, the shared memory file itself is no longer directly needed.
I realize that this is a rather contrived example, but I thought something more concrete might be helpful.
#include <cctype> // tolower()
#include <iostream>
#include <string>
#include <unistd.h> // sleep()
#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
int main(int argc, char *argv[])
{
using std::cerr; using std::cout; using std::endl;
using namespace boost::interprocess;
if (argc == 1) {
cout << "usage: " << argv[0] << " <command>\n 'c' create\n 'r' remove\n 'a' attach" << endl;
return 0;
}
const char * shm_name = "shared_memory_segment";
const char * data_name = "the_answer_to_everything";
switch (tolower(argv[1][0])) {
case 'c':
if (shared_memory_object::remove(shm_name)) { cout << "removed: " << shm_name << endl; }
managed_shared_memory(create_only, shm_name, 1024).construct<int>(data_name)(42);
cout << "created: " << shm_name << "\nadded int \"" << data_name << "\": " << 42 << endl;
break;
case 'r':
cout << (shared_memory_object::remove(shm_name) ? "removed: " : "failed to remove: " ) << shm_name << endl;
break;
case 'a':
{
managed_shared_memory segment(open_only, shm_name);
while (true) {
std::pair<int *, std::size_t> data = segment.find<int>( data_name );
if (!data.first || data.second == 0) {
cerr << "Allocation " << data_name << " either not found or empty" << endl;
break;
}
cout << "opened: " << shm_name << " (" << segment.get_segment_manager()->get_size()
<< " bytes)\nretrieved int \"" << data_name << "\": " << *data.first << endl;
sleep(10);
}
}
break;
default:
cerr << "unknown command" << endl;
break;
}
return 0;
}
One additional interesting thing - add one more case:
case 'w':
{
managed_shared_memory segment(open_only, shm_name);
std::pair<int *, std::size_t> data = segment.find<int>( data_name );
if (!data.first || data.second == 0) {
cerr << "Allocation " << data_name << " either not found or empty" << endl;
break;
}
*data.first = 17;
cout << "opened: " << shm_name << " (" << segment.get_segment_manager()->get_size()
<< " bytes)\nretrieved int \"" << data_name << "\": " << *data.first << endl;
}
break;
The aditional option 'w' causes that the memory be attached and written '17' instead ("the most random random number"). With this you can do the following:
Console 1: Do 'c', then 'a'. Reports the memory created with value 42.
Console 2: Do 'w'. On Console1 you'll see that the number is changed.
Console 2: Do 'r'. The memory is successfully removed, Console 1 still prints 17.
Console 2: Do 'c'. It will report memory as created with value 42.
Console 2: Do 'a'. You'll see 42, Console 1 still prints 17.
This confirms - as long as it works the same way on all platforms, but boost declares that it does - that you can use this way to send memory blocks from one process to another, while the "producer" only needs confirmation that the "consumer" attached the block so that "producer" can now remove it. The consumer also doesn't have to detach previous block before attaching the next one.

Weird glibc error with writing out a file

I'm getting a strange error:
*** glibc detected *** findbasis: free(): invalid next size (normal): 0x0000000006a32ce0 ***
When I try to close() a std::ofstream:
void writeEvectors(int l, parameters params, PetscReal* evectors, int basis_size)
{
for (int n = 1 + l; n <= params.nmax(); n++)
{
std::stringstream fname(std::ios::out);
fname << params.getBasisFunctionFolder() << "/evectors_n" << std::setw(4) << std::setfill('0') << n << "_l" << std::setw(3) << std::setfill('0') << l;
std::ofstream out(fname.str().c_str(), std::ios::binary);
std::cerr << "write out file:" << fname.str() << " ...";
out.write((char*)( evectors + n * basis_size),sizeof(PetscReal) * basis_size);
std::cerr << "done1" << std::endl;
if (out.fail() || out.bad())
std::cerr << "bad or fail..." << std::endl;
out.close();
std::cerr << "done2" << std::endl;
}
std::cout << "done writing out all evectors?" << std::endl;
}
When run, this program never reaches the "done2" (or the "bad or fail..."), however the "done1" is reached. Also, the data that is written out is good (as in what I expect).
I'm honestly at a loss as to why this happens, I can't think of any reason "close()" would fail.
Thanks for any help.
(I'm beginning to think it is some sort of compiler bug/error. I'm running GCC 4.1.2 (!) (RHEL 5 I believe) through mpicxx)
The glibc error sounds like there's a problem with freeing memory. If you run inside Valgrind, a free memory profiler, it ought to give you a more helpful explanation of the error.
Running in Valgrind is fairly painless - just compile the executable with the -g option to add debugging flags (assuming you're using the GNU compiler) and then in your Linux terminal enter valgrind ./your_executable and see what happens.