cudaErrorIllegalAdress in Profiler - c++

I have a CUDA program that uses thrust in some places but also normal kernels.
The problem is: When I run the program standalone, everything works fine. When I run it in the profiler (Visual profiler or nvprof cmd line) the program crashes in a thrust::inclusive_scan operation with a cudaErrorIllegalAdress error. The crash happens always in the profiler and always at the same position. Furthermore, I have multiple Iterations like:
void foo(){ cudaProfilerStart();
for(...){//...
thrust::inclusive_scan(...);//...
}
cudaProfilerStop();
}
for(...) foo();
The crash always happens on the first call to inclusive_scan in the 2nd iteration.
I'm cusing CUDA 6.5 on Win7 with a Quadro K5000.
Any ideas what can cause this or how to narrow it down? Maybe a way to get the adress of the failed access? cuda-memcheck cannot be used with nvprof AFAIK(?)
If I remove the calls to cudaProfilerStart/Stop it seems to work ok. Strange enough, it DID work today morning with them although I did not introduce any changes (did some code editing but reverted everything via git) Also the behaviour does not change if I disable/enable profile-from-start (with cudaProfilerStart/Stop in place)
Minimal working example:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <thrust/device_vector.h>
#include <cuda_profiler_api.h>
void foo(){
thrust::device_vector<int> d_in(100), d_out(100);
thrust::inclusive_scan(d_in.begin(), d_in.end(), d_out.begin());
cudaError_t res = cudaDeviceSynchronize();
std::cout << cudaGetErrorString(res) << std::endl;
}
int main(){
cudaProfilerStart();
foo();
cudaProfilerStop();
foo(); // Crash here
cudaDeviceReset();
return 0;
}
Some more scenarios:
Start(); foo(); Stop(); foo() crash
Start(); foo(); Stop(); Start(); foo() OK
Start(); foo(); Stop(); any_other_kernel(); Start(); foo() crash

This behaviour appears to be due to a limitation in the CUDA 7.0 and earlier profiler system. A fix will be available in the CUDA 7.5 release toolkit.
[This answer has been assembled from comments and added as a community wiki entry to get the question off the unanswered queue]

Related

BUS error during thread invocation

The following code causes bus error in Rasbian distro on a rasberry pi mod2 system.
#include <thread>
#include <iostream>
class bar {
public:
void startFoo() {
std::thread t(&bar::foo, this); //bus error because of this line
t.join();
}
void foo() {
std::cout << "hello from member function" << std::endl;
}
};
int main()
{
bar b;
b.startFoo();
return 0;
}
This link states that bus error occurs when your processor cannot even attempt the memory access requested. But in my code i am accessing the class own member function in thread. I cannot relate how it is causing bus error. Could someone clarify me? P.S. The source code was cross compiled on a Ubuntu OS running on a x86 PC and the binary was tested on Rasberry pi (ARM).

pure virtual method called without active exception - run-time err0r

this is a very basic code, after running it, i have this run-time error.
class A{
A(){...
}
~A(){...
t.detach();
}
start_tread(){
t=std::thread(&A::back_groud_job, this);
}
void back_groud_job(){...}
}
main///
A a =new A();
a.start_thread()'
////just a skileton
this code runs fine under windows vs and ,mingw.
on linux g++ i am having this run-time error, i read something about a bug, but it was g++4.6, i am using g++4.9...
what do i miss, and how do i fix this?

Catching exception from a specialized method

Let's consider the following three files.
tclass.h:
#include <iostream>
#include <vector>
template<typename rt>
class tclass
{
public:
void wrapper()
{
//Storage is empty
for(auto it:storage)
{
}
try
{
thrower();
}
catch(...)
{
std::cout << "Catch in wrapper\n";
}
}
private:
void thrower(){}
std::vector<int> storage;
};
spec.cpp:
#include "tclass.h"
//The exact type does not matter here, we just need to call the specialized method.
template<>
void tclass<long double>::thrower()
{
//Again, the exception may have any type.
throw (double)2;
}
main.cpp:
#include "tclass.h"
#include <iostream>
int main()
{
tclass<long double> foo;
try
{
foo.wrapper();
}
catch(...)
{
std::cerr << "Catch in main\n";
return 4;
}
return 0;
}
I use Linux x64, gcc 4.7.2, the files are compiled with this command:
g++ --std=c++11 *.cpp
First test: if we run the program above, it says:
terminate called after throwing an instance of 'double'
Aborted
Second test: if we comment for(auto it:storage) in the tclass.h file, the program will catch the exception in main function. WATWhy? Is it a stack corruption caused by an attempt to iterate over the empty vector?
Third test: lets uncomment back the for(auto it:storage) line and move the method specialization from spec.cpp to main.cpp. Then the exception is caught in wrapper. How is it possible and why does possible memory corruption not affect this case?
I also tried to compile it with different optimization levels and with -g, but results were the same.
Then I tried it on Windows 7 x64, VS2012 express, compiling with x64 version of cl.exe with no extra command line arguments. At the first test this program produced no output, so I think it just crashed silently, so the result is similar with Linux version. For the second test it produced no output again, so result is different from Linux. For the third test the result was similar with Linux result.
Are there any errors in this code so they can lead to such behavior? May the results of the first test be caused by possible bug in compilers?
With your code, I have with gcc 4.7.1:
spec.cpp:6: multiple definition of 'tclass<long double>::thrower()'
You may correct your code by declaring the specialization in your .h as:
template<> void tclass<long double>::thrower();

GDB does not break on some lines of code when using multiple source files

I have a program that I'd like to debug by setting a breakpoint in a non-default constructor, but the breakpoint I set is never hit. Below is an example program where this problem comes up. There is no problem hitting breakpoints set in the main function, but any breakpoints set in the Domain.cpp file are ignored:
Main.cpp:
#include <iostream>
#include "Domain.h"
int main()
{
Domain y;
std::cout << y.x << std::endl; // <- No problem setting breakpoint here
return 0;
}
Domain.cpp:
#include "Domain.h"
Domain::Domain()
{
x = 4; // <- A breakpoint here is skipped
}
Domain.h:
#ifndef DOMAIN_H_
#define DOMAIN_H_
class Domain
{
public:
int x;
public:
Domain();
};
#endif /* DOMAIN_H_ */
However, the problem does not exist if I put everything into a single file:
Main2.cpp:
#include <iostream>
int main()
{
class Domain
{
public:
int x;
Domain()
{
x = 4; // <- No problem setting breakpoint here now!
};
};
Domain y;
std::cout << y.x << std::endl;
return 0;
}
Why is this the case? How can I change this so that I'm able to set breakpoints when I use multiple files?
I can confirm that the breakpoints aren't working both when I run the debugger manually in a terminal and when I run it through Eclipse CDT, where I get the same error discussed in this question which was apparently never answered:
Why does Eclipse CDT ignore breakpoints?
I am using:
Eclipse Kepler
Mac OSX 10.8.4
gdb 6.3.5 (Apple version)
gcc 4.2.1 with -O0 and -g3 flags
Please be patient with me. I'm still learning the ropes.
You are likely hitting this GDB bug.
This bug has long been fixed, but your version of GDB is very old (and Apple is unlikely to update it).
This is a very interesting anomaly that I would like to explore further, but I suspect it's related to Eclipse's default GCC settings. Many super basic functions like these get optimized out when they hit the compiler. (one time I tried to track a simple for loop, but the viable was removed entirely on GCC's highest optimization settings)

QueueUserAPC - throwing exception crashes, possible mingw bug

Solved: I upgraded from mingw 4.6.2 to 4.7.0 and it works perfectly, guess it was just a bug
I started to do some research on how terminate a multithreaded application properly and I found those 2 post(first, second) about how to use QueueUserAPC to signal other threads to terminate.
I thought I should give it a try, and the application keeps crashing when I throw the exception from the APCProc.
Code:
#include <stdio.h>
#include <windows.h>
class ExitException
{
public:
char *desc;
DWORD exit_code;
ExitException(char *desc,int exit_code): desc(desc), exit_code(exit_code)
{}
};
//I use this class to check if objects are deconstructed upon termination
class Test
{
public:
char *s;
Test(char *s): s(s)
{
printf("%s ctor\n",s);
}
~Test()
{
printf("%s dctor\n",s);
}
};
DWORD CALLBACK ThreadProc(void *useless)
{
try
{
Test t("thread_test");
SleepEx(INFINITE,true);
return 0;
}
catch (ExitException &e)
{
printf("Thread exits\n%s %lu",e.desc,e.exit_code);
return e.exit_code;
}
}
void CALLBACK exit_apc_proc(ULONG_PTR param)
{
puts("In APCProc");
ExitException e("Application exit signal!",1);
throw e;
return;
}
int main()
{
HANDLE thread=CreateThread(NULL,0,ThreadProc,NULL,0,NULL);
Sleep(1000);
QueueUserAPC(exit_apc_proc,thread,0);
WaitForSingleObject(thread,INFINITE);
puts("main: bye");
return 0;
}
My question is why does this happen?
I use mingw for compilation and my OS is 64bit.
Can this be the reason?I read that you shouldn't call QueueApcProc from a 32bit app for a thread which runs in a 64bit process or vice versa, but this shouldn't be the case.
EDIT: I compiled this with visual studio's c++ compiler 2010 and it worked flawlessly, it is possible that this is a bug in gcc/mingw?
I can reproduce the same thing with VS2005. The problem is that the compiler optimizes the catch away. Why? Because according to the C++ standard it's undefined what happens if an extern "C" function exits with an exception. So the compiler assumes that SleepEx (which is extern "C") does not ever throw. After inlining of Test::Test and Test::~Test it sees that the printf doesn't throw either, and consequently if something in this block exits via an exception
Test t("thread_test");
SleepEx(INFINITE,true);
return 0;
the behavior is undefined!
In MSVC the code doesn't work with the /EHsc switch in Release build, but works with /EHa or /EHs, which tell it to assume that C function may throw. Perhaps GCC has a similar flag.