OpenCL crashes on call to clGetPlatformIDs - c++

I am new to OpenCL. Working on a Core i5 machine with Intel(R) HD Graphics 4000, running Windows 7. I installed the newest Intel driver with support for OpenCL. GpuCapsViewer confirms I have OpenCL support setup. I Developed a simple HelloWorld program using Intel OpenCL SDK. I successfully compile the program but when run, it crashes upon call to clGetPlatformIDs() with a segmentation fault. This is my code:
#include <iostream>
#include <CL/opencl.h>
int main() {
std::cout << "Test OCL without driver" << std::endl;
cl_int err;
cl_uint num_platforms;
err = clGetPlatformIDs(0, NULL, &num_platforms);
if (err == CL_SUCCESS) {
std::cout << "Success. Platforms available: " << num_platforms
<< std::endl;
} else {
std::cout << "Error. Platforms available: " << num_platforms
<< std::endl;
}
std::cout << "Test OCL without driver" << std::endl;
std::cout << "Press button to exit." << std::endl;
std::cin.get();
return 0;
}
How can it be that GpuCapsViewer successfully confirms OpenCL support and can use it to run its demos, but I can't run my code? Both must be using the same functions, right?
Been working on this for days. Even tried re installing the drivers. Any Ideas?
GpuCapsViewer says:
DRIVER: R295.93 (r295_00-233) / 10.18.10.3496 (3-11-2014)
OPENGL: OpenGL 4.2 (GeForce GT 630M/PCIe/SSE2 with 290 ext.)
OPENCL: OpenCL 1.1, GeForce GT 630M compute units:2#950MHz
CUDA: GeForce GT 630M CC:2.1, multiprocessors:2#950MHz
PHYSX: GPU PhysX (NVIDIA GeForce GT 630M)
MULTI-GPU: no multi-GPU support (2 physical GPUs)
UPDATE:
Compilation line:
g++ -I"C:\Program Files (x86)\Intel\OpenCL SDK\4.4\include" -O0 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"Test3.d" -MT"Test3.d" -o "Test3.o" "../Test3.cpp"
Finished building: ../Test3.cpp
Linker line:
g++ -L"C:\Program Files (x86)\Intel\OpenCL SDK\4.4\lib\x64" -o "TestOpenCL" ./HelloWorld.o ./HelloWorld2.o ./Test3.o -lOpenCL
Finished building target: TestOpenCL
OS: Windows 7 Ultimate Version 6.1 (Build 7601: Service Pack 1)
UPDATE 2, Crash Information:
Problem Event Name: APPCRASH
Application Name: TestOpenCL.exe
Application Version: 0.0.0.0
Application Timestamp: 53bc6ac5
Fault Module Name: TestOpenCL.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 53bc6ac5
Exception Code: c0000005
Exception Offset: 0000000000002cc0
OS Version: 6.1.7601.2.1.0.256.1
Locale ID: 1033
Additional Information 1: 56e3
Additional Information 2: 56e3743a8a234df3bdeba0b507471c44
Additional Information 3: 8fe0
Additional Information 4: 8fe0ef5706153941955de850e5612393
UPDATE 3:
Used DependencyWalker(http://dependencywalker.com/) as a substitute for dumpbin. It generates the following warnings:
Warning: At least one delay-load dependency module was not found.
Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module.
The warnings seem to refer to the following DLLs which are all marked with a "Error opening file. The system can not find the file specified(2)" error message.
API-MS-WIN-CORE-COM-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-ERROR-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-ROBUFFER-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-STRING-L1-1-0.DLL
API-MS-WIN-SHCORE-SCALING-L1-1-0.DLL
DCOMP.DLL
IESHIMS.DLL
UPDATE 4, GDB BACKTRACE:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000402cc0 in clGetPlatformIDs ()
(gdb) backtrace full
#0 0x0000000000402cc0 in clGetPlatformIDs ()
No symbol table info available.
#1 0x0000000000402af3 in main () at ../Test3.cpp:11
err = 0
num_platforms = 0
platform = 0x0
(gdb) backtrace
#0 0x0000000000402cc0 in clGetPlatformIDs ()
#1 0x0000000000402af3 in main () at ../Test3.cpp:11
UPDATE 5, GDB DISASS:
(gdb) disass
Dump of assembler code for function clGetPlatformIDs:
=> 0x0000000000402cc0 <+0>: jmpq *0x4b74e8(%rip) # 0x8ba1ae
0x0000000000402cc6 <+6>: nop
0x0000000000402cc7 <+7>: nop
End of assembler dump.
UPDATE 6, GDB INFO SHARED:
(gdb) INFO SHARED
From To Syms Read Shared Object Library
0x0000000077191000 0x00000000773384e0 Yes (*) C:\Windows\system32\ntdll.dll
0x0000000077071000 0x000000007718eab4 Yes (*) C:\Windows\system32\kernel32.dll
0x000007fefc081000 0x000007fefc0eb13c Yes (*) C:\Windows\system32\KernelBase.dll
0x000007fedf8d1000 0x000007fedf8e96aa Yes (*) C:\Windows\system32\OpenCL.dll
0x000007fefe101000 0x000007fefe1da628 Yes (*) C:\Windows\system32\advapi32.dll
0x000007fefe061000 0x000007fefe0fe4bc Yes (*) C:\Windows\system32\msvcrt.dll
0x000007fefdcc1000 0x000007fefdcde39a Yes (*) C:\Windows\SYSTEM32\sechost.dll
0x000007fefc6a1000 0x000007fefc7cc914 Yes (*) C:\Windows\system32\rpcrt4.dll
(*): Shared library is missing debugging information.
Binary file, x64 and include folders:
https://drive.google.com/file/d/0BxKA63T2GnKMRW02QWZnam5lSGM/edit?usp=sharing
UPDATE 7, GPUcaps situation:
GPUcaps detects 2 GPUs:
GPU 1: Intel(R) HD Graphics 4000
GPU 2: NVIDIA GeForce GT 630M
You can see the screenshot here:
https://drive.google.com/file/d/0BxKA63T2GnKMa00tU1gydGNJeXc/edit?usp=sharing
UPDATE 8:
Per #antiduh 's answer, I have been trying to link directly against OpenCL.dll present in Windows\System32 folder. I am using mingw64. I get this:
Invoking: Cross G++ Linker
g++ -L"C:\Windows\System32" -o "TestOpenCL" ./HelloWorld.o ./HelloWorld2.o ./Test3.o -lOpenCL
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/OpenCL.dll when searching for -lOpenCL
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/OpenCL.dll when searching for -lOpenCL
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lOpenCL
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/msvcrt.dll when searching for -lmsvcrt
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/advapi32.dll when searching for -ladvapi32
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/shell32.dll when searching for -lshell32
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/user32.dll when searching for -luser32
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/kernel32.dll when searching for -lkernel32
d:/ws/apps_inst/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.7.1/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:\Windows\System32/msvcrt.dll when searching for -lmsvcrt
UPDATE 9:
I can now compile, link and run the sample code manually with the following line.
g++ -I. s.cpp -L. -lOpenCL
I simplified everything and it just worked. This is obviously very different from the compile and link commands used by Eclipse. Any idea which of the parameters used by eclipse cause the problem? And also, why is it that eclipse first compiles to object files and then attempts to link them, in two separate steps?

There are three total ways for a program to use external library:
Static linkage: Directly insert the library into your executable. The external library, presented as a .lib file, contains nothing but packaged .obj files. Your program invokes functions from the library as normal. The compiler extracts executable code from the lib, inserts it, and performs full, complete linkage against it. It is as if you compiled against the imported functions like they were from your own source code.
Load-time dynamic linkage, aka 'implicit linking': Load the library when you launch the program. The external library, presented as a .dll containing executable code, and a .lib file containing the exports from the .dll, is tentatively linked against by the compiler and linker. The linker uses the .lib to understand how to call the .dll at run-time, and to put in deferred bindings into your program. When the OS launches your program, it performs 'load-time' linking - it looks up all of the deferred bindings, attempts to find a .dll file, finishes the linkage of the deferred bindings in your program, and allows you to run the file.
"Pure" run-time dynamic linkage, aka 'explicit linking': Directly calling LoadLibrary. Your program has no specific references to any .lib, .dll, or otherwise. Your program starts running, itself calls LoadLibrary with a string path to a .dll. LoadLibrary merges the .dll into your virtual memory, and then your program calls GetProcAddress to get a function pointer to the function you want to call. You then use that function pointer to make calls.
You can't normally link against a dll without the .lib. The compiler wants to resolve those function call references to real addresses, but we don't want to put in real addresses since we want DLLs to be loaded into any arbitrary memory address (DLLs are 'relocatable').
From my understanding, a .lib used as an import library contains stubs that the main program links directly against - so all calls in the program go through the stubs. The stubs then have references to an 'Import Address Table". When the OS loads a DLL into memory for a process, it does so by filling out the IAT. The stub then just calls the DLL by making an indirect jump that references the right slot in the IAT.
So if a DLL MathLib has an exported function Factorial that my exe is importing, then the import .lib file has an actual function Factorial that my exe statically compiles against. That Factorial in that .lib looks like the following psuedo code:
int Factorial( int value ) {
// Read MathLib's IAT which should always be at address 0x8ba100.
// Factorial's real address gets stored in slot 2, so add 8 to the address
// to read from.
__asm jmp *0x8ba108; // nb this is an indirect jump.
}
And then we hope that when the OS loads that DLL, that IAT is filled out correctly, else we jump into nothingness.
So I think what happened is that you were compiling against one .lib, but 'load-time' linking against the wrong opencl.dll. The IAT was never created, or was created in the wrong place, and so you jumped into nothingness; that's why this line created a segfault:
0x0000000000402cc0 <+0>: jmpq *0x4b74e8(%rip) # 0x8ba1ae
So lets figure out why we linked wrong. There could be 3 sets of opencl.dll/opencl.lib files on your computer:
The opencl.lib/dll that comes from Kronos, and is actually just a stub/loader library that figures out what real providers are on your computer and does dispatches function calls to the actual right lib.
The opencl.lib/dll that comes from Intel from their SDK and drivers.
The opencl.lib/dll that comes from Nvidia from their drivers.
Which of these files did you actually have? My estimate is thus:
The opencl.dll that came from kronos got installed into c:\windows\system32.
There is no opencl.lib from Kronos
There was probably no opencl.lib from nvidia, since you didn't have their SDK installed.
You probably had an opencl.lib and opencl.dll from Intel since you did have their SDK installed.
You were definitely linking against the Intel opencl.lib, but appeared to be loading the Kronos opencl.dll in c:\windows\system32. One solution would be to get the program to load the Intel opencl.dll when you run the program by putting their dll in your program's directory.
However, you state that you were able to make things work using this compilation line:
g++ -I. s.cpp -L. -lOpenCL
There's something neat about gcc on Windows - in order to link against a library, you don't need to have the .lib. Gcc figures it out for you by inspecting the dll; other people have figured out how to do the same when someone gives them a dll but no lib. In most other compilers, especially Visual Studio, you need to have a .lib and a .dll to link against something. That's why the Win SDK installs hundreds of .lib (kernel32.lib, eg). Turns out that the compiler can actually infer it if it wanted to, but libs exist as an archaic mechanism.
Anyway, you ran that above gcc link line, it found a suitable opencl.dll using the search path, invented its own .lib for it, and compiled against it; you launched your program, it used that same search path to get an opencl.dll, it was the same one you compiled against, so your program runs. Whew.
I still have some suggestions:
Find an opencl.lib and opencl.dll pair that come from Kronos's "Installable Client Driver" ICD Loader. That loader will then figure out how to bind to a particular provider (nvidia, intel, etc) at runtime.
Distribute the Kronos opencl.dll with your application so that you will never accidentally run-time-link against the wrong file.
Uninstall the Intel SDK, assuming it's providing opencl.lib/opencl.dll files that are specific to Intel.
Some more relevant questions on libs and dlls:
When building a DLL file, does the generated LIB file contain the DLL name?
Why are LIB files beasts of such a duplicitous nature?

Related

Can't statically link with libnvcuvid.so

I am trying to use the latest NVIDIA Video SDK, specifically - its NVDEC (hardware video decoder lib). I had been using the previous version for a while and it was loading function pointers in runtime from libnvcuvid.so, which on my machine is located in:
/usr/lib/nvidia-396/
It contains the folowing related items:
/usr/lib/nvidia-396/libnvcuvid.so
/usr/lib/nvidia-396/libnvcuvid.so.1
/usr/lib/nvidia-396/libnvcuvid.so.396.18
Now,in the latest SDK 8.1,there is no loading of library function pointers in runtime, but all the API methods marked as extern and static linking is used. On Windows they provide nvcuvid.lib. But on linux, the are only above mentioned SOs. My IDE targets that directory (triple checked;if I remove the path,the linker complains that it can't find the lib) correctly, also I put the libnvcuvid.so on the linker exactly the same way as I put cuda.so and cudart.so in the same place for static linking vs CUDA API. But I am still getting
"undefined reference"
for all cuvid functions declared in the latest header. As you can see, my drivers version is also up to date (8.1 requires at least 390).
Why it doesn't link?
UPDATE (linker):
/usr/bin/g++ -o bin/xxxxx_xxx_d #"xxxxx_xxx.txt" -L. -LDebug
-L/usr/lib/nvidia-396 -L/usr/local/cuda-9.1/lib64 -lcuda -lcudart -lnvcuvid .....

explicitly link intel icpc openmp

I have the intel compiler install at the following $HOME/tpl/intel. When I compile a simple hello_omp.cpp with openMP enabled
#include <omp.h>
#include <iostream>
int main ()
{
#pragma omp parallel
{
std::cout << "Hello World" << std::endl;
}
return 0;
}
I compile with ~/tpl/intel/bin/icpc -O3 -qopenmp hello_omp.cpp but when I run I get the following error:
./a.out: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory.
I would like to explicitly link the intel compiler and the appropriate library during the make process without using the LD_LIBRARY_PATH?
You have 2 simple solutions for your problem:
Linking statically with the Intel run time libraries:
~/tpl/intel/bin/icpc -O3 -qopenmp -static_intel hello_omp.cpp
Pros: you don't have to care where the Intel run time environment is installed on the machine where you run the binary, or even having it installed altogether;
Cons: your binary becomes bigger and won't allow to select a different (more recent ideally) run time environment even when it is available.
Adding the search path for dynamic library into the binary using the linker option -rpath:
~/tpl/intel/bin/icpc -O3 -qopenmp -Wl,-rpath=$HOME/tpl/intel/lib/intel64 hello_omp.cpp
Notice the use of -Wl, to transmit the option to the linker.
I guess that is more like what you were after than the first solution I proposed so I let you devise what the pros and cons are for you in comparison.
Intel Compiler ships compilervars.sh script in the bin directory which when sourced will set the appropriate env variables like LD_LIBRARY_PATH, LIBRARY_PATH and PATH with the right directories which host OpenMP runtime library and other compiler specific libraries like libsvml (short vector math library) or libimf (more optimized version of libm).

MFXInit() in libmfx.a segfaults when called from shared object

(While Intel's forum is a more natural place to ask this question I'm posting it here hoping for more activity than Intel's total lack thereof -- so far)
I'm unable to create a dynamic link library that uses Intel Media SDK (linux server) to manipulate h264 video and noticed a problem in the design of the MFX library. The way I understand it, programs are supposed to link to static library, like:
$ g++ .... -L/opt/intel/mediasdk/lib/lin_x64 -lmfx
However, this libmfx.a library appears to delegate all calls to a dlopened dynamic library /opt/intel/mediasdk/lib64/libmfxhw64.so. It is worth noting that function names (and signatures) exposed by static and dynamic libraries are identical, which is kind of confusing and dangerous.
While I don't understand the rationale behind this design, it should not be a problem by itself were it not that apparently some static/global initialization from within the library causes havoc when the (static) libmfx.a is included in a shared object. Ie.:
+------+ +-----------+
| main | <-- | mylib.so |
+------+ | | +---------------+
| libmfx.a | (dlopen) | libmfxhw64.so |
| <------------- |
|+---------+| |+-------------+|
||MFXInit()|| || MFXInit() ||
||... || || ... ||
|| || || ||
+===========+ +===============+
The above library could be assembled like this:
$ g++ -shared -o mylib.so my1.o my2.o -lmfx
And then (dynamically) linked to main.o like so:
$ g++ -o main main.o mylib.so -ldl
(Note that the additional libdl is necessary to allow libmfx.a to dlopen() libmfxhw64.so.)
Unfortunately, upon the first MFXInit() call, the program causes a segmentation fault (accessing address 0x0000400). GDB backtrace:
#0 0x0000000000000400 in ?? ()
#1 0x00007ffff61fb4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x00007ffff7bd3a1f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) () from ./lib-a.so
#3 0x00007ffff7bd12b1 in MFXInit () from ./lib-a.so
#4 0x00007ffff7bd09c8 in test_mfx () at lib.c:12
#5 0x0000000000400744 in main (argc=1, argv=0x7fffffffe0d8) at main.c:8
(Observe that MFXInit() at stackframe #3 is the one in libmfx.a whereas the one at #1 is in libmfxhw64.so.)
Note that there is no crash when mylib is created as a static library. Using breakpoints and disassembler, I managed to make following backtrace snapshot where in both cases #1 is at MFXInit+424, but they appear to hit different versions of MFXQueryVersion (absolute addresses are meaningless due to relocation):
#0 0x00007ffff6411980 in MFXQueryVersion () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#1 0x00007ffff640c4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x000000000040484f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) ()
#3 0x00000000004020e1 in MFXInit ()
#4 0x0000000000401800 in test_mfx () at lib.c:12
#5 0x0000000000401794 in main (argc=1, argv=0x7fffffffe0e8) at main.c:8
Because both static and shared Intel libs expose the same API functions, I can link straight into libmfxhw64.so guts directly, but I suppose that bypassing the static "dispatcher" is without warranty(?)
Could someone explain Intel's idea behind said design? Spec., why provide a static library that only delegates to an .so that has identical interface?
Also, it appears that the SEGV is caused by static/global data in either libmfx.a or libmfxhw64.so. Is there a way to force a specific execution order on dynamically loaded static/global sections? What is the best approach to debug these kinds of problems?
Tested with Intel Media SDK R2 (ubuntu 12) and Intel Media SDK 2015R3-R5 (Centos 7, 1.13/1.15) on Intel Haswell i7-4790 #3.6Ghz
If you have a working Intel MSDK setup, please compile my example code to confirm the issue.
At the very end of the file "readme-dispatcher-linux.pdf" in recent releases of the dispatcher source code, there is this:
There is slight difference between using Dispatcher library from
executable module or from shared object. To mitigate symbol conflict
between itself and SDK shared object on Linux*, application should:
1) link against libdispatch_shared.a instead of libmfx.a
2) define MFX_DISPATCHER_EXPOSED_PREFIX before any SDK includes
I have used this, and it works to address the symbol conflict issue you describe.
You can find this file, if you install "Intel Media Server Studio Professional 2016". There is a free community edition. The source files and the PDF will be found at /opt/intel/mediasdk/opensource/
(OK, since no one seems eager, I'll do the inelegant thing and post an answer to my own question).
After considerable research trying to break the unintentional circular linking, I discovered that the ld option --exclude-libs provides solace. Essentially, I was looking for a way to force removal of any libmfx.a symbols after using them to resolve dependencies in lib.o while creating the DLL. This could be accomplished by creating the so like this:
g++ -shared -o lib-a.so lib.o -L/opt/intel/mediasdk/lib/lin_x64 -lmfx -Wl,--exclude-libs=libmfx
Once the library is created like this, Bob's you uncle:
g++ -o main-so-a main.o lib-a.so -ldl
(Note that libdl is still needed because Intel's MFX (now inside lib-a.so) still uses dlopen to discover libmfxhw64.so)
From the ld man page:
--exclude-libs lib,lib,...
Specifies a list of archive libraries from which symbols should not be
automatically exported. The library names may be delimited by commas or
colons. Specifying "--exclude-libs ALL" excludes symbols in all archive
libraries from automatic export. This option is available only for the
i386 PE targeted port of the linker and for ELF targeted ports. For i386
PE, symbols explicitly listed in a .def file are still exported,
regardless of this option. For ELF targeted ports, symbols affected
by this option will be treated as hidden.
So, essentially the trick is no make sure that the relevant ELF symbols are marked hidden. Normally this would be handled through #pragmas by the library developers (ie. Intel), but due to their negligence this needs to be retrofitted in this case.
I suppose the same could have been accomplished with a --version-script map file, but that might have turned out to be more fragile since we want to fully encapsulate libmfx.a anyway.

Cannot Debug Shared Library - Symbols Not Loading Properly

I am currently writing a small library, and I want to check it for leaks (among other things); however, for some reason, gdb is not loading the library symbols. I have read many other posts on here (and various other places on the internet) about this, however, I cannot seem to find a solution. Here is what is going on:
I am compiling the shared library with the following flags (these are included in both the final shared library as well as all object files):
CFLAGS=-Wall -O0 -g -fPIC
Likewise, I am compiling the binary memtest (the client application for the library) to check for memory leaks and such with these flags
CFLAGS=-Wall -O0 -g
Now, I inserted a NULL pointer into the library to test if I could trace through it and "debug" the pointer (i.e. it's making it crash). So I try to run it through gdb, but it's a no go. The output of info sharedlibrary is the same for both the executable and the core:
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
... Some libraries I am not worried about debugging...
0x00d37340 0x00d423a4 Yes (*) /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0 <--- My lib
.... and some more....
(*): Shared library is missing debugging information.
As you can see, it's not loading the debug information. I am uncertain as to why this is. I have built and linked everything with the -g flag, and I even try -ggdb and -g3 but nothing seems to work properly. When I load in a core dump, this is what I see:
...some libs...
Reading symbols from /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0...done.
Loaded symbols for /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
...some more libs...
Notice how my library does not give a (no debugging symbols found) error - anyone have any ideas why? As I said before, I am unable to debug this through running the program gdb ./memtest or through debugging the core file.
Thanks for your help.
EDIT It may also be important to note, that (if you didn't realize by path) this library is a local shared library (i.e. I'm using -Wl,-rpath to link/load it)
EDIT2 It seems my version of GDB was out-of-date. Now, I have updated to the latest version from the CVS server (I have also tried latest release version 7.2) and it can "load" symbols. My info sharedlibrary now reads this:
0x00e418b0 0x00e4be74 Yes /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
However, I am still unable to step through any functions (in the shared library) - anyone have any ideas?
EDIT3 I have also tried to step through linking against a static library (libMyLIB.a) but it still isn't working. My OS is CentOS 5.6; does anyone know of any issues with this system? Also, just another confirmation that my symbols are being loaded (it just can't step through any shared lib function for some reason)
(gdb) sharedlibrary MyLIB
Symbols already loaded for /home/raged/MyLIB/memtest/../lib/libMyLIB.so.0
I found the reason this wasn't working: I was calling an old function call to initialize a pointer in my test executable. Since the object was never being created, I could never step into the library. Once I updated the function call, all worked well.
That said, if anyone experiences similar issues while all symbols appear to be loaded, be sure to check that all pointers are initialized properly even if they have the correct type.

GCC debugger stack trace displays wrong file name and line number

I am trying to port a fairly large C++ project to using g++ 4.0 on Mac OS X. My project compiles without errors, but I can't get GDB to work properly. When I look at the stack by typing "bt" on the GDB command line, all file names and line numbers displayed are wrong.
For example, according to the GDB stack trace, my main() function is supposed to be in stdexcept from the Mac OS X SDK, which does not make any sense.
What could cause GDB to malfunction so badly? I've already checked for #line and #file statements in my code and made sure that the code only has unix line endings. I've also cleaned and rebuilt the project. I've also tried debugging a Hello World project and that one did not have the same problem.
Could the problem have to do with one of the third party libraries I am linking and the way those are compiled? Or is it something completely different?
Here are two exemplary calls to gcc and ld as executed by Xcode. AFAIK all cpp-files in my project are compiled and linked with the same parameters.
/Developer/usr/bin/gcc-4.0 -x c++
-arch i386 -fmessage-length=0 -pipe -Wno-trigraphs -fpascal-strings -fasm-blocks -O0 -fpermissive -Wreturn-type -Wunused-variable -DNO_BASS_SOUND -D_DEBUG -DXCODE -D__WXMAC__ -isysroot /Developer/SDKs/MacOSX10.5.sdk
-mfix-and-continue -fvisibility-inlines-hidden -mmacosx-version-min=10.4 -gdwarf-2 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -D__WXDEBUG__ -D__WXMAC__ -c "/Users/adriangrigore/Documents/Gemsweeper
Mac/TSDLGameBase.cpp" -o
"/Users/adriangrigore/Documents/Gemsweeper
Mac/build/Gemsweeper
Mac.build/Debug/Gemsweeper
Mac.build/Objects-normal/i386/TSDLGameBase.o"
/Developer/usr/bin/g++-4.0 -arch i386
-isysroot /Developer/SDKs/MacOSX10.5.sdk
"-L/Users/adriangrigore/Documents/Gemsweeper
Mac/build/Debug"
-L/Developer/SDKs/MacOSX10.5.sdk/usr/local/lib
-L/opt/local/lib "-F/Users/adriangrigore/Documents/Gemsweeper
Mac/build/Debug"
-F/Users/adriangrigore/Library/Frameworks
-F/Developer/SDKs/MacOSX10.5.sdk/Library/Frameworks
-filelist "/Users/adriangrigore/Documents/Gemsweeper
Mac/build/Gemsweeper
Mac.build/Debug/Gemsweeper
Mac.build/Objects-normal/i386/Gemsweeper
Mac.LinkFileList"
-mmacosx-version-min=10.4 /opt/local/lib/libboost_program_options-mt.a
/opt/local/lib/libboost_filesystem-mt.a
/opt/local/lib/libboost_serialization-mt.a
/opt/local/lib/libboost_system-mt.a
/opt/local/lib/libboost_thread-mt.a
"/Users/adriangrigore/Documents/Gemsweeper
Mac/3rd
party/FreeImage/Dist/libfreeimage.a"
"/Users/adriangrigore/Documents/Gemsweeper
Mac/3rd
party/cpuinfo-1.0/libcpuinfo.a"
-L/usr/local/lib -framework IOKit -framework Carbon -framework Cocoa -framework System -framework QuickTime -framework OpenGL -framework AGL -lwx_macd_richtext-2.8 -lwx_macd_aui-2.8 -lwx_macd_xrc-2.8 -lwx_macd_qa-2.8 -lwx_macd_html-2.8 -lwx_macd_adv-2.8 -lwx_macd_core-2.8 -lwx_base_carbond_xml-2.8 -lwx_base_carbond_net-2.8 -lwx_base_carbond-2.8 -framework SDL -framework Cocoa -o "/Users/adriangrigore/Documents/Gemsweeper
Mac/build/Debug/Gemsweeper
Mac.app/Contents/MacOS/Gemsweeper Mac"
Please note that I have already asked a similar question regarding the Xcode debugger here, but I am reposting since I just learned that this is in fact not Xcode's fault, but a problem with GCC / ld / GDB.
Edit: My project makes use of the following third-party libraries: SDL, Boost, wxWidgets. I am not sure if this matters for this problem, but I just wanted to mention it just in case it does.
I've tried compiling an Xcode SDL project template and did not experience the same problem, so it must be due to something special in my project.
Second Edit: As I just found out, I made a mistake while searching files with the string "This is an automatically generated". I just found several dozen files with the same string, all belonging to FreeImage, one of the third party libraries I am using. So, the problem seems to be related to FreeImage, but I am not still not sure how to proceed.
I got those symptoms, when my gdb version didn't match my g++ version.
Try to get the newest gdb.
Your cpp files certainly have debug symbols in them (the -gdwarf-2 option).
Do you use a separate dSYM file for the debug symbols? Or are they inside the object files. I would first try to use DWARF in dSYM files and see if that helps (or vice versa)
The third party libraries appear to be release builds though (unless you renamed them yourself of course) e.g. I know for sure boost uses the -d monniker in the library names to denote debug libraries (e.g. libboost_filesystem-mt-d.a).
Now, this shouldn't really pose a problem, it should just mean you can't step into the calls made to third party libraries. (at least not make any sense of it when you do ;) But since you have problems, it might be worth a try to link with debug versions of those libraries...
Are you compiling with optimization on? I've found that O2 or higher messes with the symbols quite a bit, making gdb and core files pretty much useless.
Also, be sure you are compiling with the -g option.
Can it be that you are using SDL? SDL redefines main so your main will be named SDL_main and that the SDL parts might be heavy optimized so down there you'll have problem getting good gdb output.
...just a thought
Read this
For a test, you could check if addr2line gives you expected values. If so, this would indicate that there's nothing wrong with the ELF generated by your compile/link parameters and casts all suspicion on GDB. If not, then suspicion is still on both the tools and the ELF file.
I've tried compiling an XCode SDL
project template and did not
experience the same problem, so it
must be due to something special in my
project.
Correct. Your project settings are the thing that is different.
You will need to disable the debug optimizations in the Xcode project settings for the debug build. Xcode unfortunately makes GDB jump to weird lines (out of order) when you would expect it to move sequentially.
Go to your project settings. Set the following
1) Instruction Scheduling = None
2) Optimization Level = None [-O0]
3) ZERO_LINK = None
Your problems should go after after doing this.
Here is the project settings screen that you need to change the settings on:
From your flags the debug information should be in the object files.
Does your project settings build the executable in one location then move the final executable to another location when completed? If this is the case then gdb may not be finding the objectects files and thus not correctly retrieving the debug information from the object files.
Just a guess.
I encountered this several years ago when transitioning from the Codewarrior compilers to Xcode. I believe the way to get around this is to put the flag "-fno-inline-functions" in Other C Flags (for Dev only).
This problem was more pronounced on the PowerPC architecture for us.
What about if you remove the "-fvisibility-inlines-hidden" and "-mfix-and-continue" flags?
I've never had the "fix and continue" feature work properly for me.
WxWidgets do also define their own main if you use their IMPLEMENT_APP() macro
From here
As in all programs there must be a "main" function. Under wxWidgets main is implemented using this macro, which creates an application instance and starts the program.
IMPLEMENT_APP(MyApp)
See my answer here
I have now downloaded and compiled the FreeImage sources and yes, the file b44ExpLogTable.cpp is compiled into libfreeimage.a. The problem looks like the script gensrclist.sh just collects all .cpp files without skipping the one with a main in. That script generates a file named Makefile.srcs but one is already supplied. (running it on my Leopard failed, some problem with sh - it worked if I changed sh to bash)
Before you have changed anything this gives an a.out
c++ libfreeimage.a
The file Makefile.srcs has already been created so you should be able to remove the file b44ExpLogTable.cpp from it. Then do
make -f Makefile.osx clean
make -f Makefile.osx
When this is done the above c++ libfreeimage.a should give the following error
Undefined symbols:
"_main", referenced from:
start in crt1.10.5.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
I have a new thing you can try.
Just before your own main you can write
#ifdef main
# error main is defined
#endif
int main(int argc, char *argv[]) {
this should give an error if you have some header that redefines main.
If you define an own you might get an warning where a previous definition was made
#define main foo
int main(int argc, char *argv[]) {
You can also try to undef just before your main
#undef main
int main(int argc, char *argv[]) {