working with VexCL "compiling binaries" - c++

I want to make a program "which will be distributed to customers" , so I wanna protect my kernels code from hackers "some one told me that AMD driver some-how puts the kernel source inside the binary , so hacker can log the kernel with AMD device"
as I'm not experienced yet with VexCL , what is the proper compile line to just distribute binaries
for example with CUDA I can type : nvcc -gencode arch=compute_10,code=sm_10 myfile.cu -o myexec
what is the equivilent in VexCL?
also does VexCL work on Mac OS?which IDE? (this is a future task as I didn't have experience on Mac OS before)
my previous experience with OpenCL was by using STDCL library "but it is buggy on windows, no Mac support"

I am the developer of VexCL, and I have also replied to your question here.
VexCL generates OpenCL/CUDA kernels for the expressions you use in your code at runtime. Moreover, it allows the user to dump the generated kernel sources to the standard output stream. For example, if you save the following to a hello.cpp file:
#include <vexcl/vexcl.hpp>
int main() {
vex::Context ctx(vex::Filter::Env);
vex::vector<double> x(ctx, 1024);
vex::vector<double> y(ctx, 1024);
y = 2 * sin(M_PI * x) + 1;
}
then compile it with
g++ -o hello hello.cpp -std=c++11 -I/path/to/vexcl -lOpenCL -lboost_system
then set VEXCL_SHOW_KERNELS=1 and run the compiled binary:
$ export VEXCL_SHOW_KERNELS=1
$ ./hello
you will see the kernel that was generated for the expression y = 2 * sin(M_PI * x) + 1:
#if defined(cl_khr_fp64)
# pragma OPENCL EXTENSION cl_khr_fp64: enable
#elif defined(cl_amd_fp64)
# pragma OPENCL EXTENSION cl_amd_fp64: enable
#endif
kernel void vexcl_vector_kernel
(
ulong n,
global double * prm_1,
int prm_2,
double prm_3,
global double * prm_4,
int prm_5
)
{
for(size_t idx = get_global_id(0); idx < n; idx += get_global_size(0))
{
prm_1[idx] = ( ( prm_2 * sin( ( prm_3 * prm_4[idx] ) ) ) + prm_5 );
}
}
VexCL also allows to cache the compiled binary sources (in the $HOME/.vexcl folder by default), and it saves the source code with the cache.
From the one hand, the sources that you see are, being automatically generated, not very human-friendly. On the other hand, those are still more convenient to read than, e.g., disassembled binary. I am afraid there is nothing you can do to keep the sources away from 'hackers' except may be modify VexCL source code to suite your needs. The MIT license allows you to do that, and, if you are ready to do this, I could provide you with some guidance.
Mind you, NVIDIA OpenCL driver does it's own caching, and it also stores the kernel sources together with the cached binaries (in the $HOME/.nv/ComputeCache folder). I don't know if it is possible to alter this behavior, so 'hackers' could still get the kernel sources from there. I don't know if AMD does similar thing, but may be that is what your source meant by "log the kernel with AMD device".
Regarding the MacOS compatibility, I don't have a MacOS machine to do my own testing, but I had reports that VexCL does work there. I am not sure what IDE was used.

Related

Can't compile with mingw linking a library on Linux to create executable for Windows

I'm trying to compile C/C++ code from my Debian partition to generate some executable files for Windows.
Running $ uname -a on the command line gives Linux machine 5.14.0-2-amd64 #1 SMP Debian 5.14.9-2 (2021-10-03) x86_64 GNU/Linux. My processor is an Intel® Core™ i5-1035G4 CPU # 1.10GHz × 8, with a Mesa Intel® Iris(R) Plus Graphics (ICL GT1.5) integrated GPU.
A minimal example to show my current situation includes the following code (called code.cpp):
#include <iostream>
#include <CL/opencl.hpp>
int main()
{
std::vector <cl::Platform> all_platforms; //Get all platforms
cl::Platform::get(&all_platforms);
if (all_platforms.size() == 0)
{
std::cout << "No platforms found. Check OpenCL installation." << std::endl;
exit(1);
}
int pz = all_platforms.size();
std::cout << "Platforms size: " << pz << std::endl;
for (int i = 0; i < pz; i++)
{
cl::Platform default_platform = all_platforms[i];
std::cout << "Using platform: " << default_platform.getInfo<CL_PLATFORM_NAME>() << std::endl;
}
return(0);
}
which uses OpenCL to print all recognized devices. I compile my code writing g++ code.cpp -o code.out -lOpenCL. The executable file code.out works fine, doing what you would expect it to do. I have another program which uses GSL (GNU Scientific Library) written in C which also works well, linking with -lgsl (therefore I think there's not a problem with my code or the regular compilation process). Both OpenCL and GSL were installed from the official repositories (~# apt install ...) with no problem at all. When I execute code.out the output is
Platforms size: 2
Using platform: Intel(R) OpenCL HD Graphics
Using platform: Portable Computing Language
I installed mingw (via ~# apt install mingw-w64) to create executable files to be run on Windows, and for basic programs (i.e. without "external" libraries) it works well (replacing gcc by x86_64-w64-mingw32-gcc or i686-w64-mingw32-gcc). However for the code written above (and for the one using GSL) it doesn't work. Most of the error outputs are very similar for both examples, and I will show the command line outputs for the code using OpenCL.
When I try x86_64-w64-mingw32-g++ code.cpp -o code.out -lOpenCL the output is
code.cpp:2:10: fatal error: CL/opencl.hpp: No such file or directory
2 | #include <CL/opencl.hpp>
| ^~~~~~~~~~~~~~~
compilation terminated.
I thought this meant that I needed to be more specific when linking and including, so I gave the explicit path where the headers are located (found them via dpkg -S opencl.hpp or dpkg -S gsl*.h), and the .so file for OpenCL was found via dpkg -S *OpenCL.so, while the one for GSL was found using dpkg -S *gsl.so. When I try x86_64-w64-mingw32-g++ code.cpp -o code.out -I/usr/include/ -L/usr/lib/x86_64-linux-gnu/libOpenCL.so the output is
In file included from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/cwchar:44,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/bits/postypes.h:40,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/iosfwd:40,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/ios:38,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/ostream:38,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/iostream:39,
from code.cpp:1:
/usr/include/wchar.h:27:10: fatal error: bits/libc-header-start.h: No such file or directory
27 | #include <bits/libc-header-start.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Therefore it seems that MinGW needs additional instructions to properly find, include and/or link the libraries. I don't know how to solve this problem. Those are my attempts based on some answers I've found, and the documentation provided by MinGW says nothing about this. The exact same problem occurs no matter if I use x86_64-w64-mingw32-g++ or i686-w64-mingw32-g++, or their gcc counterparts.
When cross-compiling make sure you are only linking things targeting the same platform together. In other words, your dependencies (and their dependencies) must be for the same target platform. You can't link with those libraries for your build platform.
So if you have a Windows 64-bit application that depends on OpenCL, you will need to link it against a Windows 64-bit build of OpenCL.
The OpenCL the sources can be found here:
https://github.com/KhronosGroup/OpenCL-Headers
https://github.com/KhronosGroup/OpenCL-ICD-Loader
so you would need to build those first.

Determine the number of cores at compile time in C/C++

Is there a way to determine how many physical cores a target machine has at compile time in C/C++ in Linux under GCC?
I am aware of other methods like td::thread::hardware_concurrency() in C++11 or sysconf(_SC_NPROCESSORS_ONLN) but I am curious to know if there is actually a way to obtain this information at compile time.
You can query information during the build processes and pass it into the program as a pre-processor definition.
Example
g++ main.cpp -D PROC_COUNT=$(grep -c ^processor /proc/cpuinfo)
where main.cpp is
#include <iostream>
int main() {
std::cout << PROC_COUNT << std::endl;
return 0;
}
Edit
As pointed out in the comments. If the target machine differs from the build machine then you'll need to replace the method grep -c ^processor /proc/cpuinfo with something that queries the number of processors on the target machine. The details would depend on what form of access you have to the target machine during build.

OpenCL not finding platforms?

I am trying to utilize the C++ API for OpenCL. I have installed my NVIDIA drivers and I have tested that I can run the simple vector addition program provided here. I can compile this program with following gcc call and the program runs without problem.
gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include
However, I would very much prefer to use the C++ API as opposed the very verbose host files needed for C.
I downloaded the C++ bindings from Khronos from here and placed the cl.hpp file in the same location as my other cl.h file. The code uses some C++11 so I can compile the code with:
g++ main.cpp -o vectorAddition_cpp -std=c++11 -l OpenCL -I/usr/local/cuda-6.5/include
but when I try to run the program I get the error:
clGetPlatformIDs(-1001)
I also tried the example provided here as well which gave a more helpful error message.
No platforms found. Check OpenCL installation!
The particular code which provides this error is this:
std::vector<cl::Platform> all_platforms;
cl::Platform::get(&all_platforms);
if(all_platforms.size()==0){
std::cout<<" No platforms found. Check OpenCL installation!\n";
exit(1);
}
This seems so strange given that the C implementation runs without problem. Any insights would be sincerely appreciated.
EDIT
The C implementation actually isn't running correctly. Each addition is printed to equal zero. Checking the ret_num_platforms also returns 0. For some reason my setup is failing to find my GPU. What could I have missed? My install consists of the nvidia-340 driver and cuda-6.5 installed via apt-get and the .run file respectively.
My sincerest thanks to #pasternak for helping me troubleshoot this problem. To solve it however I ended up needing to avoid essentially all ubuntu apt-get calls for install and just use the cuda run file for the full installation. Here is what fixed the problem.
Purge existing nvidia and cuda implementations (sudo apt-get purge cuda* nvidia-*)
Download cuda-6.5 toolkit from the CUDA toolkit archive
Reboot computer
Switch to ttyl (Ctrl-Alt-F1)
Stop the X server (sudo stop lightdm)
Run the cuda run file (sh cuda_6.5.14_linux_64.run)
Select 'yes' and accept all defaults
Required reboot
Switch to ttyl, stop X server and run the cuda run file again and select 'yes' and default for everything (including the driver again)
Update PATH to include /usr/local/cuda-6.5/bin and LD_LIBRARY_PATH
to include /usr/local/cuda-6.5/lib64
Reboot again
Compile main.c program (gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include)
Verify works with ./vectorAddition
C++ API
Download cl.hpp file from Khronos here noting that it is version 1.1
Place cl.hpp file in /usr/local/cuda-6.5/include/CL with other cl headers.
Compile main.cpp (g++ main.cpp -o vectorAddition_cpp -std=c++11 -l OpenCL -I/usr/local/cuda-6.5/include)
Verify it works (./vectorAddition_cpp)
All output from both programs show the correct output for addition between vectors.
I personally find it interesting the Ubuntu's nvidia drivers don't seem to play well with the cuda toolkits. Possibly just for the older versions but still very unexpected.
It is hard to say without running the specific code on your machine but looking at the difference between the example C code you said was working and the cl.hpp might give us a clue. In particular, notice that the C example uses the following line to simply read a single platform ID:
cl_platform_id platform_id = NULL;
cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
Notice that is passes 1 as its first argument. This assumes that at least one OpenCL platform exists and requests that the first one found is placed in platform_id. Additionally, note that even though the return code is assigned to "ret" is it not used to actually check if an error is returned.
Now if we look at the implementation of the static method used to queue the set of platforms in cl.hpp, i.e. cl::Platform::get:
static cl_int get(
VECTOR_CLASS<Platform>* platforms)
{
cl_uint n = 0;
cl_int err = ::clGetPlatformIDs(0, NULL, &n);
if (err != CL_SUCCESS) {
return detail::errHandler(err, __GET_PLATFORM_IDS_ERR);
}
cl_platform_id* ids = (cl_platform_id*) alloca(
n * sizeof(cl_platform_id));
err = ::clGetPlatformIDs(n, ids, NULL);
if (err != CL_SUCCESS) {
return detail::errHandler(err, __GET_PLATFORM_IDS_ERR);
}
platforms->assign(&ids[0], &ids[n]);
return CL_SUCCESS;
}
we see that it first calls
::clGetPlatformIDs(0, NULL, &n);
notice that the first parameter is 0, which tells the OpenCL runtime to return the number of platforms in "n". If this is successful it then goes on to request the actual "n" platform IDs.
So the difference here is that the C version is not checking that there is at least one platform and simply assuming that one exists, while the cl.hpp variant is and as such maybe it is this call that is failing.
The most likely reason for all this is that the ICD is not correctly installed. You can see this thread for an example of how to fix this issue:
ERROR: clGetPlatformIDs -1001 when running OpenCL code (Linux)
I hope this helps.

armadillo requested size is too large

I am using armadillo4.300.0. I am operating on a dense matrix of size 2840260x103. I am loading this matrix from a .csv file of size approximately 3.7GB. I have enabled "ARMA_64BIT_WORD" in my application as well as config.hpp under armadillo_bits directory.
#if !defined(ARMA_64BIT_WORD)
#define ARMA_64BIT_WORD
#endif
I am compiling with gcc49 and running on ubuntu 12.04. When I run I am getting the following error. Interestingly, the application occasionally runs too. For eg., if I keep trying for some 10 times, it runs sometime.
error: Mat::init(): requested size is too large
terminate called after throwing an instance of 'std::logic_error'
what(): Mat::init(): requested size is too large
Do I need to take care of something else?
Ramki.
This problem is solved with the Intel MKL library, when we compile with the -DMKL_ILP64 -m64. Typically we focus only on link flags. But it is important to note that these flags must be enabled during compile phase on the gcc command as well. I am not sure how to enable this on openmpi library. Also the lib armadillo.so must link with mkl_ilp64 instead of mkl_lp64. Follow the instruction below.
Building and installing armadillo :
export CXX=icpc
export CC=icpc
export PATH=$PATH:/home/ramki/intel/bin:
Edit $armadillo_root/cmake_aux/Modules/ARMA_FindMKL.cmake, include the PATHS correctly.
Edit $armadillo_root/cmake_aux/Modules/ARMA_FindMKL.cmake, change mkl_lp64 to mkl_ilp64
Edit $armadillo_root/CMakeLists.txt and (1) Change CMAKE_SHARED_LINKER_FLAGS to include the link line by intel link advisor and (2) Change CMAKE_CXX_FLAGS as given by intel link advisor
Run ./configure and make sure MKL library is used for blas and lapack, icpc to be the compiler and the rest to be alright.
Run make .
Verify the linked libraries by running ldd libarmadillo.so. Mainly verify whether it is linked with mkl_ilp64 library and mkl blas and lapack libraries.
Now run make install DESTDIR=local path.
This should work.

Compiling on Vortex86: "Illegal instruction"

I'm using an embedded PC which has a Vortex86-SG CPU, Ubuntu 10.04 w/ kernel 2.6.34.10-vortex86-sg. Unfortunately we can't compile a new kernel, cause we don't have any source code, not even drivers or patches.
I have to run a small project written in C++ with OpenFrameworks. The framework compiles right each script in of_v0071_linux_release/scripts/linux/ubuntu/install_*.sh.
I noticed that in order to compile against Vortex86/Ubuntu 10.04, the following options must be added in every config.make file:
USER_CFLAGS = -march=i486
USER_LDFLAGS = -lGLEW
In effects, it compiles without errors, but the generated binary doesn't start at all:
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# ./emptyExample
Illegal instruction
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# echo $?
132
Strace last lines:
munmap(0xb77c3000, 4096) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE], NULL, 8) = 0
--- SIGILL (Illegal instruction) # 0 (0) ---
+++ killed by SIGILL +++
Illegal instruction
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin#
Any idea to solve this problem?
I know I am a bit late on this but I recently had my own issues trying to compile the kernel for the vortex86dx. I finally was able to build the kernel as well. Use these steps at your own risk as I am not a Linux guru and some settings you may have to change to your own preference/hardware:
Download and use a Linux distribution that runs on a similar kernel version that you plan on compiling. Since I will be compiling Linux 2.6.34.14, I downloaded and installed Debian 6 on virtual box with adequate ram and processor allocations. You could potentially compile on the Vortex86DX itself, but that would likely take forever.
Made sure I hade decencies: #apt-get install ncurses-dev kernel-package
Download kernel from kernel.org (I grabbed Linux-2.6.34.14.tar.xz). Extract files from package.
Grab Config file from dmp ftp site: ftp://vxmx:gc301#ftp.dmp.com.tw/Linux/Source/config-2.6.34-vortex86-sg-r1.zip. Please note vxmx user name. Copy the config file to freshly extracted Linux source folder.
Grab Patch and at ftp://vxdx:gc301#ftp.dmp.com.tw/Driver/Linux/config%26patch/patch-2.6.34-hda.zip. Please note vxdx user name. Copy to kernel source folder.
Patch Kernel: #patch -p1 < patchfilename
configure kernel with #make menuconfig
Load Alternate Configuration File
Enable generic x86 support
Enable Math Emulation
I disabled generic IDE support because I will using legacy mode(selectable in bios)
Under Device Drivers -> Ethernet (10 or 100Mbit) -> Make sure RDC R6040 Fast Ethernet Adapter Support is selected
USB support -> Select Support for Host-side USB, EHCI HCD (USB 2.0) support, OHCI HCD support
safe config as .config
check serial ports: edit .config manually make sure CONFIG_SERIAL_8250_NR_UARTS = 4 (or more if you have additional), CONFIG_SERIAL_8250_RUNTIME_UARTS = 4(or more if you have additional). If you are to use more that 4 serial ports make use config_serail_8250_MANY_PORTs is set.
compile kernel headers and source: #make-kpkg --initrd kernel_image kernel_source kernel_headers modules_image