Multiprocess headless OpenGL processing on EC2 GPU instances - opengl

My core problem is that I need to run multiple OpenGL executables concurrently on an EC2 GPU instance; I'm observing non-deterministic segfaults when trying to do this. The same program runs fine (with concurrency) on my Macbook Pro.
The application works as follows:
python script launches multiple worker executables (i.e. concurrent subprocess.call() calls from a multiprocessing.pool.ThreadPool threadpool). The python script provides a JSON file as worker input and the worker writes JSON to a file.
Each worker is a C++ program that does some headless image rendering in OpenGL using fragment shaders and a render-to-texture pipeline. I've tried using both Glut and GLX rendering contexts.
I'm confident that neither the python script nor the C++ workers have major bugs because the whole application runs fine when:
running a single worker on the EC2 GPU instance
running one or more workers on my Macbook (OSX 10.7.4)
The specific error I observe is that one or more of the workers will segfault inside an OpenGL call (e.g. glTexSubImage2D, glDrawElements, etc) after a few minutes of execution. Sometimes I've seen failures in the GLX context setup stage (e.g. glXCreateNewContext or glXChooseFBConfig). If I start more workers (i.e. higher concurrency), I see errors sooner. If I start fewer workers, it can take 15-30 minutes before a crash.
I believe that I'm having some sort of OpenGL context or driver issue. I've tried setting up my context using both GLUT and GLX and neither seems to help.
My procedure for creating the EC2 instance is very close to the instructions given here: http://hpc.nomad-labs.com/archives/139 . The specific packages I install are:
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libegl1-mesa libglu1-mesa-dev mesa-utils mesa-utils-extra llvm-dev imagemagick libboost-all-dev python2.6 python-imaging python-matplotlib python-numpy python-scipy firefox clang python-setuptools python-scipy libatlas-dev ccache libpng12-dev libmagick++-dev glew-utils xvfb x11-utils qiv xinit
On both OSX and Linux, the C++ worker links: GL GLU glut pthread m X11.
I generated my xorg.conf using:
$ nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
Before running my program, I run:
$ startx &; export DISPLAY=:0
I've tried some non-nvidia drivers, but they don't seem to help either.
I've also consulted the FAQ on parallel processing with OpenGL: http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html
The guide suggests that multithreaded GLX on Ubuntu doesn't work (and I've confirmed that personally.. :) but it seems that multiprocess GLX should be feasible and stable.
Does anybody have any ideas as to
why the OpenGL/GLX calls might be failing? Am I indeed seeing a driver issue? It seems like Mac GPU drivers have some sort of 'magic feature' aiding concurrent OpenGL usage. Are there any Ubuntu/Linux drivers with that same feature?
are there best practices for running multiple OpenGL executables concurrently on an EC2 GPU instance (or any headless Ubuntu/Linux machine for that matter)? Can anybody point me towards Open Source software that does this?

.. 1 year later..
I've found that a lot of times the gpu won't be enable on headless machines.
try to vnc in first and see if that helps.

Related

Caffe2 does not detect GPU

I'd like to use caffe2 with GPU support. I succesfully installed caffe2 (Ubuntu 16.04, python2.7) with conda environment (command : conda install pytorch-nightly -c pytorch)
It is successfully installed (I checked it with the command: python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure" and it says "Success")
However, when I check caffe2 GPU build (command : python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())), it returns 0.
I already have cuda, cuDNN, nccl and I don't understand why caffe2 does not detect available GPU..
I guess you are going to implement Detectron (otherwise nobody wanna use this dumb Caffe2 these days)
I'm pretty sure that it is caused by the mismatch of the CUDA version and CuDNN. I got stacked by this problem for a while (you don't know which version is correct for the Caffe2), finally, I got two solutions at almost the same time. Both of them work for me.
At first, just update the Nvidia driver to the latest version. My version is updated to 410.78, you can simply update the driver by selecting the certain driver in the System settings-> Software and update-> Additional Drivers.
Don't forget to restart your PC.
Then, there are two ways to implement it.
Build the environment with a Docker.
It is simple and fast. You just install Docker (as well as nvidia-docker for GPU usage) and use this pre-implemented environment with this command:
sudo docker pull ylashin/detectron
sudo nvidia-docker run --rm -it ylashin/detectron
Then you can test your Caffe2 with that NumCudaDeivce command.
It works for me!
See the whole introduction here, thanks to the efforts:
Build a Detectron environment with Docker
If you have some problem with the Docker installation (especially for the nvidia-docker), you can just skip it to the next one.
Use Detectron2
The newest Detectron is published recently (actually three days ago!). We can now work on that one now which is supported by Pytorch.
Here is the Detectron2:
Detectron2 link
Just go to the lastest one, you can even distribute everything in the Google Colab, it's much easier.

PyOpenGL headless rendering

I'm using PyOpenGL+glfw for rendering.
When trying to do the same on a headless machine (e.g a server) glfw.init() fails:
glfw.GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
Fatal Python error: Couldn't create autoTLSkey mapping
Aborted (core dumped)
I found some information about headless rendering, but only when using OpenGL directly and not through python
EDIT: I understand that maybe glfw isn't able to support it. A solution without glfw, but with something else might also work...
The solution is to use xvfb for a virtual framebuffer.
The problem is that the glfw that is installed in Ubuntu using apt-get install libglfw3 libglfw3-dev is old and unfit, so we need to compile it from source.
Here is a full working docker example:
docker run --name headless_test -ti ubuntu /bin/bash
# Inside the ubuntu shell:
apt update && apt install -y python3 python3-pip git python-opengl xvfb xorg-dev cmake
pip3 install pyopengl glfw
mkdir /projects
git clone https://github.com/glfw/glfw.git /projects/glfw
cd /projects/glfw
cmake -DBUILD_SHARED_LIBS=ON .
make
export PYGLFW_LIBRARY=/projects/glfw/src/libglfw.so
xvfb-run python3 some_script_using_pyopengl_and_glfw.py
And here is the base of PyOpenGL code to use it:
from OpenGL.GL import *
from OpenGL.GLU import *
import glfw
glfw.init()
# Set window hint NOT visible
glfw.window_hint(glfw.VISIBLE, False)
# Create a windowed mode window and its OpenGL context
window = glfw.create_window(DISPLAY_WIDTH, DISPLAY_HEIGHT, "hidden window", None, None)
# Make the window's context current
glfw.make_context_current(window)
GLFW does not support headless OpenGL at all.
https://www.glfw.org/docs/latest/context.html#context_offscreen
GLFW doesn't support creating contexts without an associated window.
This isn’t an unusual limitation, the problem is that the normal way to create an OpenGL context is by using the X server. There are now alternatives using EGL, which is relatively new. You will need to use an EGL wrapper for Python.
See: OpenGL without X.org in linux
If you want to use OpenGL without a display environment on linux(eg. x server) the best approach is to use EGL. what EGL does is it separate OpenGL context management from windowing system, so it let you create context without a displaying window.
If you are using Nvidia graphics card, you have to install proprietary driver in order to use it. along with the driver there is a library called GLVND this is a library which include EGL your app need to linked against.
Please refer to following links to learn how to use EGL:
Pro Tip: Linking OpenGL for Server-Side Rendering
EGL Eye: OpenGL Visualization without an X Serve
PS. If your EGL api cannot find any devices, you probably linked wrong EGL library, the EGL library must match the driver.

QXcbIntegration: Cannot create platform OpenGL context, neither GLX nor EGL are enabled

I have a unix binary file built with QT and OpenGL which I'm trying to execute on linux-64. It is a simple visual program that shows 2d and 3d graphics.
I have installed all necessary dependencies such as QT and openGL libraries.
However, I have stuck with the following error trying to execute the binary
QXcbIntegration: Cannot create platform OpenGL context, neither GLX
nor EGL are enabled
However, the binary eventually runs but with some missing features such as 3D graphics.
my setup includes: virtual linux-64 using virtualBox, Vagrant, x-11 forwarding, and a Mac machine.
Eventually I realised that OpenGL 3.3 wouldn't work easily on virtual machines .. yet. I had to boot from ubuntu usb and work from there by installing latest mesa 3d package.
This shows a similar issue and the developer in the comment said our 3D support is not very clean in Linux guests, hence the warnings. You can give a try to VMware.
After some time trying to get some opengl working on a particular locked down linux box, I ended up going back to Qt Creator 2.5.2 .
http://download.qt.io/archive/qtcreator/2.5/
http://download.qt.io/archive/qtcreator/2.5/qt-creator-linux-x86_64-opensource-2.5.2.bin
After getting it on the linux box...
chmod u+x *.bin
./qt-creator-linux-x86_64-opensource-2.5.2.bin
And after a short installer, Qt Creator is working!
Basically QtQuick is a requirement in any Qt Creator built after 2.5 (aka Qt 5.x) and QtQuick NEEDS opengl libraries and support.
Hope that helps.
I see this problem when executing Qt App, I was executing in dash prompt. (Ubuntu 16.04 has dash by default). I changed to bash prompt and rebuilt my QT App. This error is gone.
To configure bash I used below command.
sudo dpkg-reconfigure dash

is there a vm that i can do opengl 3+ with? virtualbox and vmware don't

I am trying to write some openFrameworks (C++) code in a VM. My host is Windows 8 and I've tried both Arch Linux and Ubuntu guests. My host computer runs the graphics code just fine with an NVidia Optimus setup and 8GB of RAM.
I do my main development in Visual Studio, however I do prefer to create Android and test packages from Linux. For this reason I just want to fire up a VM and take care of business. The problem is that some of my graphics apps need OpenGL 3+
Has anybody else had the same problem and solved it?
Give up on VirtualBox. VB's OpenGL guest support craps out at 2.1, even then only after you install VB Guest Additions from the command line with switches and then add some Registry keys to actually enable the OpenGL guest drivers.
If you're willing to shell out money, VMware Fusion for Mac and VMware Workstation for Windows both support DirectX 10 and OpenGL 3.3.
A bit late to the party here, but hopefully helpful for someone encountering similar issues these days:
The mesa software renderer now supports OpenGL 4.5, so for me, the solution is to disable 3D acceleration in the settings of the VirtualBox machine!
The mesa software OpenGL support then takes over and provides its capabilities. It's for sure not that fast, but for my purpose (testing whether an OpenGL application starts and displays something under linux) it's sufficient!
Tested both on Fedora 34 and Ubuntu 20.04.
Try VirtualBox and prepend MESA_GL_VERSION_OVERRIDE=3.0 MESA_GLSL_VERSION_OVERRIDE=130 to your linux command line. Some of the opengl3 functions may work. Though not all of them will. I used that to bring up Civ5, the animation did not show up, nor did the on-screen fonts.
If you want to see the source code:
VirtualBox uses chromium 1.9 that is opengl 2.1. The info can be verified by the glxinfo command. Use the following commands to track the VirtualBox opengl lib file:
$ ldd /usr/bin/glxinfo
$ apt-file search /usr/lib/x86_64-linux-gnu/libGL.so.1.2
$ LIBGL_DEBUG=verbose glxinfo
Then follow links:
$ ls -l x86_64-linux-gnu/dri/
lrwxrwxrwx Apr 14 2014 vboxvideo_dri.so -> ../../VBoxOGL.so
$ apt-file search /usr/lib/VBoxOGL.so
virtualbox-dbg: /usr/lib/debug/usr/lib/VBoxOGL.so
virtualbox-guest-x11: /usr/lib/VBoxOGL.so
$ dpkg -l virtualbox*
ii virtualbox-guest-x11 4.1.18-dfsg-2+deb7 amd64
$ apt-file list virtualbox-guest-x11
...
The source code tarball was virtualbox-4.3.10-dfsg.orig.tar.gz from trusty repo. The version string can be grep'ed by $ grep -r CR_OPENGL_VERSION_STRING * and $ grep -r CR_VERSION_STRING * in the source code directory.
Update 6/1/2017: Someone told me the kvm works for civ5. A quick search turned up this thread titled "GPU Passthrough with KVM: Have Your Cake and Eat it Too". The thread is too long to read, though hope it could be useful to somebody.

Getting OpenCL to work on Linux laptop with Optimus technology

I have an installation of Kubuntu 13.10 on my laptop which has an Nvidia GT555m with optimus technology. I am having some trouble getting my C++ code with OpenCL to compile.
The error I keep getting is Cannot find -lOpenCL. Doing a quick search with the GNU find utility gives me the following:
/usr/lib32/nvidia-319/libOpenCL.so.1
/usr/lib32/nvidia-319/libOpenCL.so
/usr/lib32/nvidia-319/libOpenCL.so.1.0
/usr/lib32/nvidia-319/libOpenCL.so.1.0.0
/usr/lib/x86_64-linux-gnu/libOpenCL.so
/usr/lib/nvidia-319/libOpenCL.so.1
/usr/lib/nvidia-319/libOpenCL.so
/usr/lib/nvidia-319/libOpenCL.so.1.0
/usr/lib/nvidia-319/libOpenCL.so.1.0.0
I have the following OpenCL development packages installed:
opencl-headers
nvidia-opencl-dev
I also tried the utility clinfo to see if I get any information, but I get the following error:
clinfo: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory
Does anyone have any experience setting up a Linux development environment with OpenCL on their optimus laptops?
I was under the impression that I do not need to do anything fancy to get this working.
EDIT: Ok it seems the reason I was not managing to compile was because I was mixing up headers and libraries. Using the following compiles my code well:
g++ -std=c++11 -I /usr/local/cuda-5.5/include vadd.cpp -L /usr/lib/nvidia-331 -lOpenCL
I am getting another error during runtime now (but at least I managed to compile!). The error is as follows:
ERROR: clGetPlatformIDs
-1001
From doing some research this means I probably do not have the ICD portion of nvidias toolkit installed? What I cannot understand is - where to find it!
You should install the Nvidia Cuda SDK. It contains OpenCL development libraries and includes.
You don't need development packages or libraries, (OpenCL is already there, and working, just giving you a runtime error, ICD is present). What you need is a platform ready to execute the OpenCL code, so a GPU + a driver.
You need to install the propietary driver of nVIDIA: Either by using the Ubuntu tools, or by installing the package nvidia-current.
Maybe you have to install bublebee. A library to use Cuda on Nvidia cards with optimus technology.
I do not use Kubuntu yet I got it working under Mageia release 6 Linux so I guess it should be pretty similar. In my case there were Intel and Nvidia (GeForce GTX 980M) graphic cards together in my laptop. My intention was to get running only OpenCL compiled code without any set up of Xorg graphical server.
So, as advised above by DarkZeros I did it by using only a proprietary nvidia driver (in my case downloaded from Nvidia page). Then under root user:
./NVIDIA-Linux-x86_64-375.39.run --no-opengl-files
It asked me if I wanted to modify my Xorg configuration - I said "NO". This delivered nvidia kernel modules. Next, I modified /etc/modules to let the Linux know that it should load them at start up of system (this might be different on Kubuntu)
[root#localhost ~]# cat /etc/modules
nvidia
nvidia-uvm
nvidia-drm
nvidia-modeset
and that was really it. Reboot your system and loading of modules should automatically also create correct nvidia device files under a /dev directory.
[root#localhost ~]# ls /dev/nvidia*
/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm /dev/nvidia-uvm-tools
I've got an inspiration from [ftp://download.nvidia.com/XFree86/Linux-x86/295.59/README/optimus.html][1]