Switched OS version, vulkan program stopped working

Switched OS version, vulkan program stopped working - c++

A program I was developing stopped working when I upgraded from Ubuntu 20.04 to 21.04
What I suspect is happening is, I am using conan to install dependencies, including what would be inside the sdk (the loader, the headers, the validation layers).
vkcube and vulkaninfo run, so vulkan itself is fine.
The conan package versions (whose versioning matches that of the official github repos for each project) are:
"vulkan-headers/1.2.184",
"vulkan-loader/1.2.182",
"vulkan-validationlayers/1.2.182"
I get the following from vulkaninfo:
vulkaninfo | grep Instance
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Vulkan Instance Version: 1.2.182
Instance Extensions: count = 18
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
maxMultiviewInstanceIndex = 2147483647
maxMultiviewInstanceIndex = 2147483647
drawIndirectFirstInstance = true
vertexAttributeInstanceRateDivisor = true
vertexAttributeInstanceRateZeroDivisor = true
drawIndirectFirstInstance = true
vertexAttributeInstanceRateDivisor = true
vertexAttributeInstanceRateZeroDivisor = false
I have also tried downloading the latest sdk and running the setu-env.sh script to see if that fixes it but it doesn't seem to do anything.
The exact problem I get is, a segmentation fault when calling:
physical_device.getQueueFamilyProperties(); (I am using the .hpp header)
I am not entirely sure why things stopped working, I suspect I might have a mismatch between, for example my vulkan.hpp header and my vulkan library but I don't know how to check. And I am not sure it is actually the problem.
I get this as well when running vulkaninfo | grep GPU:
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
GPU id = 0 (AMD RADV RAVEN2 (ACO))
GPU id = 1 (llvmpipe (LLVM 12.0.0, 256 bits))
GPU id = 0 (AMD RADV RAVEN2 (ACO))
GPU id = 1 (llvmpipe (LLVM 12.0.0, 256 bits))
GPU id : 0 (AMD RADV RAVEN2 (ACO)):
GPU id : 1 (llvmpipe (LLVM 12.0.0, 256 bits)):
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
GPU0:
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
GPU1:

This reddit comment seems to have been the solution:
https://www.reddit.com/r/Fedora/comments/krz20h/vulkan_swrast_lavapipe_is_getting_used_instead_of/
specifically:
After reading through Archwiki, I've discovered that lavapipe is indeed being used instead of radeon.
exporting VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json fixes the issue.

Related

Build and test V8 from AOSP for the older version on Ubuntu 18.04.5 LTS

I am trying to run the unit tests for the V8 present in the AOSP's lollipop release: external/chromium_org/v8
by following the documentation from https://v8.dev/docs/build. But the build itself is constantly failing.
Steps followed:
Export the depot_tools path
gclient sync
install dependencies using ./build/install-build-deps.sh (This script was not present by default in the source code, so had to copy manually from the higher version)
gm x64.release
I have installed all the dependencies and followed all the steps from the documentation mentioned above but when I do:
gm x64.release
the build fails with the following output:
# echo > out/x64.release/args.gn << EOF
is_component_build = false
is_debug = false
target_cpu = "x64"
use_goma = false
v8_enable_backtrace = true
v8_enable_disassembler = true
v8_enable_object_print = true
v8_enable_verify_heap = true
EOF
# gn gen out/x64.release
ERROR at //build/config/BUILDCONFIG.gn:71:7: Undefined identifier
if (os == "chromeos") {
^-
I have tried building the it with gn as well by following the manual workflow but I am ending up with the same errors. I also tried setting the os variable to linux in the gn args list but there as well I get the unknown identifier error.
I see that the v8 used in the AOSP project differs a lot in terms of files from the main source code with the same version. The helper script tools/dev/gm.py is also not present by default so I am using one from the higher version. It would be great if anyone could suggest if there's any different set of steps I should be following or any other resources I can refer to in order to build the V8 present in the AOSP project
Version: V8 3.29.88.17
OS: Ubuntu 18.04.5 LTS
Architecture: x86_64

3.29 is seriously old; I'm not surprised that it won't build with current tools. Rule of thumb: when building old software, use the tools that were used to build it back then.
In the case at hand: try make x64.release.check -jN with N being the number of CPU cores you have.
I see that the v8 used in the AOSP project differs a lot in terms of files from the main source code with the same version.
The "lollipop-release" branch contains V8 3.27.34.15, whereas "lollipop-mr1-release" contains V8 3.29.88.17 which you quoted. Does that explain the differences?

time() Function Segfaults with g++ -static Compiled on Centos 6 and Run on Debian 10

folks! I am writing software that must install and run on as many flavors of Linux as possible, but I must be able to compile it on a single Jenkins slave.
Right now this mostly works. But I have run into a case whereby a special combination of things will produce a segfault on Debian 10, but not on any of my numerous other supported flavors of Linux. I have been able to reproduce this in 3 different apps (some of which have been working for years), including the simplified prototype I have listed below.
// g++ -g -o ttt -static tt.cpp
// The above compile of this code on Centos 6 will produce a segfault
// when run on Debian 10, but not on any other tested flavor of Linux.
// Dozens of them. The version of g++ is 4.7.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int _argc, char* _argv[])
{
srand(time(0));
printf("success\n");
return 0;
}
What I have found by running each of my 3 apps on Debian 10 with gdb is that it will segfault under these conditions.
They must be compiled with the -static flag. If I don't use -static, it works fine, regardless on which flavor it is compiled.
They must call the time() function. It doesn't really matter how I call it, but it must be called. I tried the usual suspects like passing a NULL pointer and passing a real pointer. It always segfaults when the app is compiled statically.
They must be compiled on Centos 6 and run on Debian 10. If I compile statically on Debian 10, the prototype works fine.
So here are my constraints that I am working under.
I have to compile on one Linux slave, because I am distributing just one binary. Keeping track of multiple binaries and which one goes on what Linux flavor is not really an option.
I have to compile statically or it creates incompatibilities on other supported flavors of Linux.
I have to use an older Linux flavor, also for the sake of compatibility. It doesn't have to be Centos, but it has to produce binaries that will run on Centos, as well as numerous other flavors.
I have to use g++ 4.7 also for code compatibility.
In your answers, I am hoping for some kind of code trick. Maybe a good, reliable replacement for the time() function. Or a suggestion for another flavor of Linux that is compatible with Debian 10.
Bonus points would go to whoever is able to explain the black magic of why a basic, ubiquitous function like time() would be completely compatible on Debian 9, but segfaults on Debian 10 ONLY when it is compiled statically on Centos 6...
EDIT:
strace on the Centos 6 server:
execve("./ttt", ["./ttt"], [/* 37 vars */]) = 0
uname({sys="Linux", node="testcent6", ...}) = 0
brk(0) = 0x238c000
brk(0x238d180) = 0x238d180
arch_prctl(ARCH_SET_FS, 0x238c860) = 0
brk(0x23ae180) = 0x23ae180
brk(0x23af000) = 0x23af000
gettimeofday({1585687633, 358976}, NULL) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f682c82f000
write(1, "success\n", 8success
) = 8
exit_group(0) = ?
+++ exited with 0 +++
strace on the Debian 10 server:
execve("./ttt", ["./ttt"], 0x7fff0430dfd0 /* 18 vars */) = 0
uname({sysname="Linux", nodename="deletemedebian10", ...}) = 0
brk(NULL) = 0x1f6f000
brk(0x1f70180) = 0x1f70180
arch_prctl(ARCH_SET_FS, 0x1f6f860) = 0
brk(0x1f91180) = 0x1f91180
brk(0x1f92000) = 0x1f92000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffff600400} ---
+++ killed by SIGSEGV +++
Segmentation fault

The executable is trying to use the vsyscall interface to implement the syscall used for the time function.
This interface has been deprecated in favor of the vdso for a long time. It was dropped completely a while back, but can still be emulated.
Debian 10 seems to have disabled the vsyscall emulation, which is done for security reasons because it may make attacks easier. You should be able to re-enable the emulation by passing the kernel command line option vsyscall=emulate at startup, of course with the mentioned security repercussions, if that is an option.
The glibc version on CentOS 6 seems to be 2.12, which is too old to make use of the vdso. So to compile a compatible binary for newer kernel configuations, you need at least glibc 2.14 instead. I don't know whether this can be easily installed on CentOS or whether it will work correctly with the kernel shipped with it.
You should also consider whether you really need a fully static binary. You could link everything statically except libc.

OpenGL program with tensorflow C++ gives failed call to cuInit : CUDA_ERROR_OUT_OF_MEMORY

I have trained a model with no issues using tensorflow on python. I am now trying to integrate inference for this model into a pre-existing OpenGL enabled software. However, I get a CUDA_ERROR_OUT_OF_MEMORY during cuInit (that is, even earlier than loading the model, just at session creation). It does seem, that OpenGL has taken some MiBs of memory (around 300 MB), as shown by gpustat or nvidia-smi.
Is it possible there is a clash as both TF and OpenGL are trying to access/allocate the GPU memory? Has anyone encountered this problem before? Most references I found googling around are at model loading time, not at session/CUDA initialization. Is this completely unrelated to OpenGL and I am just barking up the wrong tree? A simple TF C++ inference example works. Any help is appreciated.
Here is the tensorflow logging output, for completeness:
2018-01-08 12:11:38.321136: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-08 12:11:38.379100: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY
2018-01-08 12:11:38.379388: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: rosenblatt
2018-01-08 12:11:38.379413: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: rosenblatt
2018-01-08 12:11:38.379508: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.98.0
2018-01-08 12:11:38.380425: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.98 Thu Oct 26 15:16:01 PDT 2017 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)"""
2018-01-08 12:11:38.380481: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.98.0
2018-01-08 12:11:38.380497: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.98.0
EDIT: Removing all references to OpenGL resulted in the same problem, so it has nothing to do with a clash between the libraries.

Ok, the problem was the use of the sanitizer in the debug version of the binary. The release version, or the debug version with no sanitizer work as expected.

OpenCV 3.0 install WITH_OPENCL_SVM=ON on Machine with OpenCL 1.1

So I installed OpenCV WITH_OPENCL_SVM=ON thinking that I was going to eventually get a GPU with OpenCL 2.0 on it (currently only 1.1.) However now when I try to run any programs with OpenCV I get an
Error on line 2629 (ocl.cpp): CL_DEVICE_SVM_CAPABILITIES via clGetDeviceInfo failed: -30
I believe when I create a cv::UMat. The line of code in ocl.cpp is dependent on HAVE_OPENCL_SVM being defined, I assume it then does a check for the SVM capability and fails that check because I don't have 2.0. I tried:
#undef HAVE_OPENCL_SVM
in my code and modifying cvconfig.h (honestly don't know how/when that file is referenced) so that it is not defined as well as un-defining it there again... and the error persists.
Thanks!

Compiling on Vortex86: "Illegal instruction"

I'm using an embedded PC which has a Vortex86-SG CPU, Ubuntu 10.04 w/ kernel 2.6.34.10-vortex86-sg. Unfortunately we can't compile a new kernel, cause we don't have any source code, not even drivers or patches.
I have to run a small project written in C++ with OpenFrameworks. The framework compiles right each script in of_v0071_linux_release/scripts/linux/ubuntu/install_*.sh.
I noticed that in order to compile against Vortex86/Ubuntu 10.04, the following options must be added in every config.make file:
USER_CFLAGS = -march=i486
USER_LDFLAGS = -lGLEW
In effects, it compiles without errors, but the generated binary doesn't start at all:
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# ./emptyExample
Illegal instruction
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin# echo $?
132
Strace last lines:
munmap(0xb77c3000, 4096) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE], NULL, 8) = 0
--- SIGILL (Illegal instruction) # 0 (0) ---
+++ killed by SIGILL +++
Illegal instruction
root#jb:~/openframeworks/of_v0071_linux_release/apps/myApps/emptyExample/bin#
Any idea to solve this problem?

I know I am a bit late on this but I recently had my own issues trying to compile the kernel for the vortex86dx. I finally was able to build the kernel as well. Use these steps at your own risk as I am not a Linux guru and some settings you may have to change to your own preference/hardware:
Download and use a Linux distribution that runs on a similar kernel version that you plan on compiling. Since I will be compiling Linux 2.6.34.14, I downloaded and installed Debian 6 on virtual box with adequate ram and processor allocations. You could potentially compile on the Vortex86DX itself, but that would likely take forever.
Made sure I hade decencies: #apt-get install ncurses-dev kernel-package
Download kernel from kernel.org (I grabbed Linux-2.6.34.14.tar.xz). Extract files from package.
Grab Config file from dmp ftp site: ftp://vxmx:gc301#ftp.dmp.com.tw/Linux/Source/config-2.6.34-vortex86-sg-r1.zip. Please note vxmx user name. Copy the config file to freshly extracted Linux source folder.
Grab Patch and at ftp://vxdx:gc301#ftp.dmp.com.tw/Driver/Linux/config%26patch/patch-2.6.34-hda.zip. Please note vxdx user name. Copy to kernel source folder.
Patch Kernel: #patch -p1 < patchfilename
configure kernel with #make menuconfig
Load Alternate Configuration File
Enable generic x86 support
Enable Math Emulation
I disabled generic IDE support because I will using legacy mode(selectable in bios)
Under Device Drivers -> Ethernet (10 or 100Mbit) -> Make sure RDC R6040 Fast Ethernet Adapter Support is selected
USB support -> Select Support for Host-side USB, EHCI HCD (USB 2.0) support, OHCI HCD support
safe config as .config
check serial ports: edit .config manually make sure CONFIG_SERIAL_8250_NR_UARTS = 4 (or more if you have additional), CONFIG_SERIAL_8250_RUNTIME_UARTS = 4(or more if you have additional). If you are to use more that 4 serial ports make use config_serail_8250_MANY_PORTs is set.
compile kernel headers and source: #make-kpkg --initrd kernel_image kernel_source kernel_headers modules_image

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js