Segfault generated by glEnableVertexArrayAttrib - opengl

I'm writing a simple OpenGL program using go-gl. While the program runs fine on most machines, it fails with a segfault when running under Windows on my laptop (it works on Linux though - this is what's odd about it). The culprit is my call to glEnableVertexArrayAttrib. I've attached the stack trace and relevant code below.
Partial stack trace:
Exception 0xc0000005 0x8 0x0 0x0
PC=0x0
signal arrived during external code execution
github.com/go-gl/gl/v3.3-core/gl._Cfunc_glowEnableVertexArrayAttrib(0x0, 0x1)
github.com/go-gl/gl/v3.3-core/gl/_obj/_cgo_gotypes.go:4141 +0x41
github.com/go-gl/gl/v3.3-core/gl.EnableVertexArrayAttrib(0x1)
C:/Users/mpron/go/src/github.com/go-gl/gl/v3.3-core/gl/package.go:5874 +0x3a
github.com/caseif/cubic-go/graphics.prepareVbo(0x1, 0xc0820086e0, 0xc0820a7e70)
C:/Users/mpron/go/src/github.com/caseif/cubic-go/graphics/block_renderer.go:145 +0x108
Relevant code:
gl.GenVertexArrays(1, &vaoHandle)
gl.BindVertexArray(vaoHandle)
gl.BindBuffer(gl.ARRAY_BUFFER, handle)
gl.BufferData(gl.ARRAY_BUFFER, len(*vbo) * 4, gl.Ptr(*vbo), gl.STATIC_DRAW)
gl.EnableVertexArrayAttrib(vaoHandle, positionAttrIndex) // line 145
gl.VertexAttribPointer(positionAttrIndex, 3, gl.FLOAT, false, 12, nil)

I made a subtle mistake in calling glEnableVertexArrayAttrib, only available since OpenGL 4.5, instead of glEnableVertexAttribArray, which is available since OpenGL 2.1. The former function allows attribute arrays to be toggled on a per-VAO basis, which isn't at all necessary in this context.

Related

Vulkan --- vkAcquireNextImageKHR throws std::out_of_range when certain queue families are used

TL;DR
vkAcquireNextImageKHR throws std::out_of_range when certain queue families are used. Is this expected behavior? How to debug?
Detailed description
The Vulkan program I use is based on vulkan-tutorial.com. I discovered that my VkPhysicalDevice has three queue families, each flagged with VK_QUEUE_GRAPHICS_BIT and present support:
uint32_t queueFamilyCount;
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr);
std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data());
std::vector<uint32_t> graphicsQueueFamilyIndices;
std::vector<uint32_t> presentQueueFamilyIndices;
int i = 0;
for (const auto& queueFamily : queueFamilies)
{
if (queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT)
{
graphicsQueueFamilyIndices.push_back(i);
}
VkBool32 presentSupport = false;
vkGetPhysicalDeviceSurfaceSupportKHR(
device,
i,
surface,
&presentSupport
);
if (presentSupport)
{
presentQueueFamilyIndices.push_back(i);
}
++i;
}
// graphicsQueueFamilyIndices = {0, 1, 2}
// presentQueueFamilyIndices = {0, 1, 2}
These are later used when creating the logical device, the swapchain (the queue families all have present capability) and the command pool. Later the program calls
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, semaphore, VK_NULL_HANDLE, &imageIndex);
But using any other than 0 causes this API call to throw an uncaught std::out_of_range (output is that of lldb):
But using any combination of present and graphics queue indices of the following causes this API call to throw an uncaught std::out_of_range: (1, 1), (1, 2), (2, 1), (2, 2).
lldb output is as follows:
2019-12-01 11:36:35.599882+0100 main[22130:167876] flock failed to lock maps file: errno = 35
2019-12-01 11:36:35.600165+0100 main[22130:167876] flock failed to lock maps file: errno = 35
libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: Index out of range
Process 22130 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff675c949a libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff675c949a <+10>: jae 0x7fff675c94a4 ; <+20>
0x7fff675c949c <+12>: movq %rax, %rdi
0x7fff675c949f <+15>: jmp 0x7fff675c33b7 ; cerror_nocancel
0x7fff675c94a4 <+20>: retq
Target 0: (main) stopped.
The same error is caused when using an indices that doesn't even refer to a queue, like 123. I'm using the VK_LAYER_KHRONOS_validation layer, which doesn't utter any complaint.
Questions
(1) Is this the expected behavior for passing the wrong queue family index to Vk?
(2) Are there validation layers that are capable of catching this error and making it more verbose?
(3) Why do these choices of queue families cause this error?
Details
Using queue family indices (1, 1) for graphics and present queue families during logical device creation while using index 0 for everything else already causes vkAcquireNextImage to raise the error. Of course, VK_LAYER_KHRONOS_validation raises the following warning upon command pool creation:
Validation layer: vkCreateCommandPool: pCreateInfo->queueFamilyIndex (= 0) is not one of the queue families given via VkDeviceQueueCreateInfo structures when the device was created. The Vulkan spec states: pCreateInfo::queueFamilyIndex must be the index of a queue family available in the logical device device. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCreateCommandPool-queueFamilyIndex-01937)
I'm using MoltenVK (from the Vulkan SDK, version 1.1.126.0) on macOS Catalina 10.15.1.
Workarounds
Using version 1.1.121.1 of the SDK prevents the throw from occurring.
Creating a device queue family with index 0 alongside any other device queues one might require prevents the throw from occurring.
Issue on GitHub
This has now been raised as issue on GitHub [here].
That seems to be a bug in MoltenVK. Inspection of the MoltenVK source indicates that it always implicitly uses queue 0 of queue family 0 for vkAcquireNextImage. The fact that you have no problems if you create that queue explicitly, or if you use just a Fence tells me MoltenVk probably forgets to initialize that implicit queue properly for itself.
The GitHub Issue is filed at KhronosGroup/MoltenVK#791.

Memory error when calling gl.GenVertexArrays

I've been using Go's go-gl package for quite a while now. Everything was working 100% until I did some refactoring and now I'm getting the stranges error:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
runtime stack:
runtime.throw(0x65d0fe, 0x2a)
/usr/lib/go/src/runtime/panic.go:596 +0x95
runtime.sigpanic()
/usr/lib/go/src/runtime/signal_unix.go:274 +0x2db
runtime.asmcgocall(0x8, 0x97ed40)
/usr/lib/go/src/runtime/asm_amd64.s:633 +0x70
goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x5b8ad0, 0xc420049c00, 0xc4200001a0)
/usr/lib/go/src/runtime/cgocall.go:131 +0xe2 fp=0xc420049bc0 sp=0xc420049b80
github.com/go-gl/gl/v4.5-core/gl._Cfunc_glowGenVertexArrays(0x0, 0xc400000001, 0xc42006c7d8)
github.com/go-gl/gl/v4.5-core/gl/_obj/_cgo_gotypes.go:4805 +0x45 fp=0xc420049c00 sp=0xc420049bc0
github.com/go-gl/gl/v4.5-core/gl.GenVertexArrays(0x1, 0xc42006c7d8)
...
runtime.main()
/usr/lib/go/src/runtime/proc.go:185 +0x20a fp=0xc420049fe0 sp=0xc420049f88
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420049fe8 sp=0xc420049fe0
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:2197 +0x1
exit status 2
shell returned 1
I was wondering if anyone has a solution. I've updated my drivers and a empty OpenGL scene works 100% without generating vertex arrays.
Here is my go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/<user>/Projects/<project>"
GORACE=""
GOROOT="/usr/lib/go"
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build983667275=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
The function making the call:
var vertexArrayID uint32
// ERROR ON LINE BELOW.
gl.GenVertexArrays(1, &vertexArrayID)
gl.BindVertexArray(vertexArrayID)
// Vertex buffer
var vertexBuffer uint32
gl.GenBuffers(1, &vertexBuffer)
gl.BindBuffer(gl.ARRAY_BUFFER, vertexBuffer)
gl.BufferData(gl.ARRAY_BUFFER, len(verticies)*4, gl.Ptr(verticies), gl.STATIC_DRAW)
Thank you
Turns out the a OpenGL context was created after the function call instead of before. Very strange that the empty scene still worked and only crashed after trying to generate buffers.

Wait for kernel to finish OpenCL

My OpenCL program doesn't always finish before further host (c++) code is executed. The OpenCL code is only executed up to a certain point (which apperears to be random). The code is shortened a bit, so there may be a few things missing.
cl::Program::Sources sources;
string code = ResourceLoader::loadFile(filename);
sources.push_back({ code.c_str(),code.length() });
program = cl::Program(OpenCL::context, sources);
if (program.build({ OpenCL::default_device }) != CL_SUCCESS)
{
exit(-1);
}
queue = CommandQueue(OpenCL::context, OpenCL::default_device);
kernel = Kernel(program, "main");
Buffer b(OpenCL::context, CL_MEM_READ_WRITE, size);
queue.enqueueWriteBuffer(b, CL_TRUE, 0, size, arg);
buffers.push_back(b);
kernel.setArg(0, this->buffers[0]);
vector<Event> wait{ Event() };
Version 1:
queue.enqueueNDRangeKernel(kernel, NDRange(), range, NullRange, NULL, &wait[0]);
Version 2:
queue.enqueueNDRangeKernel(kernel, NDRange(), range, NullRange, &wait, NULL);
.
wait[0].wait();
queue.finish();
Version 1 just does not wait for the OpenCL program. Version 2 crashes the program (at queue.enqueueNDRangeKernel):
Exception thrown at 0x51D99D09 (nvopencl.dll) in foo.exe: 0xC0000005: Access violation reading location 0x0000002C.
How would one make the host wait for the GPU to finish here?
EDIT: queue.enqueueNDRangeKernel returns -1000. While it returns 0 on a rather small kernel
Version 1 says to signal wait[0] when the kernel is finished - which is the right thing to do.
Version 2 is asking your clEnqueueNDRangeKernel() to wait for the events in wait before it starts that kernel [which clearly won't work].
On it's own, queue.finish() [or clFinish()] should be enough to ensure that your kernel has completed.
Since you haven'd done clCreateUserEvent, and you haven't passed it into anything else that initializes the event, the second variant doesn't work.
It is rather bad that it crashes [it should return "invalid event" or some such - but presumably the driver you are using doesn't have a way to check that the event hasn't been initialized]. I'm reasonably sure the driver I work with will issue an error for this case - but I try to avoid getting it wrong...
I have no idea where -1000 comes from - it is neither a valid error code, nor a reasonable return value from the CL C++ wrappers. Whether the kernel is small or large [and/or completes in short or long time] shouldn't affect the return value from the enqueue, since all that SHOULD do is to enqueue the work [with no guarantee that it starts until a queue.flush() or clFlush is performed]. Waiting for it to finish should happen elsewhere.
I do most of my work via the raw OpenCL API, not the C++ wrappers, which is why I'm referring to what they do, rather than the C++ wrappers.
I faced a similar problem with OpenCL that some packages of a data stream we're not processed by OpenCL.
I realized it just happens while the notebook is plugged into a docking station.
Maybe this helps s.o.
(No clFlush or clFinish calls)

OpenCL / OpenGL Implicit Synchronization on AMD Tahiti

I'm having a problem with the "implicit synchronization" of OpenCL and OpenGL on an AMD Tahiti (AMD Radeon HD 7900 Series) device. The device has the cl/gl extensions, cl_khr_gl_sharing, and cl_khr_gl_event.
When I run the program which is just a simple vbo update kernel, and draw it as a white line with simple shader, it hiccups like crazy, stalling what looks to be 2-4 frames every update. I can confirm that it isn't the cl kernel or gl shader that I'm using to update and draw the buffer, because if I put glFinish and commandQueue.finish() before and after the acquire and release of gl objects for the cl update, everything works as it should.
So, I figured that I needed to "enable" the event extension...
#pragma OPENCL EXTENSION cl_khr_gl_event : enable
...in the cl program, but that throws errors. I assume this extension isn't a client facing extension and is supposed to just work as "expected", which is why I can't enable it.
The third behavior that I noticed...if I take out the glFinish() and commandQueue.finish(), and run it in CodeXL debug, the implicit synchronization works. As in, without any changes to the code base, like forcing synchronization with finish, CodeXL allows for implicit synchronization. So, implicit sync clearly works, but I can't get it to work by just running the application regularly through Visual Studio and forcing synchronization.
Clearly I'm missing something, but I honestly can't see it. Any thoughts or explanations would be greatly appreciated, as I'd love to keep the synchronization implicit.
I'm guessing you're not using the GLsync-cl_event synchro (GL_ARB_cl_event and cl_khr_gl_event extensions), which is why adding cl/glFinish and the overhead from CodeXL are helping.
My guess is your code looks like:
A1. clEnqueueNDRangeKernel
A2. clEnqueueReleaseObjects
[here is where you inserted clFinish]
B1. glDraw*
B2. wgl/glXSwapBuffers
[here is where you inserted glFinish]
C1. clEnqueueAcquireObjects
[repeat from A1]
Instead, you should:
CL->GL synchro: have clEnqueueReleaseObjects create an (output) event to be passed to glCreateSyncFromCLeventARB, then use glWaitSync (NOT glClientWaitSync - which in this case would be the same as clFinish).
GL->CL synchro: have clEnqueueAcquireObjects take an (input) event, which will be created with clCreateFromGLsync, taking a sync object from glFenceSync
Overall, it should be:
A1. `clEnqueueNDRangeKernel`
[Option 1.1:]
A2. `clEnqueueReleaseObjects`( ..., 0, NULL, &eve1)
[Option 1.2:]
A2. `clEnqueueReleaseObjects`( ..., 0, NULL, NULL)
A2'. `clEnqueueMarker`(&eve1)
A3. sync1 = glCreateSyncFromCLeventARB(eve1)
* clReleaseEvent(eve1)
A4. glWaitSync(sync1)
* glDeleteSync(sync1)
B1. glDraw*
B2. wgl/glXSwapBuffers
B3. sync2 = glFenceSync
B4. eve2 = clCreateFromGLSync(sync2)
* glDeleteSync(sync2)
[Option 2.1:]
C1. clEnqueueAcquireObjects(, ..., 1, &eve2, NULL)
* clReleaseEvent(eve2)
[Option 2.2:]
B5. clEnqueueWaitForEvents(1, &eve2)
* clReleaseEvent(eve2)
C1. clEnqueueAcquireObjects(, ..., 0, NULL, NULL)
[Repeat from A1]
(Options 1.2 / 2.2 are better if you don't exactly know in advance what will be the last enqueue before handing control over to the other API)
As a side note, I assumed you're not using an out-of-order queue for OpenCL (there really shouldn't be a need for one in this case) - if you did, you of course have to also synchro clEnqueueAcquire -> clEnqueueNDRange -> clEnqueueRelease.

Sigsegv when recompiled for arm architecture

I am trying to figure out where I made a mistake in my c++ poco code. While running it on Ubuntu 14, the program runs correctly but when recompiled for the arm via gnueabi, it just crashes with sigsegv:
This is report from the stack trace (where it falls):
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.2.101")}, 16) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x6502a8c4} ---
+++ killed by SIGSEGV +++
And this is code where it falls ( it should connect to the tcp server ):
this->address = SocketAddress(this->host, (uint16_t)this->port);
this->socket = StreamSocket(this->address); // !HERE
Note that I am catching any exceptions (like econnrefused) and it dies correctly when it can't connect. When it connect's to the server side, it just falls.
When trying to start valgrind, it aborts with error. No idea what shadow memory range means
==4929== Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
http://pastebin.com/Ky4RynQc here is full log
Thank you
Don't know why, this compiled badly on ubuntu, but when compiled on fedora (same script, same build settings, same gnu), it's working.
Thank you guys for your comments.