GLSL loop array indexing not working on ATI? - glsl

I have a simple loop in GLSL to compute a bezier curve, and it works flawlessly on NVidia hardware.
However, it crashes on ATI cards, even though I am using version 1.20, which, IIRC, introduced non const array access.
I tried with later version (1.3 and 1.5) but still no luck.
If I unroll my loop, the code works again.
What am I missing ?

It's a driver bug (probably fixed by now, but I'm not sure).
Relevant reading: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=280190&page=1

Related

Difference EPECK and CGAL::Simple_cartesian<CGAL::Gmpq>

Hopefully someone can give me some insight why if I use the
CGAL::Simple_cartesian<CGAL::Gmpq> as Kernel my program works and it crashes if I use the CGAL::Exact_predicates_exact_constructions_kernel as Kernel;
Problem: I let one CGAL::polygon_2 (diamond shaped) fall on another polygon (square). As soon as the tip of the diamond shaped polygon touches the square the program crashes during the call to do_intersect(diamond,square) (probably with a stack overflow) if I use the EPECK kernel. My understanding was that this should always work since it is exact and I thought since it does not make a construction I should even be able to use CGAL::Exact_predicates_inexact_constructions_kernel.
It seems to start looping at the blue marked bar in the image at call: BOOST_PP_REPEAT_FROM_TO(2, 9, CGAL_LAZY_REP, _).
Solution: If I replace the EPECK with CGAL::Simple_cartesian<CGAL::Gmpq> it works.
I am willing to use this as the solution, but I want to be certain that it is actually the solution and not that I get a problem further down the line. Also some understanding would be nice why the problem is there since I thought that CGAL should be able to handle this with EPECK even if it might be a degenerate case.
Additional Info:
I have build it on 3 computers, with 2 MSVC compiler versions and 2 CGAL version all with comparable results.
MSVC: 14.10 & 14.12
CGAL: 4.10 & 4.12
gmp: 5.01
boost: 1.67.0
windows sdk: 10.0.17134 & 10.0.14393.0
Windows 10 64 bit

Mac vs Windows: Eigen::Vector3f(0,0,0).normalized()

Why am I seeing different results for this simple 3d vector operation using Eigen on Mac and Windows?
I wrote some simulation code on my MacBook Pro (macOS 10.12.6) and tested it extensively. As soon as my colleague tried using it on Windows, he had problems. He gave me a specific failing case. It worked for me. As we dug in, it came down to an attempt to normalize a 3d zero vector, so an attempt to divide by zero. He got (nan, nan, nan) while I got (0, 0, 0). In the context where it happened, the zero result was a soft/harmless fail, which is why I had not noticed it in my testing.
Clearly the vector-of-nans is the right answer. I tried it in an Ubuntu build running under Vagrant and got (-nan, -nan, -nan).
Does anyone know why I get (0, 0, 0) on macOS? I think by default Xcode is using LLVM. The Ubuntu build used clang.
My suspicion is that you got a newer Eigen version on macOS. The behavior of normalize() had been changed some time ago:
https://bitbucket.org/eigen/eigen/commits/12f866a746
There was a discussion about the expected behavior here: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=977
Check your compiler flags. You probably have fast math enabled (-ffast-math in gcc). This enables -ffinite-math-only (again, gcc) which, and I quote:
Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.

No performance gain with OpenCV 3.2 on OpenCL (TAPI)

Absolute TAPI beginner here. I recently ported my CV code to make use of UMat intstead of Mat since my CPU was on it's limit, especially morphologic operations seemed to consume quite some computing power.
Now with UMat I cannot see any changes in my framerate, it is exactly the same no matter if I use UMat or not, also Process Explorer reports no GPU usage whatsoever. I did a small test with a few calls of dilation and closing on a full HD image -- no effect.
Am I missing something here? I'm using the latest OpenCV 3.2 build for Windows and a GTX 980 with driver 378.49. cv::ocl::haveOpenCL() and cv::ocl::useOpenCL() both return true and cv::ocl::Context::getDefault().device( 0 ) also gives me the correct device, everything looks good as far as I can tell. Also, I'm using some custom CL code via cv::ocl::Kernel which is definitely invoked.
I realize it is a naive way of thinking that just changing Mat to UMat will result in huge performance gain (although every of the the very limited number of ressources covering TAPI I find online suggests exactly that). Still, I was hoping to get some gain for starters and then further optimize step-by-step from there. However, the fact that I can't discover any GPU usage whatsoever highliy irritates me.
Is there something I have to watch out for? Maybe my usage of TAPI prevents from a streamlined execution of OpenCL code, maybe by accidental/hidden readbacks I'm not aware of? Do you see any way of profiling the code with respect to that matter?
Are there any how-to's, best practices or common pitfalls for using TAPI? Things like "don't use local UMat instances within functions", "use getUMat() instead of copyTo()", "avoid calls of function x since it will cause a cv::ocl::flush()", things of that sort?
Are there OpenCV operations that are not ported to OpenCL yet? Is there documentation accordingly? In the OpenCV source code I saw that, if built with HAVE_OPENCL flat, the functions try to run CL code using the CV_OCL_RUN macro, however there are a few conditions checked beforehand, otherwise it falls back to CPU. It does not seem like I have any possitility to figure out if the GPU or the CPU was actually used apart from stepping into each and every OpenCL function with the debugger, am I right?
Any ideas/experiences apart from that? I'd appreciate any input that relates to this matter.

glsl compiler completely ignore catching the error of missing parentheses after EmitVertex EndPrimitive

I've recently bumped into an rendering issue that is caused by me inadvertently missed typing the pair of parentheses right after the EmitVertex and EndPrimitive in the OpenGL geometry shader. To my surprise, the glsl compiler didn't throw any compiling error and quietly let it pass. The end result is an blank screen since no vertex is emitted by the geometry shader.
I'm wondering is it a bug in the compiler or there is any other reason for this.
BTW, I have tested it on nvidia Titan X with Win7 and GTX 750M with Win8. They both have the same problem.
That is definitely a bug if it works the way you described (e.g. EmitVertex );).
You can always validate your shaders using Khronos' reference compiler if you are ever in doubt. That sort of thing is already common in D3D-based software, where shaders are pre-compiled when the software is built; GL does not support hardware-independent pre-compiled shaders, so you would only use this as a validation step rather than part of the compile process.
Even though it will not save runtime, it would not be a bad idea to work that into your software's build procedure so you do not have to wait until your software is actually deployed to catch simple parse errors like this. Otherwise, you will often only learn about these things after a driver version change or the software runs on a GPU it was never tested on before.

OpenGL - Samplers in opengl 3.1?

I'm using samplers quite frequently in my application and everything has been working fine.
The problem is, I can only use opengl 3.1 on my laptop. According to the documentation, samplers are only available at opengl 3.3 or higher, but here's where I'm getting a bit confused.
I can use 'glGenSamplers' just fine, no errors are generated and the sampler ID seems fine as well. When using 'glBindSampler' on a valid texture, I get a 'GL_INVALID_VALUE​' error.
Can anyone clear this up for me? If samplers aren't available in opengl 3.1, why can I use glGenSamplers without a problem?
What can I do to provide backwards compatibility? I'm guessing my only option will be to set the texture parameters every time the texture is being used for rendering, if samplers aren't available?
There are two possibilities:
Your graphics card/driver supports ARB_sampler_objects, in this case it is unsurprising that the function is supported. Feel free to use it.
The function is present anyway. In this case, strange as it sounds, you are not allowed to use it.
Check whether glGetStringi(GL_EXTENSION, ...) returns the sampler objects extension at some index. Only functionality from extensions that the implementation advertizes as "supported" is allowed to be used.
If you find some functions despite no support, they might work anyway, but they might as well not. It's undefined.
Note that although you would normally expect the function being named glGenSamplersARB when it comes from an ARB extension, that is not the case here, since this is a "backwards extension" that provides selected functionality which is present identically in a later version on hardware which isn't able to provide the full functionality of that later version.
(About the error code, note comment by Brett Hale)