Quick and dirty profiling for OpenGL compute shaders?

Quick and dirty profiling for OpenGL compute shaders? - opengl

I have a GPU implementation of Marching Cubes for procedurally generated isosurfaces as part of a BSc project, implemented in OpenFrameworks using OpenGL compute shaders. It's up and running, debugged and working at decent frame rates, but I'd like to see if I can optimize it further and eliminate any potential bottlenecks (in particular, there are a lot of atomics in there and I'd like to find out if these are slowing it down much).
The algorithm uses 6 chained compute shaders with memory barriers in between :
get samples
get surface cubes (atomic counter)
get unique vertices (atomic counter)
get polygon indices (atomic counter)
get final vertex positions
get surface normals (lots of atomic adds)
Basically what I'm trying to find out is how long the algorithm spends in each separate shader, but the async nature of OpenGL makes this difficult. I installed GDebugger which comes highly recommended and was used to profile this implementation which uses OpenCL for compute and OpenGL for rendering. Unfortunately it breaks my program. :-( The standalone exe works fine under Win7, but trying to run it inside GDebugger throws this for each shader:
[ error ] ofShader: checkProgramLinkStatus(): program failed to link
[ error ] ofShader: ofShader: program reports:
Compute info
------------
(0) : error C7006: no work group size specified
So I need an alternative way of profiling. I've considered doing something like this:
glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT);
glMapNamedBuffer(someBufferIJustUsed);
cout << ofGetElapsedTimef() << endl;
glUnmapNamedBuffer(someBufferIJustUsed);
Hopefully this should block until the shader is done writing to the buffer , and give me at least a ballpark figure for completion time? But if anyone can point out a serious flaw in this approach or knows of a better way, please let me know.

Related

GLSL time for function in shader

I would like to know if there is a way to calculate the time (or number of operations) that a function take in glsl program?
Being quite new to glsl and the way GPU works, it's hard and take me time to optimize a glsl shader; and my multipass rendering is very laggy.
So my goal would be to focus more on slower function.
Does a thing could help me?
I'm working on VS2015, and sadly, my GPU doesn't allow NSight to works.

Shaders run paralelized in GPU. You can't find the number of operations per shader, because you really don't know how many "gpu-cores" are running and how the gpu-compiler optimized the shaders.
You can measure the time ellapsed for a draw command. See more for example
here, here and here

OpenGL constructing and using data on the GPU

I am not a graphics programmer, I use C++ and C mainly, and every time I try to go into OpenGL, every book, and every resource starts like this:
GLfloat Vertices[] = {
some, numbers, here,
some, more, numbers,
numbers, numbers, numbers
};
Or they may even be vec4.
But then you do something like this:
for(int i = 0; i < 10000; i++)
for(int j = 0; j < 10000; j++)
make_vertex();
And you get a problem. That loop is going to take a significant amount of time to finish- and if the make_vertex() function is anything like a saxpy or something of the sort, it is not just a problem... it is a big problem. For example, let us assume I wish to create fractal terrain. For any modern graphic card this would be trivial.
I understand the paradigm goes like this: Write the vertices manually -> Send them over to the GPU -> GPU does vertex processing, geometry, rasterization all the good stuff. I am sure it all makes sense. But why do I have to do the entire 'Send it over' step? Is there no way to skip that entire intermediary step, and just create vertices on the GPU, and draw them, without the obvious bottleneck?
I would very much appreciate at least a point in the right direction.
I also wonder if there is a possible solution without delving into compute shaders or CUDA? Does openGL or GLSL not provide a suitable random function which can be executed in parallel?

I think what you're asking for could work by generating height maps with a compute shader, and mapping that onto a grid with fixed spacing which can be generated trivially. That's a possible solution off the top of my head. You can use GL Compute shaders, OpenCL, or CUDA. Details can be generated with geometry and tessellation shaders.
As for preventing the camera from clipping, you'd probably have to use transform feedback and do a check per frame to see if the direction you're moving in will intersect the geometry.

Your entire question seems to be built on a huge misconception, that vertices are the only things which need to be "crunched" by the GPU.
First, you should understand that GPUs are far more superior than CPUs when it comes to parallelism (heck, GPUs sacrifice conditional control jumping for the sake of parallelism). Second, shaders and these buffers you make are all stored on the GPU after being uploaded by the CPU. The reason you don't just create all vertices on the GPU? It's the same reason for why you load an image from the hard drive instead of creating a raw 2D array and start filling it up with your pixel data inline. Even then, your image would be stored in the executable program file, which is stored on the hard disk and only loaded to memory when you run it. In an actual application, you'll want to load your graphics off assets stored somewhere (usually the hard drive). Why not let the GPU load the assets from the hard drive by itself? The GPU isn't connected to a hardware's storage directly, but barely to the system's main memory via some BUS. That's because to connect to any storage directly, the GPU will have to deal with the file system which is managed by the OS. That's one of the things the CPU would be faster at doing since we're dealing with serialized data.
Now what shaders deal with is this data you upload to the GPU (vertices, texture coordinates, textures..etc). In ancient OpenGL, no one had to write any shaders. Graphics drivers came with a builtin pipeline which handles regular rendering requests for you. You'd provide it with 4 vertices, 4 texture coordinates and a texture among other things (transformation matrices..etc), and it'd draw your graphics for you on the screen. You could go a bit farther and add some lights to your scene and maybe customize a few things about it, but things were still pretty tight. New OpenGL specifications gave more freedom to the developer by allowing them to rewrite parts of the pipeline with shaders. The developer becomes responsible for transforming vertices into place and doing all sort of other calculations related to lighting etc.
I would very much appreciate at least a point in the right direction.
I am guessing it has something to do with uniforms, but really, with
me skipping pages, I really cannot understand how a shader program
runs or what the lifetime of the variables is.
uniforms are variables you can send to the shaders from the CPU every frame before you use it to render graphics. When you use the saturation slider in Photoshop or Gimp, it (probably) sends the saturation factor value to the shader as a uniform of type float. uniforms are what you use to communicate little settings like these to your shaders from your application.
To use a shader program, you first have to set it up. A shader program consists of at least 2 types of shaders linked together, a fragment shader and a vertex shader. You use some OpenGL functions to upload your shader sources to the GPU, issue an order of compilation followed by linking, and it'll give you the program's ID. To use this program, you simply glUseProgram(programId) and everything following this call will use it for drawing. The vertex shader is the code that runs on the vertices you send to position them on the screen correctly. This is where you can do transformations on your geometry like scaling, rotation etc. A fragment shader runs at some stage afterwards using interpolated (transitioned) values outputted from the vertex shader to define the color and the depth of every unit fragment on what you're drawing. This is where you can do post-processing effects on your pixels.
Anyway, I hope I've helped making a few things clearer to you, but I can only tell you that there are no shortcuts. OpenGL has quite a steep learning curve, but it all connects and things start to make sense after a while. If you're getting so bored of books and such, then consider maybe taking code snippets of every lesson, compile them, and start messing around with them while trying to rationalize as you go. You'll have to resort to written documents eventually, but hopefully then things will fit easier into your head when you have some experience with the implementation components. Good luck.
Edit:
If you're trying to generate vertices on the fly using some algorithm, then try looking into Geometry Shaders. They may give you what you want.

You probably want to use CUDA for the things you are used to do in C or C++, and let OpenGL access the rasterizer and other graphics stuff.
OpenGL an CUDA interact somehow nicely. A good entry point to customize the contents of a buffer object is here: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g0fd33bea77ca7b1e69d1619caf44214b , with cudaGraphicsGLRegisterBuffer method.
You may also want to have a look at the nbody sample from NVIDIA GPU SDK samples the come with current CUDA installs.

GPU programming for image processing

I'm working on a project aimed to control a bipad humanoid robot. Unfortunately we have a very limited set of hardware resources (a RB110 board and its mini PCI graphic card). I'm planning to port image processing tasks from CPU to graphic card processor of possible but never done it before... I'm advised to use OpenCV but seems to impossible because our graphic card processor (Volari Z9s) is not supported by framework. Then I found an interesting post on Linux Journal. Author have used OpenGL to process frames retrieved from a v4l device.
I'm a little confused about the relationship between hardware API and OpenGL/OpenCV. In order to utilize a GPU, do the hardware need to be sopported by graphic programming frameworks (OpenGL/OpenCV)? Where can I find such an API?
I googled a lot about my hardware, unfortunately the vendor (XGI Technology) seems to be somehow extinct...

In order to utilize a GPU, do the hardware need to be sopported by graphic programming frameworks (OpenGL/OpenCV)? Where can I find such an API?
OpenCL and OpenGL are both translated to hardware instructions by the GPU driver, so you need a driver for your operating system that supports these frameworks. Most GPU drivers support some version of OpenGL so that should work.
The OpenGL standard is maintained by the Khronos Group and you can find some tutorials at nehe.
How OpenGL works
OpenGL accepts triangles as input and draws them according to the state it has when the draw is issued. Most OpenGL functions are there to change the operations performed by manipulating this state. Image manipulation can be done by loading the input image as a texture and drawing several vertices with the texture active, resulting in a new Image (or more generic a new 2D grid of data).
From version > 2 (or with the right ARB extensions) the operations performed on the image can be controlled with GLSL programs called vertex and fragment shaders (there are more shaders, but these are the oldest). A vertex shader will be called once per vertex, the results of this are interpolated and forwarded to the fragment shader. A fragment shader will be called every time a new fragment(pixel) is written to the result.
Now this is all about reading and writing images, how to use it for object detection?
Use Vertices to span the input texture over the whole viewport. Instead of computing rgb colors and storing them in the result you can write a fragmentshader that computes grayscale images / gradient images and then checks these textures for each pixel if the pixel is in the center of a cycle with a specific size, part of a line or just has a relatively high gradient compared to its surrounding (good feature) or really anithing else you can find a good parallel algorithm for. (haven't done this myself)
The end result has to be read back to the cpu (sometimes you can use shaders to scale the data down before doing this). OpenCL gives it a less Graphics like feel and gives a lot more freedom but is less supported.

First of all You need shader support (GLSL or asm)
Usual way will be rendering full screen quad with your image (texture) and applying fragment shader. It's called Post-Processing And limited with instruction set and another limitations that your hardware has. On basic lvl it allows you to apply simple (single function) on large data set in parallel way that will produce another data set. But branching (if it is supported) is first performance enemy because GPU consist from couple SIMD blocks

C++ & DirectX - setting shader

Does someone know a fast way to invoke shader processing via DirectX?
Right now I'm setting shaders using D3DXCreateEffectFromFile calls, which create shaders in runtime (once per each shader) from *.fx files.
Rendering part for every object (every patch in my case - see further) then means something like:
// --------------------
// Preprocessing
effect->Begin();
effect->BeginPass(0);
effect->SetMatrix (or Vector or whatever - internal shader parameters) (...)
effect->CommitChanges();
// --------------------
// Geometry rendering
// Pass the geometry to render
// ...
// --------------------
// Postprocessing
// End 'effect' passes
effect->EndPass();
effect->End();
This is okay, but the profiler shows weird things - preprocessing (see code) takes about 60% of time (I'm rendering terrain object of 256 patches where every patch contains about 10k vertices).
Actual geometry rendering takes ~35% and postprocessing - 5% of total rendering time.
This seems pretty strange to me and I guess that D3DXEffect interface may not be the best solution for this sort of things.
I've got 2 questions:
1. Do I need to implement my own shader controller / wrapper (probably, low-level) and where should I start from?
2. Would compiling shaders help to somehow improve parameter setting performance?
Maybe somebody knows how to solve this kind of problem / some implemented shader interface or could give some advices about how is this kind of problem solved in modern game engines.
Thank you.

Actual geometry rendering takes ~35% and postprocessing - 5% of total rendering time
If you want to profile shader performance you need to use NVPerfHud or something similar. Using CPU profiler and measuring ticks is not going to help you - rendering is often asynchronous.
Do I need to implement my own shader controller / wrapper (probably, low-level)
Using your own shader wrapper isn't a bad idea - I never liked ID3DXEffect anyway.
With your own wrapper you'll have a total control of resources and program behavior.
Whether you need it or not is for you to decide. With ID3DXEffect you won't have a warranty that implementation is as fast as it could be - it could be wasting cpu cycles doing something you don't really need. D3DX library contains few classes that are useful, but aren't guaranteed to be efficient (ID3DXEffect, ID3DXMesh, All animation-related and skin-related functions, etc).
and where should I start from?
D3DXAssembleShader, IDirect3DDevice9::CreateVertexShader, IDirect3DDevice9::CreatePixelShader on DirectX 9, D3D10CompileShader on DirectX 10. Also download DirectX SDK and read shader documentation/tutorials.
Would compiling shaders help to somehow improve parameter setting performance?
Shaders are automatically compiled when you load them. You could compiling try with different optimization settings, but don't expect miracles.

Are you using a DirectX profiler or just timing your client code? Profiling DirectX API calls using timers in the client code is generally not that effective because it's not necessarily synchronously processing your state updates/draw calls as you make them. There's a lot of optimization that goes on behind the scenes. Here is an article about this for DX9 but I'm sure this hasn't changed for later versions:
http://msdn.microsoft.com/en-us/library/bb172234(VS.85).aspx

I've used effects before in DirectX and the system generally works fine. It provides some nice features that might be a pain to implement yourself at a lower-level, so I would stick with it for the moment.
As bshields suggested, your timing information might be inaccurate. It sounds likely that the drawing actually is taking the most time, compared.
The shader is compiled when it's loaded. Precompiling will save you a half-second of startup time, but so long as the shader doesn't change during runtime, you won't see any actual speed increase. Precompiling is also kind of a pain, if you're still testing a shader. You can do it with the final copy, but unless you have a lot of shaders, you won't get much benefit while loading them.
If you're creating the shaders every frame or every time your geometry is rendered, that's probably the issue. Unless the shader itself (not parameters) changes every frame, you should create the effect once and reuse that.
I don't remember where SetParameter calls go, but you may want to check the docs to make sure your SetMatrix is in the right spot. Setting parameters after the pass has started won't help anything, certainly not speed. Make sure that's set up correctly. Also, set parameters as rarely as possible, there is some slight overhead involved. Per-frame sets will give you a notable slow-down, if you have too many.
All in all, the effects system does work fine in most cases and you shouldn't be seeing what you are. Make sure your profiling is correct, your shader is valid and optimized, and your calls are in the right places.

How to do ray tracing in modern OpenGL?

So I'm at a point that I should begin lighting my flatly colored models. The test application is a test case for the implementation of only latest methods so I realized that ideally it should be implementing ray tracing (since theoretically, it might be ideal for real time graphics in a few years).
But where do I start?
Assume I have never done lighting in old OpenGL, so I would be going directly to non-deprecated methods.
The application has currently properly set up vertex buffer objects, vertex, normal and color input and it correctly draws and transforms models in space, in a flat color.
Is there a source of information that would take one from flat colored vertices to all that is needed for a proper end result via GLSL? Ideally with any other additional lighting methods that might be required to complement it.

I would not advise to try actual ray tracing in OpenGL because you need a lot hacks and tricks for that and, if you ask me, there is not a point in doing this anymore at all.
If you want to do ray tracing on GPU, you should go with any GPGPU language, such as CUDA or OpenCL because it makes things a lot easier (but still, far from trivial).
To illustrate the problem a bit further:
For raytracing, you need to trace the secondary rays and test for intersection with the geometry. Therefore, you need access to the geometry in some clever way inside your shader, however inside a fragment shader, you cannot access the geometry, if you do not store it "coded" into some texture. The vertex shader also does not provide you with this geometry information natively, and geometry shaders only know the neighbors so here the trouble already starts.
Next, you need acceleration data-structures to get any reasonable frame-rates. However, traversing e.g. a Kd-Tree inside a shader is quite difficult and if I recall correctly, there are several papers solely on this problem.
If you really want to go this route, though, there are a lot papers on this topic, it should not be too hard to find them.
A ray tracer requires extremely well designed access patterns and caching to reach a good performance. However, you have only little control over these inside GLSL and optimizing the performance can get really tough.
Another point to note is that, at least to my knowledge, real time ray tracing on GPUs is mostly limited to static scenes because e.g. kd-trees only work (well) for static scenes. If you want to have dynamic scenes, you need other data-structures (e.g. BVHs, iirc?) but you constantly need to maintain those. If I haven't missed anything, there is still a lot of research currently going on just on this issue.

You may be confusing some things.
OpenGL is a rasterizer. Forcing it to do raytracing is possible, but difficult. This is why raytracing is not "ideal for real time graphics in a few years". In a few years, only hybrid systems will be viable.
So, you have three possibities.
Pure raytracing. Render only a fullscreen quad, and in your fragment shader, read your scene description packed in a buffer (like a texture), traverse the hierarchy, and compute ray-triangles intersections.
Hybrid raytracing. Rasterize your scene the normal way, and use raytracing in your shader on some parts of the scene that really requires it (refraction, ... but it can be simultated in rasterisation)
Pure rasterization. The fragment shader does its normal job.
What exactly do you want to achieve ? I can improve the answer depending on your needs.
Anyway, this SO question is highly related. Even if this particular implementation has a bug, it's definetely the way to go. Another possibility is openCL, but the concept is the same.

As for 2019 ray tracing is an option for real time rendering but requires high end GPUs most users don't have.
Some of these GPUs are designed specifically for ray tracing.
OpenGL currently does not support hardware accelerated ray tracing.
DirectX 12 on windows does have support for it. It is recommended to wait a few more years before creating a ray tracing only renderer although it is possible using DirectX 12 with current desktop and laptop hardware. Support from mobile may take a while.

Opengl (glsl) can be used for ray (path) tracing. however there are few better options: Nvidia OptiX (Cuda toolkit -- cross platform), directx 12 (with Nvidia ray tracing extension DXR -- windows only), vulkan (nvidia ray tracing extension VKR -- cross platform, and widely used), metal (only works on MacOS), falcor (DXR, VKR, OptiX based framework), Intel Embree (CPU ray tracing only).

I found some of the other answers to be verbose and wordy. For visual examples that YES, functional ray tracers absolutely CAN be built using the OpenGL API, I highly recommend checking out some of the projects people are making on https://www.shadertoy.com/ (Warning: lag)

To answer the topic: OpenGL has no RTX extension, but Vulkan has, and interop is possible. Example here: https://github.com/nvpro-samples/gl_vk_raytrace_interop
As for the actual question: To light the triangles, there are tons of techniques, look up for "forward", "forward+" or "deferred" renderers. The technique to be used depends on you goal. The simplest and most good-looking these days, is image based lighting (IBL) with physically based shading (PBS). Basically, you use a cubemap and blur it more or less depending on the glossiness of the object. For a simple object viewer you don't need more.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js