I am building a real-time signal processing and display system using a nVidia Tesla C2050 GPU. The design was such that the signal processing part would run as a separate program and do all the computations using CUDA. In parallel, if needed I can start a separate display program which displays the processed signal using OpenGL.Since the design was to run the processes as independent processes, I do not have any CUDA-OpenGL interoperability These two programs exchange data with each other over a UNIX stream socket.
The signal processing program spends most of the time using the GPU for the CUDA stuff.I am refreshing my frame in OpenGL every 50 msecs while the CUDA program runs for roughly 700 msecs for each run and two sequential runs are usually separated by 30-40 msecs. When I run the programs one at a time (i.e. only CUDA or OpenGL part is running) everything works perfectly. But when I start the programs together, the display is also not what it is supposed to be, while the CUDA part produces the correct output. I have checked the socket implementation and I am fairly confident that the sockets are working correctly.
My question is since I have a single GPU and no CUDA-OpenGL interoperability and both the processes use the GPU regularly, is it possible that the context switching between the CUDA kernel and the OpenGL kernel is interfering with each other. Should I change the design to have a single program to run bot the parts with CUDA-OpenGL interoperability.
Compute capability 5.0 and less devices cannot run graphics and compute concurrently. The Tesla C2050 does not support any form of pre-emption so while the CUDA kernel is executing the GPU cannot be used to render the OpenGL commands. CUDA-OpenGL interop does not solve this issue.
If you have a single GPU then the best option is to break the CUDA kernels into shorter launches so that the GPU can switch between compute in graphics. In the aformentioned case the CUDA kernel should not execute for more than 50ms - GLRenderTime.
Using a second GPU to do the graphics rendering would be the better option.
Related
I understand that Tesseract already uses OpenCL to offload some of the compute intensive modules to the GPU or spreads it across available cores of the CPU. Now if I split an image into multiple parts and send it for Tesseract text extraction, will I get any more speed up?
This really depends on your hardware architecture. If you for example have multiple GPUs in your machine, the you could configure two Tesseract instances so that one uses one and the other uses the other. A common case is when you have an Intel HD Graphics 4xxx that comes with the Intel core processor, and you have an additional dedicated GPU. Still, if you don't have the exact same devices, load ballancing will not be a trivial task.
I'm making a small demo application learning opengl 3.3 using GLFW. MY problem is that, if I run a release compile it runs at about 120 fps. A debug compile runs at about 15 fps. Why would that be?
It's a demo shooting lots of particles that move and rotate.
If the app isn't optimized and spends a long time executing non-OpenGL commands, the OpenGL device can be easily in a idle situation.
You should profile the app without OpenGl commands (as if you have an infinitely fast OpenGL device) and check your FPS. If it's very slow this will be an indication that your app is CPU bound (probably in release mode too).
In addition if you're setting debug options in opengl/glsl, poor performance wouldn't be a big surprise.
Debug mode should be used to debug the app and 15fps still gives you a more or less interactive experience.
If the particle system is animated with the CPU (OpenGl only renders), you should consider a GPU accelerated solution.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How does OpenGL work at the lowest level?
When we make a program that uses the OpenGL library, for example for the Windows platform and have a graphics card that supports OpenGL, what happens is this:
We developed our program in a programming language linking the graphics with OpenGL (eg Visual C++).
Compile and link the program for the target platform (eg Windows)
When you run the program, as we have a graphics card that supports OpenGL, the driver installed on the same Windows will be responsible for managing the same graphics. To do this, when the CPU will send the required data to the chip on the graphics card (eg NVIDIA GPU) sketch the results.
In this context, we talk about graphics acceleration and downloaded to the CPU that the work of calculating the framebuffer end of our graphic representation.
In this environment, when the driver of the GPU receives data, how leverages the capabilities of the GPU to accelerate the drawing? Translates instructions and data received CUDA language type to exploit parallelism capabilities? Or just copy the data received from the CPU in specific areas of the device memory? Do not quite understand this part.
Finally, if I had a card that supports OpenGL, does the driver installed in Windows detect the problem? Would get a CPU error or would you calculate our framebuffer?
You'd better work into computer gaming sites. They frequently give articles on how 3D graphics works and how "artefacts" present themselves in case of errors in games or drivers.
You can also read article on architecture of 3D libraries like Mesa or Gallium.
Overall drivers have a set of methods for implementing this or that functionality of Direct 3D or OpenGL or another standard API. When they are loading, they check the hardware. You can have cheap videocard or expensive one, recent one or one released 3 years ago... that is different hardware. So drivers are trying to map each API feature to an implementation that can be used on given computer, accelerated by GPU, accelerated by CPU like SSE4, or even some more basic implementation.
Then driver try to estimate GPU load. Sometimes function can be accelerated, yet the GPU (especially low-end ones) is alreay overloaded by other task, then it maybe would try to calculate on CPU instead of waiting for GPU time slot.
When you make mistake there is always several variants, depending on intelligence and quality of driver.
Maybe driver would fix the error for you, ignoring your commands and running its own set of them instead.
Maybe the driver would return to your program some error code
Maybe the driver would execute the command as is. If you issued painting wit hred colour instead of green - that is an error, but the kind that driver can not know about. Search for "3d artefacts" on PC gaming related sites.
In worst case your eror would interfere with error in driver and your computer would crash and reboot.
Of course all those adaptive strategies are rather complex and indeterminate, that causes 3D drivers be closed and know-how of their internals closely guarded.
Search sites dedicated to 3D gaming and perhaps also to 3D modelling - they should rate videocards "which are better to buy" and sometimes when they review new chip families they compose rather detailed essays about technical internals of all this.
To question 5.
Some of the things that a driver does: It compiles your GPU programs (vertex,fragment, etc. shaders) to the machine instructions of the specific card, uploads the compiled programs to the appropriate area of the device memory and arranges the programs to be executed in parallel on the many many graphics cores on the card.
It uploads the graphics data (vertex coordinates, textures, etc.) to the appropriate type of graphics card memory, using various hints from the programmer, for example whether the date is frequently, infrequently, or not at all updated.
It may utilize special units in the graphics card for transfer of data to/from host memory, for example some nVidia card have a DMA unit (some Quadro card may have two or more), which can upload, for example, textures in parallel with the usual driver operation (other transfers, drawing, etc).
I've got a pretty old ATI HD 3400 video card , which have no support of OpenCL , so i'm wondering if i can actually play around with OpenGL libraries provided by ATI catalyst driver ?
If my algorithm is running in glutDisplayFunc ( displayFunc ) , that is in displayFun () , is it actually costing CPU power or GPU power ?
GLUT is just a library which manages platform-specific window and GL context creation. The function you pass to glutDisplayFunc is just called by GLUT at the appropriate time and context for the platform you're running on; it is not executed on the GPU.
It is not possible to have code that you've compiled in the normal fashion as part of a larger program run on the GPU.
However, the individual graphics operations run inside of your display func do of course perform the rendering on the GPU; the CPU is still computing which graphics operation to execute, but not actually rendering the results. Each gl function is a normal CPU function, but what it does is send a command through your system bus to your graphics card, which then does the actual rendering.
Furthermore, these operations are asynchronous; the gl functions don't wait for your GPU to finish the operation before letting your program continue. This is useful because your CPU and GPU can both be working simultaneously — the GPU draws graphics while the CPU figures out what graphics to draw. On the other hand, if you do need communication in the other direction — such as glReadPixels — then the CPU has to wait for the GPU to catch up. This is also the difference between glFlush and glFinish.
i'm programming a simple OpenGL program on a multi-core computer that has a GPU. The GPU is a simple GeForce with PhysX, CUDA and OpenGL 2.1 support. When i run this program, is the host CPU that executes OpenGL specific commands or the ones are directly transferred
to the GPU ???
Normally that's a function of the drivers you're using. If you're just using vanilla VGA drivers, then all of the OpenGL computations are done on your CPU. Normally, however, and with modern graphics cards and production drivers, calls to OpenGL routines that your graphics card's GPU can handle in hardware are performed there. Others that the GPU can't perform are handed off to the CPU.