OpenGL big 3D texture (>2GB) is very slow [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
My graphics card is GTX 1080 ti. I want to use OpenGL 3D texture. The pixel (voxel) format is GL_R32F. OpenGL did not report any errors when I initialized the texture and rendered with the texture.
When the 3D texture was small (512x512x512), my program ran fast (~500FPS).
However, if I increased the size to 1024x1024x1024 (4GB), the FPS dramatically dropped to less than 1FPS. When I monitored the GPU memory usage, the GPU memory does not exceed 3GB even though the texture size is 4GB and I have 11G in total.
If I changed pixel format to GL_R16F, it worked again and the FPS went back to 500FPS and the GPU memory consumption is about 6.2GB.
My hypothesis is that the 4GB 3D texture is not really on the GPU but on the CPU memory instead. In every frame, the driver is passing this data from CPU memory to GPU memory again and again. As a result, it slows down the performance.
My first question is whether my hypothesis is correct? If it is, why it happens even I have plenty of GPU memory? How do I enforce any OpenGL data to reside on GPU memory?

My first question is whether my hypothesis is correct?
It is not unplausible, at least.
If it is, why it happens even I have plenty of GPU memory?
That's something for your OpenGL implementation to decide. Note that this also might be some driver bug. It might also be some internal limit.
How do I enforce any OpenGL data to reside on GPU memory?
You can't. OpenGL does not have a concept of Video RAM or System RAM or even a GPU. You specify your buffers and textures and other objects and make the draw calls, and it is the GL implementation's job to map this to the actual hardware. However, there are no performance guarantees whatsoever - you might encounter a slow path or even a fallback to software rendering when you do certain things (with the latter being really uncommon in recent times, but conceptually, it is very possible).
If you want control over where to place data, when to actually transfer it, and so on, you have to use a more low-level API like Vulkan.

Related

How to optimize rendering of dynamic geomtry? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
As i know batching and instancing is used for decreasing draw-calls amount for static meshes.But what about dynamic meshes?How can i optimize amount of draw calls for them? Instancing and batching
create big overhead because you need to recalculate position with cpu every frame.Or it is better to draw dynamic meshes with separate draw calls?
There are a few performance considerations to keep in mind:
Every glDraw..() comes with some overhead, so you want to minimize those. That's one reason that instancing is such a performance boon. (Better cache behavior is another.)
Host-to-device data transfers (glBufferData()) are even slower than draw calls. So, we try to keep data on the GPU (vertex buffers, index buffers, textures) rather than transmitting it each frame.
In your case, there are a couple of ways to get performant dynamic meshes.
Fake it. Do you really need dynamic meshes - specifically, one where you must generate new mesh data? Or, can you achieve the same thing via transforms in your shaders?
Generate the mesh on the GPU. This could be done in a compute shader (for best performance) or in geometry and/or tessellation shaders. This comes with its own overhead, but, since everything happening on the GPU, you aren't hit with the much more expensive glDraw...() or host-GPU copies.
Note that geometry shaders are relatively slow, but they're still faster than copying a new vertex + index buffer from the CPU to the GPU. *
If your "dynamic" mesh has a finite number of states, just keep them all on the GPU and switch between them as necessary.
If this were another API such as Vulkan, you could potentially generate the mesh in a separate thread and transfer it to the GPU while drawing other things. That is a very complex topic, as is just about everything relating to the explicit graphics APIs.

OpenGL Shader Uniform becomes slower [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
The Problem
I'm developing some procedural terrain generation in C++ with OpenGL. As an IDE I'm using Microsoft VS2017. I can run the "experiment" without problems. But after about two hours of development the program slows down. Within ten minutes or so, the framerate drops from over 100 to 20. And shortly after that my GPU doesn't manage to render a frame every second. When launching the program, it takes an eternity to load the shaders and link the programs.
Possible Causes
After some debugging and profiling within VS2017, it turns out that over 98% of the time the CPU is waiting for the GPU to complete shader uniform actions. This includes finding the locations of uniform variables and loading three matricies to uniform variables.
Troubleshooting Steps
I've tried various different things to improve the situationg including the following ones but I could not fix the problem without restarting my computer
Copy .exe and assets to another folder
Copy .exe and assets to another physical devide
Relaunch VS2017
Decrease GPU and memory clock in MSI Afterburner
Check graphics card VRAM usage
Close background applications
My Computer
If this information helps someone, here is it:
Intel© Core© i5-6600K #3,5GHz
EVGA GeForce GTX 1060 6GB GDDR5
MSI Z170-A PRO
2x8GB DDR4-2133
Thermaltake 530W PSU
2x1TB HDD in RAID1 (Has the project on it)
128GB SSD
512GB HDD
Thanks in advance,
Elias
All of your "troubleshooting" steps are voodoo. It doesn't matter which IDE you use (it's just a glorified editor anyway). It doesn't matter where in the filesystem your executable resides (it's just block of storage device with a page mapping to the OS). Decreasing GPU and/or memory clock help with stability if you're running into thermal problems, but will not influence such creeping performance problems (also if there were a thermal problem, you'd notice it within minutes, not hours).
Sudden drops in performance after a system runs for some time can almost always be attributed to resource exhaustion, forcing the system to swap data around. The cause for resource exhaustion is improper allocation management, i.e. an imbalance between allocating something and freeing it again.
This is what you have to debug. For OpenGL every glGen…/glCreate… must be balanced by a matching glDelete…. For every use of new in your code, there must be a balancing delete (and for new …[] there must be a delete[] …).
If you push objects into a container (like std::vector, std::list, std::map and so on) make sure you also carry out the garbage, i.e. dispose of object you no longer use.

OpenGL - gpu memory exceeded, possible scenarios

I can use glTexImage2D or glBufferData to send some data to the gpu memory. Let's assume that I request driver to send more data to the gpu but the gpu memory is already full. I probably get GL_OUT_OF_MEMORY. What might happen with a rendering thread ? What are possible scenarios ? Is it possible that a rendering thread will be terminated ?
It depends on the actual OpenGL implementation. But the most likely scenario is, that you'll just encounter a serious performance drop, but things will just keep working.
OpenGL uses an abstract memory model, for which actual implementation threat the GPU's own memory as a cache. In fact for most OpenGL implementation when you load texture data it doesn't even go directly to the GPU at first. Only when it's actually required for rendering it gets loaded into the GPU RAM. If there are more textures in use than fit into GPU RAM, textures are swapped in and out from GPU RAM as needed to complete the rendering.
Older GPU generations required for a texture to completely fit into their RAM. The GPUs that came out after 2012 actually can access texture subsets from host memory as required thereby lifting that limit. In fact you're sooner running into maximum texture dimension limits, rather than memory limits (BT;DT).
Of course other, less well developed OpenGL implementations may bail out with an out of memory error. But at least for AMD and NVidia that's not an issue.

Having a lot of vertices [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to draw opengl 3d terrain , however I started to wonder if there is a huge cpu headache if I have a lot of vertices without drawing any triangles with them.
There can be some overhead, but it should not be huge. A lot of this is highly platform dependent.
GPUs mostly use address spaces that are different from what the CPU uses. To make memory pages accessible to the GPU, the pages have to be mapped into the GPU address space. There is some per-page overhead to create these mappings. Memory pages accessed by the GPU may also have to be pinned/wired to prevent them from being paged off while the GPU is accessing them. Again, there can be some per-page overhead to wire the pages.
As long as the buffer remains mapped, you only pay the price for these operations once, and not for each frame. But if resource limits are reached, either by your application, or in combination with other applications that are also using the GPU, your buffers may be unmapped, and the overhead can become repeated.
If you have enormous buffers, and are typically only using a very small part of them, it may be beneficial to split your geometry into multiple smaller buffers. Of course that's only practical if you can group your vertices so that you will mostly use the vertices from only a small number of buffers for any given frame. There is also overhead for binding each buffer, so having too many buffers is definitely not desirable either.
If the vertices you use for a draw call are in a limited index range, you can also look into using glDrawRangeElements() for drawing. With this call, you provide an index range that can be used by the draw call, which gives the driver a chance to map only part of the buffer instead of the entire buffer.
Data that resides in memory but is not actively accessed just occupies memory and has no impact on processor clock cycle consumption. This holds for any kind of data in any kind of memory.

Using pixel shader to perform fast computation? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I wish to run a very simple function a lot of times.
At first I thought about inlining the function (its only four lines long), so I figured that placing it in the header will do that automatically. gprof said that was a good idea. However I heard that pixel shaders are optimized for that purpose. I was wondering if this true? I have a simple function that takes 6 numbers and I wish to run it N times. Would a pixel shader speed things up?
Maybe a GPU could speed up your function, maybe not. It depends vastly on the function. GPUs are good at parallel execution. While a consumer-grade x86 CPU has 8 cores at most, graphic cards can execute a lot more calculations in parallel. But the bottleneck is often the transfer of data between GPU RAM and system RAM. When your function isn't actually that computationally expensive, that overhead might overshadow it.
In the end you can just try yourself, measure it, and see for yourself which is faster.
You might want to take a look at OpenCL, the most widely-supported standard for moving computation to the graphic card.
When you are living in Windows-land there is also DirectCompute which is a part of DirectX or the Accelerated Massive Parallelism extension for C++. There is also CUDA, but it only supports NVIDIA GPUs.