Having a lot of vertices [closed] - opengl

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to draw opengl 3d terrain , however I started to wonder if there is a huge cpu headache if I have a lot of vertices without drawing any triangles with them.

There can be some overhead, but it should not be huge. A lot of this is highly platform dependent.
GPUs mostly use address spaces that are different from what the CPU uses. To make memory pages accessible to the GPU, the pages have to be mapped into the GPU address space. There is some per-page overhead to create these mappings. Memory pages accessed by the GPU may also have to be pinned/wired to prevent them from being paged off while the GPU is accessing them. Again, there can be some per-page overhead to wire the pages.
As long as the buffer remains mapped, you only pay the price for these operations once, and not for each frame. But if resource limits are reached, either by your application, or in combination with other applications that are also using the GPU, your buffers may be unmapped, and the overhead can become repeated.
If you have enormous buffers, and are typically only using a very small part of them, it may be beneficial to split your geometry into multiple smaller buffers. Of course that's only practical if you can group your vertices so that you will mostly use the vertices from only a small number of buffers for any given frame. There is also overhead for binding each buffer, so having too many buffers is definitely not desirable either.
If the vertices you use for a draw call are in a limited index range, you can also look into using glDrawRangeElements() for drawing. With this call, you provide an index range that can be used by the draw call, which gives the driver a chance to map only part of the buffer instead of the entire buffer.

Data that resides in memory but is not actively accessed just occupies memory and has no impact on processor clock cycle consumption. This holds for any kind of data in any kind of memory.

Related

How to optimize rendering of dynamic geomtry? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
As i know batching and instancing is used for decreasing draw-calls amount for static meshes.But what about dynamic meshes?How can i optimize amount of draw calls for them? Instancing and batching
create big overhead because you need to recalculate position with cpu every frame.Or it is better to draw dynamic meshes with separate draw calls?
There are a few performance considerations to keep in mind:
Every glDraw..() comes with some overhead, so you want to minimize those. That's one reason that instancing is such a performance boon. (Better cache behavior is another.)
Host-to-device data transfers (glBufferData()) are even slower than draw calls. So, we try to keep data on the GPU (vertex buffers, index buffers, textures) rather than transmitting it each frame.
In your case, there are a couple of ways to get performant dynamic meshes.
Fake it. Do you really need dynamic meshes - specifically, one where you must generate new mesh data? Or, can you achieve the same thing via transforms in your shaders?
Generate the mesh on the GPU. This could be done in a compute shader (for best performance) or in geometry and/or tessellation shaders. This comes with its own overhead, but, since everything happening on the GPU, you aren't hit with the much more expensive glDraw...() or host-GPU copies.
Note that geometry shaders are relatively slow, but they're still faster than copying a new vertex + index buffer from the CPU to the GPU. *
If your "dynamic" mesh has a finite number of states, just keep them all on the GPU and switch between them as necessary.
If this were another API such as Vulkan, you could potentially generate the mesh in a separate thread and transfer it to the GPU while drawing other things. That is a very complex topic, as is just about everything relating to the explicit graphics APIs.

How to overwrite all the free disk space with 0x00? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
how to overwrite all free disk space with zeros, like the cipher command in Windows; for example:
cipher /wc:\
This will overwrite the free disk space in three passes. How can I do this in C or C++? (I want to this in one pass and as fast as possible.)
You can create a set a files and write random bytes to them until available disk space is filled. These files should be removed before exiting the program.
The files must be created on the device you wish to clean.
Multiple files may be required on some file systems, due to file size limitations.
It is important to use different non repeating random sequences in these files to avoid file system compression and deduplicating strategies that may reduce the amount of disk space actually written.
Note also that the OS may have quota systems that will prevent you from filling available disk space and may also show erratic behavior when disk space runs out for other processes.
Removing the files may cause the OS to skip the cache flushing mechanism, causing some blocks to not be written to disk. A sync() system call or equivalent might be required. Further synching at the hardware level might be delayed, so waiting for some time before removing the files may be necessary.
Repeating this process with a different random seed improves the odds of hardware recovery through surface analysis with advanced forensic tools. These tools are not perfect, especially when recovery would be a life saver for a lost Bitcoin wallet owner, but may prove effective in other more problematic circumstances.
Using random bytes has a double purpose:
prevent some file systems from optimizing the blocks and compress or share them instead of writing to the media, thus overwriting existing data.
increase the difficulty in recovering previously written data with advanced hardware recovery tools, just like these security envelopes that have random patterns printed on the inside to prevent exposing the contents of the letter by simply scanning the envelope over a strong light.

Drawing multiple objects in Vulkan [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I've been working on creating a game with a Vulkan rendering backend and I therefore need to be able to draw multiple objects on the screen. I've worked with OpenGL before so I have a little experience in graphics programming.
I've gone through the tutorial at https://vulkan-tutorial.com/ and I more or less understand the basic ideas however there are so many moving parts I find it pretty difficult to wrap my head around it all. So far I'm able to draw a single object with essentially an arbitrary number of vertices with a UBO but no more than 1 since I don't know what to decouple from the main renderer object into a drawable object.
I don't really know what pieces need to live in a vulkan-context object and what parts are independent/live on each drawable object. I've tried doing some research and most of what I can find has to do with instancing which isn't really what I'm looking to do just yet.
There's many options for how to organize this, so there's no single answer. But generally speaking a drawable scene object will need
A VkPipeline describing the state and shaders to use when drawing the object.
One or more VkDescriptorSets that bind resource references in the shader to state and data in Vulkan API objects.
The actual data or data-containing-objects (images, buffers) to use for the drawable.
For the latter two, you don't necessarily have a distinct API objects for each drawable. Instead, your drawable might just have the data and you obtain an API object (like a descriptor set) and fill it in each time you draw, then recycle it. Or you might have a single VkBuffer shared by many drawables, with each drawable's geometry occupying a sub-range of the buffer.
Your "context", on the other hand, will have a VkDevice, VkQueue(s), and various pool objects like command buffer pools, descriptor pools. You'll also usually have some way of keeping track of things that should be recycled or destroyed when command buffers have completed (usually at frame granularity).

OpenGL big 3D texture (>2GB) is very slow [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
My graphics card is GTX 1080 ti. I want to use OpenGL 3D texture. The pixel (voxel) format is GL_R32F. OpenGL did not report any errors when I initialized the texture and rendered with the texture.
When the 3D texture was small (512x512x512), my program ran fast (~500FPS).
However, if I increased the size to 1024x1024x1024 (4GB), the FPS dramatically dropped to less than 1FPS. When I monitored the GPU memory usage, the GPU memory does not exceed 3GB even though the texture size is 4GB and I have 11G in total.
If I changed pixel format to GL_R16F, it worked again and the FPS went back to 500FPS and the GPU memory consumption is about 6.2GB.
My hypothesis is that the 4GB 3D texture is not really on the GPU but on the CPU memory instead. In every frame, the driver is passing this data from CPU memory to GPU memory again and again. As a result, it slows down the performance.
My first question is whether my hypothesis is correct? If it is, why it happens even I have plenty of GPU memory? How do I enforce any OpenGL data to reside on GPU memory?
My first question is whether my hypothesis is correct?
It is not unplausible, at least.
If it is, why it happens even I have plenty of GPU memory?
That's something for your OpenGL implementation to decide. Note that this also might be some driver bug. It might also be some internal limit.
How do I enforce any OpenGL data to reside on GPU memory?
You can't. OpenGL does not have a concept of Video RAM or System RAM or even a GPU. You specify your buffers and textures and other objects and make the draw calls, and it is the GL implementation's job to map this to the actual hardware. However, there are no performance guarantees whatsoever - you might encounter a slow path or even a fallback to software rendering when you do certain things (with the latter being really uncommon in recent times, but conceptually, it is very possible).
If you want control over where to place data, when to actually transfer it, and so on, you have to use a more low-level API like Vulkan.

Load time, traversal time, memory usage for different data segments C/C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I would like to know more about the traversal time of variables in different data segments. For example, lets say we want to fill an array with 100 000 ints. What would be the difference in traversal time, if the array is in the stack, heap or in the data segment? Would it make any difference if we use alot bigger or alot smaller array - explaining: if, for instance the traversal time in heap is 2x for 100 000 elemnts and 1x for stack, would this proportion be the same if we have different size(10 000 000) ? Also, what would be the difference in process's load time and overall memory usage? Thanks!
EDIT: how can i determine this in a code? what i mean by this - is there any function to calculate execution time, "traversal time" and the other things i am trying to find out?
To answer your edited question, you can use timers. You start a timer before executing your code and stop it right after. Then subtract Stop-Start to find out the ellapsed time..
Already answered here
Memory is memory
What I mean by that: there are no physical different memories for different segments (stack, heap etc.). Moreover memory is Random Access Memory. One property for RAM memory is that accessing data takes the same amount of time regardless of where the data is physically on the chip or of the previous accesses (contrast this this tape memory, or even Hard Disks). So the access to RAM is indiscriminately the same regardless of if we're talking about heap or stack or anything else.
Cache to the rescue
That being said, that's not the whole story. Modern architectures have caches. The discussion about caches is too broad here, but the gist is that caches are smaller, more expensive, but faster memories that "cache" data from RAM. So in real scenarios, data that was access before (temporal locality) or that is near a previously accessed one (space locality) will (most likely) be fed faster to the CPU because it is available in cache.
Ok, that's nice, but what segment is faster?
As a rule of thumb, in general, we say stack memory is faster that heap memory. That got me personally confused at first when I was thinking as in paragraph 1. But you take into account paragraph 2 that makes sense. Due to usage pattern, stack is almost always in cache.
So... use stack?
Unfortunately it isn't as simple as that. It never is, especially when you analyze low level performance. Stack can't be very large. And sometimes, even if you could have your data on stack, there are other reasons when it is preferably to put it on the heap. So, I am sorry (not really) to tell you that the answer is never simple or black and white. All you can practically do is to profile your application and see for yourself. That's relatively easy. Interpreting the results and knowing how to improve them it's a whole other beast.
if, for instance the traversal time in heap is 2x for 100 000
elemnts and 1x for stack, would this proportion be the same if we have
different size(10 000 000)
Even for let's say heap only performance isn't linear. Why? Well, caches again. When accessing memory that fits in cache the performance plays nice, then you see a spike just as your data grows beyond the cache (line) size. On relatively older systems you had 3 nice regions clearly delimited corresponding to the 3 cache levels in a computer. You you see a spike as your data goes from fitting in a level to fitting in a higher level and when it din't fit in cache at all it goes down hill. Modern processors have "smart cache" which with some black magic make it appear more as you have 1 big cache instead of 3 levels.