Both SDL and Game Maker have the concept of surfaces, images that you may modify on the fly and display them. I'm using OpenGL 1 and i'd like to know if openGL has this concept of Surface.
The only way that i came up with was:
Every frame create / destroy a new texture based on needs.
Every frame, update said texture based on needs.
These approachs don't seem to be very performant, but i see no alternative. Maybe this is how they are implemented in the mentioned engines.
Yes these two are the ways you would do it in OpenGL 1.0. I dont think there are any other means as far as 1.0 spec is concerned.
Link : https://www.opengl.org/registry/doc/glspec10.pdf
Do note that the textures are stored on the device memory (GPU) which is fast to access for shading. And the above approaches copy it between host (CPU) memory and device memory. Hence the performance hit is the speed of host-device copy.
Why are you limited to OpenGL 1.0 spec. You can go higher and then you start getting more options.
Use GLSL shaders to directly edit content from one texture and output the same to another texture. Processing will be done on the GPU and a device-device copy is as fast as it gets.
Use CUDA. Map a texture to a CUDA array, use your kernel to modify the content. Or use OpenCL for non-NVIDIA cards.
This would be the better scenario so long as the modification can be executed in parallel this would benefit.
I would suggest trying the CPU copy method, as it might be fast enough for your needs. The host-device copy is getting faster with latest hardware. You might be able to get real-time 60fps or higher even with this copy, unless its a lot of textures you plan to execute this for.
Related
I have a texture in Unity which I will modify frequently.
Now there are two options:
I can make changes to texture by calling setPixels and then call Texture2D.apply. I think the apply actually copies the data from CPU to GPU.
One option is I can modify the texture in native code by getting the texture native handle and modifying it using glTexSubImage2D functions.
Now I read the apply copies only the changed pixels to GPU not full texture but I really doubt if its possible. but if it is true does this mean that calling Texture2D.apply == glTexSubImage2Din terms of performance.
If not, what should I use if I need good performance. I actually dont want to go to native side as I will have to manage the native code on for different graphics APIs supported by Unity like opengl, DX etc
Texture2D.Apply() and glTexSubImage2D are both used to update Texture. They both perform the-same action but they have differences in them.
GetPixels, SetPixels and Texture2D.Apply() are done on the CPU.
You should only use GetPixels, SetPixels and Texture2D.Apply() if you need individual pixels. Good example of this is when you want to send the Texture data over the network.
glTexSubImage2D is done on the GPU and does not require SetPixels or
GetPixels.
glTexSubImage2D is extremely faster than GetPixels, SetPixels and Texture2D.Apply().
If not, what should I use if I need good performance. I actually dont
want to go to native side as I will have to manage the native code on
for different graphics APIs supported by Unity like opengl,
You mentioned that you will be modifying the image frequently, so do not use GetPixels, SetPixels and Texture2D.Apply(). I know it is the easiest solution but it is very slow.
For the best performance:
1.Use glTexSubImage2D
Pass Texture.GetNativeTexturePtr() to the native C++ side as IntPtr then use glTexSubImage2D to directly modify it. I noticed that most of your questions is about C++ and OpenGL so this shouldn't be hard for someone like you.
As for supporting different graphics APIs, the first to support is OpenGL because that's supported on all major platforms. From the Editor, change the Graphics API to OpenGL then start coding. It should work on Windows, Mac, Linux, Android and iOS. If you want to support Direct3D, Metal and Vulkan then go for them too. You just don't have to. OpenGL is enough for this.
2. Use Shaders
You can combine Unity Shaders and Compute Shaders and still get more performance than glTexSubImage2D because this will be happening on the GPU instead of CPU. I personally find shaders complicated so #1 should be your priority.
Yes, glTexSubImage2D can be used to update a smaller rectangular portion of a larger texture.
This is probably a stupid question, but I cant find good examples on how to approach this, or if its even possible. Im just done with a project where I used gdi to biblt stuff onto a DIB-buffer then swap that onto the screen hdc, basically making my own swapchain and drawing with opengl.
So then I thought, can I do the same thing using directx11? But I cant seem to find where the DIB/buffer I need to change even is.
Am I even thinking about this correctly? Any ideas on how to handle this?
Yes, you can. Nvidia exposes vendor-specific extensions called NV_DX_interop and NV_DX_Interop2. With these extensions, you can directly access a DirectX surface (when it resides on the GPU) and render to it from an OpenGL context. There should be minimal (driver-only) overhead for this operation and the CPU will almost never be involved.
Note that while this is a vendor-specific extension, Intel GPUs support it as well.
However, don't do this simply for the fun of it or if you control all the source code for your application. This kind of interop scenario is meant for cases where you have two legacy/complicated codebases and interop is a cheaper/better option than porting all the logic to the other API.
Yeah you can do it, both OpenGL and D3D support both writeable textures and locking them to get to the pixel data.
Simply render your scene in OpenGL to a texture, lock it, read the pixel data and pass it directly to the D3D locked texture pixel data, unlock it then do whatever you want with the texture.
Performance would be dreadful of course, you're stalling the GPU multiple times in a single "operation" and forcing it to synchronize with the CPU (who's passing the data) and the bus (for memory access). Plus there would be absolutely no benefit at all. But if you really want to try it, you can do it.
I think at least some old graphics drivers used to crash if glClear wasn't used and that glClear is probably faster in a lot of cases but why? How are 3-d graphics drivers usually implemented such that these uses would have different results?
On a high level, it can be faster because the OpenGL implementation knows ahead of time that the whole buffer needs to be set to the same color/value. The more you know about what exactly needs to be done, the more you can take advantage of possible accelerations.
Let's say setting a whole buffer to the same value is more efficient than setting the same pixels to variable values. With a glClear(), you know already that all pixels will have the same value. If you draw a screen sized quad with a fragment shader that emits a constant color, the driver would either have to recognize that situation by analyzing the shaders, or the system would have to compare the values coming out of the shader, to know that all pixels have the same value.
The reason why setting everything to the same value can be more efficient has to do with framebuffer compression and related technologies. GPUs often don't actually write each pixel out to the framebuffer, but use various kinds of compression schemes to reduce the memory bandwidth needed for framebuffer writes. If you imagine almost any kind of compression, all pixels having the same value is very favorable.
To give you some ideas about the published vendor specific technologies, here are a few sources. You can probably find more with a search.
Article talking about new framebuffer compression method in relatively recent AMD cards: http://techreport.com/review/26997/amd-radeon-r9-285-graphics-card-reviewed/2.
NVIDIA patent on zero bandwidth clears: http://www.google.com/patents/US8330766.
Blurb on ARM web site about Mali framebuffer compression: http://www.arm.com/products/multimedia/mali-technologies/arm-frame-buffer-compression.php.
Why is it faster? Because it is a function that bypasses most calculations that other types of drawings have to go through.
Alpha function, blend function, logical operation, stenciling, texture mapping, and depth-buffering are ignored by glClear
Source
Why do some drivers crash without it? It's hard to say, but it should have something to do with the implementation details of OpenGL. The functions does what it's supposed to do, but might do more that you don't know about.
OpenGL might infer from this function call other tasks that it needs to perform.
The purpose here isn't rendering, but gpgpu; it's for image blurring:
given an image, I need to blur it with a fixed given separable kernel (see e.g. Separable 2D Blur Kernel).
For GPU processing, a good popular method would be to first filter the lines, then filter the columns; and using the vertex shader and the fragment shader to do so (*)
However, if I have a fixed-sized kernel, I think I can use a fast-calculated mipmap that is close to the level I want, and then upsample it (as was suggested here) .
The question is therefore: will an opengl-created mipmap be faster than a mipmap I create myself using the method of (*)?
Put another way: is the mipmap creation optimized on the gpu itself? will it always outperform (speed-wise) user-created glsl code? or would it depend on the graphics card?
Edit:
Thanks for the replies (Kahler, Jean-Simon Brochu). However, I still haven't seen any resources that explicitly say whether mipmaps generation by the gpu is faster than any user-created mipmaps, because of specific mipmap-generation-gpu-hardware...
OpenGL does not care how the functions are implemented.
OpenGL is a set of specifications, among them is the glGenerateMipmap.
Anyone can write a software renderer or develop a video card compliant to the specification. If it pass the tests, it's ~OpenGL certified~
That means that no function is mandatory to be performed on CPU or GPU, or anywhere, they just have to produce the OpenGL expected results.
Now for the practical side:
Nowadays, you can just assume the mipmap generation is done by the video card, because the major-vendors adopted this approach.
If you really want to know, you will have to check specifically to the video card you are programing to.
As for performance, assume you can't beat the video card.
Even if you come up with some highly optimized code performed in some high-tech-full-of-things-CPU, you will have to upload the mipmaps you generated to the GPU, and this operation alone will probably take more time then letting the GPU do the work after you've uploaded the full-resolution texture.
And, if you program the mipmaping as a shader, still unlikely to beat the hard-coded (maybe even hard wired) built-in function. (and that code-alone, not counting the fact that it may schedule better, process apart, etc)
This site explains the glGenerateMipmap history better =))
To what extend does OpenGL's GLSL utilize SLI setups? Is it utilized at all at the point of execution or only for end rendering?
Similarly, I know that OpenCL is alien to SLI but assuming one has several GPUs, how does it compare to GLSL in multiprocessing?
Since it might depend on the application, e.g. common transformation, or ray tracing, can you offer insight on differences depending on application type?
The goal of SLI is to divide the rendering workload on several GPU. First, the graphic driver uses a either a Sort-first or time decomposition (GPU0 works on frame n while GPU1 works on frame n+1) approach. And then, the pixels are copied from one GPU to the other.
That said, SLI has nothing to do with the shading language used by OpenGL (the way the pixels are drawn doesn't really matter).
For OpenCL, I would say that you have to divide your workload between the GPU by yourself, but I am not sure.
If you want to take advantage of multiple GPUs with OpenCL, you will have to create command queues for each device and run kernels on each device after splitting up the workload.
See http://developer.nvidia.com/object/sli_best_practices.html
Basically, you have to instruct the driver that you want to use SLI, and in which mode. After this, the driver will (almost) seamlessly do all the work for you.
Alternate Frame Rendering : no sync needed, so better performance, but more lag
Split Frame Rendering : lots of sync, some vertices are processed twice, but less lag.
For you GLSL vs OpenCL comparison, I don't know of any good benchmark. I'd be interested, though.