What does gl.glTexImage2D do? The docs say it "uploads texture data". But does this mean the whole image is in GPU memory? I'd like to use one large image file for texture mapping. Further: can I simply use a VBO for uv and position coordinates to draw the texture?
Right, I am using words the wrong way here. What I meant was carrying a 2D array of UV coordinates and a 2D array of model to subsample a larger PNG image (in texture memory) onto individual tile models. My confusion here lies in not knowing how fast these fetches can take. Lets say I have a 5000x5000 pixel image. I load it as a texture. Then I create my own algorithm for fetching portions of it to draw. Where do I save myself the bandwidth for drawing these tiles? If I implement an LOD algorithm to determine which tiles are close, which are far and which are out of the camera frustum how do manage each these tiles in memory? Loaded question I know but I am struggling to find the best implementation to get started. I am developing for mobile devices with OpenGL ES 2.0.
What exactly happens when you call glTexImage2D() is system dependent, and there's no way for you to know, unless you have developer tools that allow you to track GPU and memory usage.
The only thing guaranteed is that the data you pass to the call has been consumed by the time the call returns (since the API definition allows you to modify/free the data after the call), and that the data is accessible to the GPU when it's used for rendering. Between that, anything is fair game. Keep in mind that OpenGL is a very asynchronous API. When you make API calls, the corresponding work is mostly queued up for later execution by the GPU, and is generally not completed by the time the calls return. This can include calls for uploading data.
Also, not all GPUs have "GPU memory". In fact, if you look at them by quantity, very few of them do. Mobile GPUs have caches, but mostly not VRAM in the sense of traditional discrete GPUs. How VRAM and caches are managed is highly system dependent.
With all the caveats above, and picturing a GPU that has VRAM: While it's possible that they can load the data into VRAM in the glTexImage2D() call, I would be surprised if that was commonly done. It just wouldn't make much sense to me. When a texture is loaded, you have no idea how soon it will be used for rendering. Since you don't know if all textures will fit in VRAM (and they often will not), you might have to evict it from VRAM before it was ever used. Which would obviously be very wasteful. As a general strategy, I think it will be much more efficient to load the texture data into VRAM only when you have a draw call that uses it.
Things would be somewhat different if the driver could be very confident that all texture data will fit in VRAM. But with OpenGL, there's really no reasonable way to know this ahead of time. And things get even more complicated since at least on desktop computers, you can have multiple applications running at the same time, while VRAM is a shared resource.
You are correct.
glteximage2d is the function that actually moves the texture data across to the gpu.
you will need to create the texture object first using glGenTextures() and then bind it using glBindTexture().
there is a good example of this process in the opengl redbook
example
you can then use this texture with a VBO. There are many ways to accomplish this, but interleaving your vertex coordinates, texture coordinates, and vertex normals and then telling the GPU how to unpack them with several calls to glVertexAttribPointer is the best bet as far as performance.
you are on the right track with VBOs, the old fixed pipeline GL stuff is depricated so you should just learn VBO from the outset.
this book is not 100% up to date, but it is complete and free and should serve as a great place to start learning VBO Open GL Book
Related
I've successfully updated my rendering engine to use uniform buffer objects and instancing.
The problem is that, since I do a first frustum culling pass every frame in order to know the objects I need to draw, I have to update the buffers every frame because the objects I draw could change every time, and this isn't the most efficient thing.
How could I make this more efficient?
The only thing I could think of is to not do the frustum culling so all the buffers remain static and I don't need to update them all the times, but not doing frustum culling I'd end up draw a lot of unnecessary objects.
Updating uniform buffers is fairly cheap, to be honest. You are quite limited in size and that prevents you from doing anything too crazy.
What you need to focus on to make this efficient is actually accommodating incomplete commands that are queued up. You are more likely to run into problems where the driver/GPU is forced to stop working on the next frame/command due to poor data write patterns than you are to run into data transfer rate limitations. The problem is always going to be avoiding situations where you might write to portions of data that are still in use by the GPU (it is often working on data 1-2 frames behind the CPU).
You have multiple options depending on your target version, and the OpenGL Wiki has a general overview of buffer streaming approaches.
You will have to do some performance testing to say for sure, but I suspect that CPU-side frustum culling combined with buffer orphaning of your instance UBO will give good results. Rather than reusing any data from previous frames, you would just stream the entire instance UBO from CPU to GPU each frame and let the GPU discard the old UBO when it finishes each frame.
My question concerns the most efficient way of performing geometric image transformations on the GPU. The goal is essentially to remove lens distortion from aquired images in real time. I can think of several ways to do it, e.g. as a CUDA kernel (which would be preferable) doing an inverse transform lookup + interpolation, or the same in an OpenGL shader, or rendering a forward transformed mesh with the image texture mapped to it. It seems to me the last option could be the fastest because the mesh can be subsampled, i.e. not every pixel offset needs to be stored but can be interpolated in the vertex shader. Also the graphics pipeline really should be optimized for this. However, the rest of the image processing is probably going to be done with CUDA. If I want to use the OpenGL pipeline, do I need to start an OpenGL context and bring up a window to do the rendering, or can this be achieved anyway through the CUDA/OpenGL interop somehow? The aim is not to display the image, the processing will take place on a server, potentially with no display attached. I've heard this could crash OpenGL if bringing up a window.
I'm quite new to GPU programming, any insights would be much appreciated.
Using the forward transformed mesh method is the more flexible and easier one to implement. However performance wise there's no big difference, as the effective limit you're running into is memory bandwidth, and the amount of memory bandwidth consumed does only depend on the size of your input image. If it's a fragment shader, fed by vertices or a CUDA texture access that's causing the transfer doesn't matter.
If I want to use the OpenGL pipeline, do I need to start an OpenGL context and bring up a window to do the rendering,
On Windows: Yes, but the window can be an invisible one.
On GLX/X11 you need an X server running, but you can use a PBuffer instead of a window to get a OpenGL context.
In either case use a Framebuffer Object as the actual drawing destination. PBuffers may corrupt their primary framebuffer contents at any time. A Framebuffer Object is safe.
or can this be achieved anyway through the CUDA/OpenGL interop somehow?
No, because CUDA/OpenGL interop is for making OpenGL and CUDA interoperate, not make OpenGL work from CUDA. CUDA/OpenGL Interop helps you with the part you mentioned here:
However, the rest of the image processing is probably going to be done with CUDA.
BTW; maybe OpenGL Compute Shaders (available since OpenGL-4.3) would work for you as well.
I've heard this could crash OpenGL if bringing up a window.
OpenGL actually has no say in those things. It's just a API for drawing stuff on a canvas (canvas = window or PBuffer or Framebuffer Object), but it doesn't deal with actually getting a canvas on the scaffolding, so to speak.
Technically OpenGL doesn't care if there's a window or not. It's the graphics system on which the OpenGL context is created. And unfortunately none of the currently existing GPU graphics systems supports true headless operation. NVidia's latest Linux drivers may allow for some crude hacks to setup a truly headless system, but I never tried that, so far.
I'm kind of stuck on the logic behind an SDL2 texture. To me, they are pointless since you cannot draw to them.
In my program, I have several surfaces (or what were surfaces before I switched to SDL2) that I just blitted together to form layers. Now, it seems, I have to create several renderers and textures to create the same effect since SDL_RenderCopy takes a texture pointer.
Not only that, but all renderers have to come from a window, which I understand, but still fouls me up a bit more.
This all seems extremely bulky and slow. Am I missing something? Is there a way to draw directly to a texture? What are the point of textures, and am I safe to have multiple (if not hundreds) of renderers in place of what were surfaces?
SDL_Texture objects are stored as close as possible to video card memory and therefore can easily be accelerated by your GPU. Resizing, alpha blending, anti-aliasing and almost any compute-heavy operation can harshly be affected by this performance boost. If your program needs to run a per-pixel logic on your textures, you are encouraged to convert your textures into surfaces temporarily. Achieving a workaround with streaming textures is also possible.
Edit:
Since this answer recieves quite the attention, I'd like to elaborate my suggestion.
If you prefer to use Texture -> Surface -> Texture workflow to apply your per-pixel operation, make sure you cache your final texture unless you need to recalculate it on every render cycle. Textures in this solution are created with SDL_TEXTUREACCESS_STATIC flag.
Streaming textures (creation flag is SDL_TEXTUREACCESS_STREAMING) are encouraged for use cases where source of the pixel data is network, a device, a frameserver or some other source that is beyond SDL applications' full reach and when it is apparent that caching frames from source is inefficient or would not work.
It is possible to render on top of textures if they are created with SDL_TEXTUREACCESS_TARGET flag. This limits the source of the draw operation to other textures although this might already be what you required in the first place. "Textures as render targets" is one of the newest and least widely supported feature of SDL2.
Nerd info for curious readers:
Due to the nature of SDL implementation, the first two methods depend on application level read and copy operations, though they are optimized for suggested scenarios and fast enough for realtime applications.
Copying data from application level is almost always slow when compared to post-processing on GPU. If your requirements are more strict than what SDL can provide and your logic does not depend on some outer pixel data source, it would be sensible to allocate raw OpenGL textures painted from you SDL surfaces and apply shaders (GPU logic) to them.
Shaders are written in GLSL, a language which compiles into GPU assembly. Hardware/GPU Acceleration actually refers to code parallelized on GPU cores and using shaders is the prefered way to achieve that for rendering purposes.
Attention! Using raw OpenGL textures and shaders in conjunction with SDL rendering functions and structures might cause some unexpected conflicts or loss of flexibility provided by the library.
TLDR;
It is faster to render and operate on textures than surfaces although modifying them can sometimes be cumborsome.
Through creating a SDL2 Texture as a STREAMING type, one can lock and unlock the entire texture or just an area of pixels to perform direct pixel operations. One must create prior a SDL2 Surface, and link with lock-unlock as follows:
SDL_Surface surface = SDL_CreateSurface(..);
SDL_LockTexture(texture, &rect, &surface->pixels, &surface->pitch);
// paint into surface pixels
SDL_UnlockTexture(texture);
The key is, if you draw to texture of larger size, and the drawing is incremental ( e.g. data graph in real time ) be sure to only lock and unlock the actual area to update. Otherwise the operations will be slow, with heavy memory copying.
I have experienced reasonable performance and the usage model is not too difficult to understand.
In SDL2 it is possible to render off-screen / render directly to a texture. The function to use is:
int SDL_SetRenderTarget(SDL_Renderer *renderer, SDL_Texture *texture);
This only works if the renderer enables SDL_RENDERER_TARGETTEXTURE.
I'm rolling my very first game engine :D. I'm working on the texture resource manager now, and I want to do it right.
Is it bad in any way to just fill up all of the ActiveTexture units that the driver supports? The alternative would be to conserve these slots and only set textures when they are actually needed, at the expense of more glBindTexture calls.
The way you asked your question I think you suffer from a misconception between texture objects i.e. the texture storage, and texture units i.e. the machinery behind multitexturing.
OpenGL has texture object and texture units. Texture objects hold the data, texture units map the data of the texture object that's bound to them into the rendering process.
Usually one uploads all the textures needed for a scene into texture objects. And for each render batch that makes use of common material settings binds the textures to the right texture units in the rendering process.
I think it's also to be noted if you're going to be using tons of textures, i.e uploading to the GPU you need to consider an approach to minimize the VRAM usage. In any case what I think might be a benifit on your behalf is stb_dxt which is a DXT1/DXT5 compressor wrote in ansi C. You can compress your textures and upload them using CompressedTexture2D, this way all textures you upload to the GPU will take 1/8th their normal space.
At some point compressing all the textures at runtime will slow down the loading of your game, which is why I would suggest using DDS textures, you can use nvidia-texture tools for converting / managing compression of your textures externally.
So in a OpenGL rendering application, is it usually better to create and maintain a vertex buffer throughout the life of an application and just swap out the data every frame with glBufferData, or is it better to just delete the VBO and recreate it every frame?
Intuition tells me it's better to swap out data, but a few sample programs I've seen does the latter, so I'm kind of confused.
I read Nvidia's whitepaper on VBOs, but as I'm a newbie to opengl, it didn't make a whole lot of sense.
Thanks in advance for and advice
Since you're generating a whole new set of data each frame the documentation seems to indicate that GL_STREAM_DRAW is the Right Way to go about things.
The significant thing about VBOs is that they are render data buffers that are stored in graphics memory and not in the computer's main memory. That makes their usage very efficient when the data in them isn't being updated (too) frequently, because every time you do that, the computer will have to transfer (potentially huge amounts of) data from main to graphics memory - which is slow.
So the ideal case is to put all required render data into VBOs once and then only manipulate them via OpenGL functions like matrix transformation or via shaders.
So you would e.g. put each mesh's world space coordinates and texture coordinates into VBOs and never directly touch them again; you'd use the modelview matrix, lighting functions and shaders to render them.
You can do more to optimize VBO usage, but that's the basics as I have understood them.
Find some good hints and more details here: How do I use OpenGL 3.x VBOs to render a dynamic world?