What does OpenGL Bindless Texture function glMakeTextureHandleNonResident ACTUALLY do?

I have a working prototype that tests bindless textures. I have a camera that pans over 6 gigs of texture, while i only have 2 gigs of VRAM. I have an inner frustum that is used to get the list of objects in the viewport for rendering, and an outer frustum that is used to Queue in (make resident) the textures that will soon be rendered, all other textures, if they are resident, are made non resident using the function glMakeTextureHandleNonResident.
The program runs, but the VRAM of the gpu behaves as if it has a GC step where it clears VRAM at random intervals of time. When it does this, my rendering is completely frozen, but then skips to the proper frame, eventually getting back to up 60 FPS. Im curious that glMakeTextureHandleNonResident doesnt actually pull the texture out of VRAM "when" it is called. Does anyone know EXACTLY what the GPU is doing with that call?
GPU: Nvidia 750GT M

Bindless textures essentially expose a translation table on the hardware so that you can reference textures using an arbitrary integer (handle) in a shader rather than GL's traditional bind-to-image-unit mechanics; they don't allow you to directly control GPU memory residency.
Sparse textures actually sound more like what you want. Note that both of these things can be used together.
Making a handle non-resident does not necessarily evict the texture memory from VRAM, it just removes the handle from said translation table. Eviction of texture memory can be deferred until some future time, exactly as you have discovered.
You can read more about this in the extension specification for GL_ARB_bindless_texture.
void glMakeImageHandleResidentARB (GLuint64 handle, GLenum access):
"When an image handle is resident, the texture it references is not necessarily considered resident for the purposes of the AreTexturesResident command."
(18) Texture and image handles may be made resident or non-resident. How
does handle residency interact with texture residency queries from
OpenGL 1.1 (glAreTexturesResident or GL_TEXTURE_RESIDENT)?
The residency state for texture and image handles in this
extension is completely independent from OpenGL 1.1's GL_TEXTURE_RESIDENT
query. Residency for texture handles is a function of whether the
glMakeTextureHandleResidentARB has been called for the handle. OpenGL 1.1
residency is typically a function of whether the texture data are
resident in GPU-accessible memory.
When a texture handle is not made resident, the texture that it refers
to may or may not be stored in GPU-accessible memory. The
GL_TEXTURE_RESIDENT query may return GL_TRUE in this case. However, it does
not guarantee that the texture handle may be used safely.
When a texture handle is made resident, the texture that it refers to is
also considered resident for the purposes of the old GL_TEXTURE_RESIDENT
query. When an image handle is resident, the texture that it refers to
may or may not be considered resident for the query -- the resident
image handle may refer only to a single layer of a single mipmap level
of the full texture.


How to access framebuffer from CPU in Direct3D 11?

I am creating a simple framework for teaching fundamental graphics concepts under C++/D3D11. The framework is required to enable direct manipulation of the screen raster contents via a simple interface function (e.g. Putpixel( x,y,r,g,b )).
Under D3D9 this was a relatively simple goal achieved by allocating a surface buffer on the heap where the CPU would compose a surface. Then the backbuffer would be locked and the heap buffer's contents transferred to the backbuffer. As I understand it, it is not possible to access the backbuffer directly from the CPU under D3D11. One must prepare a texture resource and then draw it to the backbuffer via some fullscreen geometry.
I have considered two systems for such a procedure. The first comprises a D3D11_USAGE_DEFAULT texture and a D3D11_USAGE_STAGING texture. The staging texture is first mapped and then drawn to from the CPU. When the scene is complete, the staging texture is unmapped and copied to the default texture with CopyResource (which uses the GPU to perform the copy if I am not mistaken), and then the default texture is drawn to the backbuffer via a fullscreen textured quad.
The second system comprises a D3D11_USAGE_DYNAMIC texture and a frame buffer allocated on the heap. When the scene is composed, the dynamic texture is mapped, the contents of the heap buffer are copied over to the dynamic texture via the CPU, the dynamic texture is unmapped, and then it is drawn to the backbuffer via a fullscreen textured quad.
I was under the impression that textures created with read and write access and D3D11_USAGE_STAGING would reside in system memory, but the performance tests I have run seem to indicate that this is not the case. Namely, drawing a simple 200x200 filled rectangle via CPU is about 3x slower with the staging texture than with the heap buffer (exact same disassembly for both cases (a tight rep stos loop)), strongly hinting that the staging texture resides in the graphics adapter memory.
I would prefer to use the staging texture system, since it would allow both the work of rendering to the backbuffer and the work of copying from system memory to graphics memory to be offloaded onto the GPU. However, I would like to prioritize CPU access speed over such an ability in any case.
So what method method would be optimal for this usage case? Any hints, modifications of my two approaches, or suggestions of altogether different approaches would be greatly appreciated.
The dynamic and staging are both likely to be in system memory, but their is good chance that your issue, is write combined memory. It is a cache mode where single writes are coalesced together, but if you attempt to read, because it is un-cached, each load pay the price of a full memory access. You even have to be very careful, because a c++ *data=something; may sometime also leads to unwanted reads.
There is nothing wrong with a dynamic texture, the GPU can read system memory, but you need to be careful, create a few of them, and cycle each frame with a map_nooverwrite, to inhibit the costly driver buffer renaming of the discard. Of course, never do a map in read and write, only write, or you will introduce gpu/cpu sync and kill the parallelism.
Last, if you want a persistent surface and only a few putpixel a frame (or even a lot of them), i would go with an unordered access view and a compute shader that consume a buffer of pixel position with colors to update. That buffer would be a dynamic buffer with nooverwrite mapping, once again. With that solution, the main surface will reside in video memory.
On a personal note, i would not even bother to teach cpu surface manipulation, this is almost always a bad practice and a performance killer, and not the way to go in a modern gpu architecture. This was not a fundamental graphic concept a decade ago already.

OpenGL what does glTexImage2D do?

What does gl.glTexImage2D do? The docs say it "uploads texture data". But does this mean the whole image is in GPU memory? I'd like to use one large image file for texture mapping. Further: can I simply use a VBO for uv and position coordinates to draw the texture?
Right, I am using words the wrong way here. What I meant was carrying a 2D array of UV coordinates and a 2D array of model to subsample a larger PNG image (in texture memory) onto individual tile models. My confusion here lies in not knowing how fast these fetches can take. Lets say I have a 5000x5000 pixel image. I load it as a texture. Then I create my own algorithm for fetching portions of it to draw. Where do I save myself the bandwidth for drawing these tiles? If I implement an LOD algorithm to determine which tiles are close, which are far and which are out of the camera frustum how do manage each these tiles in memory? Loaded question I know but I am struggling to find the best implementation to get started. I am developing for mobile devices with OpenGL ES 2.0.
What exactly happens when you call glTexImage2D() is system dependent, and there's no way for you to know, unless you have developer tools that allow you to track GPU and memory usage.
The only thing guaranteed is that the data you pass to the call has been consumed by the time the call returns (since the API definition allows you to modify/free the data after the call), and that the data is accessible to the GPU when it's used for rendering. Between that, anything is fair game. Keep in mind that OpenGL is a very asynchronous API. When you make API calls, the corresponding work is mostly queued up for later execution by the GPU, and is generally not completed by the time the calls return. This can include calls for uploading data.
Also, not all GPUs have "GPU memory". In fact, if you look at them by quantity, very few of them do. Mobile GPUs have caches, but mostly not VRAM in the sense of traditional discrete GPUs. How VRAM and caches are managed is highly system dependent.
With all the caveats above, and picturing a GPU that has VRAM: While it's possible that they can load the data into VRAM in the glTexImage2D() call, I would be surprised if that was commonly done. It just wouldn't make much sense to me. When a texture is loaded, you have no idea how soon it will be used for rendering. Since you don't know if all textures will fit in VRAM (and they often will not), you might have to evict it from VRAM before it was ever used. Which would obviously be very wasteful. As a general strategy, I think it will be much more efficient to load the texture data into VRAM only when you have a draw call that uses it.
Things would be somewhat different if the driver could be very confident that all texture data will fit in VRAM. But with OpenGL, there's really no reasonable way to know this ahead of time. And things get even more complicated since at least on desktop computers, you can have multiple applications running at the same time, while VRAM is a shared resource.
You are correct.
glteximage2d is the function that actually moves the texture data across to the gpu.
you will need to create the texture object first using glGenTextures() and then bind it using glBindTexture().
there is a good example of this process in the opengl redbook
you can then use this texture with a VBO. There are many ways to accomplish this, but interleaving your vertex coordinates, texture coordinates, and vertex normals and then telling the GPU how to unpack them with several calls to glVertexAttribPointer is the best bet as far as performance.
you are on the right track with VBOs, the old fixed pipeline GL stuff is depricated so you should just learn VBO from the outset.
this book is not 100% up to date, but it is complete and free and should serve as a great place to start learning VBO Open GL Book

OpenGL texture management

I'm rolling my very first game engine :D. I'm working on the texture resource manager now, and I want to do it right.
Is it bad in any way to just fill up all of the ActiveTexture units that the driver supports? The alternative would be to conserve these slots and only set textures when they are actually needed, at the expense of more glBindTexture calls.
The way you asked your question I think you suffer from a misconception between texture objects i.e. the texture storage, and texture units i.e. the machinery behind multitexturing.
OpenGL has texture object and texture units. Texture objects hold the data, texture units map the data of the texture object that's bound to them into the rendering process.
Usually one uploads all the textures needed for a scene into texture objects. And for each render batch that makes use of common material settings binds the textures to the right texture units in the rendering process.
I think it's also to be noted if you're going to be using tons of textures, i.e uploading to the GPU you need to consider an approach to minimize the VRAM usage. In any case what I think might be a benifit on your behalf is stb_dxt which is a DXT1/DXT5 compressor wrote in ansi C. You can compress your textures and upload them using CompressedTexture2D, this way all textures you upload to the GPU will take 1/8th their normal space.
At some point compressing all the textures at runtime will slow down the loading of your game, which is why I would suggest using DDS textures, you can use nvidia-texture tools for converting / managing compression of your textures externally.

How to create textures within GPU

Can anyone pls tell me how to use hardware memory to create textures in OpenGL ? Currently I'm running my game in window mode, do I need to switch to fullscreen to get the use of hardware ?
If I can create textures in hardware, is there a limit for no of textures (other than the hardware memory) ? and then how can I cache my textures into hardware ? Thanks.
This should be covered by almost all texture tutorials for OpenGL. For example here, here and here.
For every texture you first need a texture name. A texture name is like a unique index for a single texture. Every name points to a texture object that can have its own parameters, data, etc. glGenTextures is used to get new names. I don't know if there is any limit besides the uint range (2^32). If there is then you will probably get 0 for all new texture names (and a gl error).
The next step is to bind your texture (see glBindTexture). After that all operations that use or affect textures will use the texture specified by the texture name you used as parameter for glBindTexture. You can now set parameters for the texture (glTexParameter) and upload the texture data with glTexImage2D (for 2D textures). After calling glTexImage you can also free the system memory with your texture data.
For static textures all this has to be done only once. If you want to use the texture you just need to bind it again and enable texturing (glEnable(GL_TEXTURE_2D)).
The size (width/height) for a single texture is limited by GL_MAX_TEXTURE_SIZE. This is normally 4096, 8192 or 16384. It is also limited by the available graphics memory because it has to fit into it together with some other resources like the framebuffer or vertex buffers. All textures together can be bigger then the available memory but then they will be swapped.
In most cases the graphics driver should decide which textures are stored in system memory and which in graphics memory. You can however give certain textures a higher priority with either glPrioritizeTextures or with glTexParameter.
I wouldn't worry too much about where textures are stored because the driver normally does a very good job with that. Textures that are used often are also more likely to be stored in graphics memory. If you set a priority that's just a "hint" for the driver on how important it is for the texture to stay on the graphics card. It's also possible the the priority is completely ignored. You can also check where textures currently are with glAreTexturesResident.
Usually when you talk about generating a texture on the GPU, you're not actually creating texture images and applying them like normal textures. The simpler and more common approach is to use Fragment shaders to procedurally calculate the colors of for each pixel in real time from scratch for every single frame.
The canonical example for this is to generate a Mandelbrot pattern on the surface of an object, say a teapot. The teapot is rendered with its polygons and texture coordinates by the application. At some stage of the rendering pipeline every pixel of the teapot passes through the fragment shader which is a small program sent to the GPU by the application. The fragment shader reads the 2D texture coordinates and calculates the Mandelbrot set color of the 2D coordinates and applies it to the pixel.
Fullscreen mode has nothing to do with it. You can use shaders and generate textures even if you're in window mode. As I mentioned, the textures you create never actually occupy space in the texture memory, they are created on the fly. One could probably think of a way to capture and cache the generated texture but this can be somewhat complex and require multiple rendering passes.
You can learn more about it if you look up "GLSL" in google - the OpenGL shading language.
This somewhat dated tutorial shows how to create a simple fragment shader which draws the Mandelbrot set (page 4).
If you can get your hands on the book "OpenGL Shading Language, 2nd Edition", you'll find it contains a number of simple examples on generating sky, fire and wood textures with the help of an external 3D Perlin noise texture from the application.
To create a texture on GPU look into "render to texture" tutorials. There are two common methods: Binding a PBuffer context as texture, or using Frame Buffer Objects. PBuffer render to textures are the older method, and have the wider support. Frame Buffer Objects are easier to use.
Also you don't have to switch to "fullscreen" mode for OpenGL to be HW accelerated. In fact OpenGL doesn't know about windows at all. A fullscreen OpenGL window is just that: A toplvel window on top of all other windows with no decorations and the input focus grabed. Some drivers bypass window masking and clipping code, and employ a simpler, faster buffer swap method if the window with the active OpenGL context covers the whole screen, thus gaining a little performance, but with current hard- and software the effect is very small compared to other influences.

How do I set the color of a single pixel in a Direct3D texture?

I'm attempting to draw a 2D image to the screen in Direct3D, which I'm assuming must be done by mapping a texture to a rectangular billboard polygon projected to fill the screen. (I'm not interested or cannot use Direct2D.) All the texture information I've found in the SDK describes loading a bitmap from a file and assigning a texture to use that bitmap, but I haven't yet found a way to manipulate a texture as a bitmap pixel by pixel.
What I'd really like is a function such as
void TextureBitmap::SetBitmapPixel(int x, int y, DWORD color);
If I can't set the pixels directly in the texture object, do I need to keep around a DWORD array that is the bitmap and then assign the texture to that every frame?
Finally, while I'm initially assuming that I'll be doing this on the CPU, the per-pixel color calculations could probably also be done on the GPU. Is the HLSL code that sets the color of a single pixel in a texture, or are pixel shaders only useful for modifying the display pixels?
First, your direct question:
You can, technically, set pixels in a texture. That would require use of LockRect and UnlockRect API.
In D3D context, 'locking' usually refers to transferring a resource from GPU memory to system memory (thereby disabling its participation in rendering operations). Once locked, you can modify the populated buffer as you wish, and then unlock - i.e., transfer the modified data back to the GPU.
Generally locking was considered a very expensive operation, but since PCIe 2.0 that is probably not a major concern anymore. You can also specify a small (even 1-pixel) RECT as a 2nd argument to LockRect, thereby requiring the memory-transfer of a negligible data volume, and hope the driver is indeed smart enough to transfer just that (I know for a fact that in older nVidia drivers this was not the case).
The more efficient (and code-intensive) way of achieving that, is indeed to never leave the GPU. If you create your texture as a RenderTarget (that is, specify D3DUSAGE_RENDERTARGET as its usage argument), you could then set it as the destination of the pipeline before making any draw calls, and write a shader (perhaps passing parameters) to paint your pixels. Such usage of render targets is considered standard, and you should be able to find many code samples around - but unless you're already facing performance issues, I'd say that's an overkill for a single 2D billboard.