finding allocated texture memory on intel - opengl

as a follow up to a different question (opengl: adding higher resolution mipmaps to a texture), I'd like to emulate a function of gDEBugger. I'd like to find the total size of the currently allocated textures, which would be used to decide between different ways to solve that question.
The specific thing I'd like to do is figure out how gDEBugger fills in the info in "view/viewers/graphic and compute memory analysis viwer", and in that window the place where it tells me the sum of the size of all currently loaded textures.
For nvidia cards it appears I can call "glGetIntegerv(GL_GPU_MEM_INFO_CURRENT_AVAILABLE_MEM_NVX,#mem_available);" just before starting the texture test and just after, make the difference, and get the desired result.
For ATI/AMD it appears I can call "wglGetGPUInfoAMD(0, WGL_GPU_RAM_AMD, GL_UNSIGNED_INT, 4, &mem_available);" before and after the texture test to get the wanted result.
For intel video cards however I am not finding the right keywords to put in various search engines to figure this out.
So, anyone can help figure out how to do this with intel cards and confirm the method I'll use for ati/amd and nvidia cards?
edit: it appears that for amd/ati cards what I wrote earlier might be for total memory and for current memory I should use instead "glGetIntegerv( GL_TEXTURE_FREE_MEMORY_ATI, &mem_avail );"
edit2: for reference, here's what seems to be the most concise and precise source for what I wrote for the ati/amd and nvidia cards: http://nasutechtips.blogspot.ca/2011/02/how-to-get-gpu-memory-size-and-usage-in.html

Related

OpenGL constructing and using data on the GPU

I am not a graphics programmer, I use C++ and C mainly, and every time I try to go into OpenGL, every book, and every resource starts like this:
GLfloat Vertices[] = {
some, numbers, here,
some, more, numbers,
numbers, numbers, numbers
};
Or they may even be vec4.
But then you do something like this:
for(int i = 0; i < 10000; i++)
for(int j = 0; j < 10000; j++)
make_vertex();
And you get a problem. That loop is going to take a significant amount of time to finish- and if the make_vertex() function is anything like a saxpy or something of the sort, it is not just a problem... it is a big problem. For example, let us assume I wish to create fractal terrain. For any modern graphic card this would be trivial.
I understand the paradigm goes like this: Write the vertices manually -> Send them over to the GPU -> GPU does vertex processing, geometry, rasterization all the good stuff. I am sure it all makes sense. But why do I have to do the entire 'Send it over' step? Is there no way to skip that entire intermediary step, and just create vertices on the GPU, and draw them, without the obvious bottleneck?
I would very much appreciate at least a point in the right direction.
I also wonder if there is a possible solution without delving into compute shaders or CUDA? Does openGL or GLSL not provide a suitable random function which can be executed in parallel?
I think what you're asking for could work by generating height maps with a compute shader, and mapping that onto a grid with fixed spacing which can be generated trivially. That's a possible solution off the top of my head. You can use GL Compute shaders, OpenCL, or CUDA. Details can be generated with geometry and tessellation shaders.
As for preventing the camera from clipping, you'd probably have to use transform feedback and do a check per frame to see if the direction you're moving in will intersect the geometry.
Your entire question seems to be built on a huge misconception, that vertices are the only things which need to be "crunched" by the GPU.
First, you should understand that GPUs are far more superior than CPUs when it comes to parallelism (heck, GPUs sacrifice conditional control jumping for the sake of parallelism). Second, shaders and these buffers you make are all stored on the GPU after being uploaded by the CPU. The reason you don't just create all vertices on the GPU? It's the same reason for why you load an image from the hard drive instead of creating a raw 2D array and start filling it up with your pixel data inline. Even then, your image would be stored in the executable program file, which is stored on the hard disk and only loaded to memory when you run it. In an actual application, you'll want to load your graphics off assets stored somewhere (usually the hard drive). Why not let the GPU load the assets from the hard drive by itself? The GPU isn't connected to a hardware's storage directly, but barely to the system's main memory via some BUS. That's because to connect to any storage directly, the GPU will have to deal with the file system which is managed by the OS. That's one of the things the CPU would be faster at doing since we're dealing with serialized data.
Now what shaders deal with is this data you upload to the GPU (vertices, texture coordinates, textures..etc). In ancient OpenGL, no one had to write any shaders. Graphics drivers came with a builtin pipeline which handles regular rendering requests for you. You'd provide it with 4 vertices, 4 texture coordinates and a texture among other things (transformation matrices..etc), and it'd draw your graphics for you on the screen. You could go a bit farther and add some lights to your scene and maybe customize a few things about it, but things were still pretty tight. New OpenGL specifications gave more freedom to the developer by allowing them to rewrite parts of the pipeline with shaders. The developer becomes responsible for transforming vertices into place and doing all sort of other calculations related to lighting etc.
I would very much appreciate at least a point in the right direction.
I am guessing it has something to do with uniforms, but really, with
me skipping pages, I really cannot understand how a shader program
runs or what the lifetime of the variables is.
uniforms are variables you can send to the shaders from the CPU every frame before you use it to render graphics. When you use the saturation slider in Photoshop or Gimp, it (probably) sends the saturation factor value to the shader as a uniform of type float. uniforms are what you use to communicate little settings like these to your shaders from your application.
To use a shader program, you first have to set it up. A shader program consists of at least 2 types of shaders linked together, a fragment shader and a vertex shader. You use some OpenGL functions to upload your shader sources to the GPU, issue an order of compilation followed by linking, and it'll give you the program's ID. To use this program, you simply glUseProgram(programId) and everything following this call will use it for drawing. The vertex shader is the code that runs on the vertices you send to position them on the screen correctly. This is where you can do transformations on your geometry like scaling, rotation etc. A fragment shader runs at some stage afterwards using interpolated (transitioned) values outputted from the vertex shader to define the color and the depth of every unit fragment on what you're drawing. This is where you can do post-processing effects on your pixels.
Anyway, I hope I've helped making a few things clearer to you, but I can only tell you that there are no shortcuts. OpenGL has quite a steep learning curve, but it all connects and things start to make sense after a while. If you're getting so bored of books and such, then consider maybe taking code snippets of every lesson, compile them, and start messing around with them while trying to rationalize as you go. You'll have to resort to written documents eventually, but hopefully then things will fit easier into your head when you have some experience with the implementation components. Good luck.
Edit:
If you're trying to generate vertices on the fly using some algorithm, then try looking into Geometry Shaders. They may give you what you want.
You probably want to use CUDA for the things you are used to do in C or C++, and let OpenGL access the rasterizer and other graphics stuff.
OpenGL an CUDA interact somehow nicely. A good entry point to customize the contents of a buffer object is here: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g0fd33bea77ca7b1e69d1619caf44214b , with cudaGraphicsGLRegisterBuffer method.
You may also want to have a look at the nbody sample from NVIDIA GPU SDK samples the come with current CUDA installs.

What is the difference between clearing the framebuffer using glClear and simply drawing a rectangle to clear the framebuffer?

I think at least some old graphics drivers used to crash if glClear wasn't used and that glClear is probably faster in a lot of cases but why? How are 3-d graphics drivers usually implemented such that these uses would have different results?
On a high level, it can be faster because the OpenGL implementation knows ahead of time that the whole buffer needs to be set to the same color/value. The more you know about what exactly needs to be done, the more you can take advantage of possible accelerations.
Let's say setting a whole buffer to the same value is more efficient than setting the same pixels to variable values. With a glClear(), you know already that all pixels will have the same value. If you draw a screen sized quad with a fragment shader that emits a constant color, the driver would either have to recognize that situation by analyzing the shaders, or the system would have to compare the values coming out of the shader, to know that all pixels have the same value.
The reason why setting everything to the same value can be more efficient has to do with framebuffer compression and related technologies. GPUs often don't actually write each pixel out to the framebuffer, but use various kinds of compression schemes to reduce the memory bandwidth needed for framebuffer writes. If you imagine almost any kind of compression, all pixels having the same value is very favorable.
To give you some ideas about the published vendor specific technologies, here are a few sources. You can probably find more with a search.
Article talking about new framebuffer compression method in relatively recent AMD cards: http://techreport.com/review/26997/amd-radeon-r9-285-graphics-card-reviewed/2.
NVIDIA patent on zero bandwidth clears: http://www.google.com/patents/US8330766.
Blurb on ARM web site about Mali framebuffer compression: http://www.arm.com/products/multimedia/mali-technologies/arm-frame-buffer-compression.php.
Why is it faster? Because it is a function that bypasses most calculations that other types of drawings have to go through.
Alpha function, blend function, logical operation, stenciling, texture mapping, and depth-buffering are ignored by glClear
Source
Why do some drivers crash without it? It's hard to say, but it should have something to do with the implementation details of OpenGL. The functions does what it's supposed to do, but might do more that you don't know about.
OpenGL might infer from this function call other tasks that it needs to perform.

Average triangles in DirectX 10

So I basically want to check some information for my project.
I have GTX 460 video card. I wrote DX10 program with 20k triangles printed on the screen and now I get 28 FPS in Release build. All those triangles call DrawIndexed inside them so this is ofcourse an overhead in calling so much draws.
But anyway, I would like to know: how much triangles could I draw on the screen with those capabilities and at which FPS? I think 20k triangles is not even nearly enough to load some good models on game scene.
Sorry for my terrible english.
Sounds like you are creating a single draw call per triangle primitive, this is very bad, hence the horrid FPS, you should aim to draw as many triangles as possible per draw call, this can be done in a few ways:
Profile your code, both nVidia and AMD have free to to you you find why your code is slow, allowing you to focus where it really matters, so use them.
Index buffers & triangle strips to reduce bandwidth
Grouping of verts by material type/state/texture to improve batching
Instancing of primitive groups: draw multiple models/meshs in one call
Remove as much redundant state change (setting of shaders, textures, buffers, paramters) as possible, this goes hand-in-hand with the group mentioned earlier
The DX SDK will have examples of implementing each of these. The exact amount for triangles you can draw and a decent FPS(either 30 or 60 if you want vsync) varies greatly depending on the complexity of shading the triangles, however, if draw most simply, you should be able to push a few million with ease.
I would recommend taking a good look at the innards of an open source DX11 (not many DX10 projects exists, but the API is almost identical) engine, such as heiroglyph 3, and going through the SDK tutorials.
There are also quite a few presentations on increasing performance with DX10, but profile your code before diving into the suggestions full-on, here are a few from the hardware vendors themselves (color coded hints for nVidia vs AMD hardware):
GDC '08
GDC '09

optimal pixel-read back strategy

I need to render certain scenes and read the whole image back in main memory. I've search for this and it seems that most video cards will accelerate the rendering but the read-back will be very slow. After a bit of research i only found this card mentioning "Hardware-Accelerated Pixel Read-Back"
The other approach would do software rendering and the read-back problem doesn't exist, but then the rendering performance will be bad.
Likely, i will have to implement both in order to be able to find the optimal trade-off, but my question is about what other alternative can i have hardware-wise; i understand Quadro is for modelling and designer market segment, which is precisely the client target of this application, Does this means that i'm not likely to find better pixel read-back performance in other video card lines? i.e: Tesla or Fermi, which don't even have video outputs btw
I don't know if the performance would be any different, but you could at least try rendering to an off-screen buffer, then setting that as a texture of a full-screen quad (or outputting that to video in some other way)

glDrawArray+VBO increasing memory footprint

I am writing a Windows based OpenGL viewer application.
I am using VBO + triangle strip + glDrawArrays method to render my meshes. Every thing is perfectly working on all machines.
In case of Windows Desktop with nVidia Quadro cards the working/peak working memory shoots when i first call glDrawArray.
While in case of laptops having nvidia mobile graphic cards the working memory or peak working memory does not shoot. Since last few days i am checking almost all forums/post/tuts about VBO memory issue. Tried all combinations of VBO like GL_STATIC_DRAW/DYNAMIC/STREAM, glMapbuffer/glunmapbuffer. But nothing stops shooting memory on my desktops.
I suspect that for VBO with ogl 1.5 i am missing some flags.
PS: I have almost 500 to 600 VBO's in my application. I am using array of structures ( i.e. v,n,c,t together in a structure). And I am not aligning my VBOs to 16k memory.
Can any one suggest me how I should go ahead to solve this issue. Any hints/pointers would be helpful.
Do you actually run out of memory or does your application increasingly consume memory? If not, why bother? If the OpenGL implementation keeps a working copy for itself, then this is probably for a reason. Also there's little you can do on the OpenGL side to avoid this, since it's entirely up to the driver how it manages its stuff. I think the best course of action, if you really want to keep the memory footprint low, is contacting NVidia, so that they can double check if this may be a bug in their drivers.