I was wondering if there is an easy way to query (programatically) the GPU OpenGL Limits for the following features:
- maximum 2D texture size
- maximum 3D texture size
- maximum number of vertex shader attributes
- maximum number of varying floats
- number of texture image units (in vertex shader, and in fragment shader)
- maximum number of draw buffers
I need to know these numbers in advance before writing my GPU Research Project.
glGet() is your friend, with:
GL_MAX_3D_TEXTURE_SIZE
GL_MAX_TEXTURE_SIZE
GL_MAX_VERTEX_ATTRIBS
GL_MAX_VARYING_FLOATS
GL_MAX_TEXTURE_UNITS
GL_MAX_DRAW_BUFFERS
e.g.:
GLint result;
glGetIntegerv(GL_MAX_VARYING_FLOATS, &result);
Not quite sure what your project is setting out to achieve, but you might be interested in OpenCL if it's general purpose computing and you weren't already aware of it. In particular Cl/GL interop if there is a graphics element too and your hardware supports it.
As Damon pointed out in the comments in practice it may be more complex than this for texture sizes. The problems arise because rendering may fallback from hardware to software for some sizes of textures, and also because the size of a texture varies depending upon the pixel format used. To work around this it is possible to use GL_PROXY_TEXTURE_* with glTexImage*.
As a complement to what was said by "awoodland" and if you still do not know ... i think you should take a look at GLEW...
GLEW provides efficient run-time mechanisms for determining which OpenGL extensions are supported on the target platform.
http://glew.sourceforge.net/
Related
I am currently implementing the pose estimation algorithm proposed in Oikonomidis et al., 2011, which involves rendering a mesh in N different hypothesised poses (N will probably be about 64). Section 2.5 suggests speeding up the computation by using instancing to generate multiple renderings simultaneously (after which they reduce each rendering to a single number on the GPU), and from their description, it sounds like they found a way to produce N renderings simultaneously.
In my implementation's setup phase, I use an OpenGL viewport array to define GL_MAX_VIEWPORTS viewports. Then in the rendering phase, I transfer an array of GL_MAX_VIEWPORTS model-pose matrices to a mat4 uniform array in GPU memory (I am only interested in estimating position and orientation), and use gl_InvocationID in my geometry shader to select the appropriate pose matrix and viewport for each polygon of the mesh.
GL_MAX_VIEWPORTS is 16 on my machine (I have a GeForce GTX Titan), so this method will allow me to render up to 16 hypotheses at a time on the GPU. This may turn out to be fast enough, but I am nonetheless curious about the following:
Is there is a workaround for the GL_MAX_VIEWPORTS limitation that is likely to be faster than calling my render function ceil(double(N)/GL_MX_VIEWPORTS) times?
I only started learning the shader-based approach to OpenGL a couple of weeks ago, so I don't yet know all the tricks. I initially thought of replacing my use of the built-in viewport support with a combination of:
a geometry shader that adds h*gl_InvocationID to the y coordinates of the vertices after perspective projection (where h is the desired viewport height) and passes gl_InvocationID onto the fragment shader; and
a fragment shader that discards fragments with y coordinates that satisfy y<gl_InvocationID*h || y>=(gl_InvocationID+1)*h.
But I was put off investigating this idea further by the fear that branching and discard would be very detrimental to performance.
The authors of the paper above released a technical report describing some of their GPU acceleration methods, but it's not detailed enough to answer my question. Section 3.2.3 says "During geometry instancing, viewport information is attached to every vertex... A custom pixel shader clips pixels that are outside their pre-defined viewports". This sounds similar to the workaround that I've described above, but they were using Direct3D, so it's not easy to compare what they were able to achieve with that in 2011 to what I can achieve today in OpenGL.
I realise that the only definitive answer to my question is to implement the workaround and measure its performance, but it's currently a low-priority curiosity, and I haven't found answers anywhere else, so I hoped that a more experienced GLSL user might be able to offer their time-saving wisdom.
From a cursory glance at the paper, it seems to me that the actual viewport doesn't change. That is, you're still rendering to the same width/height and X/Y positions, with the same depth range.
What you want is to change which image you're rendering to. Which is what gl_Layer is for; to change which layer within the layered array of images attached to the framebuffer you are rendering to.
So just set the gl_ViewportIndex to 0 for all vertices. Or more specifically, don't set it at all.
The number of GS instancing invocations does not have to be a restriction; that's your choice. GS invocations can write multiple primitives, each to a different layer. So you could have each instance write, for example, 4 primitives, each to 4 separate layers.
Your only limitations should be the number of layers you can use (governed by GL_MAX_ARRAY_TEXTURE_LAYERS and GL_MAX_FRAMEBUFFER_LAYERS, both of which must be at least 2048), and the number of primitives and vertex data that a single GS invocation can emit (which is kind of complicated).
I have an art application I'm dabbling with that uses OpenGL for accelerated graphics rendering. I'd like to be able to add the ability to draw arbitrary piecewise curves - pretty much the same sort of shapes that can be defined by the SVG 'path' element.
Rather than tessellating my paths into polygons on the CPU, I thought it might be better to pass an array of values in a buffer to my shader defining the pieces of my curve and then using an in/out test to check which pixels were actually inside. In other words, I'd be iterating through a potentially large array of data describing each segment in my path.
From what I remember back when I learned shader programming years ago, GPUs handle if statements by evaluating both branches and then throwing away the branch that wasn't used. This would effectively mean that it would end up silently running through my entire buffer even if I only used a small part of it (i.e., my buffer has the capacity to handle 1024 curve segments, but the simple rectangle I'm drawing only uses the first four of them).
How do I write my code to deal with this variable data? Can modern GPUs handle conditional code like this well?
GPUs can handle arbitrary-length buffers and conditionals (or fake it convincingly). The problem is that a vertex and geometry shaders cannot generate arbitrary number of triangles from a short description.
OpenGL 4.0 added two new types of shaders: Tessellation Control shaders and Tessellation Evaluation shaders. These shaders give you the ability to tessellate curves and surfaces on the GPU.
I found this tutorial to be quite useful in showing how to tessellate Bezier curves on the GPU.
I got confused of how textures work with CUDA
as when I do device Query "on my GTX 780" I find this:
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
now when I investigated CUDA "particles example", I found this:
checkCudaErrors(cudaBindTexture(0, oldPosTex, sortedPos, numParticles*sizeof(float4)));
where numParticles in my case I have raised it to 1024 * 1024 * 2 (around 2.1 millions)
how does this fit in the 1D texture??
also inside the kernels I've found this "need more explain please as everything here is connected"
texture<float4, 1, cudaReadModeElementType> oldPosTex;
#define FETCH(t, i) tex1Dfetch(t##Tex, i)
at kernel:
float4 pos = FETCH(oldPos, sortedIndex);
now what I need to know also, I can use this texture "with its defined size numParticles*sizeof(float4) in a frame buffer draw instead of drawing a VBO?
how does this fit in the 1D texture?
The texture hardware consists of two main parts, the texture filtering hardware and the texture cache. Texture filtering includes functionality such as interpolation, addressing by normalized floating point coordinates and handling out-of-bounds addresses (clamp, wrap, mirror and border addressing modes). The texture cache can store data in a space filling curve to maximize 2D spatial locality (and thereby the cache hit rate). It can also store data in a regular flat array.
The Maximum Texture Dimension Size refers to limitations in the texture filtering hardware, not the texture caching hardware. And so, it refers to limits you may hit when using functions like tex2D() but not when using functions like tex1Dfetch(), which performs an unfiltered texture lookup. So, code you gave is probably setting things up for tex1Dfetch().
need more explain please as everything here is connected
This question is too broad and may be why your question was downvoted.
now what I need to know also, I can use this texture "with its defined size numParticles*sizeof(float4) in a frame buffer draw instead of drawing a VBO?
This is not a CUDA question as CUDA cannot draw anything. You should look into CUDA OpenGL interop to see if your question is answered there. If it's not, you should create a new question and describe your question more clearly.
Is there a way to get results from a shader running on a GPU back to the program running on the CPU?
I want to generate a polygon mesh from simple voxel data based on a computational costly algorithm on the GPU but I need the result on the CPU for physics calculations.
Define "the results"?
In general, if you're doing GPGPU-style computations with OpenGL, you are going to need to structure your shaders around the needs of a rendering system. Rendering systems are designed to be one-way: data goes into them and an image is produced. Going backwards, having the rendering system produce data, is not generally how rendering systems are structured.
That doesn't mean you can't do it, of course. But you need to architect everything around the limitations of OpenGL.
OpenGL offers a number of hooks where you can write data from certain shader stages. Most of these require specialized hardware
Fragment shader outputs
Any hardware capable of fragment shaders will obviously allow you to write to the current framebuffer you're rendering. Through the use of framebuffer objects and textures with floating-point or integer image formats, you can write pretty much any data you want to a variety of images. Once in a texture, you can simply call glGetTexImage to get the rendered pixel data. Or you can just do glReadPixels to get it if the FBO is still bound. Either way works.
The primary limitations of this method are:
The number of images you can attach to the framebuffer; this limits the amount of data you can write. On pre-GL 3.x hardware, FBOs were typically limited to only 4 images plus a depth/stencil buffer. In 3.x and better hardware, you can expect a minimum of 8 images.
The fact that you're rendering. This means that you need to set up your vertex data to position a triangle exactly where you want it to modify data. This is not a trivial undertaking. It's also difficult to get useful input data, since you typically want each texel to be fairly independent of the other. Structuring your fragment shader around these limitations is difficult. Not impossible, but non-trivial in many cases.
Transform Feedback
This OpenGL 3.0 feature allows the output from the Vertex Processing stage of OpenGL (vertex shader and optional geometry shader) to be captured in one or more buffer objects.
This is much more natural for capturing vertex data that you want to play with or render again. In your case, you'll need to read it back after rendering it, perhaps with a glGetBufferSubData call, or by using glMapBufferRange for reading.
The limitations here are that you generally only can capture 4 output values, where each value is a vec4. There are also some strict layout restrictions. Some OpenGL 3.x and 4.x hardware offers the ability to write data to multiple feedback streams, which can all be written into different buffers.
Image Load/Store
This GL 4.2 feature represents the pinnacle of writing: you can bind an image (a buffer texture, if you want to write to a buffer), and just write to it. There are memory ordering constraints that you need to work within.
It's very flexible, but very complex. Besides the difficulty in using it properly, there are a number of limitations. The number of images you can write to will be fairly limited, perhaps 8 or so. And implementations may have total write limits, so that 8 images to write to may have to be shared by the fragment shader's outputs.
What's more, image outputs are only guaranteed for the fragment shader (and 4.3's compute shaders). That is, hardware is allowed to forbid you from using image load/store on non-FS/CS shader stages.
I've been using DirectX (with XNA) for a while now, and have recently switched to OpenGL. I'm really loving it, but one thing has got me annoyed.
I've been trying to implement something that requires dynamic indexing in the vertex shader, but I've been told that this requires the equivilant of SM 4.0. However I know that this works in DX even with SM 2.0, possibly even 1.0. XNA's instancing sample uses this to do instancing on SM2.0 only cards http://create.msdn.com/en-US/education/catalog/sample/mesh_instancing.
The compiler can't have been "unrolling" it into a giant list of if statements, since this would surely exceed the instruction limit on SM2 for our 250 instances.
So is DX doing some trickery that I can't do with OpenGL, can I manipulate OpenGL to do the same, or is it a hardware feature that OpenGL doesn't expose?
You can upload an array for your light directions with something like glUniform3fv, then (assuming I understand what you're trying to do correctly) you just need your vertex format to include an index into this array (so there be lots of duplication of these indices if the index only changes once per mesh or something). If you don't already know, you can use glGetAttribLocation + glVertexAttribPointer to send arbitrary vertex attributes like this to the shader (as opposed to using the deprecated built-in attributes like gl_Vertex, gl_Normal, etc).
From your link:
Note that there is no single perfect
instancing technique. This must be
implemented in a different way on
Windows compared to Xbox 360, and on
Windows the ideal technique requires
shader 3.0, but there is also a
fallback approach that will work with
shader 2.0. This sample implements
several different instancing
techniques, so it can work on both
platforms and shader versions.
Not the emboldened part. So ont hat basis you should be able to do similar instancing on shader model 3. Shader model 2's instancing is usually performed using a matrix palette. It sumply means you can render multiple meshes in one call by uploading a load of transformation matrices in one go. This reduces draw calls and improves speed.
Anyway for OpenGL there was a lot of troubles finalising this extension and hence you need shader 4. You CAN, however, still stick a per vertex matrix palette index in yoru vertex structure and do matrix palette rendering using a shader...