I am programming an OpenGL renderer in C++. I want it to be as efficient as possible and each vertex/normal/UV tex coord/tangents/etc to take up as little memory as possible. I am using indexes, line strips, and fans. I was thinking that 32bit floating points are not necessary and 16 bit Floating points should be fine, at least for some of these like normals and UVs. I can't seem to find any examples of this anywhere. I can find talk of "GL_HALF_FLOAT", but no real examples. Am I on the right track? Or is this not worth looking in to? If anyone knows of an example of this could they send a link of source code?
In full OpenGL (unlike in OpenGL ES), shader code always operates with 32-bit floats. Starting with OpenGL 3.0, specifying vertex data as half-floats is supported, though. If the precision is sufficient for your needs, this can reduce memory usage for your vertex data, and reduce the bandwidth needed for vertex fetching.
Keep in mind that the precision of a half-float is only about 3-4 decimal digits. So the accuracy really is seriously limited.
As for how to use them, it's quite straightforward. If you have a pointer to half-float values, you store them in a VBO using glBufferData() or glBufferSubData(), just like you would for any other type. The glVertexAttribPointer() call will then look like this, using an attribute with 3 components as an example:
glVertexAttribPointer(loc, 3, GL_HALF_FLOAT, GL_FALSE, 0);
The format of the data itself is defined in the ARB_texture_float extension. While it's not officially named, it looks at least very similar to the IEEE 754-2008 format. I wrote conversion code based on that Wikipedia format description before, and it worked fine for OpenGL usage.
Most languages don't have built-in types for half-floats. So you either will have to write a few lines of code to do the conversion from float to half-float, or use code that somebody else wrote.
The following resources about half-float conversion are from a quick search. I have no personal experience with any of them, and you should do your own search to find the one most suitable for your needs:
Interesting article from Intel, explaining possible performance benefits: https://software.intel.com/en-us/articles/performance-benefits-of-half-precision-floats. This also mentions that Intel processors have instructions for the conversion (e.g. there's a _mm256_cvtps_ph intrinsic to convert from float to half-float).
Open source library for half-float operations and conversions: http://half.sourceforge.net/.
gcc documentation saying that it supports a half-float type (__fp16) for ARM targets.
Related
The specification for the OpenGL method glTexImage2D() gives a large table of accepted internalFormat parameters. I'm wondering though, if it really matters what I set this parameter as, since the doc says
If an application wants to store the texture at a certain
resolution or in a certain format, it can request the resolution
and format with internalFormat. The GL will choose an internal
representation that closely approximates that requested by internalFormat, but
it may not match exactly.
which makes it seem as though OpenGL is just going to pick what it wants anyways. Should I bother getting an images bit depth and setting the internalFormat to something like GL_RGBA8 or GL_RGBA16? All the code examples I've seen just use GL_RGBA...
which makes it seem as though OpenGL is just going to pick what it wants anyways.
This is very misleading.
There are a number of formats that implementations are required to support more or less exactly as described. Implementations are indeed permitted to store them in larger storage. But they're not permitted to lose precision compared to them. And there are advantages to using them (besides the obvious knowledge of exactly what you're getting).
First, it allows you to use specialized formats like GL_RGB10_A2, which is handy in certain situations (storing linear color values for deferred rendering, etc). Second, FBOs are required to support any combination of image formats, but only if all of those image formats come from the list of required color formats for textures/renderbuffers (but not the texture-only). If you're using any other internal formats, FBOs can throw GL_FRAMEBUFFER_UNSUPPORTED at you.
Third, immutable texture storage functions require the use of sized internal formats. And you should use those whenever they're available.
In general, you should always use sized internal formats. There's no reason to use the generic ones.
Using a generic internal format OpenGL will choose whatever it "likes" best, and tell it that you don't care. With an explicit internal format, you're telling OpenGL, that you actually care about the internal representation (most likely because you need the precision). While an implementation is free to up- or downgrade if an exact match can not be made, the usual fallback is to upgrade to the next higher format precision that can satisfy the requested demands.
Should I bother getting an images bit depth and setting the internalFormat
If you absolutely require the precision, then yes. If your concerns are more about performance, then no, as the usual default of the OpenGL implementations being around, is to choose the internal format for best performance if no specific format has been requested.
I'm working in C++ with large voxel grids in a scientific context and I'm trying to decide, which library to use. Only a fraction of the voxel grid holds values - but might be several per voxel (e.g. struct), which are determined by raytracing. I'm not trying to render anything, but I have to determine the potential number of rays passing though the entire target area, thus an awful lot of ray-box computations will have to be caluculated and preferebly very fast...
So far, I found
OpenVDB http://www.openvdb.org/
Field3d http://sites.google.com/site/field3d/
The latter appeals a bit more, because it seems simpler/easier to use.
My question is: Which of them would be more suited if put to use in tasks, which are not aimed at rendering/visualization? Which one is faster/better when computing a lot of ray-box-intersections (no viewpoint-dependent culling possible)? Suggestions, anyone?
In any case, I want to use an existing C++ library and not write a kdTree/Octree etc. myself. Don't have the time for inventing the wheel anew.
I would advise
OpenSceneGraph
Ogre3D
VTK
I have personally used the first two. However, VTK is also a popular alternative. All three of them support voxel based rendering.
for an application I'm developing I need to be able to
draw lines of different widths and colours
draw solid color filled triangles
draw textured (no alpha) quads
Very easy...but...
All coordinates are integer in pixel space and, very important: glReading all the pixels from the framebuffer
on two different machines, with two different graphic cards, running two different OS (Linux and freebsd),
must result in exactly the same sequence of bits (given an appropriate constant format conversion).
I think this is impossible to safely be achieved using opengl and hardware acceleration, since I bet different graphic
cards (from different vendors) may implement different algorithms for rasterization.
(OpenGl specs are clear about this, since they propose an algorithm but they also state that implementations may differ
under certain circumstances).
Also I don't really need hardware acceleration since I will be rendering very low speed and simple graphics.
Do you think I can achieve this by just disabling hardware acceleration? What happens in that case under linux, will I default on
MESA software rasterizer? And in that case, can I be sure it will always work or I am missing something?
That you're reading back in rendered pixels and strongly depend on their mathematical exactness/reproducability sounds like a design flaw. What's the purpose of this action? If you, for example, need to extract some information from the image, why don't you try to extract this information from the abstract, vectorized information prior to rendering?
Anyhow, if you depend on external rendering code and there's no way to make your reading code more robust to small errors, you're signing up for lots of pain and maintenance work. Other people could break your code with every tiny patch, because that kind of pixel exactness to the bit-level is usually a non-issue when they're doing their unit tests etc. Let alone the infinite permutations of hard- and software layers that are possible, and all might have influence on the exact pixel bits.
If you only need those two operatios: lines (with different widths and colors) and quads (with/without texture), I recommend writing your own rendering/rasterizer code which operates on a 8 bit uint array representing the image pixels (R8G8B8). The operations you're proposing aren't too nasty, so if performance is unimportant, this might actually be the better way to go on the long run.
I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely.
I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way.
I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ?
Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.
OpenGL is for rendering polygons. You might be able to do multiple passes and use shaders to get what you want but you are better off re-writing the algorithm in OpenCL. The bonus then would be you have something portable that will even use multi core CPUs if no graphics accelerator card is available.
Rather than OpenGL, this sounds like a CUDA, or more generally GPGPU problem.
If you have C or C++ code to do it already, CUDA should be little more than figuring out the types you want to use on the GPU and how the algorithm can be tiled.
If you want to do this with OpengGL, you'd normally do it by supplying the current data as a texture, and writing a fragment shader that processes that data, and set it up to render to a texture. Once the output texture is fully rendered, you can retrieve it back to the CPU and write it out as a file.
I'm afraid it's hard to do much more than a very general sketch of the overall flow without knowing more about what you're doing -- but if (as you said) you've already done this with CUDA, you apparently already have a pretty fair idea of most of the details.
At heart what you are asking here is "how can I use a GPU to solve this problem?"
Modern GPUs are essentially linear algebra engines, so your first step would be to define your problem as a matrix that transforms an input coordinate < x, y > to its output in homogenous space:
For example, you would represent a transformation of scaling x by ½, scaling y by 1.2, and translating up and left by two units as:
and you can work out analogous transforms for rotation, shear, etc, as well.
Once you've got your transform represented as a matrix-vector multiplication, all you need to do is load your source data into a texture, specify your transform as the projection matrix, and render it to the result. The GPU performs the multiplication per pixel. (You can also write shaders, etc, that do more complicated math, factor in multiple vectors and matrices and what-not, but this is the basic idea.)
That said, once you have got your problem expressed as a linear transform, you can make it run a lot faster on the CPU too by leveraging eg SIMD or one of the many linear algebra libraries out there. Unless you need real-time performance or have a truly immense amount of data to process, using CUDA/GL/shaders etc may be more trouble than it's strictly worth, as there's a bit of clumsy machinery involved in initializing the libraries, setting up render targets, learning the details of graphics development, etc.
Simply converting your inner loop from ad-hoc math to a well-optimized linear algebra subroutine may give you enough of a performance boost on the CPU that you're done right there.
You might find this tutorial useful (it's a bit old, but note that it does contain some OpenGL 2.x GLSL after the Cg section). I don't believe there are any shortcuts to image processing in GLSL, if that's what you're looking for... you do need to understand a lot of the 3D rasterization aspect and historical baggage to use it effectively, although once you do have a framework for inputs and outputs set up you can forget about that and play around with your own algorithms in shader code relatively easily.
Having being doing this sort of thing for years (initially using Direct3D shaders, but more recently with CUDA) I have to say that I entirely agree with the posts here recommending CUDA/OpenCL. It makes life much simpler, and generally runs faster. I'd have to be pretty desperate to go back to a graphics API implementation of non-graphics algorithms now.
GDIPlus blend functions use premultiplied rgb channel by alpha bitmaps for efficiency. However premultiplying by alpha is a very costly since you have to treat each pixel one by one.
It seem that it would be a good candidate for SSE assembly. Is there someone here that would want to share its implementation? I know that this is hard work so that's the reason I ask. I'm not trying to steal your work. You'll get all my consideration for sharing this if you can.
Edit : I'm not trying to do alpha blending by software. I'm trying to premultiply each color component of each pixel in an image by its alpha. I'm doing this because the alpha blend is done by the formula : dst=srcsrc.alpha+dst(1-dst.alpha) however the AlphaBlend Win32 function does implement dst=src+dst(1-dst.alpha) for optimisation reason. To get the correct result you need that src be equal to src*src.alpha before calling AlphaBlend.
It would take me a bit of time to write as I know little about assembly so I was asking if someone would like to share its implementation. SSE would be great as in the paper the gain would alpha blending by software is 300%.
There's a good article found here. It's a bit old but you might find something useful in the section where it uses MMX to implement alpha blending. This could be easily translated to SSE instructions to take advantage of larger register sizes (128bit)
MMX Enhanced Alpha Blending
Intel Application Notes here, with source code
Using MMX™ Instructions to Implement Alpha Blending
You may want to have a look at the Eigen C++ template library. It allows you to use high level C++ code that uses optimized assembler with support for SSE/Altivec.
Fast. (See benchmark).
Expression templates allow to intelligently remove temporaries and enable lazy evaluation, when that is appropriate -- Eigen takes care of this automatically and handles aliasing too in most cases.
Explicit vectorization is performed for the SSE (2 and later) and AltiVec instruction sets, with graceful fallback to non-vectorized code. Expression templates allow to perform these optimizations globally for whole expressions.
With fixed-size objects, dynamic memory allocation is avoided, and the loops are unrolled when that makes sense.
For large matrices, special attention is paid to cache-friendliness.
Elegant. (See API showcase).
The API is extremely clean and expressive, thanks to expression templates. Implementing an algorithm on top of Eigen feels like just copying pseudocode. You can use complex expressions and still rely on Eigen to produce optimized code: there is no need for you to manually decompose expressions into small steps.
Treating each pixel is not expensive with native Win32 GDI apis.
See MSDN