When allocating textures using glTexImage* functions, I know that I need to set glTexParameteri(GL_TEXTURE_MAX_LEVEL) to a reasonable value and specify all the levels up to that value, as described here.
I didn't expect for this to be necessary in case of glTexStorage* functions too, since they accept the number of layers as a parameter and allocate memory for that number of layers up-front. Still, I noticed I couldn't sample an immutable texture defined this way - until I called glGenerateMipmap or specified GL_TEXTURE_MAX_LEVEL to levels-1.
I didn't find any official reason why it should be necessary and I expected immutable texture's parameters to be, well, immutable (and well-initialized). Can somebody confirm if (and why) this behaviour is correct? Or is it an AMD driver bug perhaps?
OK, I think I got that:
The parameter levels of glTexStorage is indeed stored in the texture object, but as GL_TEXTURE_IMMUTABLE_LEVELS, not as GL_TEXTURE_MAX_LEVEL, as I thought.
The parameter GL_TEXTURE_MAX_LEVEL hence remains at the default large value. (It's possible to change it manually: the immutable flag of texture object only relates to the texture buffer and its format, but not buffer data or parameters).
The texture immutability should affect LOD calculation in the following way according to the spec:
if TEXTURE_IMMUTABLE_FORMAT is TRUE, then levelbase is clamped
to the range [0; levelimmut - 1]
So leaving GL_TEXTURE_MAX_LEVEL intact (= 1000) for an immutable texture shall have the same effect as setting it to levels-1.
Verdict: driver bug; the driver apparently omits this clamping step.
I know that I need to set glTexParameteri(GL_TEXTURE_MAX_LEVEL) to a reasonable value and specify all the levels up to that value, as described here.
Well, you don't have to. The default value for GL_TEXTURE_MAX_LEVEL is 1000 and hence larger than any image pyramid you'll every reasonably use.
Still, I noticed I couldn't sample an immutable texture defined this way - until I called glGenerateMipmap or specified GL_TEXTURE_MAX_LEVEL to levels-1.
Yes, that's because image storage is independent of image sampling. The value of GL_TEXTURE_MAX_LEVEL is a parameter that affects image access at sampling time (you could set it into a Sampler Object as well) that's independent of the actual texture image storage. You can change the range of used image pyramid levels also after image specification, if you want to select only a subrange of images used during rendering, or only upload images into a subset of the allocated image pyramid.
EDIT reworded for clarification
Related
I use glTexImage2DMultiSample with fixedsamplelocations parameter set on false. Then texels may have different samples count. How can I check samples count for a texel in the fragment shader ? Is the only solution to use textureSamples (ARB_shader_texture_image_samples) ?
The sample count for an image is for the image, not for individual texels. So if some implementation assigns different numbers of samples to different texels, that's not something you can query. The implementation will have to behave as if it didn't do that, since all of OpenGL's APIs assume a fixed sample count for an image. This includes the APIs for reading and writing the current fragment's sample mask.
After watching videos and reading the documentation on DXR and DX12, I'm still not sure how to manage resources for DX12 raytracing (DXR).
There is quite a difference between rasterizing and raytracing in terms of resource management, the main difference being that rasterizing has a lot of temporal resources that can be bound on the fly, and raytracing being in need of all resources being ready to go at the time of casting rays. The reason is obvious, a ray can hit anything in the whole scene, so we need to have every shader, every texture, every heap ready and filled with data before we cast a single ray.
So far so good.
My first test was adding all resources to a single heap - based on some DXR tutorials. The problem with this approach arises with objects having the same shaders but different textures. I defined 1 shader root signature for my single hit group, which I had to prepare before raytracing. But when creating a root signature, we have to exactly tell which position in the heap corresponds to the SRV where the texture is located. Since there are many textures with different positions in the heap, I would need to create 1 root signature per object with different textures. This of course is not preferred, since based on documentation and common sense, we should keep the root signature amount as small as possible.
Therefore, I discarded this test.
My second approach was creating a descriptor heap per object, which contained all local descriptors for this particular object (Textures, Constants etc..). The global resources = TLAS (Top Level Acceleration Structure), and the output and camera constant buffer were kept global in a separate heap. In this approach, I think I misunderstood the documentation by thinking I can add multiple heaps to a root signature. As I'm writing this post, I could not find a way of adding 2 separate heaps to a single root signature. If this is possible, I would love to know how, so any help is appreciated.
Here the code I'm usign for my root signature (using dx12 helpers):
bool PipelineState::CreateHitSignature(Microsoft::WRL::ComPtr<ID3D12RootSignature>& signature)
{
const auto device = RaytracingModule::GetInstance()->GetDevice();
if (device == nullptr)
{
return false;
}
nv_helpers_dx12::RootSignatureGenerator rsc;
rsc.AddRootParameter(D3D12_ROOT_PARAMETER_TYPE_SRV,0); // "t0" vertices and colors
// Add a single range pointing to the TLAS in the heap
rsc.AddHeapRangesParameter({
{2 /*t2*/, 1, 0, D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1}, /* 2nd slot of the first heap */
{3 /*t3*/, 1, 0, D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 3}, /* 4nd slot of the first heap. Per-instance data */
});
signature = rsc.Generate(device, true);
return signature.Get() != nullptr;
}
Now my last approach would be to create a heap containing all necessary resources
-> TLAS, CBVs, SRVs (Textures) etc per object = 1x heap per object effectively. Again, as I was reading documentation, this was not advised, and documentation was stating that we should group resources to global heaps. At this point, I have a feeling I'm mixing DX12 and DXR documentation and best practices, by using proposals from DX12 in the DXR domain, which is probably wrong.
I also read partly through Nvidia Falcor source code and they seem to have 1 resource heap per descriptor type effectively limiting the number of descriptor heaps to a minimum (makes total sense) but I did not jet find how a root signature is created with multiple separate heaps.
I feel like I'm missing one last puzzle part to this mystery before it all falls into place and creates a beautiful image. So if anyone could explain how the resource management (heaps, descriptors etc.. ) should be handled in DXR if we want to have many objects which different resources, it would help me a lot.
So thanks in advance!
Jakub
With DXR you need to start at shader model 6.2 where dynamic indexing started to have a much more official support than just "the last descriptor is free to leak in seemingly-looking overrun indices" that was the "secret" approach in 5.1
Now you have full "bindless" using a type var[] : register(t4, 1); declarative syntax and you can index freely var[1] will access register (t5,1) etc.
You can setup register ranges in the descriptor table, so if you have 100 textures you can span 100.
You can even declare other resources after the array variable as long as you remember to jump all the registers. But it's easier to use different virtual spaces:
float4 ambiance : register(b0, 0);
Texture2D all_albedos[] : register(t0, 1);
matrix4x4 world : register(b1, 0);
Now you can go to t100 with no disturbance on the following space0 declarations.
The limit on the the register value is lifted in SM6. It's
up to max supported heap allocation
So all_albedos[3400].Sample(..) is a perfectly acceptable call (provided your heap has bound the views).
Unfortunatly in DX12 they give you the feeling you can bind multiple heaps with the CommandList::SetDescriptorHeaps function, but if you try you'll get runtime errors:
D3D12 ERROR: ID3D12CommandList::SetDescriptorHeaps: pDescriptorHeaps[1] sets a descriptor heap type that appears earlier in the pDescriptorHeaps array.
Only one of any given descriptor heap type can be set at a time. [ EXECUTION ERROR #554: SET_DESCRIPTOR_HEAP_INVALID]
It's misleading so don't trust that plural s in the method name.
Really if we have multiple heaps, that would only be because of triple buffering circular update/usage case, or upload/shader-visible I suppose. Just put everything in your one heap, and let the descriptor table index in it as demanded.
A descriptor table is a very lightweight element, it's just 3 ints. A descriptor start, a span and a virtual space. Just use that, you can span for 1000 textures if you have 1000 textures in your scene. You can get the material ID if you embed it into an indirection texture that would have unique UVs like a lightmap. Or in the vertex data, or just the whole hitgroup (if you setup for 1 hitgroup = 1 object). Your hitgroup index, which is given by a system value in the shader, will be your texture index.
Dynamic indexing of HLSL 5.1 might be the solution to this issue.
https://learn.microsoft.com/en-us/windows/win32/direct3d12/dynamic-indexing-using-hlsl-5-1
With dynamic indexing, we can create one heap containing all materials and use an index per object that will be used in the shader to take the correct material at run time
Therefore, we do not need multiple heaps of the same type, since it's not possible anyway. Only 1 heap per heap type is allowed at the same time
I'm trying to find ways to copy multidimensional arrays from host to device in opencl and thought an approach was to use an image... which can be 1, 2, or 3 dimensional objects. However I'm confused because when reading a pixle from an array, they are using vector datatypes. Normally I would think double pointer, but it doesn't sound like that is what is meant by vector datatypes. Anyway here are my questions:
1) What is actually meant to vector datatype, why wouldn't we just specify 2 or 3 indices when denoting pixel coordinates? It looks like a single value such as float2 is being used to denote coordinates, but that makes no sense to me. I'm looking at the function read_imageui and read_image.
2) Can the input image just be a subset of the entire image and sampler be the subset of the input image? I don't understand how the coordinates are actually specified here either since read_image() only seams to take a single value for input and a single value for sampler.
3) If doing linear algebra, should I just bite the bullet and translate 1-D array data from the buffer into multi-dim arrays in opencl?
4) I'm still interested in images, so even if what I want to do is not best for images, could you still explain questions 1 and 2?
Thanks!
EDIT
I wanted to refine my question and ask, in the following khronos documentation they define...
int4 read_imagei (
image2d_t image,
sampler_t sampler,
int2 coord)
But nowhere can I find what image2d_t's definition or structure is supposed to be. The samething for sampler_t and int2 coord. They seem like structs to me or pointers to structs since opencl is supposed to be based on ansi c, but what are the fields of these structs or how do I note the coord with what looks like a scala?! I've seen the notation (int2)(x,y), but that's not ansi c, that looks like scala, haha. Things seem conflicting to me. Thanks again!
In general you can read from images in three different ways:
direct pixel access, no sampling
sampling, normalized coordinates
sampling, integer coordinates
The first one is what you want, that is, you pass integer pixel coordinates like (10, 43) and it will return the contents of the image at that point, with no filtering whatsoever, as if it were a memory buffer. You can use the read_image*() family of functions which take no sampler_t param.
The second one is what most people want from images, you specify normalized image coords between 0 and 1, and the return value is the interpolated image color at the specified point (so if your coordinates specify a point in between pixels, the color is interpolated based on surrounding pixel colors). The interpolation, and the way out-of-bounds coordinates are handled, are defined by the configuration of the sampler_t parameter you pass to the function.
The third one is the same as the second one, except the texture coordinates are not normalized, and the sampler needs to be configured accordingly. In some sense the third way is closer to the first, and the only additional feature it provides is the ability to handle out-of-bounds pixel coordinates (for instance, by wrapping or clamping them) instead of you doing it manually.
Finally, the different versions of each function, e.g. read_imagef, read_imagei, read_imageui are to be used depending on the pixel format of your image. If it contains floats (in each channel), use read_imagef, if it contains signed integers (in each channel), use read_imagei, etc...
Writing to an image on the other hand is straightforward, there are write_image{f,i,ui}() functions that take an image object, integer pixel coordinates and a pixel color, all very easy.
Note that you cannot read and write to the same image in the same kernel! (I don't know if recent OpenCL versions have changed that). In general I would recommend using a buffer if you are not going to be using images as actual images (i.e. input textures that you sample or output textures that you write to only once at the end of your kernel).
About the image2d_t, sampler_t types, they are OpenCL "pseudo-objects" that you can pass into a kernel from C (they are reserved types). You send your image or your sampler from the C side into clSetKernelArg, and the kernel gets back a sampler_t or an image2d_t in the kernel's parameter list (just like you pass in a buffer object and it gets a pointer). The objects themselves cannot be meaningfully manipulated inside the kernel, they are just handles that you can send into the read_image/write_image functions, along with a few others.
As for the "actual" low-level difference between images and buffers, GPU's often have specially reserved texture memory that is highly optimized for "read often, write once" access patterns, with special texture sampling hardware and texture caches to optimize scatter reads, mipmaps, etc..
On the CPU there is probably no underlying difference between an image and a buffer, and your runtime likely implements both as memory arrays while enforcing image semantics.
I have a terrible behavior with one of my textures on OpenGL.
After deleting a texture, I create a new one, it generates the same tex number as before, but the texture is incorrect. Also glGetError returns always 0 on every line ! I tried to add glFlush/glFinish after glDeleteTextures but it doesn't change anything ! the texture number seems locked somewhere...why ?
It's single threaded, here the behavior :
//myTexture == 24 is loaded and works correctly
GLboolean bIsTexture = glIsTexture(myTexture); //returns 1 = > ok
glDeleteTextures(1,&myTexture);
bIsTexture = glIsTexture(myTexture); //returns 0 => ok
//Let's create a new texture
glGenTextures(1,&myTexture);//myTexture == 24 (as the glDelete was ok)
glBindTexture(GL_TEXTURE_2D,myTexture);
bIsTexture = glIsTexture(myTexture); //returns 0 => FAILS
Call BindTexture to 0 before creating a texture already bind :
//Let's create a new texture
glBindTexture(GL_TEXTURE_2D,0); // free the old bind texture if deleted
glGenTextures(1,&myTexture);//myTexture == 24 (as the glDelete was ok)
glBindTexture(GL_TEXTURE_2D,myTexture);
bIsTexture = glIsTexture(myTexture); //returns 1 => Ok
Well, you're missing one important step in the commonly used sequence of operations for creating a new texture: Actually allocating it.
//Let's create a new texture
glGenTextures(1,&myTexture);//myTexture == 24 (as the glDelete was ok)
glBindTexture(GL_TEXTURE_2D,myTexture);
Here you must call either glTexImage2D or glTexStorage to actually allocate the texture object. Before doing so, there's no texture data associated with the generated texture name. This is important: The value(s) generated by glGenTextures are not textures, but texture names (i.e. handles) and while OpenGL states that this should be a texture object already, a buggy driver may interpret it wrongly.
glTexImage2D(…); // <<<<<
bIsTexture = glIsTexture(myTexture); //returns …
Update:
As Andon M. Coleman points out (thanks for that), binding a texture name to a texture target makes texture object (associated to said name). So one should expect glIsTexture to return GL_TRUE in that case. Now here's how reality differs from the ideal world of specification: An actual (buggy) driver may be wrongly implemented and assume a name being associated with a texture object only if there's an actual data storage present, so it might be necessary to do the actual allocation to see the effect.
In practice you normally do the storage allocation quite soon after the name allocation. I presume the test suite the implementer of your driver used does not check for this corner case. Time for writing a bug report.
The only thing glIsTexture (...) does is let you know if an OpenGL name (handle) belongs to a texture or not. In fact, OpenGL names are not actually tied to their final purpose until first-use. In the case of your texture name, glIsTexture merely checks to see if the name is associated with a texture; this association takes place the first time you call glBindTexture (...) using the name. glIsTexture does not tell you if the name has an asociated data store (e.g. you called glTexImage2D (...) or in OpenGL 4+ glTexStorage (...)).
Here's an interesting bit of trivia: before OpenGL 3.0, the spec. allowed you to come up with any unused number for an object name and OpenGL would treat it like you had used a glGen___ (...) function to generate the name; this can still be done in compatibility profiles. That is how unimportant the name generation functions were in the grand scheme of things.
The big takeaway here is that names are given their function upon fist use. More importantly glIs___ (...) merely tells you if a name is associated with a particular kind of OpenGL object (not if it is a valid/initialized/... object).
The official explanation of what I just mentioned comes from the OpenGL spec, which states:
The command:
void GenTextures( sizei n, uint *textures );
returns n previously unused texture names in textures. These names are marked as used, for the purposes of GenTextures only, but they acquire texture state and a dimensionality only when they are first bound, just as if they were unused.
The binding is effected by calling:
void BindTexture( enum target, uint texture );
with target set to the desired texture target and texture set to the unused name. The resulting texture object is a new state vector, comprising all the state and with the same initial values listed in section 8.21 The new texture object bound to target is, and remains a texture of the dimensionality and type specified by target until it is deleted.
Since this is all that glIsTexture (...) is supposed to do, I would have to assume that this is a driver bug.
I'm trying to apply contrast and brightness to a bitmap in memory and I'm completely lost. Currently I'm trying to use Magick++ to do it, but if one of the other APIs would work better I'm all ears. I managed to find Magick::Image::sigmoidalContrast() for applying the contrast, but I can't figure out how to get it to work. I'm creating an image, passing it the buffer pointer, then calling that function, but it doesn't seem like it's changing anything so my first though was that it's making a copy and modifying that. Even so, I have no idea how to get the data out of the Magick::Image object.
Here's what I got so far.
Magick::Image image(fBitmapData->mGetTextureWidth(), fBitmapData->mGetTextureHeight(), "RGBA", MagickCore::CharPixel, pixels);
image.sigmoidalContrast(1, 20.0);
The documentation is useless and after searching I could only find hints that the first parameter is actually a boolean, even though it takes a size_t, that specifies whether to add or subtract the contrast, and the second value is something I have no idea what to pass so I'm just using 20.0 to test.
So does anyone know if this will work for contrast, and if not, then how do you apply contrast? And likewise I still have no idea how to apply brightness either and can't find any functions that look like they would work.
Figured it out; The function for contrast I was using was correct, and for brightness I ended up using image.modulate(brightness, 100.0, 100.0);. To get the data out of the image object you can grab the pixels of the entire image by doing
const MagickCore::PixelPacket * magickPixels = image.getConstPixels(0, 0, image.columns(), image.rows());
And then copy the magickPixels data back into the original pixels that were passed into the image constructor. An important thing to note is that the member MagickCore::PixelPacket::opacity is not what you would think it would be. If the pixel is completely transparent you'd think the value would be 0, right? Well for some reason ImageMagick is doing it opposite. So for full transparency the value would be 255. This means you need to do 255 - opacity to get the correct value.
Also be careful of the MAGICKCORE_QUANTUM_DEPTH that ImageMagick was compiled with, as this will change the values drastically. For my code MAGICKCORE_QUANTUM_DEPTH just happened to be defined as 16 so all of the values were a range of 0 to 65535, which I just fixed by doing realValue = magickValue >> 8 when copying the data back over since the texture data is unsigned char values.
Just for clarification on how to use these functions, since the documentation is horrible and completely wrong, the first parameter to signmoidalContrast() is actually a boolean, even though the type is a size_t, that specifies whether to increase the contrast (true) or reduce it (false), and the second is a range from 0.00001 to 20.0. I say 0.00001 because 0.0 is an invalid value so it just needs to be some decimal that is close to but not exactly 0.0.
For modulate() the documentation says that each value should be specified as 1.0 for no change, which is completely wrong. The values are actually a percentage so for no change you would specify 100.0.
I hope that helps someone because it took me all damn day to figure this stuff out.
According to the Imagemagick website - for the command line but may be the same?
-sigmoidal-contrast contrastxmid-point
increase the contrast without saturating highlights or shadows.
Increase the contrast of the image using a sigmoidal transfer function without saturating highlights or shadows. Contrast indicates how much to increase the contrast. For example, near 0 is none, 3 is typical and 20 is a lot. Note that exactly zero is invalid, but 0.0001 is negligibly different from no change in contrast. mid-point indicates where midtones fall in the resultant image (0 is white; 50% is middle-gray; 100% is black). By default the image contrast is increased, use +sigmoidal-contrast to decrease the contrast.
To achieve the equivalent of a sigmoidal brightness change, use -sigmoidal-contrast brightnessx0% to increase brightness and class="arg">+sigmoidal-contrast brightnessx0% to decrease brightness.
On the command line there is a new brightness contrast setting that may be in later versions of magic++?
-brightness-contrast brightness{xcontrast}{%}}
Adjust the brightness and/or contrast of the image.
Brightness and Contrast values apply changes to the input image. They are not absolute settings. A brightness or contrast value of zero means no change. The range of values is -100 to +100 on each. Positive values increase the brightness or contrast and negative values decrease the brightness or contrast. To control only contrast, set the brightness=0. To control only brightness, set contrast=0 or just leave it off.
You may also use -channel to control which channels to apply the brightness and/or contrast change. The default is to apply the same transformation to all channels.
Brightness and Contrast arguments are converted to offset and slope of a linear transform and applied using -function polynomial "slope,offset".
The slope varies from 0 at contrast=-100 to almost vertical at contrast=+100. For brightness=0 and contrast=-100, the result are totally midgray. For brightness=0 and contrast=+100, the result will approach but not quite reach a threshold at midgray; that is the linear transformation is a very steep vertical line at mid gray.
Negative slopes, i.e. negating the image, are not possible with this function. All achievable slopes are zero or positive.
The offset varies from -0.5 at brightness=-100 to 0 at brightness=0 to +0.5 at brightness=+100. Thus, when contrast=0 and brightness=100, the result is totally white. Similarly, when contrast=0 and brightness=-100, the result is totally black.
As the range of values for the arguments are -100 to +100, adding the '%' symbol is no different than leaving it off.
If magick++ is like Imagick it may be lagging a long way behind the Imagemagick options