Packing and unpacking a uint into float4 in DirectX - c++

I have a texture atlas that I'm generating from an array of uints. Sampling from it in my pixel shader, colors are coming out correctly. Here's the relevant HLSL:
Texture2D textureAtlas : register(t8);
SamplerState smoothSampler : register(s9)
{
Filter = MIN_MAG_MIP_LINEAR;
AddressU = Clamp;
AddressV = Clamp;
}
struct PS_OUTPUT
{
float4 Color : SV_TARGET0;
float Depth : SV_DEPTH0;
}
PS_OUTPUT PixelShader
{
// among other things, u and v are calculated here
output.Color = textureAtlas.Sample(smoothSampler, float2(u,v));
}
This works great. With color working, I've extended the texture atlas to include depth information as well. There are only a few thousand depth values that I want, well under 24 bits worth (my depth buffer is 24 bits wide + an 8 bit stencil). The input depth values are uints, just like the colors, though of course in the depth case the values are going to be spread over four color channels and in the shader I want a single float between 0 and 1, so that will need to be computed from the sample. Here's the additional pixel shader code:
// u and v are recalculated for the depth portion of the texture atlas
float4 depthSample = textureAtlas.Sample(smoothSampler, float2(u,v));
float depthValue =
(depthSample.b * 65536.0 +
depthSample.g * 256.0 +
depthSample.r)
/ 65793.003921568627450980392156863;
output.Depth = depthValue;
The long constant here is 16777216/255, which should map the full uint range down to a unorm.
Now, when I'm generating the texture, if I constrain the depth values to the range of 0..2048, the output depth is correct. However, if I allow the upper limit of the range to increase (even if it's simply by taking the input values and performing a left shift by 16), then the output depths will be slightly off. Not by much, just +/- 0.002, but it's enough to make the output look terrible.
Can anybody spot my bug here? Or, more generally, is there a better way of packing and unpacking uints into textures?
I'm working in shader model 4 level 9_3 and C++ 11.

Your code is prone to precision loss: you're adding a relatively large number up to (65536+256) and a small number depthSample.r < 1.
Also, make sure your (u,v) are in the center of the texel to avoid filtering or replace Sample with Load.
Since you're using SM4 you can use the functions asuint and asfloat to reinterpret cast.
You can also use float format textures instead of R8G8B8A8.

Related

Attempt at making a Displacement Filter

So, basically, I'm trying to make a OBS Filter that displaces the pixels based on a lightmap/luminance map. I decided to learn how to make a filter by following this tutorial. But, in this tutorial, they don't explain much in terms of pixel displacement. So, I made a function that basically gets the brightness value of a texture I input and tested it by changing the pixel's alpha value with the red value of the texture:
float4 get_displacement(float2 position)
{
float2 pattern_uv = position / pattern_size;
float4 pattern_sample = pattern_texture.Sample(linear_wrap, pattern_uv / scale);
return pattern_sample;
}
float4 pixel_shader(pixel_data pixel) : TARGET
{
float4 source_sample = image.Sample(linear_wrap, pixel.uv);
if (pattern_size.x <= 0){
return source_sample;
}
float2 position = pixel.uv * float2(width, height);
float4 lightmap = get_displacement(position);
return float4(source_sample.rgb, lightmap.r);
return source_sample;
}
Which results to this (Note: The green is from a colour source that's behind the image to show the alpha value)
But, for some reason, when I try it with the vertex_shader, the function that decides where the pixel is rendered at, it seems to not work:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
if (pattern_size.x <= 0){
pixel.pos = mul(float4(vertex.pos.xyz, 1.0), ViewProj);
return pixel;
}
float2 position = vertex.uv * float2(width, height);
float4 lightmap = get_displacement(position);
pixel.pos = mul(float4(vertex.pos.x + (lightmap.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
(Note: testRamp1 is used as a value that I can change from a slider inside of OBS via some filter Properties)
The result that I'm expecting is something similar to this
To see if the issue was from me changing the XY position, I tested it using this function:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + 100, vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
And it gave me an expected result.
I also changed the 100 with the testRamp1 value, and it works just the same based on the value of the slider.
So, I then tested if it was from the pixels needing to all move the same distance as each other. So, I change the function to this:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (vertex.uv.x * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
Which then gives me either a squashed image when testRamp1 is set to a negative value, and it gives me a stretched image when it's set it to a positive value.
But as soon as I try to get the value of an image, may it be the pattern or from the source image, it no longer works(not even the filter parameters appear). For example, I used, this function to use the values of the source image:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
float4 source_sample = image.Sample(linear_wrap, vertex.uv);
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (source_sample.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
At this point, I'm at a loss of words as to what could be causing this issue
First of all, the vertex shader is not what you want to use for this kind of effect. What you actually want to do, is to sample the image in the pixel shader, but offset the UV values slightly by your displacement before you pass them to the Sample function.
The primary reason, why you don't want to do this in the vertex shader, is, that the number of vertices is usually much smaller than the number of pixels - in the worst case, you only have 4 vertices in total (one for each corner of your screen), so the granuality of things you can do in the vertex shader is rather coarse. (Note: I'm not too familiar with OBS filters, and don't know how many vertices OBS dispatches, but certainly much less than the number of pixels you have on your screen).
Now, the reason why your vertex shader didn't work at all, is a bit more technical. In short, you can't use Sample in a vertex shader, you'd have to use SampleLevel or SampleGrad instead (note that these functions require more parameters). This is because Sample automatically calculates a UV gradient between adjacent pixels, to figure out the level of detail that is needed for your texture (whether or not it actually has multiple levels of detail). But the vertex shader operates on vertices, not on pixels, so the concept of an "adjacent pixel" doesn't make sense in a vertex shader - thus, the Sample method doesn't work.

OpenGL Terrain System, small height difference between GPU and CPU

A quick summary:
I've a simple Quad tree based terrain rendering system that builds terrain patches which then sample a heightmap in the vertex shader to determine the height of each vertex.
The exact same calculation is done on the CPU for object placement and co.
Super straightforward, but now after adding some systems to procedurally place objects I've discovered that they seem to be misplaced by just a small amount. To debug this I render a few crosses as single models over the terrain. The crosses (red, green, blue lines) represent the height read from the CPU. While the terrain mesh uses a shader to translate the vertices.
(I've also added a simple odd/even gap over each height value to rule out a simple offset issue. So those ugly cliffs are expected, the submerged crosses are the issue)
I'm explicitly using GL_NEAREST to be able to display the "raw" height value:
As you can see the crosses are sometimes submerged under the terrain instead of representing its exact height.
The heightmap is just a simple array of floats on the CPU and on the GPU.
How the data is stored
A simple vector<float> which is uploaded into a GL_RGB32F GL_FLOAT buffer. The floats are not normalized and my terrain usually contains values between -100 and 500.
How is the data accessed in the shader
I've tried a few things to rule out errors, the inital:
vec2 terrain_heightmap_uv(vec2 position, Heightmap heightmap)
{
return (position + heightmap.world_offset) / heightmap.size;
}
float terrain_read_height(vec2 position, Heightmap heightmap)
{
return textureLod(heightmap.heightmap, terrain_heightmap_uv(position, heightmap), 0).r;
}
Basics of the vertex shader (the full shader code is very long, so I've extracted the part that actually reads the height):
void main()
{
vec4 world_position = a_model * vec4(a_position, 1.0);
vec4 final_position = world_position;
// snap vertex to grid
final_position.x = floor(world_position.x / a_quad_grid) * a_quad_grid;
final_position.z = floor(world_position.z / a_quad_grid) * a_quad_grid;
final_position.y = terrain_read_height(final_position.xz, heightmap);
gl_Position = projection * view * final_position;
}
To ensure the slightly different way the position is determined I tested it using hardcoded values that are identical to how C++ reads the height:
return texelFetch(heightmap.heightmap, ivec2((position / 8) + vec2(1024, 1024)), 0).r;
Which gives the exact same result...
How is the data accessed in the application
In C++ the height is read like this:
inline float get_local_height_safe(uint32_t x, uint32_t y)
{
// this macro simply clips x and y to the heightmap bounds
// it does not interfer with the result
BB_TERRAIN_HEIGHTMAP_BOUND_XY_TO_SAFE;
uint32_t i = (y * _size1d) + x;
return buffer->data[i];
}
inline float get_height_raw(glm::vec2 position)
{
position = position + world_offset;
uint32_t x = static_cast<int>(position.x);
uint32_t y = static_cast<int>(position.y);
return get_local_height_safe(x, y);
}
float BB::Terrain::get_height(const glm::vec3 position)
{
return heightmap->get_height_raw({position.x / heightmap_unit_scale, position.z / heightmap_unit_scale});
}
What have I tried:
Comparing the Buffers
I've dumped the first few hundred values from the vector. And compared it with the floating point buffer uploaded to the GPU using Nvidia Nsight, they are equal, rounding/precision errors there.
Sampling method
I've tried texture, textureLod and texelFetch to rule out some issue there, they all give me the same result.
Rounding
The super strange thing, when I round all the height values. They are perfectly aligned which just screams floating point precision issues.
Position snapping
I've tried rounding, flooring and ceiling the position, to ensure the position always maps to the same texel. I also tried adding an epsilon offset to rule out a positional precision error (probably stupid because the terrain is stable...)
Heightmap sizes
I've tried various heightmaps, also of different sizes.
Heightmap patterns
I've created a heightmap containing a pattern to ensure the position is not just offsetet.

Pass array of floats to fragment shader via texture

I am trying to pass an array of floats (in my case an audio wave) to a fragment shader via texture. It works but I get some imperfections as if the value read from the 1px height texture wasn't reliable.
This happens with many combinations of bar widths and amounts.
I get the value from the texture with:
precision mediump float;
...
uniform sampler2D uDisp;
...
void main(){
...
float columnWidth = availableWidth / barsCount;
float barIndex = floor((coord.x-paddingH)/columnWidth);
float textureX = min( 1.0, (barIndex+1.0)/barsCount );
float barValue = texture2D(uDisp, vec2(textureX, 0.0)).r;
...
If instead of the value from the texture I use something else the issue doesn't seem to be there.
barValue = barIndex*0.1;
Any idea what could be the issue? Is using a texture for this purpose a bad idea?
I am using Pixi.JS as WebGL framework, so I don't have access to low level APIs.
With a gradient texture for the data and many bars the problems becomes pretty evident.
Update: Looks like the issue relates to the consistency of the value of textureX.
Trying different formulas like barIndex/(barsCount-1.0) results in less noise. Wrapping it on a min definitely adds more noise.
Turned out the issue wasn't in reading the values from the texture, but was in the drawing. Instead of using IFs I switched to step and the problem went away.
vec2 topLeft = vec2(
paddingH + (barIndex*columnWidth) + ((columnWidth-barWidthInPixels)*0.5),
top
);
vec2 bottomRight = vec2(
topLeft.x + barWidthInPixels,
bottom
);
vec2 tl = step(topLeft, coord);
vec2 br = 1.0-step(bottomRight, coord);
float blend = tl.x * tl.y * br.x * br.y;
I guess comparisons of floats through IFs are not very reliable in shaders.
Generally mediump is insufficient for texture coordinates for any non-trivial texture, so where possible use highp. This isn't always available on some older GPUs, so depending on the platform this may not solve your problem.
If you know you are doing 1:1 mapping then also use GL_NEAREST rather than GL_LINEAR, as the quantization effect will more likely hide some of the precision side-effects.
Given you probably know the number of columns and bars you can probably pre-compute some of the values on the CPU (e.g. precompute 1/columns and pass that as a uniform) at fp32 precision. Passing in small values between 0 and 1 is always much better at preserving floating point accuracy, rather than passing in big values and then dividing out.

DirectX11: Height based texture blending

I currently have 3 textures being blended using a slope amount, in my terrain project. I do this by sampling each texture, determining the slope amount and setting the texture colour based on a lerp between two textures. This is the snippet of this from my pixel shader:
static const float TEX_LOW_BOUND = 0.4f;
static const float TEX_HIGH_BOUND = 0.7f;
...
float4 texColour;
float4 lowColour = lowerTex.Sample(SWrap, pin.Tex);
float4 midColour = middleTex.Sample(SWrap, pin.Tex);
float4 hiColour = upperTex.Sample(SWrap, pin.Tex);
float slope = 1.0f - pin.Normal.y;
if (slope < TEX_LOW_BOUND)
{
texColour = lerp(lowColour, midColour, slope / TEX_LOW_BOUND);
}
else if (slope >= TEX_LOW_BOUND && slope < TEX_HIGH_BOUND)
{
texColour = lerp(midColour, hiColour, (slope - TEX_LOW_BOUND) * (1.0f / (TEX_HIGH_BOUND - TEX_LOW_BOUND)));
}
else if (slope >= TEX_HIGH_BOUND)
{
texColour = hiColour;
}
I want to add a final snow texture, to apply above a certain height. I get the height value in my vertex shader by using:
vout.WHeight = mul(vin.Pos, worldMatrix).y;
I can then just set the texture colour to the snow above a certain height using this in my pixel shader:
if (pin.WHeight > 35.0f)
{
texColour = snowTex.Sample(SWrap, pin.Tex);
}
Which produces the following:
How can I blend the edge of the snow with the other textures so that the edge isn't so harsh. Bearing in mind the other textures may have already been lerped, and i'd like to maintain the texture colour.
Thank you for your time
You can do basically the same thing you just did when adding in the color for the snow caps, but what you would need here is a ranged input to determine if it is close to the edge. There are several approaches to doing this. One method could be to blend the pixel values with color addition or subtraction then normalize between the range of color value. The other would be to apply multiple texture blending. As you stated in your condition above if (pin.WHeight > 35.0f)
we know that 35.0f is the maximum height value before you start to apply the snow texture. Depending on your desired results your ranged based input might be something like: if ( height > 34.8f && height < 35.2f ) { apply texture blending or color blending; }.
The other method would be to use an alpha value with transparency fading layer over top of the original layer using the same ranged input to produce the desired output.
The only thing with this type of approach or algorithm is that it may not appear to look as realistic as you would like. This is because all the snow caps will have exactly the same height value creating an unrealistic perimeter.
A suggestion which would be close to your original approach may work out better. When applying the texture or color to your snow caps you could have an nondeterministic algorithm that would randomly select specific heights within a min range to apply the texture - texture blending, then anything over a specific height above that would then smooth out to being pure white. This way each mountain top would have a white cap, but not all of the heights would be the same at the lower bounds.

Generating a 3DLUT (.3dl file) for sRGB to CIELAB colorspace transformation

We already have a highly optimized class in our API to read 3D Lut(Nuke format) files and apply the transform to the image. So instead of iterating pixel-by-pixel and converting RGB values to Lab (RGB->XYZ->Lab) values using the complex formulae, I think it would be better if I generated a lookup table for RGB to LAB (or XYZ to LAB) transform. Is this possible?
I understood how the 3D Lut works for transformations from RGB to RGB, but I am confused about RGB to Lab as L, a and b have different ranges. Any hints ?
EDIT:
Can you please explain me how the Lut will work ?
Heres one explanation: link
e.g Below is my understanding for a 3D Lut for RGB->RGB transform:
a sample Nuke 3dl Lut file:
0 64 128 192 256 320 384 448 512 576 640 704 768 832 896 960 1023
R, G, B
0, 0, 0
0, 0, 64
0, 0, 128
0, 0, 192
0, 0, 256
.
.
.
0, 64, 0
0, 64, 64
0, 64, 128
.
.
Here instead of generating a 1024*1024*1024 table for the source 10-bit RGB values, each R,G and B range is quantized to 17 values generating a 4913 row table.
The first line gives the possible quantized values (I think here only the length and the max value matter ). Now suppose, if the source RGB value is (20, 20, 190 ), the output would be line # 4 (0, 0, 192) (using some interpolation techniques). Is that correct?
This one is for 10-bit source, you could generate a smiliar one for 8-bit by changing the range from 0 to 255?
Similarly, how would you proceed for sRGB->Lab conversion ?
An alternative approach makes use of graphics hardware, aka "general purpose GPU computing". There are some different tools for this, e.g. OpenGL GLSL, OpenCL, CUDA, ... You should gain an incredible speedup of about 100x and more compared to a CPU solution.
The most "compatible" solution is to use OpenGL with a special fragment shader with which you can perform computations. This means: upload your input image as a texture to the GPU, render it in a (target) framebuffer with a special shader program which converts your RGB data to Lab (or it can also make use of a lookup table, but most float computations on the GPU are faster than table / texture lookups, so we won't do this here).
First, port your RGB to Lab conversion function to GLSL. It should work on float numbers, so if you used integral values in your original conversion, get rid of them. OpenGL uses "clamp" values, i.e. float values between 0.0 and 1.0. It will look like this:
vec3 rgbToLab(vec3 rgb) {
vec3 lab = ...;
return lab;
}
Then, write the rest of the shader, which will fetch a pixel of the (RGB) texture, calls the conversion function and writes the pixel in the color output variable (don't forget the alpha channel):
uniform sampler2D texture;
varying vec2 texCoord;
void main() {
vec3 rgb = texture2D(texture, texCoord).rgb;
gl_FragColor = vec4(lab, 1.0);
}
The corresponding vertex shader should write texCoord values of (0,0) in the bottom left and (1,1) in the top right of a target quad filling the whole screen (framebuffer).
Finally, use this shader program in your application by rendering on a framebuffer with the same size than your image. Render a quad which fills the whole region (without setting any transformations, just render a quad from the 2D vertices (-1,-1) to (1,1)). Set the uniform value texture to your RGB image which you uploaded as a texture. Then, read back the framebuffer from the device, which should hopefully contain your image in Lab color space.
Assuming your source colorspace is a triplet of bytes (RGB, 8 bits each) and both color spaces are stored in structs with the names SourceColor and TargetColor respectively, and you have a conversion function given like this:
TargetColor convert(SourceColor color) {
return ...
}
Then you can create a table like this:
TargetColor table[256][256][256]; // 16M * sizeof(TargetColor) => put on heap!
for (int r, r < 256; ++r)
for (int g, g < 256; ++g)
for (int b, b < 256; ++b)
table[r][g][b] = convert({r, g, b}); // (construct SourceColor from r,g,b)
Then, for the actual image conversion, use an alternative convert function (I'd suggest that you write a image conversion class which takes a function pointer / std::function in its constructor, so it's easily exchangeable):
TargetColor convertUsingTable(SourceColor source) {
return table[source.r][source.g][source.b];
}
Note that the space consumption is 16M * sizeof(TargetColor) (assuming 32 bit for Lab this will be 64MBytes), so the table should be heap-allocated (it can be stored in-class if your class is going to live on the heap, but better allocate it with new[] in the constructor and store it in a smart pointer).