Normalizing a vector error in HLSL - hlsl

I have completely re-written this first post to better show what the problem is.
I am using ps v1.4 (highest version I have support for) and keep getting an error.
It happens any time I use any type of function such as cos, dot, distance, sqrt, normalize etc. on something that was passed into the pixelshader.
For example, I need to do "normalize(LightPosition - PixelPosition)" to use a point light in my pixelshader, but normalize gives me an error.
Some things to note-
I can use things like pow, abs, and radians with no error.
There is only an error if it is done on something passed from the vertex shader. (For example I could take the sqrt of a local pixelshader variable with no error)
I get the error from doing a function on ANY variable passed in, even text coords, color, etc.
Inside the vertex shader I can do all of these functions on any variables passed in with no errors, it's only in the pixelshader that I get an error
All the values passing from the vertex to pixel shader are correct, because if I use software processing rather than hardware I get no error and a perfectly lit scene.
Since normalizing the vector is essentially where my error comes form I tried creating my own normalizing function.
I call Norm(LightPosition - PixelPosition) and "Norm" looks like this -
float3 Norm(float3 v)
{
return v / sqrt(dot(v, v));
}
I still get the error because I guess technically I'm still trying to take a sqrt inside the pixelshader.
The error isn't anything specific, it just says "error in application" on the line where I load my .fx file in C#
I'm thinking it could actually be a compiling error because I have to use such old versions (vs 1.1 and ps 1.4)
When debugged using fxc.exe it tells me "can not map instruction to pixel shader instruction set"

Old GPU:s didn't always support any instruction, especially in the pixel shader.
You might get away with a sqrt in the vertex shader but for a so old version (1.1 !!) the fragment shader might be extremely limited.
I.e this might not be a bug.
The work around could be to skip the hlsl and write your own assembler (but you might stumble onto the same problem there) and simulate the sqrt (say with a texture lookup and / or interpolations if you can have 2 textures in 1.0 :-p )
You can of course try to write a sqrt-lookup/interpolation in hlsl but it might be too big too (I don't remember but IIRC 1.1 don't let you write very long shaders).

Related

Side-effects of display_reset() and texture_set_interpolation()

It seems like display_reset() also resets the texture interpolation setting.
Can someone explain to me what happens internally when you mix texture_set_interpolation and display_reset? The docs tell me nothing, yet when i have interpolation enabled and switch on 8x AA, my sprite lines get pixely again. When i immediately call texture_set_interpolation after the display_reset() it still doesn't work - i'll have to wait a few steps before i can enable it and it removes the pixely lines.
This isn't optimal for my game initialisation code - it loads the users' preferences, and calls display_reset() and calls texture_set_interpolation() afterwards. It ends up with textures still being pixely.
Note: I feel like the anti-alias changes do nothing. I have a smoke-line-texture that i transform with a shader to make it feel real, (this warps the smoke), but the lines are all very pixelly - it looks the same in 0 antialias as it does on 8x antialias. It only looks smooth when i enable texture interpolation.

Using gl_SampleMask with multisample texture doesn't get per-sample blend?

I got problem when using gl_SampleMask with multisample texture.
To simplify problem I give this example.
Drawing two triangles to framebuffer with a 32x multisample texture attached.
Vertexes of triangles are (0,0) (100,0) (100,1) and (0,0) (0,1) (100,1).
In fragment shader, I have code like this,
#extension GL_NV_sample_mask_override_coverage : require
layout(override_coverage) out int gl_SampleMask[];
...
out_color = vec4(1,0,0,1);
coverage_mask = gen_mask( gl_FragCoord.x / 100.0 * 8.0 );
gl_SampleMask[0] = coverage_mask;
function int gen_mask(int X) generates a integer with X 1s in it's binary representation.
Hopefully, I'd see 100 pixel filled with full red color.
But actually I got alpha-blended output. Pixel at (50,0) shows (1,0.25,0.25), which seems to be two (1,0,0,0.5) drawing on (1,1,1,1) background.
However, if I break the coverage_mask, check gl_SampleID in fragment shader, and write (1,0,0,1) or (0,0,0,0) to output color according to coverage_mask's gl_SampleID's bit,
if ((coverage_mask >> gl_SampleID) & (1 == 1) ) {
out_color = vec4(1,0,0,1);
} else {
out_color = vec4(0,0,0,0);
}
I got 100 red pixel as expected.
I've checked OpenGL wiki and document but didn't found why the behavior changed here.
And, i'm using Nvidia GTX 980 with driver version 361.43 on Windows 10.
I'd put the test code to GitHub later if necessary.
when texture has 32 samples, Nvidia's implementation split one pixel to four small fragment, each have 8 samples. So in each fragment shader there are only 8-bit gl_SampleMask available.
OK, let's assume that's true. How do you suppose NVIDIA implements this?
Well, the OpenGL specification does not allow them to implement this by changing the effective size of gl_SampleMask. It makes it very clear that the size of the sample mask must be large enough to hold the maximum number of samples supported by the implementation. So if GL_MAX_SAMPLES returns 32, then gl_SampleMask must have 32 bits of storage.
So how would they implement it? Well, there's one simple way: the coverage mask. They give each of the 4 fragments a separate 8 bits of coverage mask that they write their outputs to. Which would work perfectly fine...
Until you overrode the coverage mask with override_coverage. This now means all 4 fragment shader invocations can write to the same samples as other FS invocations.
Oops.
I haven't directly tested NVIDIA's implementation to be certain of that, but it is very much consistent with the results you get. Each FS instance in your code will write to, at most, 8 samples. The same 8 samples. 8/32 is 0.25, which is exactly what you get: 0.25 of the color you wrote. Even though 4 FS's may be writing for the same pixel, each one is writing to the same 25% of the coverage mask.
There's no "alpha-blended output"; it's just doing what you asked.
As to why your second code works... well, you fell victim to one of the classic C/C++ (and therefore GLSL) blunders: operator precedence. Allow me to parenthesize your condition to show you what the compiler thinks you wrote:
((coverage_mask >> gl_SampleID) & (1 == 1))
Equality testing has a higher precedence than any bitwise operation. So it gets grouped like this. Now, a conformant GLSL implementation should have failed to compile because of that, since the result of 1 == 1 is a boolean, which cannot be used in a bitwise & operation.
Of course, NVIDIA has always had a tendency to play fast-and-loose with GLSL, so it doesn't surprise me that they allow this nonsense code to compile. Much like C++. I have no idea what this code would actually do; it depends on how a true boolean value gets transformed into an integer. And GLSL doesn't define such an implicit conversion, so it's up to NVIDIA to decide what that means.
The traditional condition for testing a bit is this:
(coverage_mask & (0x1 << gl_SampleID))
It also avoids undefined behavior if coverage_mask isn't an unsigned integer.
Of course, doing the condition correctly should give you... the exact same answer as the first one.

Load mesh file with TetGen in C++

I want to load a mesh file using TetGen library in C++ but I don't know the right procedure or what switches to activate in my code in order to show the Constrained Delaunay mesh.
I tried something basic loading of a dinosaur mesh (from rocq.inria.fr) with default behavior:
tetgenio in, out;
in.firstnumber = 0;
in.load_medit("TetGen\\parasaur1_cut.mesh",0);
tetgenbehavior *b = new tetgenbehavior();
tetrahedralize(b, &in, &out);
The shape is supposed to be like this:
When using TetView it works perfectly. But with my code I got the following result:
I tried to activate the Piecewise Linear Complex (plc) property for Delaunay Constraint:
b->plc = 1;
and I got just a few parts from the mesh:
Maybe there are more parts but I don't know how to get them.
That looks a lot like you might be loading a quad mesh as a triangle mesh or vice versa. One thing is clear, you are getting the floats from the file, since the boundaries of the object look roughly correct. Make certain you are loading a strictly triangle or quad-based mesh. If it is a format that you can load into Blender, I'd recommend loading it, triangulating it, and re-exporting it, just in case a poly snuck into there.
Another possibility is an indexing off by one error. Are you sure you are getting each triangle/quad in the correct order? Which is to say -- make sure you are loading triangles 123 123 123 and NOT 1 231 231 231.
One other possibility, if this format indexes all of the vertices, and then lists the indexes of the vertices, you might be loading all of the vertices correctly, and then getting the indexes of the triangles/quads messed up, as described in the previous two paragraphs. I'm thinking this is the case, since it looks like all of your points are correct, but the lines connecting them are way wrong.

How do images work in opencl kernel?

I'm trying to find ways to copy multidimensional arrays from host to device in opencl and thought an approach was to use an image... which can be 1, 2, or 3 dimensional objects. However I'm confused because when reading a pixle from an array, they are using vector datatypes. Normally I would think double pointer, but it doesn't sound like that is what is meant by vector datatypes. Anyway here are my questions:
1) What is actually meant to vector datatype, why wouldn't we just specify 2 or 3 indices when denoting pixel coordinates? It looks like a single value such as float2 is being used to denote coordinates, but that makes no sense to me. I'm looking at the function read_imageui and read_image.
2) Can the input image just be a subset of the entire image and sampler be the subset of the input image? I don't understand how the coordinates are actually specified here either since read_image() only seams to take a single value for input and a single value for sampler.
3) If doing linear algebra, should I just bite the bullet and translate 1-D array data from the buffer into multi-dim arrays in opencl?
4) I'm still interested in images, so even if what I want to do is not best for images, could you still explain questions 1 and 2?
Thanks!
EDIT
I wanted to refine my question and ask, in the following khronos documentation they define...
int4 read_imagei (
image2d_t image,
sampler_t sampler,
int2 coord)
But nowhere can I find what image2d_t's definition or structure is supposed to be. The samething for sampler_t and int2 coord. They seem like structs to me or pointers to structs since opencl is supposed to be based on ansi c, but what are the fields of these structs or how do I note the coord with what looks like a scala?! I've seen the notation (int2)(x,y), but that's not ansi c, that looks like scala, haha. Things seem conflicting to me. Thanks again!
In general you can read from images in three different ways:
direct pixel access, no sampling
sampling, normalized coordinates
sampling, integer coordinates
The first one is what you want, that is, you pass integer pixel coordinates like (10, 43) and it will return the contents of the image at that point, with no filtering whatsoever, as if it were a memory buffer. You can use the read_image*() family of functions which take no sampler_t param.
The second one is what most people want from images, you specify normalized image coords between 0 and 1, and the return value is the interpolated image color at the specified point (so if your coordinates specify a point in between pixels, the color is interpolated based on surrounding pixel colors). The interpolation, and the way out-of-bounds coordinates are handled, are defined by the configuration of the sampler_t parameter you pass to the function.
The third one is the same as the second one, except the texture coordinates are not normalized, and the sampler needs to be configured accordingly. In some sense the third way is closer to the first, and the only additional feature it provides is the ability to handle out-of-bounds pixel coordinates (for instance, by wrapping or clamping them) instead of you doing it manually.
Finally, the different versions of each function, e.g. read_imagef, read_imagei, read_imageui are to be used depending on the pixel format of your image. If it contains floats (in each channel), use read_imagef, if it contains signed integers (in each channel), use read_imagei, etc...
Writing to an image on the other hand is straightforward, there are write_image{f,i,ui}() functions that take an image object, integer pixel coordinates and a pixel color, all very easy.
Note that you cannot read and write to the same image in the same kernel! (I don't know if recent OpenCL versions have changed that). In general I would recommend using a buffer if you are not going to be using images as actual images (i.e. input textures that you sample or output textures that you write to only once at the end of your kernel).
About the image2d_t, sampler_t types, they are OpenCL "pseudo-objects" that you can pass into a kernel from C (they are reserved types). You send your image or your sampler from the C side into clSetKernelArg, and the kernel gets back a sampler_t or an image2d_t in the kernel's parameter list (just like you pass in a buffer object and it gets a pointer). The objects themselves cannot be meaningfully manipulated inside the kernel, they are just handles that you can send into the read_image/write_image functions, along with a few others.
As for the "actual" low-level difference between images and buffers, GPU's often have specially reserved texture memory that is highly optimized for "read often, write once" access patterns, with special texture sampling hardware and texture caches to optimize scatter reads, mipmaps, etc..
On the CPU there is probably no underlying difference between an image and a buffer, and your runtime likely implements both as memory arrays while enforcing image semantics.

GLSL 3D Noise Implementation on ATI Graphics Cards

I have tried so many different strategies to get a usable noise function and none of them work. So, how do you implement perlin noise on an ATI graphics card in GLSL?
Here are the methods I have tried:
I have tried putting the permutation and gradient data into a GL_RGBA 1D texture and calling the texture1D function. However, one call to this noise implementation leads to 12 texture calls and kills the framerate.
I have tried uploading the permutation and gradient data into a uniform vec4 array, but the compiler won't let me get an element in the array unless the index is a constant. For example:
int i = 10;
vec4 a = noise_data[i];
will give a compiler error of this:
ERROR: 0:43: Not supported when use temporary array indirect index.
Meaning I can only retrieve the data like this:
vec4 a = noise_data[10];
I also tried programming the array directly into the shader, but I got the same index issue. I hear NVIDIA graphics cards will actually allow this method, but ATI will not.
I tried making a function that returned a specific hard coded data point depending on the input index, but the function, being called 12 times and having 64 if statements, made the linking time unbearable.
ATI does not support the "built in" noise functions for glsl, and I cant just precompute the noise and import it as a texture, because I am dealing with fractals. This means I need the infinite precision of calculating the noise at run time.
So the overarching question is...
How?
For better distribution of random values I suggest these very good articles:
Pseudo Random Number Generator in GLSL
Lumina noise GLSL tutorial
Have random fun !!!
There is a project on github with GLSL noise functions. It has both the "classic" and newer noise functions in 2,3, and 4D.
IOS does have the noise function implemented.
noise() is well-known for not beeing implemented...
roll you own :
int c;
int Xn;
srand(int x, int y, int width){// in pixel
c = x+y*width;
};
int rand(){
Xn = (a*Xn+c)%m;
return Xn;
}
for a and m values, see wikipedia
It's not perfect, but often good enough.
This SimpleX noise stuff might do what you want.
Try adding #version 150 to the top of your shader.