In a fragment shader, the following compiles fine:
uniform isampler2D testTexture;
/* in main() x, y, xoff and yoff are declared as int and assigned here, then... */
int tmp = texelFetchOffset(testTexture, ivec2(x, y), 0, ivec2(xoff, yoff)).r;
However, the following does not compile:
uniform usampler2D testTexture;
/* in main() x, y, xoff and yoff are declared as uint and assigned here, then... */
uint tmp = texelFetchOffset(testTexture, uvec2(x, y), 0, uvec2(xoff, yoff)).r;
The OpenGL 4.2 driver gives the following compiler error message:
error C1115: unable to find compatible overloaded function "texelFetchOffset(usampler2D, uvec2, int, uvec2)
This is Nvidia's Linux driver 290.* for a Quadro 5010M -- but I'm wondering if I made a (beginner) mistake and was not working to spec somehow here?
The texelFetchOffset function that takes a usampler2D still takes an ivec2 as its texture coordinates and offset. The u only applies to the sampler type and return value; not everything about the function becomes unsigned.
And remember: OpenGL doesn't allow implicit conversions between unsigned and signed integer types.
Related
I tried getting an uniform vec3 from the fragment shader to my CPU using glGetnUniformfv. According to the documentation this should perfectly work. It also works when only getting a float from the shader. But when used like this,
float f[3] = {0.0f};
glGetnUniformfv(program, glGetUniformLocation(program, name.c_str()), 3, f);
my program crashes. I checked the glGetUniformLocation but it had a valid output.
The third parameter to the glGetnUniform family of functions is not actually the number of entries in the array. It is the byte size of the array pointed to by f. Which, because f is an array rather than just a pointer to an array, would be sizeof(f).
Now, your implementation shouldn't have crashed, so there's probably something else going on there. But this is the problem in the code you've provided.
Unless you're using a context that actually supports OpenGL 4.5+, get the vec3 using "the old way" like this:
float f[3] = {0.0f};
glGetUniformfv(program, glGetUniformLocation(program, name.c_str()), f);
The new desktop-only glGetnUniform entry points exist only for extra safety, similar to strncpy vs strcpy.
Also, if you do use the glGetn variant, you should pass 12 instead of 3 for bufSize since it's a byte count.
If I have the following code in a GLSL fragment shader:
float r = 0.386;
float a = 26.6;
float xd = r*cos(0.0174532924*(a+0));
float yd = r*sin(0.0174532924*(a+0));
float xe = r*cos(0.0174532924*(a+90));
float ye = r*sin(0.0174532924*(a+90));
is it a sane assumption that the compiler will evaluate those trigonometric functions instead of have them be evaluated in every fragment execution?
In this case, sadly, you can't know much, since the compilation is done by the GPU. I would say it is implementation dependent, since some compilers may be better optimized.
However, as WearyWanderer sayed, you can hardcode the values or pass them through uniforms/UBO.
As you mentioned you could calculate the values and directly assign them, but want to let it for documentation purposes, I assum the values will be the same in every execution of the shader code.
Uniform Variables are variables that you can calculate once, send to a shader, and are the same for every execution, unless you change the uniform variable at some point. For example:
float r = 0.386;
float a = 26.6;
float xd_val = r*cos(0.0174532924*(a+0));
GLuint xd_id = glGetUniformLocation(pShaderProgram, "xd");
glUniform1f(xd_id, xd_val);
This calculates the value only once on the CPU, passes it to the shader program as a uniform variable, and the shader has access to the value for every execution without recaulcating it, but still leaves the code in here for your documentation that you wanted.
Uniform's are commonly used for object wide values, I.E an alpha-value, passing in scene lights for phong shader model, etc.
I'm writing an OpenGL/CUDA (6.5) interop application. I get a compile time error trying to write a floating point value to an OpenGL texture through a surface reference in my CUDA kernel.
Here I give a high level description of how I set up the interop, but I am successfully reading from my texture in my CUDA kernel, so I believe this is done correctly. I have an OpenGL texture declared with
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB32F_ARB, 512, 512, 0, GL_RGB, GL_FLOAT, NULL);
After creating the texture I call cudaGraphicsGLRegisterImage with cudaGraphicsRegisterFlagsSurfaceLoadStore set. Before running my CUDA kernel, I unbind the texture and call cudaGraphicsMapResources on the cudaGraphicsResource pointers obtained from cudaGraphicsGLRegisterImage. Then I get a cudaArray from cudaGraphicsSubResourceGetMappedArray, create an appropriate resource descriptor for the array, and call cudaCreateSurfaceObject to get a pointer to a cudaSurfaceObject_t. I then call cudaMemcpy with cudaMemcpyHostToDevice to copy the cudaSurfaceObject_t to a buffer on the device allocated by cudaMalloc.
In my CUDA kernel I can read from the surface reference with something like this, and I have verified that this works as expected.
__global__ void cudaKernel(cudaSurfaceObject_t tex) {
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
float4 sample = surf2Dread<float4>(tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
In the kernel I want to modify sample and write it back to the texture. The GPU has compute capability 5.0, so this should be possible. I am trying this
surf2Dwrite<float4>(sample, tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
But I get the error:
error: no instance of overloaded function "surf2Dwrite" matches the argument list
argument types are: (float4, cudaSurfaceObject_t, int, int, cudaSurfaceBoundaryMode)
I can see in
cuda-6.5/include/surface_functions.h
that there are only prototypes for integral versions of surf2Dwrite that accept a void * for the second argument. I do see prototypes for surf2Dwrite which accept a float4 with a templated surface object, However, I'm not sure how I could declare a templated surface object with OpenGL interop. I haven't been able to find anything else on how to do this. Any help is appreciated. Thanks.
It turns out the answer was pretty simple, though I don't know why it works. Instead of calling
surf2Dwrite<float4>(sample, tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
I needed to call
surf2Dwrite(sample, tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
To be honest I'm not sure I fully understand CUDA's use of templating in c++. Anyone have an explanation?
For a complete example of CUDA writing to a surface that's linked to an OpenGL texture, refer to this project:
https://github.com/nvpro-samples/gl_cuda_interop_pingpong_st
From the CUDA Documentation, here is the definition of surface template functions:
template<class T>
T surf2Dread(cudaSurfaceObject_t surfObj,
int x, int y,
boundaryMode = cudaBoundaryModeTrap);
template<class T>
void surf2Dread(T* data,
cudaSurfaceObject_t surfObj,
int x, int y,
boundaryMode = cudaBoundaryModeTrap);
I am currently working with GLSL 330 and came across some odd behavior of the mod() function.
Im working under windows 8 with a Radeon HD 6470M. I can not recreate this behavior on my desktop PC which uses windows 7 and a GeForce GTX 260.
Here is my test code:
float testvalf = -126;
vec2 testval = vec2(-126, -126);
float modtest1 = mod(testvalf, 63.0); //returns 63
float modtest2 = mod(testval.x, 63.0); //returns 63
float modtest3 = mod(-126, 63.0); //returns 0
Edit:
Here are some more test results done after IceCools suggestion below.
int y = 63;
int inttestval = -126;
ivec2 intvectest(-126, -126);
float floattestval = -125.9;
float modtest4 = mod(inttestval, 63); //returns 63
float modtest5 = mod(intvectest, 63); //returns vec2(63.0, 63.0)
float modtest6 = mod(intvectest.x, 63); //returns 63
float modtest7 = mod(floor(floattestval), 63); //returns 63
float modtest8 = mod(inttestval, y); //returns 63
float modtest9 = mod(-126, y); //returns 63
I updated my drivers and tested again, same results. Once again not reproducable on the desktop.
According to the GLSL docs on mod the possible parameter combinations are (GenType, float) and (GenType, GenType) (no double, since we're < 4.0). Also the return type is forced to float but that shouldn't matter for this problem.
I don't know that if you did it on intention but -126 is an int not a float, and the code might not be doing what you expect.
By the way about the modulo:
Notice that 2 different functions are called:
The first two line:
float mod(float, float);
The last line:
int mod(int, float);
If I'm right mod is calculated like:
genType mod(genType x, float y){
return x - y*floor(x/y);
}
Now note, that if x/y evaluates -2.0 it will return 0, but if it evaluates as -2.00000001 then 63.0 will be returned. That difference is not impossible between int/float and float/float division.
So the reason is might be just the fact that you are using ints and floats mixed.
I think I have found the answer.
One thing I've been wrong about is that mangsl's keyword for genType doesn't mean a generic type, like in a c++ template.
GenType is shorthand for float, vec2, vec3, and vec4 (see link - ctrl+f genType).
Btw genType naming is like:
genType - floats
genDType - doubles
genIType - ints
genBType - bools
Which means that genType mod(genType, float) implies that there is no function like int mod(int, float).
All the code above have been calling float mod(float, float) (thankfully there is implicit typecast for function parameters, so mod(int, int) works too, but actually mod(float, float) is called).
Just as a proof:
int x = mod(-126, 63);
Doesn't compile: error C7011: implicit cast from "float" to "int"
It only doesn't work because it returns float, so it works like this:
float x = mod(-126, 63);
Therefore float mod(float, float) is called.
So we are back at the original problem:
float division is inaccurate
int to float cast is inaccurate
It shouldn't be a problem on most GPU, as floats are considered equal if the difference between them is less than 10^-5 (it may vary with hardware, but this is the case for my GPU). So floor(-2.0000001) is -2. Highp floats are far more accurate than this.
Therefore either you are not using highp floats (precision highp float; should fix it then) or your GPU has stricter limit for float equality, or some of the functions are returning less accurate value.
If all else fails try:
#extension BlackMagic : enable
Maybe some driver setting is forcing default float precision to be mediump.
If this happens, all your defined variables will be mediump, however, numbers typed in the code will still remain highp.
Consider this code:
precision mediump float;
float x = 0.4121551, y = 0.4121552;
x == y; // true
0.4121551 == 0.4121552; // false, as highp they still differ.
So that mod(-126,63.0) could be still precise enough to return the correct value, as its working with high precision floats, however if you give a variable (like at all the other cases), which will only be mediump, the function won't have enough precision to calculate the correct value, and as you look at your tests, this is what's happening:
All the functions that take at least one variable are not precise enough
The only function call that takes 2 typed numbers return the correct value.
In my OpenCL code (which is not coded by myself, it's just an example code from internet), there is the following sentence to use the function of clamp.
return clamp(color,0,1);
However it seems that this makes error during compilation, so I got the error info message by using CL_PROGRAM_BUILD_LOG from clGetProgramBuildInfo.
Error during compilation! (-11)
4483
build log
:211:9: error: call to 'clamp' is ambiguous
return clamp(color,0,1);
^~~~~
<built-in>:3558:26: note: candidate function
float4 __OVERLOADABLE__ clamp(float4 x, float min, float max) ;
^
<built-in>:3577:25: note: candidate function
float4 __OVERLOADABLE__ clamp(float4, float4, float4);
^
<built-in>:3556:26: note: candidate function
float3 __OVERLOADABLE__ clamp(float3 x, float min, float max) ;
^
<built-in>:3575:25: note: candidate function
float3 __OVERLOADABLE__ clamp(float3, float3, float3);
^
:296:52: error: address expression must be an lvalue or a function designator
r.origin = matrixVectorMultiply(viewTransform, &(float3)(0, 0, -1));
^~~~~~~~~~~~~~~~~~
:297:62: error: address expression must be an lvalue or a function designator
r.dir = normalize(matrixVectorMultiply(viewTransform, &(float3)(x, y, 0)) - r.origin);
^~~~~~~~~~~~~~~~~
Is there any necessary keyword for using clamp function in OpenCL code? BTW, I'm using the environment of the Linux Ubuntu 10.04 64bit.
Try the following
return clamp(color,0.0f,1.0f);
This way we know for sure that 2nd and 3rd params are not ambiguous and that you are trying to call the function:
clamp(float4 color, float min, float max);
If this doesn't work, then see your color param, but the 2nd and 3rd param should be fine now.
There are several overloaded clamp builtin functions in OpenCL; the compiler needs to select exactly one, based on the types of the arguments. Valid combinations are
T clamp(T,T,T) and T clamp(T,S,S)
where T is one of the OpenCL integral or floating point types, and S is the scalar type of an element of T when T is a vector type.
It would appear that your sample code was illegally mixing float and integer arguments to the call. The constants 1 and 0 are of type int, unlike 0.0f and 1.0f which are of type float.
See the quick reference card for more details.
I am getting the same problems on the same piece of code (http://www.gamedev.net/blog/1241/entry-2254210-realtime-raytracing-with-opencl-ii/). It is written poorly and managed to hang my pc.
The clamp() problem is indeed fixed by making sure that the last two arguments are floats.
The matrixVectorMultiply() problem is fixed by changing the signature of that function. It originally is:
float3 matrixVectorMultiply(__global float* matrix, float3* vector){
float3 result;
result.x = matrix[0]*((*vector).x)+matrix[4]*((*vector).y)+matrix[8]*((*vector).z)+matrix[12];
result.y = matrix[1]*((*vector).x)+matrix[5]*((*vector).y)+matrix[9]*((*vector).z)+matrix[13];
result.z = matrix[2]*((*vector).x)+matrix[6]*((*vector).y)+matrix[10]*((*vector).z)+matrix[14];
return result;
}
However there is absolutely no reason for vector to be a pointer, so you can remove the * before every occurrence of vector.
Then the code should compile, but the program probably still crashes.
Probably not your issue, but worth noting: Between OpenCL 1.0 and 1.1 clamp changed slightly, so if you are not careful you can have code that compiles in one version and not the other. Specifically, in the OpenCL 1.1 specification, "Appendix F – Changes", "F.1 Summary of changes from OpenCL 1.0" it says "The following features are added to the OpenCL C programming language (section 6):", then "New built-in functions", then "clamp integer function defined in section 6.11.3"
So you're best off fully qualifying your parameters.
Related to this, in OpenCL 1.1 added (vector, scalar) variant of integer functions min and max, so don't use those in 1.0 (cast the scalar parameters to vectors instead).