GLSL compile fails on a mat4x3 vec4 multiplication

GLSL compile fails on a mat4x3 vec4 multiplication - opengl

I have a GLSL shader that works fine, until I add the following lines to it.
"vec4 coord = texture2D(Coordinates,gl_TexCoord[0].st);"
"mat4x3 theMatrix;"
"theMatrix[1][1] = 1.0;"
"vec3 CampProj = theMatrix * coord;"
when I check the error log I am told:
"ERROR: 0:2: '=' : cannot convert from '4-component vector of float' to '3-component vector of float'\n\n"
if I make CampProject a vec4 this compiles fine, but I am very confused as to how a 4 column, 3 row matrix multiplied by a 4 component vector is gonna result in a 4 component vector.
is this a bug, or is it possible that the 4x3 matrix is really just a 4x4 under the hood with a 0,0,0,1 final row? if not can someone explain to me why the compiler is insisting on returning a vec4?
I'm using C++ via VSExpress 2013, win7, Mobile Intel® 4 Series Express Chipset Family
UPDATE:
Reto's answer is what I expect to be the case. That it is a bug in the compilation. Both because that's the only thing that makes sense in a LA context and because the LA definition is what the GLSL documentation references for matrix/matrix and matrix / vec multiplication; however, even after updating my video chipset drivers the compilation is showing me the same behavior. could one other person confirm that behavior Reto describes?
#Reto if nobody has confirmed by 12-05-14 I'll accept your answer as correct as it seems the only real logical possibility.

This looks like a bug in your GLSL compiler. This should compile successfully:
mat4x3 mat;
vec3 res = mat * vec4(1.0);
and this should give an error:
mat4x3 mat;
vec4 res = mat * vec4(1.0);
I tested this on 3 configurations, and all of them confirmed this behavior:
Windows 8.1 with Intel HD Graphics 4600.
Windows 8.1 with NVIDIA GeForce GT 740M.
Mac OS Yosemite with Intel Iris Pro.
This also matches my understanding of the specs. In the GLSL 3.30 spec document, mat4x3 is described as:
a floating-point matrix with 4 columns and 3 rows
and multiplication is defined by (emphasis added):
The operator is multiply (*), where both operands are matrices or one operand is a vector and the other a matrix. A right vector operand is treated as a column vector and a left vector operand as a row vector. In all these cases, it is required that the number of columns of the left operand is equal to the number of rows of the right operand. Then, the multiply (*) operation does a linear algebraic multiply, yielding an object that has the same number of rows as the left operand and the same number of columns as the right operand.
In this example, the "number of columns of the left operand" is 4, which means that the vector needs to have 4 "rows", which is a column vector with 4 elements. Then, since the left operand has 3 rows, the resulting vector has 3 "rows", which is a column vector with 3 elements.
It's also the only thing that makes sense based on standard linear algebra.

That's because your component count of both your vec4 and your mat4x3 is four. You cannot expect to recieve a vec3 out of that multiplication.
Why don't you just use homogeneous matrix and a vec4? You'll get a vec4, and if you really must, can convert it to a vec3 by doing
vec3.xyz
Hope this helps, my first answer!

Related

Convert from linear RGB to XYZ

This matrix is supposed to convert from linear RGBA to XYZ, preserving the alpha channel as it is:
vec4 M[4]=
{
vec4{0.4124564f,0.3575761f,0.1804375f,0.0f}
,vec4{0.2126729f,0.7151522f,0.0721750f,0.0f}
,vec4{0.0193339f,0.1191920f,0.9503041f,0.0f}
,vec4{0.0,0.0f,0.0f,1.0f}
};
Is that correct? Where can I find the values in double precision? I am asking, because the second row is very close to the luma formula, which from what I understood is associated with non-linear sRGB values:
vec4 weights{0.2126f,0.7152f,0.0722f,0.0f};
auto temp=m_data*weights; //vectorized multiplication
return temp[0] + temp[1] + temp[2] + temp[3]; //Sum it up to compute the dot product (weighted average)
Other questions: Should the weights discussed actually be identical? Should conversion to Y'CbCr use the same weights? Should it be performed in linear or sRGB space?

This matrix converts from a sRGB flavour to CIE XYZ D65. This is however not the official sRGB matrix published in IEC 61966-2-1:1999 which is rounded to 4 digits as follows:
[[ 0.4124 0.3576 0.1805]
[ 0.2126 0.7152 0.0722]
[ 0.0193 0.1192 0.9505]]
Depending the context you are performing your conversions, it might be important to use the official IEC 61966-2-1:1999 matrix to get matching results with other third-party datasets as it is likely that they will be using the canonical matrix.
For reference here is a double precision conversion matrix computed with Colour:
[[0.412390799265960 0.357584339383878 0.180480788401834]
[0.212639005871510 0.715168678767756 0.072192315360734]
[0.019330818715592 0.119194779794626 0.950532152249661]]
And the code used to generate it:
import numpy as np
import colour
print(colour.models.sRGB_COLOURSPACE.RGB_to_XYZ_matrix)
np.set_printoptions(formatter={'float': '{:0.15f}'.format})
colour.models.sRGB_COLOURSPACE.use_derived_transformation_matrices(True)
print(colour.models.sRGB_COLOURSPACE.RGB_to_XYZ_matrix)
Should the weights discussed actually be identical?
For consistency in your computations you might want to use weights matching your matrix or you will get issues when doing conversions back and forth.
Should conversion to Y'CbCr use the same weights?
Y'CbCr has many variants and nobody will be able to answer properly if you don't know which variant you need.
Should it be performed in linear or sRGB space?
Y'CbCr conversions are pretty much always happening in gamma encoded values, ITU-R BT.2020 YcCbcCrc being a notable exception as it is based on linearly encoded values. It is also important to understand that the sRGB colourspace is also linear, as a matter of fact the matrix central to the discussion here is meant to be applied on linearly encoded sRGB values.
Those two last questions should probably be asked in another new question.

Find new camera matrix from two set of input points with OpenCV

Consider the following input data as given:
CP: camera intrinsic parameter (3x3) (more info)
WP: 4x 3D world coordinates (ie. 4 points in world space)
Now consider the following variables:
CM1: current camera location & rotation (4x4 matrix)
SP1: 4x 2D screen coordinates (ie. 4 points in screen space)
SP2: again 4x 2D screen coordinates (ie. 4 points in screen space)
with the constraint that SP1 is derived from looking through the camera at CM1 with camera parameters CP at WP. For clarity, let's express that relationship as the following (although this might not be correct formula in matrix maths).
SP1 = CM1 * CP * WP
I am trying to find the new camera location & rotation CM2 such that
SP2 = CM2 * CP * WP
I've been trying to use cvCalibrateCamera() but haven't been successful. I'm glad that I finally figured out how to convert all the data so that the function no longer spits out an error. But now I think that the function might not be the correct one to use in this case, as I have no idea how to apply the output data. (I have the feeling that the function is used to compute camera intrinsic parameters from input data, but the documentation is not clear to me. In case you are interested in my humble attempt: http://pastebin.com/WC8CKUhZ)
Is there a way to achieve what I am trying to compute with openCV? I couldn't find any function that would match my requirements more than cvCalibrateCamera() (guessed from the data I can feed into it).

Using gl_SampleMask with multisample texture doesn't get per-sample blend?

I got problem when using gl_SampleMask with multisample texture.
To simplify problem I give this example.
Drawing two triangles to framebuffer with a 32x multisample texture attached.
Vertexes of triangles are (0,0) (100,0) (100,1) and (0,0) (0,1) (100,1).
In fragment shader, I have code like this,
#extension GL_NV_sample_mask_override_coverage : require
layout(override_coverage) out int gl_SampleMask[];
...
out_color = vec4(1,0,0,1);
coverage_mask = gen_mask( gl_FragCoord.x / 100.0 * 8.0 );
gl_SampleMask[0] = coverage_mask;
function int gen_mask(int X) generates a integer with X 1s in it's binary representation.
Hopefully, I'd see 100 pixel filled with full red color.
But actually I got alpha-blended output. Pixel at (50,0) shows (1,0.25,0.25), which seems to be two (1,0,0,0.5) drawing on (1,1,1,1) background.
However, if I break the coverage_mask, check gl_SampleID in fragment shader, and write (1,0,0,1) or (0,0,0,0) to output color according to coverage_mask's gl_SampleID's bit,
if ((coverage_mask >> gl_SampleID) & (1 == 1) ) {
out_color = vec4(1,0,0,1);
} else {
out_color = vec4(0,0,0,0);
}
I got 100 red pixel as expected.
I've checked OpenGL wiki and document but didn't found why the behavior changed here.
And, i'm using Nvidia GTX 980 with driver version 361.43 on Windows 10.
I'd put the test code to GitHub later if necessary.

when texture has 32 samples, Nvidia's implementation split one pixel to four small fragment, each have 8 samples. So in each fragment shader there are only 8-bit gl_SampleMask available.
OK, let's assume that's true. How do you suppose NVIDIA implements this?
Well, the OpenGL specification does not allow them to implement this by changing the effective size of gl_SampleMask. It makes it very clear that the size of the sample mask must be large enough to hold the maximum number of samples supported by the implementation. So if GL_MAX_SAMPLES returns 32, then gl_SampleMask must have 32 bits of storage.
So how would they implement it? Well, there's one simple way: the coverage mask. They give each of the 4 fragments a separate 8 bits of coverage mask that they write their outputs to. Which would work perfectly fine...
Until you overrode the coverage mask with override_coverage. This now means all 4 fragment shader invocations can write to the same samples as other FS invocations.
Oops.
I haven't directly tested NVIDIA's implementation to be certain of that, but it is very much consistent with the results you get. Each FS instance in your code will write to, at most, 8 samples. The same 8 samples. 8/32 is 0.25, which is exactly what you get: 0.25 of the color you wrote. Even though 4 FS's may be writing for the same pixel, each one is writing to the same 25% of the coverage mask.
There's no "alpha-blended output"; it's just doing what you asked.
As to why your second code works... well, you fell victim to one of the classic C/C++ (and therefore GLSL) blunders: operator precedence. Allow me to parenthesize your condition to show you what the compiler thinks you wrote:
((coverage_mask >> gl_SampleID) & (1 == 1))
Equality testing has a higher precedence than any bitwise operation. So it gets grouped like this. Now, a conformant GLSL implementation should have failed to compile because of that, since the result of 1 == 1 is a boolean, which cannot be used in a bitwise & operation.
Of course, NVIDIA has always had a tendency to play fast-and-loose with GLSL, so it doesn't surprise me that they allow this nonsense code to compile. Much like C++. I have no idea what this code would actually do; it depends on how a true boolean value gets transformed into an integer. And GLSL doesn't define such an implicit conversion, so it's up to NVIDIA to decide what that means.
The traditional condition for testing a bit is this:
(coverage_mask & (0x1 << gl_SampleID))
It also avoids undefined behavior if coverage_mask isn't an unsigned integer.
Of course, doing the condition correctly should give you... the exact same answer as the first one.

OpenGL Pixel Shader: how to generate random matrix of 0s and 1s (on each pixel)?

So what I need is simple: each time we perform our shader (meaning on each pixel) I need to calculate random matrix of 1s and 0s with resolution == originalImageResolution. How to do such thing?
As for now I have created one for shadertoy random matrix resolution is set to 15 by 15 here because gpu makes chrome fall often when I try stuff like 200 by 200 while really I need full image resolution size
#ifdef GL_ES
precision highp float;
#endif
uniform vec2 resolution;
uniform float time;
uniform sampler2D tex0;
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * (43758.5453+ time));
}
vec3 getOne(){
vec2 p = gl_FragCoord.xy / resolution.xy;
vec3 one;
for(int i=0;i<15;i++){
for(int j=0;j<15;j++){
if(rand(p)<=0.5)
one = (one.xyz + texture2D(tex0,vec2(j,i)).xyz)/2.0;
}
}
return one;
}
void main(void)
{
gl_FragColor = vec4(getOne(),1.0);
}
And one for Adobe pixel bender:
<languageVersion: 1.0;>
kernel random
< namespace : "Random";
vendor : "Kabumbus";
version : 3;
description : "not as random as needed, not as fast as needed"; >
{
input image4 src;
output float4 outputColor;
float rand(float2 co, float2 co2){
return fract(sin(dot(co.xy ,float2(12.9898,78.233))) * (43758.5453 + (co2.x + co2.y )));
}
float4 getOne(){
float4 one;
float2 r = outCoord();
for(int i=0;i<200;i++){
for(int j=0;j<200;j++){
if(rand(r, float2(i,j))>=1.0)
one = (one + sampleLinear(src,float2(j,i)))/2.0;
}
}
return one;
}
void
evaluatePixel()
{
float4 oc = getOne();
outputColor = oc;
}
}
So my real problem is - my shaders make my GPU deiver fall. How to use GLSL for same purpose that I do now but with out failing and if possible faster?
Update:
What I want to create is called Single-Pixel Camera (google Compressive Imaging or Compressive Sensing), I want to create gpu based software implementation.
Idea is simple:
we have an image - NxM.
for each pixel in image we want GPU to performe the next operations:
to generate NxMmatrix of random values - 0s and 1s.
compute arithmetic mean of all pixels on original image whose coordinates correspond to coordinates of 1s in our random NxM matrix
output result of arithmetic mean as pixel color.
What I tried to implement in my shaders was simulate that wary process.
What is really stupid in trying to do this on gpu:
Compressive Sensing does not tall us to compute NxM matrix of such arithmetic mean values, it meeds just a peace of it (for example 1/3). So I put some pressure I do not need to on GPU. However testing on more data is not always a bad idea.

Thanks for adding more detail to clarify your question. My comments are getting too long so I'm going to an answer. Moving comments into here to keep them together:
Sorry to be slow, but I am trying to understand the problem and the goal. In your GLSL sample, I don't see a matrix being generated. I see a single vec3 being generated by summing a random selection (varying over time) of cells from a 15 x 15 texture (matrix). And that vec3 is recomputed for each pixel. Then the vec3 is used as the pixel color.
So I'm not clear whether you really want to create a matrix, or just want to compute a value for every pixel. The latter is in some sense a 'matrix', but computing a simple random value for 200 x 200 pixels would not strain your graphics driver. Also you said you wanted to use the matrix. So I don't think that's what you mean.
I'm trying to understand why you want a matrix - to preserve a consistent random basis for all the pixels? If so, you can either precompute a random texture, or use a consistent pseudorandom function like you have in rand() except not use time. You clearly know about that so I guess I still don't understand the goal. Why are you summing a random selection of cells from the texture, for each pixel?
I believe the reason your shader is crashing is that your main() function is exceeding its time limit - either for a single pixel, or for the whole set of pixels. Calling rand() 40,000 times per pixel (in a 200 * 200 nested loop) could certainly explain that!
If you had 200 x 200 pixels, and are calling sin() 40k times for each one, that's 160,000,000 calls per frame. Poor GPU!
I'm hopeful that if we understand the goal better, we'll be able to recommend a more efficient way to get the effect you want.
Update.
(Deleted this part, since it was mistaken. Even though many cells in the source matrix may each contribute less than a visually detectable amount of color to the result, the total of the many cells can contribute a visually detectable amount of color.)
New update based on updated question.
OK, (thinking "out loud" here so you can check whether I'm understanding correctly...) Since you need each of the random NxM values only once, there is no actual requirement to store them in a matrix; the values can simply be computed on demand and then thrown away. That's why your example code above does not actually generate a matrix.
This means we cannot get away from generating (NxM)^2 random values per frame, that is, NxM random values per pixel, and there are NxM pixels. So for N=M=200, that's 160 million random values per frame.
However, we can still optimize some things.
First, since your random values only need to be one bit each (you only need a boolean answer to decide whether to include each cell from the source texture into the mix), you can probably use a cheaper pseudo random number generator. The one you're using outputs much more random data per call than one bit. For example, you could call the same PRNG function as you're using now, but store the value and extract 32 random bits out of it. Or at least several, depending on how many are random enough. In addition, instead of using a sin() function, if you have extension GL_EXT_gpu_shader4 (for bitwise operators), you could use something like this:
.
int LFSR_Rand_Gen(in int n)
{
// <<, ^ and & require GL_EXT_gpu_shader4.
n = (n << 13) ^ n;
return (n * (n*n*15731+789221) + 1376312589) & 0x7fffffff;
}
Second, you are currently performing one divide operation per included cell (/2.0), which is probably relatively expensive, unless the compiler and GPU are able to optimize it into a bit shift (is that possible for floating point?). This also will not give the arithmetic mean of the input values, as discussed above... it will put much more weight on the later values and very little on the earlier ones. As a solution, keep a count of how many values are being included, and divide by that count once, after the loop is finished.
Whether these optimizations will be enough to enable your GPU driver to drive 200x200 * 200x200 pixels per frame, I don't know. They should definitely enable you to increase your resolution substantially.
Those are the ideas that occur to me off the top of my head. I am far from being a GPU expert though. It would be great if someone more qualified can chime in with suggestions.
P.S. In your comment, you jokingly (?) mentioned the option of precomputing N*M NxM random matrices. Maybe that's not a bad idea?? 40,000x40,000 is a big texture (40MB at least), but if you store 32 bits of random data per cell, that comes down to 1250 x 40,000 cells. Too bad vanilla GLSL doesn't help you with bitwise operators to extract the data, but even if you don't have the GL_EXT_gpu_shader4 extension you can still fake it. (Maybe you would also need a special extension then for non-square textures?)

GLSL 3D Noise Implementation on ATI Graphics Cards

I have tried so many different strategies to get a usable noise function and none of them work. So, how do you implement perlin noise on an ATI graphics card in GLSL?
Here are the methods I have tried:
I have tried putting the permutation and gradient data into a GL_RGBA 1D texture and calling the texture1D function. However, one call to this noise implementation leads to 12 texture calls and kills the framerate.
I have tried uploading the permutation and gradient data into a uniform vec4 array, but the compiler won't let me get an element in the array unless the index is a constant. For example:
int i = 10;
vec4 a = noise_data[i];
will give a compiler error of this:
ERROR: 0:43: Not supported when use temporary array indirect index.
Meaning I can only retrieve the data like this:
vec4 a = noise_data[10];
I also tried programming the array directly into the shader, but I got the same index issue. I hear NVIDIA graphics cards will actually allow this method, but ATI will not.
I tried making a function that returned a specific hard coded data point depending on the input index, but the function, being called 12 times and having 64 if statements, made the linking time unbearable.
ATI does not support the "built in" noise functions for glsl, and I cant just precompute the noise and import it as a texture, because I am dealing with fractals. This means I need the infinite precision of calculating the noise at run time.
So the overarching question is...
How?

For better distribution of random values I suggest these very good articles:
Pseudo Random Number Generator in GLSL
Lumina noise GLSL tutorial
Have random fun !!!

There is a project on github with GLSL noise functions. It has both the "classic" and newer noise functions in 2,3, and 4D.
IOS does have the noise function implemented.

noise() is well-known for not beeing implemented...
roll you own :
int c;
int Xn;
srand(int x, int y, int width){// in pixel
c = x+y*width;
};
int rand(){
Xn = (a*Xn+c)%m;
return Xn;
}
for a and m values, see wikipedia
It's not perfect, but often good enough.

This SimpleX noise stuff might do what you want.

Try adding #version 150 to the top of your shader.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js