I got problem when using gl_SampleMask with multisample texture.
To simplify problem I give this example.
Drawing two triangles to framebuffer with a 32x multisample texture attached.
Vertexes of triangles are (0,0) (100,0) (100,1) and (0,0) (0,1) (100,1).
In fragment shader, I have code like this,
#extension GL_NV_sample_mask_override_coverage : require
layout(override_coverage) out int gl_SampleMask[];
...
out_color = vec4(1,0,0,1);
coverage_mask = gen_mask( gl_FragCoord.x / 100.0 * 8.0 );
gl_SampleMask[0] = coverage_mask;
function int gen_mask(int X) generates a integer with X 1s in it's binary representation.
Hopefully, I'd see 100 pixel filled with full red color.
But actually I got alpha-blended output. Pixel at (50,0) shows (1,0.25,0.25), which seems to be two (1,0,0,0.5) drawing on (1,1,1,1) background.
However, if I break the coverage_mask, check gl_SampleID in fragment shader, and write (1,0,0,1) or (0,0,0,0) to output color according to coverage_mask's gl_SampleID's bit,
if ((coverage_mask >> gl_SampleID) & (1 == 1) ) {
out_color = vec4(1,0,0,1);
} else {
out_color = vec4(0,0,0,0);
}
I got 100 red pixel as expected.
I've checked OpenGL wiki and document but didn't found why the behavior changed here.
And, i'm using Nvidia GTX 980 with driver version 361.43 on Windows 10.
I'd put the test code to GitHub later if necessary.
when texture has 32 samples, Nvidia's implementation split one pixel to four small fragment, each have 8 samples. So in each fragment shader there are only 8-bit gl_SampleMask available.
OK, let's assume that's true. How do you suppose NVIDIA implements this?
Well, the OpenGL specification does not allow them to implement this by changing the effective size of gl_SampleMask. It makes it very clear that the size of the sample mask must be large enough to hold the maximum number of samples supported by the implementation. So if GL_MAX_SAMPLES returns 32, then gl_SampleMask must have 32 bits of storage.
So how would they implement it? Well, there's one simple way: the coverage mask. They give each of the 4 fragments a separate 8 bits of coverage mask that they write their outputs to. Which would work perfectly fine...
Until you overrode the coverage mask with override_coverage. This now means all 4 fragment shader invocations can write to the same samples as other FS invocations.
Oops.
I haven't directly tested NVIDIA's implementation to be certain of that, but it is very much consistent with the results you get. Each FS instance in your code will write to, at most, 8 samples. The same 8 samples. 8/32 is 0.25, which is exactly what you get: 0.25 of the color you wrote. Even though 4 FS's may be writing for the same pixel, each one is writing to the same 25% of the coverage mask.
There's no "alpha-blended output"; it's just doing what you asked.
As to why your second code works... well, you fell victim to one of the classic C/C++ (and therefore GLSL) blunders: operator precedence. Allow me to parenthesize your condition to show you what the compiler thinks you wrote:
((coverage_mask >> gl_SampleID) & (1 == 1))
Equality testing has a higher precedence than any bitwise operation. So it gets grouped like this. Now, a conformant GLSL implementation should have failed to compile because of that, since the result of 1 == 1 is a boolean, which cannot be used in a bitwise & operation.
Of course, NVIDIA has always had a tendency to play fast-and-loose with GLSL, so it doesn't surprise me that they allow this nonsense code to compile. Much like C++. I have no idea what this code would actually do; it depends on how a true boolean value gets transformed into an integer. And GLSL doesn't define such an implicit conversion, so it's up to NVIDIA to decide what that means.
The traditional condition for testing a bit is this:
(coverage_mask & (0x1 << gl_SampleID))
It also avoids undefined behavior if coverage_mask isn't an unsigned integer.
Of course, doing the condition correctly should give you... the exact same answer as the first one.
Related
I've been under the assumption that my gamma correction pipeline should be as follows:
Use sRGB format for all textures loaded in (GL_SRGB8_ALPHA8) as all art programs pre-gamma correct their files. When sampling from a GL_SRGB8_ALPHA8 texture in a shader OpenGL will automatically convert to linear space.
Do all lighting calculations, post processing, etc. in linear space.
Convert back to sRGB space when writing final color that will be displayed on the screen.
Note that in my case the final color write involves me writing from a FBO (which is a linear RGB texture) to the back buffer.
My assumption has been challenged as if I gamma correct in the final stage my colors are brighter than they should be. I set up for a solid color to be drawn by my lights of value { 255, 106, 0 }, but when I render I get { 255, 171, 0 } (as determined by print-screening and color picking). Instead of orange I get yellow. If I don't gamma correct at the final step I get exactly the right value of { 255, 106, 0 }.
According to some resources modern LCD screens mimic CRT gamma. Do they always? If not, how can I tell if I should gamma correct? Am I going wrong somewhere else?
Edit 1
I've now noticed that even though the color I write with the light is correct, places where I use colors from textures are not correct (but rather far darker as I would expect without gamma correction). I don't know where this disparity is coming from.
Edit 2
After trying GL_RGBA8 for my textures instead of GL_SRGB8_ALPHA8, everything looks perfect, even when using the texture values in lighting computations (if I half the intensity of the light, the output color values are halfed).
My code is no longer taking gamma correction into account anywhere, and my output looks correct.
This confuses me even more, is gamma correction no longer needed/used?
Edit 3 - In response to datenwolf's answer
After some more experimenting I'm confused on a couple points here.
1 - Most image formats are stored non-linearly (in sRGB space)
I've loaded a few images (in my case both .png and .bmp images) and examined the raw binary data. It appears to me as though the images are actually in the RGB color space, as if I compare the values of pixels with an image editing program with the byte array I get in my program they match up perfectly. Since my image editor is giving me RGB values, this would indicate the image stored in RGB.
I'm using stb_image.h/.c to load my images and followed it all the way through loading a .png and did not see anywhere that it gamma corrected the image while loading. I also examined the .bmps in a hex editor and the values on disk matched up for them.
If these images are actually stored on disk in linear RGB space, how am I supposed to (programatically) know when to specify an image is in sRGB space? Is there some way to query for this that a more featured image loader might provide? Or is it up to the image creators to save their image as gamma corrected (or not) - meaning establishing a convention and following it for a given project. I've asked a couple artists and neither of them knew what gamma correction is.
If I specify my images are sRGB, they are too dark unless I gamma correct in the end (which would be understandable if the monitor output using sRGB, but see point #2).
2 - "On most computers the effective scanout LUT is linear! What does this mean though?"
I'm not sure I can find where this thought is finished in your response.
From what I can tell, having experimented, all monitors I've tested on output linear values. If I draw a full screen quad and color it with a hard-coded value in a shader with no gamma correction the monitor displays the correct value that I specified.
What the sentence I quoted above from your answer and my results would lead me to believe is that modern monitors output linear values (i.e. do not emulate CRT gamma).
The target platform for our application is the PC. For this platform (excluding people with CRTs or really old monitors), would it be reasonable to do whatever your response to #1 is, then for #2 to not gamma correct (i.e. not perform the final RGB->sRGB transformation - either manually or using GL_FRAMEBUFFER_SRGB)?
If this is so, what are the platforms on which GL_FRAMEBUFFER_SRGB is meant for (or where it would be valid to use it today), or are monitors that use linear RGB really that new (given that GL_FRAMEBUFFER_SRGB was introduced 2008)?
--
I've talked to a few other graphics devs at my school and from the sounds of it, none of them have taken gamma correction into account and they have not noticed anything incorrect (some were not even aware of it). One dev in particular said that he got incorrect results when taking gamma into account so he then decided to not worry about gamma. I'm unsure what to do in my project for my target platform given the conflicting information I'm getting online/seeing with my project.
Edit 4 - In response to datenwolf's updated answer
Yes, indeed. If somewhere in the signal chain a nonlinear transform is applied, but all the pixel values go unmodified from the image to the display, then that nonlinearity has already been pre-applied on the image's pixel values. Which means, that the image is already in a nonlinear color space.
Your response would make sense to me if I was examining the image on my display. To be sure I was clear, when I said I was examining the byte array for the image I mean I was examining the numerical value in memory for the texture, not the image output on the screen (which I did do for point #2). To me the only way I could see what you're saying to be true then is if the image editor was giving me values in sRGB space.
Also note that I did try examining the output on monitor, as well as modifying the texture color (for example, dividing by half or doubling it) and the output appeared correct (measured using the method I describe below).
How did you measure the signal response?
Unfortunately my methods of measurement are far cruder than yours. When I said I experimented on my monitors what I meant was that I output solid color full screen quad whose color was hard coded in a shader to a plain OpenGL framebuffer (which does not do any color space conversion when written to). When I output white, 75% gray, 50% gray, 25% gray and black the correct colors are displayed. Now here my interpretation of correct colors could most certainly be wrong. I take a screenshot and then use an image editing program to see what the values of the pixels are (as well as a visual appraisal to make sure the values make sense). If I understand correctly, if my monitors were non-linear I would need to perform a RGB->sRGB transformation before presenting them to the display device for them to be correct.
I'm not going to lie, I feel I'm getting a bit out of my depth here. I'm thinking the solution I might persue for my second point of confusion (the final RGB->sRGB transformation) will be a tweakable brightness setting and default it to what looks correct on my devices (no gamma correction).
First of all you must understand that the nonlinear mapping applied to the color channels is often more than just a simple power function. sRGB nonlinearity can be approximated by about x^2.4, but that's not really the real deal. Anyway your primary assumptions are more or less correct.
If your textures are stored in the more common image file formats, they will contain the values as they are presented to the graphics scanout. Now there are two common hardware scenarios:
The scanout interface outputs a linear signal and the display device will then internally apply a nonlinear mapping. Old CRT monitors were nonlinear due to their physics: The amplifiers could put only so much current into the electron beam, the phosphor saturating and so on – that's why the whole gamma thing was introduced in the first place, to model the nonlinearities of CRT displays.
Modern LCD and OLED displays either use resistor ladders in their driver amplifiers, or they have gamma ramp lookup tables in their image processors.
Some devices however are linear, and ask the image producing device to supply a proper matching LUT for the desired output color profile on the scanout.
On most computers the effective scanout LUT is linear! What does this mean though? A little detour:
For illustration I quickly hooked up my laptop's analogue display output (VGA connector) to my analogue oscilloscope: Blue channel onto scope channel 1, green channel to scope channel 2, external triggering on line synchronization signal (HSync). A quick and dirty OpenGL program, deliberately written with immediate mode was used to generate a linear color ramp:
#include <GL/glut.h>
void display()
{
GLuint win_width = glutGet(GLUT_WINDOW_WIDTH);
GLuint win_height = glutGet(GLUT_WINDOW_HEIGHT);
glViewport(0,0, win_width, win_height);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, 1, 0, 1, -1, 1);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glBegin(GL_QUAD_STRIP);
glColor3f(0., 0., 0.);
glVertex2f(0., 0.);
glVertex2f(0., 1.);
glColor3f(1., 1., 1.);
glVertex2f(1., 0.);
glVertex2f(1., 1.);
glEnd();
glutSwapBuffers();
}
int main(int argc, char *argv[])
{
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutCreateWindow("linear");
glutFullScreen();
glutDisplayFunc(display);
glutMainLoop();
return 0;
}
The graphics output was configured with the Modeline
"1440x900_60.00" 106.50 1440 1528 1672 1904 900 903 909 934 -HSync +VSync
(because that's the same mode the flat panel runs in, and I was using cloning mode)
gamma=2 LUT on the green channel.
linear (gamma=1) LUT on the blue channel
This is how the signals of a single scanout line look like (upper curve: Ch2 = green, lower curve: Ch1 = blue):
You can clearly see the x⟼x² and x⟼x mappings (parabola and linear shapes of the curves).
Now after this little detour we know, that the pixel values that go to the main framebuffer, go there as they are: The OpenGL linear ramp underwent no further changes and only when a nonlinear scanout LUT was applied it altered the signal sent to the display.
Either way the values you present to the scanout (which means the on-screen framebuffers) will undergo a nonlinear mapping at some point in the signal chain. And for all standard consumer devices this mapping will be according to the sRGB standard, because it's the smallest common factor (i.e. images represented in the sRGB color space can be reproduced on most output devices).
Since most programs, like webbrowsers assume the output to undergo a sRGB to display color space mapping, they simply copy the pixel values of the standard image file formats to the on-screen frame as they are, without performing a color space conversion, thereby implying that the color values within those images are in sRGB color space (or they will often merely convert to sRGB, if the image color profile is not sRGB); the correct thing to do (if, and only if the color values written to the framebuffer are scanned out to the display unaltered; assuming that scanout LUT is part of the display), would be conversion to the specified color profile the display expects.
But this implies, that the on-screen framebuffer itself is in sRGB color space (I don't want to split hairs about how idiotic that is, lets just accept this fact).
How to bring this together with OpenGL? First of all, OpenGL does all it's color operations linearly. However since the scanout is expected to be in some nonlinear color space, this means, that the end result of the rendering operations of OpenGL somehow must be brougt into the on-screen framebuffer color space.
This is where the ARB_framebuffer_sRGB extension (which went core with OpenGL-3) enters the picture, which introduced new flags used for the configuration of window pixelformats:
New Tokens
Accepted by the <attribList> parameter of glXChooseVisual, and by
the <attrib> parameter of glXGetConfig:
GLX_FRAMEBUFFER_SRGB_CAPABLE_ARB 0x20B2
Accepted by the <piAttributes> parameter of
wglGetPixelFormatAttribivEXT, wglGetPixelFormatAttribfvEXT, and
the <piAttribIList> and <pfAttribIList> of wglChoosePixelFormatEXT:
WGL_FRAMEBUFFER_SRGB_CAPABLE_ARB 0x20A9
Accepted by the <cap> parameter of Enable, Disable, and IsEnabled,
and by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
and GetDoublev:
FRAMEBUFFER_SRGB 0x8DB9
So if you have a window configured with such a sRGB pixelformat and enable sRGB rasterization mode in OpenGL with glEnable(GL_FRAMEBUFFER_SRGB); the result of the linear colorspace rendering operations will be transformed in sRGB color space.
Another way would be to render everything into an off-screen FBO and to the color conversion in a postprocessing shader.
But that's only the output side of rendering signal chain. You also got input signals, in the form of textures. And those are usually images, with their pixel values stored nonlinearly. So before those can be used in linear image operations, such images must be brought into a linear color space first. Lets just ignore for the time being, that mapping nonlinear color spaces into linear color spaces opens several of cans of worms upon itself – which is why the sRGB color space is so ridiculously small, namely to avoid those problems.
So to address this an extension EXT_texture_sRGB was introduced, which turned out to be so vital, that it never went through being ARB, but went straight into the OpenGL specification itself: Behold the GL_SRGB… internal texture formats.
A texture loaded with this format undergoes a sRGB to linear RGB colorspace transformation, before being used to source samples. This gives linear pixel values, suitable for linear rendering operations, and the result can then be validly transformed to sRGB when going to the main on-screen framebuffer.
A personal note on the whole issue: Presenting images on the on-screen framebuffer in the target device color space IMHO is a huge design flaw. There's no way to do everything right in such a setup without going insane.
What one really wants is to have the on-screen framebuffer in a linear, contact color space; the natural choice would be CIEXYZ. Rendering operations would naturally take place in the same contact color space. Doing all graphics operations in contact color spaces, avoids the opening of the aforementioned cans-of-worms involved with trying to push a square peg named linear RGB through a nonlinear, round hole named sRGB.
And although I don't like the design of Weston/Wayland very much, at least it offers the opportunity to actually implement such a display system, by having the clients render and the compositor operate in contact color space and apply the output device's color profiles in a last postprocessing step.
The only drawback of contact color spaces is, that there it's imperative to use deep color (i.e. > 12 bits per color channel). In fact 8 bits are completely insufficient, even with nonlinear RGB (the nonlinearity helps a bit to cover up the lack of perceptible resolution).
Update
I've loaded a few images (in my case both .png and .bmp images) and examined the raw binary data. It appears to me as though the images are actually in the RGB color space, as if I compare the values of pixels with an image editing program with the byte array I get in my program they match up perfectly. Since my image editor is giving me RGB values, this would indicate the image stored in RGB.
Yes, indeed. If somewhere in the signal chain a nonlinear transform is applied, but all the pixel values go unmodified from the image to the display, then that nonlinearity has already been pre-applied on the image's pixel values. Which means, that the image is already in a nonlinear color space.
2 - "On most computers the effective scanout LUT is linear! What does this mean though?
I'm not sure I can find where this thought is finished in your response.
This thought is elaborated in the section that immediately follows, where I show how the values you put into a plain (OpenGL) framebuffer go directly to the monitor, unmodified. The idea of sRGB is "put the values into the images exactly as they are sent to the monitor and build consumer displays to follow that sRGB color space".
From what I can tell, having experimented, all monitors I've tested on output linear values.
How did you measure the signal response? Did you use a calibrated power meter or similar device to measure the light intensity emitted from the monitor in response to the signal? You can't trust your eyes with that, because like all our senses our eyes have a logarithmic signal response.
Update 2
To me the only way I could see what you're saying to be true then is if the image editor was giving me values in sRGB space.
That's indeed the case. Because color management was added to all the widespread graphics systems as an afterthought, most image editors edit pixel values in their destination color space. Note that one particular design parameter of sRGB was, that it should merely retroactively specify the unmanaged, direct value transfer color operations as they were (and mostly still are done) done on consumer devices. Since there happens no color management at all, the values contained in the images and manipulated in editors must be in sRGB already. This works for so long, as long images are not synthetically created in a linear rendering process; in case of the later the render system has to take into account the destination color space.
I take a screenshot and then use an image editing program to see what the values of the pixels are
Which gives you of course only the raw values in the scanout buffer without the gamma LUT and the display nonlinearity applied.
I wanted to give a simple explanation of what went wrong in the initial attempt, because although the accepted answer goes in-depth on colorspace theory, it doesn't really answer that.
The setup of the pipeline was exactly right: use GL_SRGB8_ALPHA8 for textures, GL_FRAMEBUFFER_SRGB (or custom shader code) to convert back to sRGB at the end, and all your intermediate calculations will be using linear light.
The last bit is where you ran into trouble. You wanted a light with a color of (255, 106, 0) - but that's an sRGB color, and you're working with linear light. To get the color you want, you need to convert that color to the linear space, the same way that GL_SRGB8_ALPHA8 is doing for your textures. For your case, this would be a vec3 light with intensity (1, .1441, 0) - this is the value after applying gamma-compression.
In an attempt to improve performance of display of an object which is very large (and filling up GPU ram), after some reasonably light maths, I discovered I have an opertunity to compress my vertex data down from 16-byte vertices down to 4 byte vertices (since the data could be conceptually be thought of as a mearly a transformed height map - impliying x and y location from the vertex id), where I can tightly pack the Z coordinate into, say, 30 bits, leaving 2 bits for a colour pallet index. That's the idea anyway. My question isn't with the coordinate packing, it's with the colour packing.
The colour pallet will be chosen by the c++ code that loads the model. Since it also loads the shader, I'm currently trying to write the colour lookup code as a switch statement, ie:
int colourIndex = (compressedVertex & Mask) >> bitOffset;
switch (colourIndex)
{
case 0: return vec4(....);
case 1: return vec4(....);
case 2: return vec4(....);
case 3: return vec4(....);
}
Where the model has more colours then 4, I'm comfortable sacrificing bits of height precision in order to fit more bits of colour pallet in (up to a point anyway). My measurements shows that using a switch statement for binding a 4 colour pallet is no slower then binding a 4 pixel 1D texture and using a sampler to read from it.
I've scaled this up to 32 colours so far, and it seems at least as fast as using a texture.
When is a good line in the sand to stop using switch and start using a texture for a lookup table? If It helps the application I'm developing for has an already enforced minimum requirement of OpenGl 3.3. Once the data is on the card it'll never be changed. Can I crank it up to 256 case statements? 1024? 32768? Where's the limit?
(Pre-emptive response: Yes I could continue experimenting and pick a value that works for me on my single, modern card using trial and error and some interpolating; but I'm interested in a more general idea of what is best practice and whether anyone else has tried something similar and knows it to work out in the wild?)
I avoid branching as much as possible in shaders. My advice is to use a texture to do the lookup.
You ask:
Can I crank it up to 256 case statements? 1024? 32768? Where's the limit?
and you say:
I've scaled this up to 32 colours so far, and it seems at least as fast as using a texture.
OpenGL thrives at looking up textures. It's designed to do that. It's not designed for a gigantic switch case statement. And as the commenters say it won't perform well across the board. A 64x64 pixel texture can give you 4096 lookups and in the long run, in my opinion, it's going to be faster over a larger number of lookups.
For reasons detailed
here
I need to texture a quad using a bitmap (as in, 1 bit per pixel, not an 8-bit pixmap).
Right now I have a bitmap stored in an on-device buffer, and am mounting it like so:
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, BFR.G[(T+1)%2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, W, H, 0, GL_COLOR_INDEX, GL_BITMAP, 0);
The OpenGL spec has this to say about glTexImage2D:
"If type is GL_BITMAP, the data is considered as a string of unsigned bytes (and format must be GL_COLOR_INDEX). Each data byte is treated as eight 1-bit elements..."
Judging by the spec, each bit in my buffer should correspond to a single pixel. However, the following experiments show that, for whatever reason, it doesn't work as advertised:
1) When I build my texture, I write to the buffer in 32-bit chunks. From the wording of the spec, it is reasonable to assume that writing 0x00000001 for each value would result in a texture with 1-px-wide vertical bars with 31-wide spaces between them. However, it appears blank.
2) Next, I write with 0x000000FF. By my apparently flawed understanding of the bitmap mode, I would expect that this should produce 8-wide bars with 24-wide spaces between them. Instead, it produces a white 1-px-wide bar.
3) 0x55555555 = 1010101010101010101010101010101, therefore writing this value ought to create 1-wide vertical stripes with 1 pixel spacing. However, it creates a solid gray color.
4) Using my original 8-bit pixmap in GL_BITMAP mode produces the correct animation.
I have reached the conclusion that, even in GL_BITMAP mode, the texturer is still interpreting 8-bits as 1 element, despite what the spec seems to suggest. The fact that I can generate a gray color (while I was expecting that I was working in two-tone), as well as the fact that my original 8-bit pixmap generates the correct picture, support this conclusion.
Questions:
1) Am I missing some kind of prerequisite call (perhaps for setting a stride length or pack alignment or something) that will signal to the texturer to treat each byte as 8-elements, as it suggests in the spec?
2) Or does it simply not work because modern hardware does not support it? (I have read that GL_BITMAP mode was deprecated in 3.3, I am however forcing a 3.0 context.)
3) Am I better off unpacking the bitmap into a pixmap using a shader? This is a far more roundabout solution than I was hoping for but I suppose there is no such thing as a free lunch.
So what I need is simple: each time we perform our shader (meaning on each pixel) I need to calculate random matrix of 1s and 0s with resolution == originalImageResolution. How to do such thing?
As for now I have created one for shadertoy random matrix resolution is set to 15 by 15 here because gpu makes chrome fall often when I try stuff like 200 by 200 while really I need full image resolution size
#ifdef GL_ES
precision highp float;
#endif
uniform vec2 resolution;
uniform float time;
uniform sampler2D tex0;
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * (43758.5453+ time));
}
vec3 getOne(){
vec2 p = gl_FragCoord.xy / resolution.xy;
vec3 one;
for(int i=0;i<15;i++){
for(int j=0;j<15;j++){
if(rand(p)<=0.5)
one = (one.xyz + texture2D(tex0,vec2(j,i)).xyz)/2.0;
}
}
return one;
}
void main(void)
{
gl_FragColor = vec4(getOne(),1.0);
}
And one for Adobe pixel bender:
<languageVersion: 1.0;>
kernel random
< namespace : "Random";
vendor : "Kabumbus";
version : 3;
description : "not as random as needed, not as fast as needed"; >
{
input image4 src;
output float4 outputColor;
float rand(float2 co, float2 co2){
return fract(sin(dot(co.xy ,float2(12.9898,78.233))) * (43758.5453 + (co2.x + co2.y )));
}
float4 getOne(){
float4 one;
float2 r = outCoord();
for(int i=0;i<200;i++){
for(int j=0;j<200;j++){
if(rand(r, float2(i,j))>=1.0)
one = (one + sampleLinear(src,float2(j,i)))/2.0;
}
}
return one;
}
void
evaluatePixel()
{
float4 oc = getOne();
outputColor = oc;
}
}
So my real problem is - my shaders make my GPU deiver fall. How to use GLSL for same purpose that I do now but with out failing and if possible faster?
Update:
What I want to create is called Single-Pixel Camera (google Compressive Imaging or Compressive Sensing), I want to create gpu based software implementation.
Idea is simple:
we have an image - NxM.
for each pixel in image we want GPU to performe the next operations:
to generate NxMmatrix of random values - 0s and 1s.
compute arithmetic mean of all pixels on original image whose coordinates correspond to coordinates of 1s in our random NxM matrix
output result of arithmetic mean as pixel color.
What I tried to implement in my shaders was simulate that wary process.
What is really stupid in trying to do this on gpu:
Compressive Sensing does not tall us to compute NxM matrix of such arithmetic mean values, it meeds just a peace of it (for example 1/3). So I put some pressure I do not need to on GPU. However testing on more data is not always a bad idea.
Thanks for adding more detail to clarify your question. My comments are getting too long so I'm going to an answer. Moving comments into here to keep them together:
Sorry to be slow, but I am trying to understand the problem and the goal. In your GLSL sample, I don't see a matrix being generated. I see a single vec3 being generated by summing a random selection (varying over time) of cells from a 15 x 15 texture (matrix). And that vec3 is recomputed for each pixel. Then the vec3 is used as the pixel color.
So I'm not clear whether you really want to create a matrix, or just want to compute a value for every pixel. The latter is in some sense a 'matrix', but computing a simple random value for 200 x 200 pixels would not strain your graphics driver. Also you said you wanted to use the matrix. So I don't think that's what you mean.
I'm trying to understand why you want a matrix - to preserve a consistent random basis for all the pixels? If so, you can either precompute a random texture, or use a consistent pseudorandom function like you have in rand() except not use time. You clearly know about that so I guess I still don't understand the goal. Why are you summing a random selection of cells from the texture, for each pixel?
I believe the reason your shader is crashing is that your main() function is exceeding its time limit - either for a single pixel, or for the whole set of pixels. Calling rand() 40,000 times per pixel (in a 200 * 200 nested loop) could certainly explain that!
If you had 200 x 200 pixels, and are calling sin() 40k times for each one, that's 160,000,000 calls per frame. Poor GPU!
I'm hopeful that if we understand the goal better, we'll be able to recommend a more efficient way to get the effect you want.
Update.
(Deleted this part, since it was mistaken. Even though many cells in the source matrix may each contribute less than a visually detectable amount of color to the result, the total of the many cells can contribute a visually detectable amount of color.)
New update based on updated question.
OK, (thinking "out loud" here so you can check whether I'm understanding correctly...) Since you need each of the random NxM values only once, there is no actual requirement to store them in a matrix; the values can simply be computed on demand and then thrown away. That's why your example code above does not actually generate a matrix.
This means we cannot get away from generating (NxM)^2 random values per frame, that is, NxM random values per pixel, and there are NxM pixels. So for N=M=200, that's 160 million random values per frame.
However, we can still optimize some things.
First, since your random values only need to be one bit each (you only need a boolean answer to decide whether to include each cell from the source texture into the mix), you can probably use a cheaper pseudo random number generator. The one you're using outputs much more random data per call than one bit. For example, you could call the same PRNG function as you're using now, but store the value and extract 32 random bits out of it. Or at least several, depending on how many are random enough. In addition, instead of using a sin() function, if you have extension GL_EXT_gpu_shader4 (for bitwise operators), you could use something like this:
.
int LFSR_Rand_Gen(in int n)
{
// <<, ^ and & require GL_EXT_gpu_shader4.
n = (n << 13) ^ n;
return (n * (n*n*15731+789221) + 1376312589) & 0x7fffffff;
}
Second, you are currently performing one divide operation per included cell (/2.0), which is probably relatively expensive, unless the compiler and GPU are able to optimize it into a bit shift (is that possible for floating point?). This also will not give the arithmetic mean of the input values, as discussed above... it will put much more weight on the later values and very little on the earlier ones. As a solution, keep a count of how many values are being included, and divide by that count once, after the loop is finished.
Whether these optimizations will be enough to enable your GPU driver to drive 200x200 * 200x200 pixels per frame, I don't know. They should definitely enable you to increase your resolution substantially.
Those are the ideas that occur to me off the top of my head. I am far from being a GPU expert though. It would be great if someone more qualified can chime in with suggestions.
P.S. In your comment, you jokingly (?) mentioned the option of precomputing N*M NxM random matrices. Maybe that's not a bad idea?? 40,000x40,000 is a big texture (40MB at least), but if you store 32 bits of random data per cell, that comes down to 1250 x 40,000 cells. Too bad vanilla GLSL doesn't help you with bitwise operators to extract the data, but even if you don't have the GL_EXT_gpu_shader4 extension you can still fake it. (Maybe you would also need a special extension then for non-square textures?)
I have tried so many different strategies to get a usable noise function and none of them work. So, how do you implement perlin noise on an ATI graphics card in GLSL?
Here are the methods I have tried:
I have tried putting the permutation and gradient data into a GL_RGBA 1D texture and calling the texture1D function. However, one call to this noise implementation leads to 12 texture calls and kills the framerate.
I have tried uploading the permutation and gradient data into a uniform vec4 array, but the compiler won't let me get an element in the array unless the index is a constant. For example:
int i = 10;
vec4 a = noise_data[i];
will give a compiler error of this:
ERROR: 0:43: Not supported when use temporary array indirect index.
Meaning I can only retrieve the data like this:
vec4 a = noise_data[10];
I also tried programming the array directly into the shader, but I got the same index issue. I hear NVIDIA graphics cards will actually allow this method, but ATI will not.
I tried making a function that returned a specific hard coded data point depending on the input index, but the function, being called 12 times and having 64 if statements, made the linking time unbearable.
ATI does not support the "built in" noise functions for glsl, and I cant just precompute the noise and import it as a texture, because I am dealing with fractals. This means I need the infinite precision of calculating the noise at run time.
So the overarching question is...
How?
For better distribution of random values I suggest these very good articles:
Pseudo Random Number Generator in GLSL
Lumina noise GLSL tutorial
Have random fun !!!
There is a project on github with GLSL noise functions. It has both the "classic" and newer noise functions in 2,3, and 4D.
IOS does have the noise function implemented.
noise() is well-known for not beeing implemented...
roll you own :
int c;
int Xn;
srand(int x, int y, int width){// in pixel
c = x+y*width;
};
int rand(){
Xn = (a*Xn+c)%m;
return Xn;
}
for a and m values, see wikipedia
It's not perfect, but often good enough.
This SimpleX noise stuff might do what you want.
Try adding #version 150 to the top of your shader.