iterate over large amount of data in fragment shader - glsl

I'm trying to iterate over a large amount of data in my fragment shader in webgl. I want to pass a lot of data to it and then iterate on each pass of the fragment shader. I'm having some issues doing that though. My ideas were the following:
1. pass the data in uniforms to the frag shader, but I can't send very much data that way.
2. use a buffer to send data as I do verts to the vert shader and then use a varying to send data to the frag shader. unfortunately this seems to involve some issues. (a) varying's interpolate between vectors and I think that'll cause issues with my code (although perhaps this is unavoidable ) (b) more importantly, I don't know how to iterate over the data i pass to my fragment shader. I'm already using a buffer for my 3d point coordinates, but how does webgl handle a second buffer and data coming through it.
* I mean to say, in what order is data fetched from each buffer (my first buffer containing 3d coordinates and the second buffer I'm trying to add)? lastly, as stated above, if i want to iterate over all the data passed for every pass of the fragment shader, how can i do that? *
i've already tried using a uniform array and iterate over that in my fragment shader but i ran into limitations I believe since there is a relatively small size limit for uniforms. I'm currently trying the second method mentioned above.
//pseudo code
vertexCode = `
attribute vec4 3dcoords;
varying vec4 3dcoords;
??? ??? my_special_data;
void main(){...}
`
fragCode = `
varying vec4 3dcoords;
void main(){
...
// perform math operation on 3dcoords for all values in my_special_data variable and store in variable my_results
if( my_results ... ){
gl_FragColor += ...;
}
`

Textures in WebGL are random access 2D arrays of data so you can use them to read lots of data
Example:
const width = 256;
const height = 256;
const vs = `
attribute vec4 position;
void main() {
gl_Position = position;
}
`;
const fs = `
precision highp float;
uniform sampler2D tex;
const int width = ${width};
const int height = ${height};
void main() {
vec4 sums = vec4(0);
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
vec2 xy = (vec2(x, y) + 0.5) / vec2(width, height);
sums += texture2D(tex, xy);
}
}
gl_FragColor = sums;
}
`;
function main() {
const gl = document.createElement('canvas').getContext('webgl');
// check if we can make floating point textures
const ext1 = gl.getExtension('OES_texture_float');
if (!ext1) {
return alert('need OES_texture_float');
}
// check if we can render to floating point textures
const ext2 = gl.getExtension('WEBGL_color_buffer_float');
if (!ext2) {
return alert('need WEBGL_color_buffer_float');
}
// make a 1x1 pixel floating point RGBA texture and attach it to a framebuffer
const framebufferInfo = twgl.createFramebufferInfo(gl, [
{ type: gl.FLOAT, },
], 1, 1);
// make random 256x256 texture
const data = new Uint8Array(width * height * 4);
for (let i = 0; i < data.length; ++i) {
data[i] = Math.random() * 256;
}
const tex = twgl.createTexture(gl, {
src: data,
minMag: gl.NEAREST,
wrap: gl.CLAMP_TO_EDGE,
});
// compile shaders, link, lookup locations
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);
// create a buffer and put a 2 unit
// clip space quad in it using 2 triangles
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
position: {
numComponents: 2,
data: [
-1, -1,
1, -1,
-1, 1,
-1, 1,
1, -1,
1, 1,
],
},
});
// render to the 1 pixel texture
gl.bindFramebuffer(gl.FRAMEBUFFER, framebufferInfo.framebuffer);
// set the viewport for 1x1 pixels
gl.viewport(0, 0, 1, 1);
gl.useProgram(programInfo.program);
// calls gl.bindBuffer, gl.enableVertexAttribArray, gl.vertexAttribPointer
twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);
// calls gl.activeTexture, gl.bindTexture, gl.uniformXXX
twgl.setUniforms(programInfo, {
tex,
});
const offset = 0;
const count = 6;
gl.drawArrays(gl.TRIANGLES, offset, count);
// read the result
const pixels = new Float32Array(4);
gl.readPixels(0, 0, 1, 1, gl.RGBA, gl.FLOAT, pixels);
console.log('webgl sums:', pixels);
const sums = new Float32Array(4);
for (let i = 0; i < data.length; i += 4) {
for (let j = 0; j < 4; ++j) {
sums[j] += data[i + j] / 255;
}
}
console.log('js sums:', sums);
}
main();
<script src="https://twgljs.org/dist/4.x/twgl-full.min.js"></script>

Related

How to write an OpenGL fragment shader in two stages?

I am writing a program, which draws the Mandelbrot set. For every pixel, I run a function and it returns an activation number between 0 and 1. Currently, this is done in a fragment shader and activation is my color.
But Imagine you zoom in on the fractal, suddenly all the activations you can see on the screen are between .87 and .95. You can't see the difference very well.
I am looking for a way to first calculate all the activations and store them in an array then based on that array choose the colors. Both of those need to run on the GPU for performance reasons.
So you need to find minimum and maximum intensity of a picture you've rendered. This cannot be done in a single draw, since these values are nonlocal. A possible way to do this is to recursively apply a pipeline that downscales an image in half, computing the minimum and maximum values of 2x2 squares and storing them e.g. in an RG texture (kind of mipmap generation, with min/max instead of averaging colours). In the end you have a 1x1 texture which contains the minimal and maximal values of your image in its only pixel. You can sample this texture in the final render that maps activation values to colours.
I solved my issue by creating a new gll program and attaching a compute shader to it.
unsigned int vs = CompileShader(vertShaderStr, GL_VERTEX_SHADER);
unsigned int fs = CompileShader(fragShaderStr, GL_FRAGMENT_SHADER);
unsigned int cs = CompileShader(compShaderStr, GL_COMPUTE_SHADER);
glAttachShader(mainProgram, vs);
glAttachShader(mainProgram, fs);
glAttachShader(computeProgram, cs);
glLinkProgram(computeProgram);
glValidateProgram(computeProgram);
glLinkProgram(mainProgram);
glValidateProgram(mainProgram);
glUseProgram(computeProgram);
Then, in the Render loop I switch programs and run the compute shader.
glUseProgram(computeProgram);
glDispatchCompute(resolutionX, resolutionY, 1);
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(mainProgram);
/* Drawing the whole screen using the shader */
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
/* Poll for and process events */
glfwPollEvents();
updateBuffer();
Update();
/* Swap front and back buffers */
glfwSwapBuffers(window);
I pass the data from compute shader to fragment shader via shader storage buffer.
void setupBuffer() {
glGenBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
glNamedBufferStorage(ssbo, sizeof(float) * (resolutionX * resolutionY +
SH_EXTRA_FLOATS), &data, GL_MAP_WRITE_BIT | GL_MAP_READ_BIT | GL_DYNAMIC_STORAGE_BIT); //sizeof(data) only works for statically sized C/C++ arrays.
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo);
}
void updateBuffer() {
float d[] = { data.min, data.max };
glNamedBufferSubData(ssbo, 0, 2 * sizeof(float), &d);
}
In the compute shader, I can access the buffer like this:
layout(std430, binding = 1) buffer bufferIn
{
float min;
float max;
float data[];
};
layout(std430, binding = 1) buffer destBuffer
{
float min;
float max;
float data[];
} outBuffer;
void main() {
screenResolution;
int index = int(gl_WorkGroupID.x + screenResolution.x * gl_WorkGroupID.y);
dvec2 coords = adjustCoords();
dvec4 position = rotatedPosition(coords);
for (i = 0; i < maxIter; i++) {
position = pow2(position);
double length = lengthSQ(position);
if (length > treashold) {
float log_zn = log(float(length)) / 2.0;
float nu = log(log_zn / log(2.0)) / log2;
float iterAdj = 1.0 - nu + float(i);
float scale = iterAdj / float(maxIter);
if (scale < 0)
data[index] = -2;
data[index] = scale;
if (scale > max) max = scale;
if (scale < min && scale > 0) min = scale;
return;
}
}
data[index] = -1;
};
And finally, in the fragment shader, I can read the buffer like this:
layout(std430, binding = 1) buffer bufferIn
{
float min;
float max;
float data[];
};
if (data[index] == -1) {
color = notEscapedColor;
return;
}
float value = (data[index] - min) / (max - min);
if (value < 0) value = 0;
if (value > 1) value = 1;
Here is the code in its entirety.

OpenGL 2D Batch Rendering: Textures glitching together when having multiple active textures

This is what happens when I draw switching from the black texture to the lime green one in a simple for loop. It seems to have bits from the previously drawn texture.
Here's a simplified version of how my renderer works
Init(): Create my VAO and attrib pointers and generate element buffer and indicies
Begin(): Bind my vertex buffer and map the buffer pointer
Draw(): Submit a renderable to draw which gets 4 vertecies in the vertex buffer each get a position, color, texCoords, and a Texture Slot
End(): I delete the buffer pointer, bind my VAO, IBO, and textures to their active texture slots and draw the elements.
I do this every frame (except init). What I don't understand is if I draw PER TEXTURE, only having one active then this doesn't happen. It's when I have multiple active textures and they are bound.
Here's my renderer
void Renderer2D::Init()
{
m_Textures.reserve(32);
m_VertexBuffer.Create(nullptr, VERTEX_BUFFER_SIZE);
m_Layout.PushFloat(2); //Position
m_Layout.PushUChar(4); //Color
m_Layout.PushFloat(2); //TexCoords
m_Layout.PushFloat(1); //Texture ID
//VA is bound and VB is unbound
m_VertexArray.AddBuffer(m_VertexBuffer, m_Layout);
unsigned int* indices = new unsigned int[INDEX_COUNT];
int offset = 0;
for (int i = 0; i < INDEX_COUNT; i += 6)
{
indices[i + 0] = offset + 0;
indices[i + 1] = offset + 1;
indices[i + 2] = offset + 2;
indices[i + 3] = offset + 2;
indices[i + 4] = offset + 3;
indices[i + 5] = offset + 0;
offset += 4;
}
m_IndexBuffer.Create(indices, INDEX_COUNT);
m_VertexArray.Unbind();
}
void Renderer2D::Begin()
{
m_VertexBuffer.Bind();
m_Buffer = (VertexData*)m_VertexBuffer.GetBufferPointer();
}
void Renderer2D::Draw(Renderable2D& renderable)
{
const glm::vec2& position = renderable.GetPosition();
const glm::vec2& size = renderable.GetSize();
const Color& color = renderable.GetColor();
const glm::vec4& texCoords = renderable.GetTextureRect();
const float tid = AddTexture(renderable.GetTexture());
DT_CORE_ASSERT(tid != 0, "TID IS EQUAL TO ZERO");
m_Buffer->position = glm::vec2(position.x, position.y);
m_Buffer->color = color;
m_Buffer->texCoord = glm::vec2(texCoords.x, texCoords.y);
m_Buffer->tid = tid;
m_Buffer++;
m_Buffer->position = glm::vec2(position.x + size.x, position.y);
m_Buffer->color = color;
m_Buffer->texCoord = glm::vec2(texCoords.z, texCoords.y);
m_Buffer->tid = tid;
m_Buffer++;
m_Buffer->position = glm::vec2(position.x + size.x, position.y + size.y);
m_Buffer->color = color;
m_Buffer->texCoord = glm::vec2(texCoords.z, texCoords.w);
m_Buffer->tid = tid;
m_Buffer++;
m_Buffer->position = glm::vec2(position.x, position.y + size.y);
m_Buffer->color = color;
m_Buffer->texCoord = glm::vec2(texCoords.x, texCoords.w);
m_Buffer->tid = tid;
m_Buffer++;
m_IndexCount += 6;
}
void Renderer2D::End()
{
Flush();
}
const float Renderer2D::AddTexture(const Texture2D* texture)
{
for (int i = 0; i < m_Textures.size(); i++) {
if (texture == m_Textures[i]) // Compares memory addresses
return i + 1; // Returns the texture id plus one since 0 is null texture id
}
// If the texture count is already at or greater than max textures
if (m_Textures.size() >= MAX_TEXTURES)
{
End();
Begin();
}
m_Textures.push_back((Texture2D*)texture);
return m_Textures.size();
}
void Renderer2D::Flush()
{
m_VertexBuffer.DeleteBufferPointer();
m_VertexArray.Bind();
m_IndexBuffer.Bind();
for (int i = 0; i < m_Textures.size(); i++) {
glActiveTexture(GL_TEXTURE0 + i);
m_Textures[i]->Bind();
}
glDrawElements(GL_TRIANGLES, m_IndexCount, GL_UNSIGNED_INT, NULL);
m_IndexBuffer.Unbind();
m_VertexArray.Unbind();
m_IndexCount = 0;
m_Textures.clear();
}
Here's my fragment shader
#version 330 core
out vec4 FragColor;
in vec4 ourColor;
in vec2 ourTexCoord;
in float ourTid;
uniform sampler2D textures[32];
void main()
{
vec4 texColor = ourColor;
if(ourTid > 0.0)
{
int tid = int(ourTid - 0.5);
texColor = ourColor * texture(textures[tid], ourTexCoord);
}
FragColor = texColor;
}
I appreciate any help, let me know if you need to see more code
i don't know if you need this anymore but for the the record
you have a logical problem in your fragment code
let's think if your "ourTid" bigger than 0 let's take 1.0f for example
you subtract 0.5f , we cast it to int(0.5) it's 0 for sure now let's say that we need the texture number 2 and do the same process 2-0.5 = 1.5 "cast it to int" = 1
definitely you will have the previous texture every time
now the solution is easy you should add 0.5 instead of subtract it to be sure that the numbers interpolation is avoided and you got the correct texture.

How to establish glBindBufferRange() offset with Shader Storage Buffer and std430?

I want to switch between ssbo data to draw things with different setup. To make it happen I need to use glBindBufferRange() with its suitable offset.
I've read that the offset needs to be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT for ubo, but things may be changed with ssbo since using std430 instead of std140.
I tried to do this the easiest way
struct Color
{
float r, g, b, a;
};
struct V2
{
float x, y;
};
struct Uniform
{
Color c1;
Color c2;
V2 v2;
float r;
float f;
int t;
};
GLuint ssbo = 0;
std::vector<Uniform> uniform;
int main()
{
//create window, context etc.
glCreateBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
Uniform u;
u.c1 = {255, 0, 255, 255 };
u.c2 = {255, 0, 255, 255 };
u.v2 = { 0.0f, 0.0f };
u.r = 0.0f;
u.f = 100.0f;
u.t = 0;
uniform.push_back(u);
u.c1 = {255, 255, 0, 255 };
u.c2 = {255, 255, 0, 255 };
u.v2 = { 0.0f, 0.0f };
u.r = 100.0f;
u.f = 100.0f;
u.t = 1;
uniform.push_back(u);
u.c1 = {255, 0, 0, 255 };
u.c2 = {255, 0, 0, 255 };
u.v2 = { 0.0f, 0.0f };
u.r = 100.0f;
u.f = 0.0f;
u.t = 0;
uniform.push_back(u);
glNamedBufferData(ssbo, sizeof(Uniform) * uniform.size(), uniform.data(), GL_STREAM_DRAW);
for(int i = 0; i < uniform.size(); ++i) {
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, ssbo, sizeof(Uniform) * i, sizeof(Uniform));
glDrawArrays(...);
}
//swap buffer etc.
return 0;
}
#version 460 core
layout(location = 0) out vec4 f_color;
layout(std430, binding = 1) buffer Unif
{
vec4 c1;
vec4 c2;
vec2 v2;
float r;
float f;
int t;
};
void main()
{
f_color = vec4(t, 0, 0, 1);
}
There is of course vao, vbo, vertex struct and so on, but they are not affect ssbo.
I got GL_INVALID_VALUE glBindBufferRange() error, though. And that must come from offset, because my next attempt transfers data, but with wrong order.
My next attept was to use GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT
and a formula I found on the Internet
int align = 4;
glGetIntegerv(GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT, &align);
int ssboSize = sizeof(Uniform) + align - sizeof(Uniform) % align;
so just changing glNamedBufferData and glBindBufferRange it looks like this
glNamedBufferData(ssbo, ssboSize * uniform.size(), uniform.data(), GL_STREAM_DRAW);
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, ssbo, ssboSize * i, sizeof(Uniform));
and that way, it almost worked. As you can see, ts are
0;
1;
0;
so opengl should draw 3 shapes with colors -
vec4(0, 0, 0, 1);
vec4(1, 0, 0, 1);
vec4(0, 0, 0, 1);
it draws them wrong order
vec4(1, 0, 0, 1);
vec4(0, 0, 0, 1);
vec4(0, 0, 0, 1);
How can I make it transfer data proper way?
The OpenGL spec (Version 4.6) states the following in section "6.1.1 Binding Buffer Objects to Indexed Target Points" regararding the error conditions for glBindBufferRange:
An INVALID_VALUE error is generated by BindBufferRange if buffer is
non-zero and offset or size do not respectively satisfy the constraints described for those parameters for the specified target, as described in section 6.7.1.
Section 6.7.1 "Indexed Buffer Object Limits and Binding Queries" states for SSBOs:
starting offset: SHADER_STORAGE_BUFFER_START
offset restriction: multiple of value of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT
binding size SHADER_STORAGE_BUFFER_SIZE
According to Table 23.64 "Implementation Dependent Aggregate Shader Limits":
256 [with the following footnote]: The value of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT is the maximum allowed, not the minimum.
So if your offset is not a multiple of 256 (which it isn't), this code is simply not guaranteed to work at all. You can query for the actual restriction by the implementation you are running on and ajust your buffer contents accordingly, but you must be prepared that it is as high as 256 bytes.
I ended up using struct alignas(128) Uniform. I guess my next goal is to not use hardcoded align.

Shaders: How to draw 3D point verts without generating geometry?

I have a 3D Webgl scene. I am using Regl http://regl.party/ . Which is WebGL. So I am essentially writing straight GLSL.
This is a game project. I have an array of 3D positions [[x,y,z] ...] which are bullets, or projectiles. I want to draw these bullets as a simple cube, sphere, or particle. No requirement on the appearance.
How can I make shaders and a draw call for this without having to create a repeated duplicate set of geometry for the bullets?
Preferring an answer with a vert and frag shader example that demonstrates the expected data input and can be reverse engineered to handle the CPU binding layer
You create an regl command which encapsulates a bunch of data. You can then call it with an object.
Each uniform can take an optional function to supply its value. That function is passed a regl context as the first argument and then the object you passed as the second argument so you can call it multiple times with a different object to draw the same thing (same vertices, same shader) somewhere else.
var regl = createREGL()
const objects = [];
const numObjects = 100;
for (let i = 0; i < numObjects; ++i) {
objects.push({
x: rand(-1, 1),
y: rand(-1, 1),
speed: rand(.5, 1.5),
direction: rand(0, Math.PI * 2),
color: [rand(0, 1), rand(0, 1), rand(0, 1), 1],
});
}
function rand(min, max) {
return Math.random() * (max - min) + min;
}
const starPositions = [[0, 0, 0]];
const starElements = [];
const numPoints = 5;
for (let i = 0; i < numPoints; ++i) {
for (let j = 0; j < 2; ++j) {
const a = (i * 2 + j) / (numPoints * 2) * Math.PI * 2;
const r = 0.5 + j * 0.5;
starPositions.push([
Math.sin(a) * r,
Math.cos(a) * r,
0,
]);
}
starElements.push([
0, 1 + i * 2, 1 + i * 2 + 1,
]);
}
const drawStar = regl({
frag: `
precision mediump float;
uniform vec4 color;
void main () {
gl_FragColor = color;
}`,
vert: `
precision mediump float;
attribute vec3 position;
uniform mat4 mat;
void main() {
gl_Position = mat * vec4(position, 1);
}`,
attributes: {
position: starPositions,
},
elements: starElements,
uniforms: {
mat: (ctx, props) => {
const {viewportWidth, viewportHeight} = ctx;
const {x, y} = props;
const aspect = viewportWidth / viewportHeight;
return [.1 / aspect, 0, 0, 0,
0, .1, 0, 0,
0, 0, 0, 0,
x, y, 0, 1];
},
color: (ctx, props) => props.color,
}
})
regl.frame(function () {
regl.clear({
color: [0, 0, 0, 1]
});
objects.forEach((o) => {
o.direction += rand(-0.1, 0.1);
o.x += Math.cos(o.direction) * o.speed * 0.01;
o.y += Math.sin(o.direction) * o.speed * 0.01;
o.x = (o.x + 3) % 2 - 1;
o.y = (o.y + 3) % 2 - 1;
drawStar(o);
});
})
<script src="https://cdnjs.cloudflare.com/ajax/libs/regl/1.3.11/regl.min.js"></script>
You can draw all of the bullets as point sprites, in which case you just need to provide the position and size of each bullet and draw them as GL_POINTS. Each “point” is rasterized to a square based on the output of your vertex shader (which runs once per point). Your fragment shader is called for each fragment in that square, and can color the fragment however it wants—with a flat color, by sampling a texture, or however else you want.
Or you can provide a single model for all bullets, a separate transform for each bullet, and draw them as instanced GL_TRIANGLES or GL_TRIANGLE_STRIP or whatever. Read about instancing on the OpenGL wiki.
Not a WebGL coder so read with prejudice...
Encode the vertexes in a texture
beware of clamping use texture format that does not clamp to <0.0,+1.0> like GL_LUMINANCE32F_ARB or use vertexes in that range only. To check for clamping use:
GLSL debug prints
Render single rectangle covering whole screen
and use the texture from #1 as input. This will ensure that a fragment shader is called for each pixel of the screen/view exactly once.
Inside fragment shader read the texture and check the distance of a fragment to your vertexes
based on it render your stuff or dicard() fragment... spheres are easy, but boxes and other shapes might be complicated to render based on the distance of vertex especially if they can be arbitrary oriented (which need additional info in the input texture).
To ease up this you can prerender them into some texture and use the distance as texture coordinates ...
This answer of mine is using this technique:
raytrace through 3D mesh
You can sometimes get away with using GL_POINTS with a large gl_PointSize and a customized fragment shader.
An example shown here using distance to point center for fragment alpha. (You could also just as well sample a texture)
The support for large point sizes might be limited though, so check that before deciding on this route.
var canvas = document.getElementById('cvs');
gl = canvas.getContext('webgl');
var vertices = [
-0.5, 0.75,0.0,
0.0, 0.5, 0.0,
-0.75,0.25,0.0,
];
var vertex_buffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, vertex_buffer);
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array(vertices), gl.STATIC_DRAW);
gl.bindBuffer(gl.ARRAY_BUFFER, null);
var vertCode =
`attribute vec3 coord;
void main(void) {
gl_Position = vec4(coord, 1.0);
gl_PointSize = 50.0;
}`;
var vertShader = gl.createShader(gl.VERTEX_SHADER);
gl.shaderSource(vertShader, vertCode);
gl.compileShader(vertShader);
var fragCode =
`void main(void) {
mediump float ds = distance(gl_PointCoord.xy, vec2(0.5,0.5))*2.0;
mediump vec4 fg_color=vec4(0.0, 0.0, 0.0,1.0- ds);
gl_FragColor = fg_color;
}`;
var fragShader = gl.createShader(gl.FRAGMENT_SHADER);
gl.shaderSource(fragShader, fragCode);
gl.compileShader(fragShader);
var shaderProgram = gl.createProgram();
gl.attachShader(shaderProgram, vertShader);
gl.attachShader(shaderProgram, fragShader);
gl.linkProgram(shaderProgram);
gl.useProgram(shaderProgram);
gl.bindBuffer(gl.ARRAY_BUFFER, vertex_buffer);
var coord = gl.getAttribLocation(shaderProgram, "coord");
gl.vertexAttribPointer(coord, 3, gl.FLOAT, false, 0, 0);
gl.enableVertexAttribArray(coord);
gl.viewport(0,0,canvas.width,canvas.height);
gl.drawArrays(gl.POINTS, 0, 3);
<!doctype html>
<html>
<body>
<canvas width = "400" height = "400" id = "cvs"></canvas>
</body>
</html>

OpenGL compute shader results in APPCRASH

Can anybody help check why this OpenGL compute shader results in APPCRASH on my Nvidia GT440, Windows 7 64bit? I want to learn about the memory order defined in GLSL, so I write this simple shader. I don't think there is any problem about the shader code.
The sizes of image0, image1 and output_image are all 1024x768. output_image is only used to check if the result is correct.
The work group size is 16x16. The number of work groups launched in the X/Y/Z dimension is 1024/16, 768/16 and 1, i.e. glDispatchCompute(64, 48, 1). If x index of the invocation ID is even, it sets the pixel (x,y) of image0, then image1. If x index of the invocation ID is odd, it reads image1 and waits until the corresponding pixel (x-1, y) of image1 is set, then reads image0.
#version 450 core
layout (local_size_x = 16, local_size_y = 16) in;
layout (binding = 0, rgba8ui) uniform uimage2D output_image;
layout (binding = 1, r32ui) uniform volatile coherent uimage2D image0;
layout (binding = 2, r32ui) uniform volatile coherent uimage2D image1;
void main(void)
{
// thread 0 of each work group clears a portion of output image
if (gl_LocalInvocationIndex == 0) {
uvec2 start = gl_WorkGroupSize.xy * gl_WorkGroupID.xy;
for (uint j = start.y; j < start.y + gl_WorkGroupSize.y; ++j)
for (uint i = start.x; i < start.x + gl_WorkGroupSize.x; ++i)
imageStore(output_image, ivec2(i, j), uvec4(0));
}
barrier();
if (gl_GlobalInvocationID.x % 2 == 0) {
// store to image0, then image1
imageStore(image0, ivec2(gl_GlobalInvocationID.xy), uvec4(1));
imageStore(image1, ivec2(gl_GlobalInvocationID.xy), uvec4(1));
}
else {
ivec2 coord = ivec2(gl_GlobalInvocationID.xy) - ivec2(1, 0);
// wait for image1 to be set
uint flag;
do {
flag = imageLoad(image1, coord).x;
}
while (flag != 1);
// check if image0 is set
uint color = imageLoad(image0, coord).x * 255;
// write output image
imageStore(output_image, coord, uvec4(flag));
imageStore(output_image, ivec2(gl_GlobalInvocationID.xy), uvec4(color));
}
}