Mipmapping with compute shader - c++

I have 3D textures of colors, normals and other data of my voxelized scene and because some of this data can't be just averaged i need to calculate mip levels by my own. The 3D texture sizes are (128+64) x 128 x 128, the additional 64 x 128 x 128 are for mip levels.
So when i take the first mip level, which is at (0, 0, 0) with a size of 128 x 128 x 128 and just copy voxels to the second level, which is at (128, 0, 0) the data appears there, but as soon as i copy the second level at (128, 0, 0) to the third at (128, 0, 64) the data doesn't appear at the 3rd level.
shader code:
#version 450 core
layout (local_size_x = 1,
local_size_y = 1,
local_size_z = 1) in;
layout (location = 0) uniform unsigned int resolution;
layout (binding = 0, rgba32f) uniform image3D voxel_texture;
void main()
{
ivec3 index = ivec3(gl_WorkGroupID);
ivec3 spread_index = index * 2;
vec4 voxel = imageLoad(voxel_texture, spread_index);
imageStore(voxel_texture, index + ivec3(resolution, 0, 0), voxel);
// This isn't working
voxel = imageLoad(voxel_texture, spread_index +
ivec3(resolution, 0, 0));
imageStore(voxel_texture, index + ivec3(resolution, 0, 64), voxel);
}
The shader program is dispatched with
glUniform1ui(0, OCTREE_RES);
glBindImageTexture(0, voxel_textures[0], 0, GL_TRUE, 0, GL_READ_WRITE,
GL_RGBA32F);
glDispatchCompute(64, 64, 64);
I don't know if i missed some basic thing, this is my first compute shader. I also tried to use memory barriers but it didn't change a thing.

Well you can't expect your second imageLoad to read texels that you just wrote in your first store like that.
And there is no way to synchronize access outside of the 'local' workgroup.
You'll need either :
To use multiple invocation of your kernel to do each layer
To rewrite your shader logic so that you always fetch from the 'original' zone.

Related

How to convert RGBA to NV12 using OpenGL?

I need to convert RGBA to NV12 using OpenGL shader as encoder input.
Already have render two different fragment shaders, both textures are from one camera . Using v4l2 to get the camera image(YUV). Then, convert YUV to RGB let OpenGL render. The next step, I need to convert RGB to NV12 as encoder input because the encoder only accept NV12 format.
Use a compute shader to convert RGB to planar YUV, then downsample the UV plane by a factor of two.
Here's the compute shader:
#version 450 core
layout(local_size_x = 32, local_size_y = 32) in;
layout(binding = 0) uniform sampler2D src;
layout(binding = 0) uniform writeonly image2D dst_y;
layout(binding = 1) uniform writeonly image2D dst_uv;
void main() {
ivec2 id = ivec2(gl_GlobalInvocationID.xy);
vec3 yuv = rgb_to_yuv(texelFetch(src, id).rgb);
imageStore(dst_y, id, vec4(yuv.x,0,0,0));
imageStore(dst_uv, id, vec4(yuv.yz,0,0));
}
There's lots of different YUV conventions, and I don't know which one is expected by your encoder. So replace rgb_to_yuv above with the inverse of your YUV -> RGB convertion.
Then proceed as follows:
GLuint in_rgb = ...; // rgb(a) input texture
int width = ..., height = ...; // the size of in_rgb
GLuint tex[2]; // output textures (Y plane, UV plane)
glCreateTextures(GL_TEXTURE_2D, tex, 2);
glTextureStorage2D(tex[0], 1, GL_R8, width, height); // Y plane
// UV plane -- TWO mipmap levels
glTextureStorage2D(tex[1], 2, GL_RG8, width, height);
// use this instead if you need signed UV planes:
//glTextureStorage2D(tex[1], 2, GL_RG8_SNORM, width, height);
glBindTextures(0, 1, &in_rgb);
glBindImageTextures(0, 2, tex);
glUseProgram(compute); // the above compute shader
int wgs[3];
glGetProgramiv(compute, GL_COMPUTE_WORK_GROUP_SIZE, wgs);
glDispatchCompute(width/wgs[0], height/wgs[1], 1);
glUseProgram(0);
glGenerateTextureMipmap(tex[1]); // downsamples tex[1]
// copy data to the CPU memory:
uint8_t *data = (uint8_t*)malloc(width*height*3/2);
glGetTextureImage(tex[0], 0, GL_RED, GL_UNSIGNED_BYTE, width*height, data);
glGetTextureImage(tex[1], 1, GL_RG, GL_UNSIGNED_BYTE, width*height/2,
data + width*height);
DISCLAIMER:
This code is untested.
It assumes that width and height are divisible by 32.
It might be missing a memory barrier somewhere.
It's not the most efficient way to read data out of the GPU -- you might need to at least read one frame behind while the next one is calculated.

How to convert large arrays of quad primitives to triangle primitives?

I have an existing system, which provides 3D meshes. The provided data are an array of vertex coordinates with 3 components (x, y, z) and an index list.
The issue is that the index list is a consecutive array of quad primitives.
The system has to be make runnable with a core profile OpenGL Context first, and later with OpenGL ES 3.x, too.
I know that all the quads have all the same winding order (counter clockwise), but I have no further information about the quads. I don't know anything about their relation or adjacencies.
Since I want to use core profile Context for rendering, I cannot use the GL_QUAD primitive type. I have to convert the quads to triangles.
Of course the array of quad indices can easily be converted to an array of triangle indices:
std::vector<unsigned int> triangles;
triangles.reserve( no_of_indices * 6 / 4 );
for ( int i = 0; i < no_of_indices; i += 4 )
{
int tri[] = { quad[i], quad[i+1], quad[i+2], quad[i], quad[i+2], quad[i+3] };
triangles.insert(triangles.end(), tri, tri+6 );
}
If that has to be done only once, then that would be the solution. But the mesh data are not static. The data can change dynamically.
The data do not change continuously and every time, but the data change unpredictably and randomly.
An other simple solution would be to create an vertex array object, which directly refers to an element array buffer with the quads and draw them in a loop with the GL_TRIANGLE_FAN primitive type:
for ( int i = 0; i < no_of_indices; i += 4 )
glDrawElements( GL_TRIANGLE_FAN, 4, GL_UNSIGNED_INT, (void*)(sizeof(unsigned int) * 4) );
But I hope there is a better solution. I'm searching for a possibility to draw the quads with one single draw call, or to transform the quads to triangles on the GPU.
If that has to be done only once, then that would be the solution. But the mesh data are not static.
The mesh data may be dynamic, but the topology of that list is the same. Every 4 vertices is a quad, so every 4 vertices represents the triangles (0, 1, 2) and (0, 2, 3).
So you can build an arbitrarily large static index buffer containing an ever increasing series of these numbers (0, 1, 2, 0, 2, 3, 4, 5, 6, 4, 6, 7, etc). You can even use baseVertex rendering to offset them to render different serieses of quads using the same index buffer.
My suggestion would be to make the index buffer use GLushort as the index type. This way, your index data only takes up 12 bytes per quad. Using shorts gives you a limit of 16384 quads in a single drawing command, but you can reuse the same index buffer to draw multiple serieses of quads with baseVertex rendering:
constexpr GLushort batchSize = 16384;
constexpr unsigned int vertsPerQuad = 6;
void drawQuads(GLuint quadCount)
{
//Assume VAO is set up.
int baseVertex = 0;
while(quadCount > batchSize)
{
glDrawElementsBaseVertex(GL_TRIANGLES​, batchSize * vertsPerQuad, GL_UNSIGNED_SHORT, 0, baseVertex​ * 4);
baseVertex += batchSize;
quadCount -= batchSize;
}
glDrawElementsBaseVertex(GL_TRIANGLES​, quadCount * vertsPerQuad, GL_UNSIGNED_SHORT, 0, baseVertex​ * 4);
}
If you want slightly less index data, you can use primitive restart indices. This allows you to designate an index to mean "restart the primitive". This allows you to use a GL_TRIANGLE_STRIP primitive and break the primitive up into pieces while still only having a single draw call. So instead of 6 indices per quad, you have 5, with the 5th being the restart index. So now your GLushort indices only take up 10 bytes per quad. However, the batchSize now must be 16383, since the index 0xFFFF is reserved for restarting. And vertsPerQuad must be 5.
Of course, baseVertex rendering works just fine with primitive restarting, so the above code works too.
First I want to mention that this is not a question which I want to answer myself, but I want to provide my current solution to this issue.
This means, that I'm still looking for "the" solution, the perfectly acceptable solution.
In my solution, I decided to use Tessellation. I draw patches with a size of 4:
glPatchParameteri( GL_PATCH_VERTICES, self.__patch_vertices )
glDrawElements( GL_PATCHES, no_of_indices, GL_UNSIGNED_INT, 0 )
The Tessellation Control Shader has a default behavior. The patch data is passed directly from the Vertex Shader invocations to the tessellation primitive generation. Because of that it can be omitted completely.
The Tessellation Evaluation Shader uses a quadrilateral patch (quads) to create 2 triangles:
#version 450
layout(quads, ccw) in;
in TInOut
{
vec3 pos;
} inData[];
out TInOut
{
vec3 pos;
} outData;
uniform mat4 u_projectionMat44;
void main()
{
const int inx_map[4] = int[4](0, 1, 3, 2);
float i_quad = dot( vec2(1.0, 2.0), gl_TessCoord.xy );
int inx = inx_map[int(round(i_quad))];
outData.pos = inData[inx].pos;
gl_Position = u_projectionMat44 * vec4( outData.pos, 1.0 );
}
An alternative solution would be to use a Geometry Shader. The input primitive type lines_adjacency provides 4 vertices, which can be mapped to 2 triangles (triangle_strip). Of course this seems to be a hack, since a lines adjacency is something completely different than a quad, but it works anyway.
glDrawElements( GL_LINES_ADJACENCY, no_of_indices, GL_UNSIGNED_INT, 0 );
Geometry Shader:
#version 450
layout( lines_adjacency ) in;
layout( triangle_strip, max_vertices = 4 ) out;
in TInOut
{
vec3 pos;
} inData[];
out TInOut
{
vec3 col;
} outData;
uniform mat4 u_projectionMat44;
void main()
{
const int inx_map[4] = int[4](0, 1, 3, 2);
for ( int i=0; i < 4; ++i )
{
outData.pos = inData[inx_map[i]].pos;
gl_Position = u_projectionMat44 * vec4( outData.pos, 1.0 );
EmitVertex();
}
EndPrimitive();
}
An improvement would be to use Transform Feedback to capture new buffers, containing triangle primitives.

OpenGL compute shader normal map generation poor performance

I have an height cube map and I want to generate a normal cube map texture from it. My height cube map is just a 2048x2048 image that I load at the beginning of the application for each face of the cube, and I can modify in real time a "maximum height" value which is used as a multiplicator when retrieving a pixel in the height map.
Initially I was calculating the normals in the vertex shader, but it gave me bad lighting results so I decided to move the calculations in the fragment shader.
As the height map does not change every frame (only when I modify the "maximum height" value), I want to generate a normal map texture from it, using a compute shader because I don't need any rasterization, but it gives me very poor performances.
With the fragment shader I ran at 200FPS but using the compute shader I run at 40 FPS.
Here is how I bind my images and start the compute work:
_computeShaderProgram.use();
glUniform1f(_computeShaderProgram.getUniformLocation("maxHeight"), maxHeight);
glBindImageTexture(
0,
static_cast<GLuint>(heightMap),
0,
GL_TRUE,
0,
GL_READ_ONLY,
GL_RGBA32F
);
glBindImageTexture(
1,
static_cast<GLuint>(normalMap),
0,
GL_TRUE,
0,
GL_WRITE_ONLY,
GL_RGBA32F
);
// Start compute work
// I only compute for one face of the cube map
glDispatchCompute(normalMap.getWidth() / 16, normalMap.getWidth() / 16, 1);
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
And the compute shader:
#version 430 core
#extension GL_ARB_compute_shader : enable
layout(local_size_x = 16, local_size_y = 16, local_size_z = 1) in;
layout(rgba32f, binding = 0) readonly uniform imageCube heightMap;
layout(rgba32f, binding = 1) writeonly uniform imageCube normalMap;
uniform float maxHeight;
float getHeight(ivec3 heightMapCoord) {
vec4 heightMapValue = imageLoad(heightMap, heightMapCoord);
return heightMapValue.r * maxHeight;
}
void main() {
ivec3 textCoord = ivec3(gl_GlobalInvocationID);
// Calculate height of neighbors
float leftCubePosHeight = getHeight(textCoord + ivec3(-1, 0, 0));
float rightCubePosHeight = getHeight(textCoord + ivec3(1, 0, 0));
float topCubePosHeight = getHeight(textCoord + ivec3(0, -1, 0));
float bottomCubePosHeight = getHeight(textCoord + ivec3(0, 1, 0));
// Calculate normal using central differences method
vec3 horizontal = vec3(2.0, rightCubePosHeight - leftCubePosHeight, 0.0);
vec3 vertical = vec3(0.0, bottomCubePosHeight - topCubePosHeight, 2.0);
vec3 normal = normalize(cross(vertical, horizontal));
imageStore(normalMap, textCoord, vec4(normal, 1.0));
}
I tried with different work groups sizes (width, width / 8, width / 16, width / 32) and local sizes (1, 8, 16, 32) but the performance is always poor, around 40 FPS or 20 FPS for work group with a size of the full width.
I know I can use shared memory for threads in the same work group to prevent fetching the same texture coordinate 4 times but later I will have height map generated procedurally and will be larger than 2048x2048 I think.
What is the difference between the fragment shader and the compute shader that make it so slow ? Am I doing something wrong ?
Is there any other solutions to generate this normal map ?
EDIT:
The fps I gave above are not right because I was generating 1/16 of the normal map (when I had 40FPS), and I also used the central differences technique to calculate the normals, which is cheap but does not give good lighting results, so I switched to Sobel technique, which is a little more expensive.
I made some tests to know which technique could give the best performance.
Each frame I generate the normal map (this will not be the case later, but it's just to test the performance). Here are my tests:
CPU side single thread: 1.5FPS
Compute shader with local sizes of 1 and one worker group for each image pixel: 4FPS
Compute shader with local sizes of 16 and one worker group for each 16x16 image pixels block: 11FPS
Fragment shader using framebuffer and MRT with 6 color attachments (one for each face of the normal map): 12.5FPS
This is a little laggy when I modify the max height (which generate the normal map again), but I think it's okay as I won't modify it a lot.

Is there a simple way to get the depth of an object in OpenGL (JOGL)

how can I get the z-Coordinate of an Object in 3D-space when I click on it.
(Its not really an Object more an graph, I need to know what an user selected) I use JOGL.
I just finished to port a picking sample from g-truck ogl-samples.
I will try to give you a quick explanation about the code.
We start by enabling the depth test
private boolean initTest(GL4 gl4) {
gl4.glEnable(GL_DEPTH_TEST);
return true;
}
In the initBuffer we:
generate all the buffer we need with glGenBuffers
bind the element buffer and we transfer the content of our indices. Each index refers to the vertex to use. We need to bind it first because glBufferData will be using whatever is bounded at the target specify by the first argument, GL_ELEMENT_ARRAY_BUFFER in this case
do the same for the vertices themselves.
get the GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT(it's a global parameter) to determine the minimum uniform block size to store our transform variable. This is necessary if we want to bind it via glBindBufferRange, function that we will not use, instead, for binding our picking buffer, this is why we pass just the size of a float, Float.BYTES
the last argument of glBufferData is just an hint (it's up to OpenGL and the driver do what they want), as you see is static for the indices and vertices, because we are not gonna change them anymore, but is dynamic for the uniform buffers, since we will update them every frame.
Code:
private boolean initBuffer(GL4 gl4) {
gl4.glGenBuffers(Buffer.MAX.ordinal(), bufferName, 0);
gl4.glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufferName[Buffer.ELEMENT.ordinal()]);
ShortBuffer elementBuffer = GLBuffers.newDirectShortBuffer(elementData);
gl4.glBufferData(GL_ELEMENT_ARRAY_BUFFER, elementSize, elementBuffer, GL_STATIC_DRAW);
gl4.glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
gl4.glBindBuffer(GL_ARRAY_BUFFER, bufferName[Buffer.VERTEX.ordinal()]);
FloatBuffer vertexBuffer = GLBuffers.newDirectFloatBuffer(vertexData);
gl4.glBufferData(GL_ARRAY_BUFFER, vertexSize, vertexBuffer, GL_STATIC_DRAW);
gl4.glBindBuffer(GL_ARRAY_BUFFER, 0);
int[] uniformBufferOffset = {0};
gl4.glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, uniformBufferOffset, 0);
int uniformBlockSize = Math.max(projection.length * Float.BYTES, uniformBufferOffset[0]);
gl4.glBindBuffer(GL_UNIFORM_BUFFER, bufferName[Buffer.TRANSFORM.ordinal()]);
gl4.glBufferData(GL_UNIFORM_BUFFER, uniformBlockSize, null, GL_DYNAMIC_DRAW);
gl4.glBindBuffer(GL_UNIFORM_BUFFER, 0);
gl4.glBindBuffer(GL_TEXTURE_BUFFER, bufferName[Buffer.PICKING.ordinal()]);
gl4.glBufferData(GL_TEXTURE_BUFFER, Float.BYTES, null, GL_DYNAMIC_READ);
gl4.glBindBuffer(GL_TEXTURE_BUFFER, 0);
return true;
}
In the initTexture we initialize our textures, we:
generate both the textures with glGenTextures
set the GL_UNPACK_ALIGNMENT to 1 (default is usually 4 bytes), in order to avoid any problem at all, (because your horizontal texture size must match the alignment).
set the activeTexture to GL_TEXTURE0, there is a specific number of texture slots and you need to specify it before working on any texture.
bind the diffuse texture
set the swizzle, that is what each channel will receive
set the levels (mipmap), where 0 is the base (original/biggest)
set the filters
allocate the space, levels included with glTexStorage2D
transfer for each level the corresponding data
reset back the GL_UNPACK_ALIGNMENT
bind to GL_TEXTURE0 our other texture PICKING
allocate a single 32b float storage and associate the PICKING texture to the PICKING buffer with glTexBuffer
Code:
private boolean initTexture(GL4 gl4) {
try {
jgli.Texture2D texture = new Texture2D(jgli.Load.load(TEXTURE_ROOT + "/" + TEXTURE_DIFFUSE));
jgli.Gl.Format format = jgli.Gl.instance.translate(texture.format());
gl4.glGenTextures(Texture.MAX.ordinal(), textureName, 0);
// Diffuse
{
gl4.glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
gl4.glActiveTexture(GL_TEXTURE0);
gl4.glBindTexture(GL_TEXTURE_2D, textureName[Texture.DIFFUSE.ordinal()]);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_R, GL_RED);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_G, GL_GREEN);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_B, GL_BLUE);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_A, GL_ALPHA);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, texture.levels() - 1);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
gl4.glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
gl4.glTexStorage2D(GL_TEXTURE_2D, texture.levels(), format.internal.value,
texture.dimensions(0)[0], texture.dimensions(0)[1]);
for (int level = 0; level < texture.levels(); ++level) {
gl4.glTexSubImage2D(GL_TEXTURE_2D, level,
0, 0,
texture.dimensions(level)[0], texture.dimensions(level)[1],
format.external.value, format.type.value,
texture.data(0, 0, level));
}
gl4.glPixelStorei(GL_UNPACK_ALIGNMENT, 4);
}
// Piking
{
gl4.glBindTexture(GL_TEXTURE_BUFFER, textureName[Texture.PICKING.ordinal()]);
gl4.glTexBuffer(GL_TEXTURE_BUFFER, GL_R32F, bufferName[Buffer.PICKING.ordinal()]);
gl4.glBindTexture(GL_TEXTURE_BUFFER, 0);
}
} catch (IOException ex) {
Logger.getLogger(Gl_420_picking.class.getName()).log(Level.SEVERE, null, ex);
}
return true;
}
In the initProgram we initialize our program, by:
generating a pipeline (composition of different shaders), glGenProgramPipelines
creating a vertex shader code vertShaderCode, where GL_VERTEX_SHADER is the shader type, SHADERS_ROOT is the place where the shader source is located, SHADERS_SOURCE_UPDATE is the name and "vert" is the extension.
initializing it, similarly for the fragment shader
grabbing the generated index and saving in programName
setting the program separable, nothing useful here, just pure sport, glProgramParameteri
adding both shader to our shaderProgram and linking and compiling it, link
specifing which program stage our pipelineName has, glUseProgramStages
Code:
private boolean initProgram(GL4 gl4) {
boolean validated = true;
gl4.glGenProgramPipelines(1, pipelineName, 0);
// Create program
if (validated) {
ShaderProgram shaderProgram = new ShaderProgram();
ShaderCode vertShaderCode = ShaderCode.create(gl4, GL_VERTEX_SHADER,
this.getClass(), SHADERS_ROOT, null, SHADERS_SOURCE_UPDATE, "vert", null, true);
ShaderCode fragShaderCode = ShaderCode.create(gl4, GL_FRAGMENT_SHADER,
this.getClass(), SHADERS_ROOT, null, SHADERS_SOURCE_UPDATE, "frag", null, true);
shaderProgram.init(gl4);
programName = shaderProgram.program();
gl4.glProgramParameteri(programName, GL_PROGRAM_SEPARABLE, GL_TRUE);
shaderProgram.add(vertShaderCode);
shaderProgram.add(fragShaderCode);
shaderProgram.link(gl4, System.out);
}
if (validated) {
gl4.glUseProgramStages(pipelineName[0], GL_VERTEX_SHADER_BIT | GL_FRAGMENT_SHADER_BIT, programName);
}
return validated & checkError(gl4, "initProgram");
}
In the initVertexArray we:
generate a single vertex array, glGenVertexArrays, and bind it, glBindVertexArray
bind the vertices buffer and set the attribute for the position and the color, here interleaved. The position is identified by the attribute index Semantic.Attr.POSITION (this will match the one in the vertex shader), component size 2, type GL_FLOAT, normalized false, stride or the total size of each vertex attribute 2 * 2 * Float.BYTES and the offset in this attribute 0. Similarly for the color.
unbind the vertices buffer since it is not part of the vertex array state. It must be bound only for the glVertexAttribPointer so that OpenGL can know which buffer those parameters refers to.
enable the corresponding vertex attribute array, glEnableVertexAttribArray
bind the element (indices) array, part of the vertex array
Code:
private boolean initVertexArray(GL4 gl4) {
gl4.glGenVertexArrays(1, vertexArrayName, 0);
gl4.glBindVertexArray(vertexArrayName[0]);
{
gl4.glBindBuffer(GL_ARRAY_BUFFER, bufferName[Buffer.VERTEX.ordinal()]);
gl4.glVertexAttribPointer(Semantic.Attr.POSITION, 2, GL_FLOAT, false, 2 * 2 * Float.BYTES, 0);
gl4.glVertexAttribPointer(Semantic.Attr.TEXCOORD, 2, GL_FLOAT, false, 2 * 2 * Float.BYTES, 2 * Float.BYTES);
gl4.glBindBuffer(GL_ARRAY_BUFFER, 0);
gl4.glEnableVertexAttribArray(Semantic.Attr.POSITION);
gl4.glEnableVertexAttribArray(Semantic.Attr.TEXCOORD);
gl4.glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufferName[Buffer.ELEMENT.ordinal()]);
}
gl4.glBindVertexArray(0);
return true;
}
In the render we:
bind the TRANSFORM buffer that will contain our transformation matrix.
get a byteBuffer pointer out of that.
calculate the projection, view and model matrices and multiplying them in the same order p * v * m, called also mvp matrix.
save our mvp matrix in our pointer and rewind the buffer (position set to 0 again).
unmap it to make sure it gets uploaded to the gpu
set the viewport to match our window size
set the clear depthValue to 1 (superflous, since it is the default value), clear depth, with the depthValue, and color buffer, with the color {1.0f, 0.5f, 0.0f, 1.0f}
bind the pipeline
set active texture 0
bind the diffuse texture and the picking image texture
bind the vertex array
bind the transform uniform buffer
render, glDrawElementsInstancedBaseVertexBaseInstance is overused it, but what is important is the primitive type GL_TRIANGLES, the number of indices elementCount and their type GL_UNSIGNED_SHORT
bind the picking texture buffer and retrieve its value
Code:
#Override
protected boolean render(GL gl) {
GL4 gl4 = (GL4) gl;
{
gl4.glBindBuffer(GL_UNIFORM_BUFFER, bufferName[Buffer.TRANSFORM.ordinal()]);
ByteBuffer pointer = gl4.glMapBufferRange(
GL_UNIFORM_BUFFER, 0, projection.length * Float.BYTES,
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
FloatUtil.makePerspective(projection, 0, true, (float) Math.PI * 0.25f,
(float) windowSize.x / windowSize.y, 0.1f, 100.0f);
FloatUtil.makeIdentity(model);
FloatUtil.multMatrix(projection, view());
FloatUtil.multMatrix(projection, model);
for (float f : projection) {
pointer.putFloat(f);
}
pointer.rewind();
// Make sure the uniform buffer is uploaded
gl4.glUnmapBuffer(GL_UNIFORM_BUFFER);
}
gl4.glViewportIndexedf(0, 0, 0, windowSize.x, windowSize.y);
float[] depthValue = {1.0f};
gl4.glClearBufferfv(GL_DEPTH, 0, depthValue, 0);
gl4.glClearBufferfv(GL_COLOR, 0, new float[]{1.0f, 0.5f, 0.0f, 1.0f}, 0);
gl4.glBindProgramPipeline(pipelineName[0]);
gl4.glActiveTexture(GL_TEXTURE0);
gl4.glBindTexture(GL_TEXTURE_2D, textureName[Texture.DIFFUSE.ordinal()]);
gl4.glBindImageTexture(Semantic.Image.PICKING, textureName[Texture.PICKING.ordinal()],
0, false, 0, GL_WRITE_ONLY, GL_R32F);
gl4.glBindVertexArray(vertexArrayName[0]);
gl4.glBindBufferBase(GL_UNIFORM_BUFFER, Semantic.Uniform.TRANSFORM0, bufferName[Buffer.TRANSFORM.ordinal()]);
gl4.glDrawElementsInstancedBaseVertexBaseInstance(GL_TRIANGLES, elementCount, GL_UNSIGNED_SHORT, 0, 5, 0, 0);
gl4.glBindBuffer(GL_ARRAY_BUFFER, bufferName[Buffer.PICKING.ordinal()]);
ByteBuffer pointer = gl4.glMapBufferRange(GL_ARRAY_BUFFER, 0, Float.BYTES, GL_MAP_READ_BIT);
float depth = pointer.getFloat();
gl4.glUnmapBuffer(GL_ARRAY_BUFFER);
System.out.printf("Depth: %2.3f\n", depth);
return true;
}
In our vertex shader, executed for each vertex, we:
define the glsl version and profile
define all the attribute indices, that must coincide with our coming from the Semantic we used previously
set some memory layout parameters, such as std140 and column_mayor (useless, default value for matrices)
declare the Transform uniform buffer
declare a vec3 position and vec2 texCoord inputs
declare a (built in, incomplete and useless) gl_PerVertex output
declare a Block block output
save inside our block the incoming texCoord and inside gl_Position our vertex in clip space position. The incoming position vertex is in Model space -> * model matrix = vertex in World space, * view/camera matrix = vertex in Camera/View space, * projection matrix = vertex in Clip space.
Code:
#version 420 core
#define POSITION 0
#define COLOR 3
#define TEXCOORD 4
#define TRANSFORM0 1
precision highp float;
precision highp int;
layout(std140, column_major) uniform;
layout(binding = TRANSFORM0) uniform Transform
{
mat4 mvp;
} transform;
layout(location = POSITION) in vec3 position;
layout(location = TEXCOORD) in vec2 texCoord;
out gl_PerVertex
{
vec4 gl_Position;
};
out Block
{
vec2 texCoord;
} outBlock;
void main()
{
outBlock.texCoord = texCoord;
gl_Position = transform.mvp * vec4(position, 1.0);
}
There may be are other stages after the vertex shader, such as tessellation control/evaluation and geometry, but they are not mandatory.
The last stage is the fragment shader, executed once per fragment/pixel, that starts similarly, then we:
declare the texture diffuse on binding 0, that matches with our glActiveTexture(GL_TEXTURE0) inside the render and the imageBuffer picking where we will save our depth identified by binding 1, that matches our Semantic.Image.PICKING inside our render.glBindImageTexture
declare the picking coordinates, here hardcoded, but nothing stops you from turning them out as uniform variable and set it on runtime
declare the incoming Block block holding the texture coordinates
declare the default output color
if the current fragment coordinates gl_FragCoord (built in function) corresponds to the picking coordinates pickingCoord, save the current z value gl_FragCoord.z inside the imageBuffer depth and set the output color to vec4(1, 0, 1, 1), otherwise we set it equal to the diffuse texture by texture(diffuse, inBlock.texCoord.st). st is part of the stqp selection, synonymous of xywz or rgba.
Code:
#version 420 core
#define FRAG_COLOR 0
precision highp float;
precision highp int;
layout(std140, column_major) uniform;
in vec4 gl_FragCoord;
layout(binding = 0) uniform sampler2D diffuse;
layout(binding = 1, r32f) writeonly uniform imageBuffer depth;
uvec2 pickingCoord = uvec2(320, 240);
in Block
{
vec2 texCoord;
} inBlock;
layout(location = FRAG_COLOR, index = 0) out vec4 color;
void main()
{
if(all(equal(pickingCoord, uvec2(gl_FragCoord.xy))))
{
imageStore(depth, 0, vec4(gl_FragCoord.z, 0, 0, 0));
color = vec4(1, 0, 1, 1);
}
else
color = texture(diffuse, inBlock.texCoord.st);
}
Finally in the end we clean up all our OpenGL resources:
#Override
protected boolean end(GL gl) {
GL4 gl4 = (GL4) gl;
gl4.glDeleteProgramPipelines(1, pipelineName, 0);
gl4.glDeleteProgram(programName);
gl4.glDeleteBuffers(Buffer.MAX.ordinal(), bufferName, 0);
gl4.glDeleteTextures(Texture.MAX.ordinal(), textureName, 0);
gl4.glDeleteVertexArrays(1, vertexArrayName, 0);
return true;
}

OpenGL - Provide a set of values in a 1D texture

I want to provide a set of values in a 1D texture. Please consider the following simple example:
gl.glBindTexture(GL4.GL_TEXTURE_1D, myTextureHandle);
FloatBuffer values = Buffers.newDirectFloatBuffer(N);
for (int x = 0; x < N; ++x)
values.put(x);
values.rewind();
gl.glTexImage1D(GL4.GL_TEXTURE_1D, 0, GL4.GL_R32F, N, 0, GL4.GL_RED, GL4.GL_FLOAT, values);
Here, N is the amount of values I want to store in the texture. However, calling textureSize(myTexture, 0) in my fragment shader yields 1 (no matter to what I set N). So, what's going wrong here?
EDIT: The code above is executed at initialization. My rendering loop looks like
gl.glClear(GL4.GL_COLOR_BIT |GL4.GL_DEPTH_BUFFER_BIT);
gl.glUseProgram(myProgram);
gl.glActiveTexture(MY_TEXTURE_INDEX);
gl.glBindTexture(GL4.GL_TEXTURE_1D, myTextureHandle);
gl.glUniform1i(uMyTexture, MY_TEXTURE_INDEX);
gl.glDrawArrays(GL4.GL_POINTS, 0, 1);
My vertex shader consists of a main-function which does nothing. I'm using the geometry shader to create a fullscreen quad. The pixel shader code looks like
uniform sampler1D myTexture;
out vec4 color;
void main()
{
if (textureSize(myTexture, 0) == 1)
{
color = vec4(1, 0, 0, 1);
return;
}
color = vec4(1, 1, 0, 1);
}
The result is a red-colored window.
Make sure your texture is complete. Since GL_TEXTURE_MIN_FILTER defaults to GL_NEAREST_MIPMAP_LINEAR you'll have to supply a full set of mipmaps.
Or set GL_TEXTURE_MIN_FILTER to GL_NEAREST/GL_LINEAR.
You also need to pass GL_TEXTURE0 + MY_TEXTURE_INDEX (instead of only MY_TEXTURE_INDEX) to glActiveTexture():
gl.glActiveTexture( GL_TEXTURE0 + MY_TEXTURE_INDEX );
...
gl.glUniform1i( uMyTexture, MY_TEXTURE_INDEX );