I'm trying to render a bunch of small axis-aligned (2d) quads in Vulkan, and rather than using a non-indexed draw call, I thought to try and minimize transfer overhead and use indexed draw with the following scheme:
#version 450
layout(location = 0) in vec2 inTopleft;
layout(location = 1) in vec2 inExtent;
vec2 positions[6] = vec2[](
vec2(0, 0),
vec2(0, 1),
vec2(1, 0),
vec2(0, 1),
vec2(1, 1),
vec2(1, 0)
);
void main() {
vec2 position = positions[gl_VertexIndex % 6];
gl_Position = vec4(inTopleft + position * inExtent, 0, 1);
}
That way I only need to send one vertex per quad, and then I just put the same vertex six times in the index buffer like:
index_buffer = [0,0,0,0,0,0, 1,1,1,1,1,1, 2,2,2,2,2,2, ... n,n,n,n,n,n]
but this scheme doesn't seem to work because gl_VertexIndex I suspect is giving the value of the element in the index_buffer, right? I mean for the first quad gl_VertexIndex is 0 for all six verticies, and then the second is 1 for all six verticies, and so on. It's not actually giving 0,1,2,3,4,5 for the first quad, and 6,7,8,9,10,11 for the second quad, and so on.
Is that right? And if so, is there any way to do what I'm trying to do?
So I ended up using an instance draw call (non-indexed) with one instance per quad, and that seems to about double performance (200 fps -> 500fps, rendering about 10k quads), at least on my graphics card (NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)).
What I mean is each quad has its own instance and so the draw call is called like draw(nvertexes=6, ninstances=nquads), and so the shader can change from:
- vec2 position = positions[gl_VertexIndex % 6];
+ vec2 position = positions[gl_VertexIndex];
and of course the vertex buffer is now per instance instead of per vertex.
Related
I'm currently having a problem with my compute shader failing to properly get an element at a certain index of an input array.
I've read the buffers manually using NVidia NSight and it seems to be input properly, the problem seems to be with indexing.
It's supposed to be drawing voxels on a grid, take this case as an example (What is supposed to be drawn is highlighted in red while blue is what I am getting):
And here is the SSBO buffer capture in NSight transposed:
This is the compute shader I'm currently using:
#version 430
layout(local_size_x = 1, local_size_y = 1) in;
layout(rgba32f, binding = 0) uniform image2D img_output;
layout(std430) buffer;
layout(binding = 0) buffer Input0 {
ivec2 mapSize;
};
layout(binding = 1) buffer Input1 {
bool mapGrid[];
};
void main() {
// base pixel colour for image
vec4 pixel = vec4(1, 1, 1, 1);
// get index in global work group i.e x,y position
ivec2 pixel_coords = ivec2(gl_GlobalInvocationID.xy);
vec2 normalizedPixCoords = vec2(gl_GlobalInvocationID.xy) / gl_NumWorkGroups.xy;
ivec2 voxel = ivec2(int(normalizedPixCoords.x * mapSize.x), int(normalizedPixCoords.y * mapSize.y));
float distanceFromMiddle = length(normalizedPixCoords - vec2(0.5, 0.5));
pixel = vec4(0, 0, mapGrid[voxel.x * mapSize.x + voxel.y], 1); // <--- Where I'm having the problem
// I index the voxels the same exact way on the CPU code and it works fine
// output to a specific pixel in the image
//imageStore(img_output, pixel_coords, pixel * vec4(vignettecolor, 1) * imageLoad(img_output, pixel_coords));
imageStore(img_output, pixel_coords, pixel);
}
NSight doc file: https://ufile.io/wmrcy1l4
I was able to fix the problem by completely ditching SSBOs and using a texture buffer, turns out the problem was that OpenGL treated each value as a 4-byte value and stepped 4 bytes instead of one for each index.
Based on this post: Shader storage buffer object with bytes
I've worked through a couple of tutorials in the breakout series on learnopengl.com, so I have a very simple 2D renderer. I want to add a subimage feature to it, though, where I can specify a vec4 for a kind of "source rectangle", so if the vec4 was (10, 10, 32, 32), it would only render a rectangle at 10, 10 with a width and height of 32, kind of like how the SDL renderer works.
The way the renderer is set up is there is a quad VAO which all the sprites use, which contains the texture coordinates. Initially, I though I could use an array of VAO's for each sprite, each with different texture coordinates, but I'd like to be able to change the source rectangle before the sprite gets drawn, to make things like animation easier... My second idea was to have a seperate uniform vec4 passed into the fragment shader for the source rectangle, but how do I only render that section in pixel coordinates?
Use the Primitiv type GL_TRIANGLE_STRIP or GL_TRIANGLE_FAN to render a quad. Use integral one-dimensional vertex coordinates instead of floating-point vertex coordinates. The vertex coordinates are the indices of the quad corners. For a GL_TRIANGLE_FAN they are:
vertex 1: 0
vertex 2: 1
vertex 3: 2
vertex 4: 3
Set the rectangle definition (10, 10, 32, 32) in the vertex shader uisng a Uniform variable of type vec4. With this information, you can calculate the vertex coordinate in the vertex shader:
in int cornerIndex;
uniform vec4 rectangle;
void main()
{
vec2 vertexArray[4] =
vec2[4](rectangle.xy, rectangle.zy, rectangle.zw, rectangle.xw);
vec2 vertex = vertexArray[cornerIndex];
// [...]
}
The Vertex Shader provides the built-in input gl_VertexID, which specifies the index of the vertex currently being processed. This variable could be used instead of cornerIndex in this case. Note that it is not necessary for the vertex shader to have any explicit input.
I ended up doing this in the vertex shader. I passed in the vec4 as a uniform to the vertex shader, as well as the size of the image, and used the below calculation:
// convert pixel coordinates to vertex coordinates
float widthPixel = 1.0f / u_imageSize.x;
float heightPixel = 1.0f / u_imageSize.y;
float startX = u_sourceRect.x, startY = u_sourceRect.y, width = u_sourceRect.z, height = u_sourceRect.w;
v_texCoords = vec2(widthPixel * startX + width * widthPixel * texPos.x, heightPixel * startY + height * heightPixel * texPos.y);
v_texCoords is a varying that the fragment shader uses to map the texture.
I'm trying to develop a map for a 2D tile based game, the approach I'm using is to save the map images in a large texture (tileset) and draw only the desired tiles on the screen by updating the positions through vertex shader, however on a 10x10 map involves 100 glDrawArrays calls, looking through the task manager, this consumes 5% of CPU usage and 4 ~ 5% of GPU, imagine if it was a complete game with dozens of calls, there is a way to optimize this, such as preparing the whole scene and just make 1 draw call, drawing all at once, or some other approach?
void GameMap::draw() {
m_shader - > use();
m_texture - > bind();
glBindVertexArray(m_quadVAO);
for (size_t r = 0; r < 10; r++) {
for (size_t c = 0; c < 10; c++) {
m_tileCoord - > setX(c * m_tileHeight);
m_tileCoord - > setY(r * m_tileHeight);
m_tileCoord - > convert2DToIso();
drawTile(0);
}
}
glBindVertexArray(0);
}
void GameMap::drawTile(GLint index) {
glm::mat4 position_coord = glm::mat4(1.0 f);
glm::mat4 texture_coord = glm::mat4(1.0 f);
m_srcX = index * m_tileWidth;
GLfloat clipX = m_srcX / m_texture - > m_width;
GLfloat clipY = m_srcY / m_texture - > m_height;
texture_coord = glm::translate(texture_coord, glm::vec3(glm::vec2(clipX, clipY), 0.0 f));
position_coord = glm::translate(position_coord, glm::vec3(glm::vec2(m_tileCoord - > getX(), m_tileCoord - > getY()), 0.0 f));
position_coord = glm::scale(position_coord, glm::vec3(glm::vec2(m_tileWidth, m_tileHeight), 1.0 f));
m_shader - > setMatrix4("texture_coord", texture_coord);
m_shader - > setMatrix4("position_coord", position_coord);
glDrawArrays(GL_TRIANGLES, 0, 6);
}
--Vertex Shader
#version 330 core
layout (location = 0) in vec4 vertex; // <vec2 position, vec2 texCoords>
out vec4 TexCoords;
uniform mat4 texture_coord;
uniform mat4 position_coord;
uniform mat4 projection;
void main()
{
TexCoords = texture_coord * vec4(vertex.z, vertex.w, 1.0, 1.0);
gl_Position = projection * position_coord * vec4(vertex.xy, 0.0, 1.0);
}
-- Fragment Shader
#version 330 core
out vec4 FragColor;
in vec4 TexCoords;
uniform sampler2D image;
uniform vec4 spriteColor;
void main()
{
FragColor = vec4(spriteColor) * texture(image, vec2(TexCoords.x, TexCoords.y));
}
The Basic Technique
The first thing you want to do is set up your 10x10 grid vertex buffer. Each square in the grid is actually two triangles. And all the triangles will need their own vertices because the UV coordinates for adjacent tiles are not the same, even though the XY coordinates are the same. This way each triangle can copy the area out of the texture atlas that it needs to and it doesn't need to be contiguous in UV space.
Here's how the vertices of two adjacent quads in the grid will be set up:
1: xy=(0,0) uv=(Left0 ,Top0)
2: xy=(1,0) uv=(Right0,Top0)
3: xy=(1,1) uv=(Right0,Bottom0)
4: xy=(1,1) uv=(Right0,Bottom0)
5: xy=(0,1) uv=(Left0 ,Bottom0)
6: xy=(0,0) uv=(Left0 ,Top0)
7: xy=(1,0) uv=(Left1 ,Top1)
8: xy=(2,0) uv=(Right1,Top1)
9: xy=(2,1) uv=(Right1,Bottom1)
10: xy=(2,1) uv=(Right1,Bottom1)
11: xy=(1,1) uv=(Left1 ,Bottom1)
12: xy=(1,0) uv=(Left1 ,Top1)
These 12 vertices define 4 triangles. The Top, Left, Bottom, Right UV coordinates for the first square can be completely different from the coordinates of the second square, thus allowing each square to be textured by a different area of the texture atlas. E.g. see below to see how the UV coordinates for each triangle map to a tile in the texture atlas.
In your case with your 10x10 grid, you would have 100 quads, or 200 triangles. With 200 triangles at 3 vertices each, that would be 600 vertices to define. But it's a single draw call of 200 triangles (600 vertices). Each vertex has its own x, y, u, v, coordinates. To change which tile a quad is, you have to update the uv coordinates of 6 vertices in your vertex buffer.
You will likely find that this is the most convenient and efficient approach.
Advanced Approaches
There are more memory efficient or convenient ways of setting this up with multiple streams to reduce duplication of vertices and leverage shaders to do the work of setting it up if you're willing to trade off computation time for memory or convenience. Find the balance that is right for you. But you should grasp the basic technique first before trying to optimize.
But in the multiple-stream approach, you could specify all the xy vertices separately from all the uv vertices to avoid duplication. You could also specify a second set of texture coordinates which was just the top-left corner of the tile in the atlas and let the uv coordinates just go from 0,0 (top left) to 1,1 (bottom right) for each quad, then let your shader scale and transform the uv coordinates to arrive at final texture coordinates. You could also specify a single uv coordinate of the top-left corner of the source area for each primitive and let a geometry shader complete the squares. And even smarter, you could specify only the x,y coordinates (omitting the uv coordinates entirely) and in your vertex shader, you can sample a texture that contains the "tile numbers" of each quad. You would sample this texture at coordinates based on the x,y values in the grid, and then based on the value you read, you could transform that into the uv coordinates in the atlas. To change the tile in this system, you just change the one pixel in the tile map texture. And finally, you could skip generating the primitives entirely and derive them entirely from a single list sent to the geometry shader and generate the x,y coordinates of the grid which gets sent downstream to the vertex shader to complete the triangle geometry and uv coordinates of the grid, this is the most memory efficient, but relies on the GPU to compute the setup at runtime.
With a static 6-vertices-per-triangle setup, you free up GPU processing at the cost of a little extra memory. Depending on what you need for performance, you may find that using up more memory to get higher fps is desirable. Vertex buffers are tiny compared to textures anyway.
So as I said, you should start with the basic technique first as it's likely also the optimal solution for performance as well, especially if your map doesn't change very often.
You can upload all parameters to gpu memory and draw everything using only one draw call.
This way it's not required to update vertex shader uniforms and you should have zero cpu load.
It's been 3 years since I used OpenGL so I can only point you into the right direction.
Start reading some material like for instance:
https://ferransole.wordpress.com/2014/07/09/multidrawindirect/
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glDrawArraysIndirect.xhtml
Also, keep in mind this is GL 4.x stuff, check your target platform (software+hardware) GL version support.
I'm making 2D game with large-pixel graphics. To achieve this effect I'm rendering all images to framebuffer with texture 2 times smaller than my window. And then, I'm rendering this texture to window using quad ({{-1,-1},{1,-1},{1,1},{-1,1}}).
This works fine, but coordinate system when rendering to texture is a bit strange. For example, when I use
glBegin(GL_POINTS);
glVertex2f(-0.75, -0.75);
glEnd();
It renders 2x2 point. I would expect this point to be at (win_w * 1/8, win_h * 7/8) but whis point is at (win_w * 1/4, win_h * 3/4).
If I change framebuffer texture size from ((win_w + 1) / 2, (win_h + 1) / 2) (2 times smaller than my screen)
to ((win_w + 3) / 4, (win_h + 3) / 4) (4 times smaller than my screen) that point is now has 4x4 size and it is at (win_w * 1/2, win_h * 1/2) (center of window).
I think this is incorrect. AFAIK, framebuffer coordinate system does not depend on framebuffer texture size; 1,1 is a top-right corner on any texture size, right?
There is no transformation matrixes or sometring like this, so OpenGL must not transform my coordinates.
I still can render with this strange coordinate system, but I don't understand why it works this way.
So, question is: i want to render vertices is same place inside window with any framebuffer texture size. Is it possible? (I don't want to use trasformation matrixes inside shaders, because it should work without them. I hope there is another solutions.)
Shaders:
// Vertex:
#version 430
in layout(location = 0) vec2 pos;
out vec2 vPos;
void main()
{
vPos = pos;
gl_Position = vec4(pos.x, pos.y, 0, 1);
}
// Fragment:
#version 430
uniform layout(location = 0) sampler2D tex;
in vec2 vPos;
out vec4 color;
void main()
{
color = texture(tex, (vPos + 1) / 2);
}
Problem solved. (Thanks to #RetoKoradi.) Now my code looks like this:
glViewport(0, 0, 800, 600);
/// Switch shaders and framebuffer
DrawQuadWithTexture();
glViewport(0, 0, 400, 300);
/// Switch shaders and framebuffer
DrawAllStuff();
I'm trying to render to texture with OpenGL + GLSL shaders. For start I'm trying to fill every pixel of 30x30 texture with white color. I'm passing to vertex shader index from 0 to 899, representing each pixel of texture. Is this correct?
Vertex shader:
flat in int index;
void main(void) {
gl_Position = vec4((index % 30) / 15 - 1, floor(index / 30) / 15 - 1, 0, 1);
}
Fragment shader:
out vec4 color;
void main(void) {
color = vec4(1, 1, 1, 1);
}
You are trying to render 900 vertices, with one vertex per pixel? Why are you doing that? What primitive type are you using. It would only make sense if you were using points, but then you would need some slight modification of the output coordinates to actually hit the fragment centers.
The usual way for this is to render just a quad (easily represented as a triangle strip with just 4 vertices) which is filling the whole framebuffer. To achieve this, you just need to setup the viewport to the full frambeuffer and render are quad from (-1,-1) to (1,1).
Note that in both approaches, you don't need vertex attributes. You could just use gl_VertexID (directly as replacement for index in your approach, or as aan index to a const array of 4 vertex coords for the quad).