Related
I'm a complete beginner to OpenGL programming and am trying to follow the Breakout tutorial at learnopengl.com but would like to draw the ball as an actual circle, instead of using a textured quad like Joey suggests. However, every result that Google throws back at me for "draw circle opengl 3.3" or similar phrases seems to be at least a few years old, and using even-older-than-that versions of the API :-(
The closest thing that I've found is this SO question, but of course the OP just had to use a custom VertexFormat object to abstract some of the details, without sharing his/her implementation of such! Just my luck! :P
There's also this YouTube tutorial that uses a seemingly-older version of the API, but copying the code verbatim (except for the last few lines which is where the code looks old) still got me nowhere.
My version of SpriteRenderer::initRenderData() from the tutorial:
void SpriteRenderer::initRenderData() {
GLuint vbo;
auto attribSize = 0;
GLfloat* vertices = nullptr;
// Determine whether this sprite is a circle or
// quad and setup the vertices array accordingly
if (!this->isCircle) {
attribSize = 4;
vertices = new GLfloat[24] {...} // works for rendering quads
} else {
// This code is adapted from the YouTube tutorial that I linked
// above and is where things go pear-shaped for me...or at least
// **not** circle-shaped :P
attribSize = 3;
GLfloat x = 0.0f;
GLfloat y = 0.0f;
GLfloat z = 0.0f;
GLfloat r = 100.0f;
GLint numSides = 6;
GLint numVertices = numSides + 2;
GLfloat* xCoords = new GLfloat[numVertices];
GLfloat* yCoords = new GLfloat[numVertices];
GLfloat* zCoords = new GLfloat[numVertices];
xCoords[0] = x;
yCoords[0] = y;
zCoords[0] = z;
for (auto i = 1; i < numVertices; i++) {
xCoords[i] = x + (r * cos(i * (M_PI * 2.0f) / numSides));
yCoords[i] = y + (r * sin(i * (M_PI * 2.0f) / numSides));
zCoords[i] = z;
}
vertices = new GLfloat[numVertices * 3];
for (auto i = 0; i < numVertices; i++) {
vertices[i * 3] = xCoords[i];
vertices[i * 3 + 1] = yCoords[i];
vertices[i * 3 + 2] = zCoords[i];
}
}
// This is where I go back to the learnopengl.com code. Once
// again, the following works for quads but not circles!
glGenVertexArrays(1, &vao);
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, 24 * sizeof(
GLfloat), vertices, GL_STATIC_DRAW);
glBindVertexArray(vao);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, attribSize, GL_FLOAT, GL_FALSE,
attribSize * sizeof(GLfloat), (GLvoid*)0);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
}
And here's the SpriteRenderer::DrawSprite() method (the only difference from the original being lines 24 - 28):
void SpriteRenderer::Draw(vec2 position, vec2 size, GLfloat rotation, vec3 colour) {
// Prepare transformations
shader.Use();
auto model = mat4(1.0f);
model = translate(model, vec3(position, 0.0f));
model = translate(model, vec3(0.5f * size.x, 0.5f * size.y, 0.0f)); // Move origin of rotation to center
model = rotate(model, rotation, vec3(0.0f, 0.0f, 1.0f)); // Rotate quad
model = translate(model, vec3(-0.5f * size.x, -0.5f * size.y, 0.0f)); // Move origin back
model = scale(model, vec3(size, 1.0f)); // Lastly, scale
shader.SetMatrix4("model", model);
// Render textured quad
shader.SetVector3f("spriteColour", colour);
glActiveTexture(GL_TEXTURE0);
texture.Bind();
glBindVertexArray(vao);
if (!isCircular) {
glDrawArrays(GL_TRIANGLES, 0, 6);
} else {
glDrawArrays(GL_TRIANGLE_FAN, 0, 24); // also tried "12" and "8" for the last param, to no avail
}
glBindVertexArray(0);
}
And finally, the shaders (different to the ones used for quads):
// Vertex shader
#version 330 core
layout (location = 0) in vec3 position;
uniform mat4 model;
uniform mat4 projection;
void main() {
gl_Position = projection * model *
vec4(position.xyz, 1.0f);
}
// Fragment shader
#version 330 core
out vec4 colour;
uniform vec3 spriteColour;
void main() {
colour = vec4(spriteColour, 1.0);
}
P.S. I know I could just use a quad but I'm trying to learn how to draw all primitives in OpenGL, not just quads and triangles (thanks anyway Joey)!
P.P.S I just realised that the learnopengl.com site has a whole section devoted to debugging OpenGL apps, so I set that up but to no avail :-( I don't think the error handling is supported by my driver (Intel UHD Graphics 620 latest driver) since the GL_CONTEXT_FLAG_DEBUG_BIT was not set after following the instructions:
Requesting a debug context in GLFW is surprisingly easy as all we have to do is pass a hint to GLFW that we'd like to have a debug output context. We have to do this before we call glfwCreateWindow:
glfwWindowHint(GLFW_OPENGL_DEBUG_CONTEXT, GL_TRUE);
Once we initialize GLFW we should have a debug context if we're using OpenGL version 4.3 or higher, or else we have to take our chances and hope the system is still able to request a debug context. Otherwise we have to request debug output using its OpenGL extension(s).
To check if we successfully initialized a debug context we can query OpenGL:
GLint flags; glGetIntegerv(GL_CONTEXT_FLAGS, &flags);
if (flags & GL_CONTEXT_FLAG_DEBUG_BIT) {
// initialize debug output
}
That if statement is never entered into!
Thanks to #Mykola's answer to this question I have gotten half-way there:
numVertices = 43;
vertices = new GLfloat[numVertices];
auto i = 2;
auto x = 0.0f,
y = x,
z = x,
r = 0.3f;
auto numSides = 21;
auto TWO_PI = 2.0f * M_PI;
auto increment = TWO_PI / numSides;
for (auto angle = 0.0f; angle <= TWO_PI; angle += increment) {
vertices[i++] = r * cos(angle) + x;
vertices[i++] = r * sin(angle) + y;
}
Which gives me .
Two questions I still have:
Why is there an extra line going from the centre to the right side and how can I fix it?
According to #user1118321's comment on a related SO answer, I should be able to prepend another vertex to the array at (0, 0) and use GL_TRIANGLE_FAN instead of GL_LINE_LOOP
to get a coloured circle. But this results in no output for me :-( Why?
I work on particles system and I want to use SSBO to make update of velocity and position on my particles with compute shader. But I see for each update-call the compute use same values of positions but compute update position because in draw-call particles are moved.
Load particles into SSBOs
// Load Positions
glGenBuffers(1, &m_SSBOpos);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, m_SSBOpos);
// Allocation de la mémoire vidéo
glBufferData(GL_SHADER_STORAGE_BUFFER, pb.size() * 4 * sizeof(float), NULL, GL_STATIC_DRAW);
GLint bufMask = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT; // the invalidate makes a big difference when re-writing
float *points = (float *) glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, pb.size() * 4 * sizeof(float), bufMask);
for (int i = 0; i < pb.size(); i++)
{
points[i * 4] = pb.at(i).m_Position.x;
points[i * 4 + 1] = pb.at(i).m_Position.y;
points[i * 4 + 2] = pb.at(i).m_Position.z;
points[i * 4 + 3] = 0;
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
// Load vélocité
glGenBuffers(1, &m_SSBOvel);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, m_SSBOvel);
// Allocation de la mémoire vidéo
glBufferData(GL_SHADER_STORAGE_BUFFER, pb.size() * 4 * sizeof(float), NULL, GL_STATIC_DRAW);
float *vels = (float *)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, pb.size() * 4 * sizeof(float), bufMask);
for (int i = 0; i < pb.size(); i++)
{
vels[i * 4] = pb.at(i).m_Velocity.x;
vels[i * 4 + 1] = pb.at(i).m_Velocity.y;
vels[i * 4 + 2] = pb.at(i).m_Velocity.z;
vels[i * 4 + 3] = 0;
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
Update
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 4, shaderUtil.getSSBOpos());
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 5, shaderUtil.getSSBOvel());
// UPDATE DES PARTICULES
shaderUtil.UseCompute();
glUniform1i(shaderUtil.getDT(), fDeltaTime);
glDispatchCompute(NUM_PARTICLES / WORK_GROUP_SIZE, 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
shaderUtil.DeleteCompute();
Draw
shaderUtil.Use();
glUniformMatrix4fv(glGetUniformLocation(shaderUtil.getProgramID(), "projection"), 1, GL_FALSE, glm::value_ptr(projection));
glUniformMatrix4fv(glGetUniformLocation(shaderUtil.getProgramID(), "modelview"), 1, GL_FALSE, glm::value_ptr(View * Model));
glPointSize(10);
// Rendu
glBindBuffer(GL_ARRAY_BUFFER, shaderUtil.getSSBOpos());
glVertexPointer(4, GL_FLOAT, 0, (void *)0);
glEnableClientState(GL_VERTEX_ARRAY);
glDrawArrays(GL_POINTS, 0, NUM_PARTICLES);
glDisableClientState(GL_VERTEX_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
shaderUtil.Delete();
Comput shader
#version 430 compatibility
#extension GL_ARB_compute_shader : enable
#extension GL_ARB_shader_storage_buffer_object : enable
layout(std140, binding = 4) buffer Pos
{
vec4 Positions[]; // array of structures
};
layout(std140, binding = 5) buffer Vel
{
vec4 Velocities[]; // array of structures
};
uniform float dt;
layout(local_size_x = 128, local_size_y = 1, local_size_z = 1) in;
void main()
{
uint numParticule = gl_GlobalInvocationID.x;
vec4 v = Velocities[numParticule];
vec4 p = Positions[numParticule];
vec4 tmp = vec4(0, -9.81, 0,0) + v * (0.001 / (7. / 1000.));
v += tmp ;
Velocities[numParticule] = v;
p += v ;
Positions[numParticule] = p;
}
Do you know why it's happened ?
I have done class which render 2d objects based on Dear ImGui DrawList, because it can draw many different variants of objects thanks index vector dynamic array and still stay well optimized. Dear ImGui can render 30k unfilled rects while having ~36fps and ~70MB on debug mode, without antialiasing (my computer). Mine very limited version draws 30k unfilled rects while having ~3 fps and ~130MB on debug mode.
class Renderer
{
public:
Renderer();
~Renderer();
void Create();
void DrawRect(float x, float y, float w, float h, GLuint color, float thickness);
void Render(float w, float h);
void Clear();
void ReserveData(int numVertices, int numElements);
void CreatePolygon(const Vector2* vertices, const GLuint verticesCount, GLuint color, float thickness);
GLuint vao, vbo, ebo;
GLShader shader;
Vertex* mappedVertex = nullptr;
GLuint* mappedElement = nullptr,
currentVertexIndex = 0;
std::vector<Vertex> vertexBuffer;
std::vector<GLuint> elementBuffer;
std::vector<Vector2> vertices;
};
const char* vtx =
R"(
#version 460 core
layout(location = 0) in vec3 a_position;
layout(location = 1) in vec4 a_color;
out vec3 v_position;
out vec4 v_color;
uniform mat4 projection;
void main()
{
gl_Position = projection * vec4(a_position, 1.0);
v_color = a_color;
}
)";
const char* frag =
R"(
#version 460 core
layout (location = 0) out vec4 outColor;
in vec4 v_color;
void main()
{
outColor = v_color;
}
)";
void Renderer::Clear()
{
vertexBuffer.resize(0);
elementBuffer.resize(0);
vertices.resize(0);
mappedVertex = nullptr;
mappedElement = nullptr;
currentVertexIndex = 0;
}
void Renderer::Create()
{
glGenBuffers(1, &vbo);
glGenBuffers(1, &ebo);
shader.VtxFromFile(vtx);
shader.FragFromFile(frag);
}
void Renderer::DrawRect(float x, float y, float w, float h, GLuint color, float thickness)
{
// Add vertices
vertices.push_back({ x, y });
vertices.push_back(Vector2(x, y + w));
vertices.push_back(Vector2( x, y ) + Vector2(w, h));
vertices.push_back(Vector2(x + w, y));
// Create rect
CreatePolygon(vertices.data(), vertices.size(), color, thickness);
}
void Renderer::Render(float w, float h)
{
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
shader.UseProgram();
shader.UniformMatrix4fv("projection", glm::ortho(0.0f, w, 0.0f, h));
GLuint elemCount = elementBuffer.size();
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glEnableVertexAttribArray(0);
glEnableVertexAttribArray(1);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex), (const void*)offsetof(Vertex, position));
glVertexAttribPointer(1, 4, GL_UNSIGNED_BYTE, GL_TRUE, sizeof(Vertex), (const void*)offsetof(Vertex, position));
glBufferData(GL_ARRAY_BUFFER, vertexBuffer.size() * sizeof(Vertex), vertexBuffer.data(), GL_STREAM_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, elementBuffer.size() * sizeof(GLuint), elementBuffer.data(), GL_STREAM_DRAW);
const unsigned short* idxBufferOffset = 0;
glDrawElements(GL_TRIANGLES, elemCount, GL_UNSIGNED_INT, idxBufferOffset);
idxBufferOffset += elemCount;
glDeleteVertexArrays(1, &vao);
glDisable(GL_BLEND);
}
void Renderer::CreatePolygon(const Vector2* vertices, const GLuint verticesCount, GLuint color, float thickness)
{
// To create for example unfilled rect, we have to draw 4 rects with small sizes
// So, unfilled rect is built from 4 rects and each rect contains 4 vertices ( * 4) and 6 indices ( *6)
ReserveData(verticesCount * 4, verticesCount * 6);
for (GLuint i = 0; i < verticesCount; ++i)
{
const int j = (i + 1) == verticesCount ? 0 : i + 1;
const Vector2& position1 = vertices[i];
const Vector2& position2 = vertices[j];
Vector2 difference = position2 - position1;
difference *= difference.Magnitude() > 0 ? 1.0f / difference.Magnitude() : 1.0f;
const float dx = difference.x * (thickness * 0.5f);
const float dy = difference.y * (thickness * 0.5f);
mappedVertex[0].position = Vector2(position1.x + dy, position1.y - dx);
mappedVertex[1].position = Vector2(position2.x + dy, position2.y - dx);
mappedVertex[2].position = Vector2(position2.x - dy, position2.y + dx);
mappedVertex[3].position = Vector2(position1.x - dy, position1.y + dx);
mappedVertex[0].color = color;
mappedVertex[1].color = color;
mappedVertex[2].color = color;
mappedVertex[3].color = color;
mappedVertex += 4;
mappedElement[0] = currentVertexIndex;
mappedElement[1] = currentVertexIndex + 1;
mappedElement[2] = currentVertexIndex + 2;
mappedElement[3] = currentVertexIndex + 2;
mappedElement[4] = currentVertexIndex + 3;
mappedElement[5] = currentVertexIndex;
mappedElement += 6;
currentVertexIndex += 4;
}
this->vertices.clear();
}
void Renderer::ReserveData(int numVertices, int numElements)
{
currentVertexIndex = vertexBuffer.size();
// Map vertex buffer
int oldVertexSize = vertexBuffer.size();
vertexBuffer.resize(oldVertexSize + numVertices);
mappedVertex = vertexBuffer.data() + oldVertexSize;
// Map element buffer
int oldIndexSize = elementBuffer.size();
elementBuffer.resize(oldIndexSize + numElements);
mappedElement = elementBuffer.data() + oldIndexSize;
}
int main()
{
//Create window, init opengl, etc.
Renderer renderer;
renderer.Create();
bool quit=false;
while(!quit) {
//Events
//Clear color bit
renderer.Clear();
for(int i = 0; i < 30000; ++i)
renderer.DrawRect(100.0f, 100.0f, 50.0f, 50.0f, 0xffff0000, 1.5f);
renderer.Render(windowW, windowH);
//swap buffers
}
return 0;
}
Why is it that much slower?
How can I make it faster and less memory-consuming?
The biggest bottleneck in that code looks like your allocations are never amortized across frames, since you are clearing the buffers capacity instead of reusing them, leading you to lots of realloc/copies (probably Log2(n) reallocs/copies if your vector implementation grows by factor of 2). Try changing your .clear() call with .resize(0) and maybe you can have a more lazy/rare call to .clear() when things gets unused.
In debug or in release mode? Vectors are terribly slow in debug due to memory checking. Profiling should always be done in Release.
Profiling should be done both in Release and Debug/Unoptimized mode if you intend to ever use and work with your application in Debug/Unoptimized mode. The gross "zero-cost abstraction" lie of modern C++ is that it makes it a pain to work with a debugger because large applications don't run at correct frame-rate in "Debug" mode any more. Ideally you should always run all your applications in Debug mode. Do yourself a productivity favour and ALSO do some profiling/optimization for your worse case.
Good luck with your learning quest! :)
Solution
I do not use std::vector anymore. I use ImVector instead (it maybe your own implementation as well),
I set position directly to a Vector2.x/.y
EDIT: I'm thinking the problem might be when I'm loading the vertices and indices. Maybe focus on that section :)
I'm trying to load a heightmap from a bmp file and displaying it in OpenGL. As with most things I try, everything compiles and runs without errors but nothing is drawn on the screen. I can't seem to isolate the issue that much, since all the code works on its own, but when combined to draw terrain, nothing works.
Terrain class
I have a terrain class. It has 2 VBOs, 1 IBO and 1 VAO. It also stores the vertices, indices, colours of the vertices and the heights. It is loaded from a bmp file.
Loading terrain:
Terrain* Terrain::loadTerrain(const std::string& filename, float height)
{
BitMap* bmp = BitMap::load(filename);
Terrain* t = new Terrain(bmp->width, bmp->length);
for(unsigned y = 0; y < bmp->length; y++)
{
for(unsigned x = 0; x < bmp->width; x++)
{
unsigned char color =
(unsigned char)bmp->pixels[3 * (y * bmp->width + x)];
float h = height * ((color / 255.0f) - 0.5f);
t->setHeight(x, y, h);
}
}
delete bmp;
t->initGL();
return t;
}
Initializing the buffers:
void Terrain::initGL()
{
// load vertices from heights data
vertices = new Vector4f[w * l];
int vertIndex = 0;
for(unsigned y = 0; y < l; y++)
{
for(unsigned x = 0; x < w; x++)
{
vertices[vertIndex++] = Vector4f((float)x, (float)y, heights[y][x], 1.0f);
}
}
// generate indices for indexed drawing
indices = new GLshort[(w - 1) * (l - 1) * 6]; // patch count * 6 (2 triangles per terrain patch)
int indicesIndex = 0;
for(unsigned y = 0; y < (l - 1); ++y)
{
for(unsigned x = 0; x < (w - 1); ++x)
{
int start = y * w + x;
indices[indicesIndex++] = (GLshort)start;
indices[indicesIndex++] = (GLshort)(start + 1);
indices[indicesIndex++] = (GLshort)(start + w);
indices[indicesIndex++] = (GLshort)(start + 1);
indices[indicesIndex++] = (GLshort)(start + 1 + w);
indices[indicesIndex++] = (GLshort)(start + w);
}
}
// generate colours for the vertices
colours = new Vector4f[w * l];
for(unsigned i = 0; i < w * l; i++)
{
colours[i] = Vector4f(0.0f, 1.0f, 0.0f, 1.0f); // let's make the entire terrain green
}
// THIS CODE WORKS FOR CUBES (BEGIN)
// vertex buffer object
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
// index buffer object
glGenBuffers(1, &ibo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
// colours vertex buffer object
glGenBuffers(1, &colour_vbo);
glBindBuffer(GL_ARRAY_BUFFER, colour_vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(colours), colours, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
// create vertex array object
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, colour_vbo);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBindVertexArray(0);
// THIS CODE WORKS FOR CUBES (END)
}
The part where I create the VBOs, IBO and VAO works fine for cubes, they are drawn nicely.
Rendering terrain:
void Terrain::render()
{
glUseProgram(shaderProgram);
glBindVertexArray(vao);
int indices_length = (w - 1) * (l - 1) * 6;
glDrawElements(GL_TRIANGLES, indices_length, GL_UNSIGNED_SHORT, 0);
}
Shaders
These are the vertex and fragment shaders.
Vertex:
#version 330
layout (location = 0) in vec4 position;
layout (location = 1) in vec4 vertexColour;
out vec4 fragmentColour;
uniform vec3 offset;
uniform mat4 perspectiveMatrix;
void main()
{
vec4 cameraPos = position + vec4(offset.x, offset.y, offset.z, 0.0);
gl_Position = perspectiveMatrix * cameraPos;
fragmentColour = vertexColour;
}
Fragment:
#version 330
in vec4 fragmentColour;
out vec4 outputColour;
void main()
{
outputColour = fragmentColour;
}
Perspective matrix
Here are the settings for the "camera":
struct CameraSettings
{
static const float FRUSTUM_SCALE = 1.0f;
static const float Z_NEAR = 0.5f;
static const float Z_FAR = 3.0f;
static float perspective_matrix[16];
};
float CameraSettings::perspective_matrix[16] = {
FRUSTUM_SCALE,
0, 0, 0, 0,
FRUSTUM_SCALE,
0, 0, 0, 0,
(Z_FAR + Z_NEAR) / (Z_NEAR - Z_FAR),
-1.0f,
0, 0,
(2 * Z_FAR * Z_NEAR) / (Z_NEAR - Z_FAR),
0
};
The uniforms get filled in after initGL() is called:
// get offset uniform
offsetUniform = ShaderManager::getUniformLocation(shaderProgram, "offset");
perspectiveMatrixUniform = ShaderManager::getUniformLocation(shaderProgram, "perspectiveMatrix");
// set standard uniform data
glUseProgram(shaderProgram);
glUniform3f(offsetUniform, xOffset, yOffset, zOffset);
glUniformMatrix4fv(perspectiveMatrixUniform, 1, GL_FALSE, CameraSettings::perspective_matrix);
glUseProgram(0);
Could someone check out my code and give suggestions?
I'm pretty sure that when you say :
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);
you actually want to say :
glBufferData(GL_ARRAY_BUFFER, sizeof (Vector4f) * w * l, vertices, GL_STATIC_DRAW);
(same to color buffer, etc)
I am enrolled in shaders course and interested in computer vision and image processing. I was wondering how can I mix GLSL shaders knowledge with image processing? What do I gain if I implement image processing algorithms with GLSL?
Case study: real time box blur on CPU vs GPU fragment shader
I have implemented a simple box blur https://en.wikipedia.org/wiki/Box_blur algorithm on CPU and GPU fragment shader to see which was faster:
demo video https://www.youtube.com/watch?v=MRhAljmHq-o
source code: https://github.com/cirosantilli/cpp-cheat/blob/cfd18700827dfcff6080a938793b44b790f0b7e7/opengl/glfw_webcam_image_process.c
My camera refresh rate capped FPS to 30, so I measured how wide the box could be and still keep 30 FPS.
On a Lenovo T430 (2012), NVIDIA NVS5400, Ubuntu 16.04 with image dimensions 960x540, the maximum widths were:
GPU: 23
CPU: 5
Since the computation is quadratic, the speedup was:
( 23 / 5 ) ^ 2 = 21.16
faster on GPU than CPU!
Not all algorithms are faster on the GPU. For example, operations that act on single pictures like swapping RGB reach 30FPS on the CPU, so it is useless to add the complexity of GPU programming to it.
Like any other CPU vs GPU speedup question, it all comes down if you have enough work per byte transferred to the GPU, and benchmarking is the best thing you can do. In general, quadratic algorithms or worse are a good bet for the GPU. See also: What do the terms "CPU bound" and "I/O bound" mean?
Main part of the code (just clone from GitHub):
#include "common.h"
#include "../v4l2/common_v4l2.h"
static const GLuint WIDTH = 640;
static const GLuint HEIGHT = 480;
static const GLfloat vertices[] = {
/* xy uv */
-1.0, 1.0, 0.0, 1.0,
0.0, 1.0, 0.0, 0.0,
0.0, -1.0, 1.0, 0.0,
-1.0, -1.0, 1.0, 1.0,
};
static const GLuint indices[] = {
0, 1, 2,
0, 2, 3,
};
static const GLchar *vertex_shader_source =
"#version 330 core\n"
"in vec2 coord2d;\n"
"in vec2 vertexUv;\n"
"out vec2 fragmentUv;\n"
"void main() {\n"
" gl_Position = vec4(coord2d, 0, 1);\n"
" fragmentUv = vertexUv;\n"
"}\n";
static const GLchar *fragment_shader_source =
"#version 330 core\n"
"in vec2 fragmentUv;\n"
"out vec3 color;\n"
"uniform sampler2D myTextureSampler;\n"
"void main() {\n"
" color = texture(myTextureSampler, fragmentUv.yx).rgb;\n"
"}\n";
static const GLchar *vertex_shader_source2 =
"#version 330 core\n"
"in vec2 coord2d;\n"
"in vec2 vertexUv;\n"
"out vec2 fragmentUv;\n"
"void main() {\n"
" gl_Position = vec4(coord2d + vec2(1.0, 0.0), 0, 1);\n"
" fragmentUv = vertexUv;\n"
"}\n";
static const GLchar *fragment_shader_source2 =
"#version 330 core\n"
"in vec2 fragmentUv;\n"
"out vec3 color;\n"
"uniform sampler2D myTextureSampler;\n"
"// pixel Delta. How large a pixel is in 0.0 to 1.0 that textures use.\n"
"uniform vec2 pixD;\n"
"void main() {\n"
/*"// Identity\n"*/
/*" color = texture(myTextureSampler, fragmentUv.yx ).rgb;\n"*/
/*"// Inverter\n"*/
/*" color = 1.0 - texture(myTextureSampler, fragmentUv.yx ).rgb;\n"*/
/*"// Swapper\n"*/
/*" color = texture(myTextureSampler, fragmentUv.yx ).gbr;\n"*/
/*"// Double vision ortho.\n"*/
/*" color = ("*/
/*" texture(myTextureSampler, fragmentUv.yx ).rgb +\n"*/
/*" texture(myTextureSampler, fragmentUv.xy ).rgb\n"*/
/*" ) / 2.0;\n"*/
/*"// Multi-me.\n"*/
/*" color = texture(myTextureSampler, 4.0 * fragmentUv.yx ).rgb;\n"*/
/*"// Horizontal linear blur.\n"*/
/*" int blur_width = 21;\n"*/
/*" int blur_width_half = blur_width / 2;\n"*/
/*" color = vec3(0.0, 0.0, 0.0);\n"*/
/*" for (int i = -blur_width_half; i <= blur_width_half; ++i) {\n"*/
/*" color += texture(myTextureSampler, vec2(fragmentUv.y + i * pixD.x, fragmentUv.x)).rgb;\n"*/
/*" }\n"*/
/*" color /= blur_width;\n"*/
/*"// Square linear blur.\n"*/
" int blur_width = 23;\n"
" int blur_width_half = blur_width / 2;\n"
" color = vec3(0.0, 0.0, 0.0);\n"
" for (int i = -blur_width_half; i <= blur_width_half; ++i) {\n"
" for (int j = -blur_width_half; j <= blur_width_half; ++j) {\n"
" color += texture(\n"
" myTextureSampler, fragmentUv.yx + ivec2(i, j) * pixD\n"
" ).rgb;\n"
" }\n"
" }\n"
" color /= (blur_width * blur_width);\n"
"}\n";
int main(int argc, char **argv) {
CommonV4l2 common_v4l2;
GLFWwindow *window;
GLint
coord2d_location,
myTextureSampler_location,
vertexUv_location,
coord2d_location2,
pixD_location2,
myTextureSampler_location2,
vertexUv_location2
;
GLuint
ebo,
program,
program2,
texture,
vbo,
vao,
vao2
;
unsigned int
cpu,
width,
height
;
uint8_t *image;
float *image2 = NULL;
/*uint8_t *image2 = NULL;*/
if (argc > 1) {
width = strtol(argv[1], NULL, 10);
} else {
width = WIDTH;
}
if (argc > 2) {
height = strtol(argv[2], NULL, 10);
} else {
height = HEIGHT;
}
if (argc > 3) {
cpu = (argv[3][0] == '1');
} else {
cpu = 0;
}
/* Window system. */
glfwInit();
glfwWindowHint(GLFW_RESIZABLE, GL_FALSE);
window = glfwCreateWindow(2 * width, height, __FILE__, NULL, NULL);
glfwMakeContextCurrent(window);
glewInit();
CommonV4l2_init(&common_v4l2, COMMON_V4L2_DEVICE, width, height);
/* Shader setup. */
program = common_get_shader_program(vertex_shader_source, fragment_shader_source);
coord2d_location = glGetAttribLocation(program, "coord2d");
vertexUv_location = glGetAttribLocation(program, "vertexUv");
myTextureSampler_location = glGetUniformLocation(program, "myTextureSampler");
/* Shader setup 2. */
const GLchar *fs;
if (cpu) {
fs = fragment_shader_source;
} else {
fs = fragment_shader_source2;
}
program2 = common_get_shader_program(vertex_shader_source2, fs);
coord2d_location2 = glGetAttribLocation(program2, "coord2d");
vertexUv_location2 = glGetAttribLocation(program2, "vertexUv");
myTextureSampler_location2 = glGetUniformLocation(program2, "myTextureSampler");
pixD_location2 = glGetUniformLocation(program2, "pixD");
/* Create vbo. */
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
/* Create ebo. */
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
/* vao. */
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glVertexAttribPointer(coord2d_location, 2, GL_FLOAT, GL_FALSE, 4 * sizeof(vertices[0]), (GLvoid*)0);
glEnableVertexAttribArray(coord2d_location);
glVertexAttribPointer(vertexUv_location, 2, GL_FLOAT, GL_FALSE, 4 * sizeof(GLfloat), (GLvoid*)(2 * sizeof(vertices[0])));
glEnableVertexAttribArray(vertexUv_location);
glBindVertexArray(0);
/* vao2. */
glGenVertexArrays(1, &vao2);
glBindVertexArray(vao2);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glVertexAttribPointer(coord2d_location2, 2, GL_FLOAT, GL_FALSE, 4 * sizeof(vertices[0]), (GLvoid*)0);
glEnableVertexAttribArray(coord2d_location2);
glVertexAttribPointer(vertexUv_location2, 2, GL_FLOAT, GL_FALSE, 4 * sizeof(GLfloat), (GLvoid*)(2 * sizeof(vertices[0])));
glEnableVertexAttribArray(vertexUv_location2);
glBindVertexArray(0);
/* Texture buffer. */
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
/* Constant state. */
glViewport(0, 0, 2 * width, height);
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glActiveTexture(GL_TEXTURE0);
/* Main loop. */
common_fps_init();
do {
/* Blocks until an image is available, thus capping FPS to that.
* 30FPS is common in cheap webcams. */
CommonV4l2_updateImage(&common_v4l2);
image = CommonV4l2_getImage(&common_v4l2);
glClear(GL_COLOR_BUFFER_BIT);
/* Original. */
glTexImage2D(
GL_TEXTURE_2D, 0, GL_RGB, width, height,
0, GL_RGB, GL_UNSIGNED_BYTE, image
);
glUseProgram(program);
glUniform1i(myTextureSampler_location, 0);
glBindVertexArray(vao);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
/* Optional CPU modification to compare with GPU shader speed. */
if (cpu) {
image2 = realloc(image2, 3 * width * height * sizeof(image2[0]));
for (unsigned int i = 0; i < height; ++i) {
for (unsigned int j = 0; j < width; ++j) {
size_t index = 3 * (i * width + j);
/* Inverter. */
/*image2[index + 0] = 1.0 - (image[index + 0] / 255.0);*/
/*image2[index + 1] = 1.0 - (image[index + 1] / 255.0);*/
/*image2[index + 2] = 1.0 - (image[index + 2] / 255.0);*/
/* Swapper. */
/*image2[index + 0] = image[index + 1] / 255.0;*/
/*image2[index + 1] = image[index + 2] / 255.0;*/
/*image2[index + 2] = image[index + 0] / 255.0;*/
/* Square linear blur. */
int blur_width = 5;
int blur_width_half = blur_width / 2;
int blur_width2 = (blur_width * blur_width);
image2[index + 0] = 0.0;
image2[index + 1] = 0.0;
image2[index + 2] = 0.0;
for (int k = -blur_width_half; k <= blur_width_half; ++k) {
for (int l = -blur_width_half; l <= blur_width_half; ++l) {
int i2 = i + k;
int j2 = j + l;
// Out of bounds is black. TODO: do module to match shader exactly.
if (i2 > 0 && i2 < (int)height && j2 > 0 && j2 < (int)width) {
unsigned int srcIndex = index + 3 * (k * width + l);
image2[index + 0] += image[srcIndex + 0];
image2[index + 1] += image[srcIndex + 1];
image2[index + 2] += image[srcIndex + 2];
}
}
}
image2[index + 0] /= (blur_width2 * 255.0);
image2[index + 1] /= (blur_width2 * 255.0);
image2[index + 2] /= (blur_width2 * 255.0);
}
}
glTexImage2D(
GL_TEXTURE_2D, 0, GL_RGB, width, height,
0, GL_RGB, GL_FLOAT, image2
);
}
/* Modified. */
glUseProgram(program2);
glUniform1i(myTextureSampler_location2, 0);
glUniform2f(pixD_location2, 1.0 / width, 1.0 / height);
glBindVertexArray(vao2);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
glfwSwapBuffers(window);
glfwPollEvents();
common_fps_print();
} while (!glfwWindowShouldClose(window));
/* Cleanup. */
if (cpu) {
free(image2);
}
CommonV4l2_deinit(&common_v4l2);
glDeleteBuffers(1, &vbo);
glDeleteVertexArrays(1, &vao);
glDeleteTextures(1, &texture);
glDeleteProgram(program);
glfwTerminate();
return EXIT_SUCCESS;
}
The first obvious answer is that you gain parallelism. Now, why using GLSL rather than, say CUDA which is more flexible ? GLSL doesn't require you to have an NVIDIA graphics card, so it's a much more portable solution (you'd still have the option of OpenCL though).
What can you gain with parallelism ? Most of the time, you can treat pixels independantly. For instance, increasing the contrast of an image usually requires you to loop over all pixels and apply an affine transform of the pixel values. If each pixel is handled by a separate thread, then you don't need to do this loop anymore : you just raterize a quad, and apply a pixel shader that reads a texture at the current rasterized point, and ouput to the render target (or the screen) the transformed pixel value.
The drawback is that your data need to reside on the GPU : you'll need to transfer all your images to the GPU which can take some time, and can make the speedup gained with the parallelization useless. As such, GPU implementations are often done either when the operations to be made are compute intensive, or when the whole pipeline can remain on the GPU (for instance, if the goal is to only display the modified image on screen, you save the need to transfer back the image on the CPU).
OpenGL 4.3 (announced at SIGGRAPH 2012) supports Compute shaders. If you are doing strictly graphics work, and already using OpenGL, it might be easier to use this than OpenCL / OpenGL interop (or CUDA / OpenGL interop).
Here is what Khronos has to say about when to use 4.3 Compute shaders versus OpenCL: Link to PDF; see slide 5.