Measure Render-To-Texture Performance in OpenGL ES 2.0 - c++

Basically, I'm doing some sort of image processing using a screen-sized rectangle made of two triangles and a fragment shader, which is doing the whole processing stuff. The actual effect is something like an animation as it depends on a uniform variable, called current_frame.
I'm very much interested in measuring the performance in terms of "MPix/s". What I do is something like that:
/* Setup all necessary stuff, including: */
/* - getting the location of the `current_frame` uniform */
/* - creating an FBO, adding a color attachment */
/* and setting it as the current one */
double current_frame = 0;
double step = 1.0f / NUMBER_OF_ITERATIONS;
tic(); /* Start counting the time */
for (i = 0; i < NUMBER_OF_ITERATIONS; i++)
{
glUniform1f(current_frame_handle, current_frame);
current_frame += step;
glDrawArrays(GL_TRIANGLES, 0, NUMBER_OF_INDICES);
glFinish();
}
double elapsed_time = tac(); /* Get elapsed time in seconds */
/* Calculate achieved pixels per second */
double pps = (OUT_WIDTH * OUT_HEIGHT * NUMBER_OF_ITERATIONS) / elapsed_time;
/* Sanity check by using reading the output into a buffer */
/* using glReadPixels and saving this buffer into a file */
As far as theory goes, is there anything wrong with my concept?
Also, I've got the impression that glFinish() on mobile hardware doesn't necessarily wait for previous render calls and may do some optimizations.
Of course, I can always force it by doing glReadPixels() after each draw, but that would be quite slow so that this wouldn't really help.
Could you advise me as to whether my testing scenario is sensible and whether there is something more that can be done.

Concerning speed, using glDrawArrays() still duplicates the shared vertices.
glDrawElements() is the solution to reduce the number of vertices in
the array, so it allows transferring less data to OpenGL.
http://www.songho.ca/opengl/gl_vertexarray.html
Just throwing that in there to help speed up your results. As far as your timing concept, it looks fine to me. Are you getting results similar to what you had hoped?

I would precalculate all possible frames, and then use glEnableClientState() and glTexCoordPointer() to change which part of the existing texture is drawn in each frame.

Related

In OpenGL, what is a good target for how many vertices in a VBO while maintaining a good frame rate

I am working on making a 2D game engine from scratch, mostly for fun. Recently I've been really concerned about the performance of the whole engine. I keep reading articles on a good target number of polygons to try and reach, and I've seen talk in the millions, meanwhile I've only managed to get 40,000 without horrible frame rate drops.
I've tried to use a mapped buffer from the graphics card instead of my own, but that actually gives me worse performance. I've read about techniques like triple buffer rendering, and I can see how it may theoretically speed it up, I cant imagine it speeding my code up into the millions I've read about.
The format I use is 28 Byte vertices, (Three floats for position, 2 floats for texture coordinates, 1 for color, and 1 for which texture buffer to read from). I've thought about trimming this down, but once again it doesn't seem worth it.
Looking through my code almost 98% of the time is spent allocating, filling up, and giving the VAO to the graphics card. So that's currently my only bottleneck.
All the sprites are just 4 sided polygons, and I'm just using GL_QUADS to render the whole object. 40,000 sprites just feels really low. I only have one draw call for them, so I was expecting at least 10 times that from what I've read. I've head some models have nearly 40k polygons in them alone for 3D!
Here is some relevant code to how I render it all:
//This is the main render loop, currently it's only called once per frame
for (int i = 0; i < l_Layers.size(); i++) {
glUseProgram(l_Layers[i]->getShader().getShaderProgram());
GLint loc = glGetUniformLocation(l_Layers[i]->getShader().getShaderProgram(), "MVT");
glUniformMatrix4fv(loc,1, GL_FALSE, mat.data);
l_Layers[i]->getVertexBuffer().Bind();
glDrawArrays(GL_QUADS, 0, l_Layers[i]->getVertexBuffer().getSize());
l_Layers[i]->getVertexBuffer().Unbind();
}
//These lines of code take up by far the most compute time
void OP::VertexBuffer::startBuffer(int size)
{
flush();
Vertices = new Vertex[size * 4];
}
void OP::VertexBuffer::submit(Vertex vertex)
{
Vertices[Index] = vertex;
Index++;
}
void Layer::Render() {
l_VertexBuffer.startBuffer(l_Sprites.size());
for (size_t i = 0; i < l_Sprites.size(); i++) {
Vertex* vert = l_Sprites[i]->getVertexArray();
l_VertexBuffer.submit(vert[0]);
l_VertexBuffer.submit(vert[1]);
l_VertexBuffer.submit(vert[2]);
l_VertexBuffer.submit(vert[3]);
}
}
I don't know of anything I've been doing wrong, but I just dont understand how people are getting orders of magnitude more polygons on the screen. Especially when they have far more complex models then I have with GL_QUADS.
98% of the time is spent allocating, filling up, and giving the VAO to the graphics card. So that's currently my only bottleneck.
Creating the VAO and filling it up should actually only happen once and therefore should not affect the frame rate, you should only need to bind the VAO before calling render.
Obviously I can't see all of your code so I may have the wrong idea but it looks like you're creating a new vertex array every time Render is called.
It doesn't surprise me that you're spending all of your time in here:
//These lines of code take up by far the most compute time
void OP::VertexBuffer::startBuffer(int size)
{
flush();
Vertices = new Vertex[size * 4];
}
Calling new on every render call for a large array is going to considerably impact your performance, you're also spending time assigning to that array every frame.
On top of that you appear to be leaking memory.
Every time you call:
Vertices = new Vertex[size * 4];
You're failing to free the array that you allocated on the previous call to Render. What you're doing is similar to the example below:
foo = new Foo();
foo = new Foo();
Memory is allocated to foo in the first call, the first foo created was never deconstructed nor deallocated and there is now no way to do so as foo has been reassigned so the first foo has leaked.
So I think you have a combination of issues going on here.

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?
First and foremost the biggest CPU hog with OpenGL is immediate modeā€¦ and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.
I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

which one is proper method of writing this gl code

i have been doing some experiments with opengl and handling textures.
in my experiment i have a 2d array of (int) which are randomly generated
int mapskeleton[300][300];
then after that i have my own obj file loader for loading obj with textures
m2d wall,floor;//i initialize and load those files at start
for recording statistics of render times i used
bool Once = 1;
int secs = 0;
now to the render code here i did my experiment
// Code A: Benchmarked on radeon 8670D
// Takes 232(average) millisecs for drawing 300*300 tiles
if(Once)
secs = glutGet(GLUT_ELAPSED_TIME);
for(int i=0;i<mapHeight;i++){
for(int j=0;j<mapWidth;j++){
if(mapskeleton[j][i] == skel_Wall){
glBindTexture(GL_TEXTURE_2D,wall.texture);
glPushMatrix();
glTranslatef(j*10,i*10,0);
wall.Draw();//Draws 10 textured triangles
glPopMatrix();
}
if(mapskeleton[j][i] == skel_floor){
glBindTexture(GL_TEXTURE_2D,floor.texture);
glPushMatrix();
glTranslatef(j*10,i*10,0);
floor.Draw();//Draws 2 textured triangles
glPopMatrix();
}
}
}
if(Once){
secs = glutGet(GLUT_ELAPSED_TIME)-secs;
printf("time taken for rendering %i msecs",secs)
Once = 0;
}
and other code is
// Code B: Benchmarked on radeon 8670D
// Takes 206(average) millisecs for drawing 300*300 tiles
if(Once)
secs = glutGet(GLUT_ELAPSED_TIME);
glBindTexture(GL_TEXTURE_2D,floor.texture);
for(int i=0;i<mapHeight;i++){
for(int j=0;j<mapWidth;j++){
if(mapskeleton[j][i] == skel_floor){
glPushMatrix();
glTranslatef(j*10,i*10,0);
floor.Draw();
glPopMatrix();
}
}
}
glBindTexture(GL_TEXTURE_2D,wall.texture);
for(int i=0;i<mapHeight;i++){
for(int j=0;j<mapWidth;j++){
if(mapskeleton[j][i] == skel_Wall){
glPushMatrix();
glTranslatef(j*10,i*10,0);
wall.Draw();
glPopMatrix();
}
}
}
if(Once){
secs = glutGet(GLUT_ELAPSED_TIME)-secs;
printf("time taken for rendering %i msecs",secs)
Once = 0;
}
for me code A looks good with a point of a person(Beginner) viewing code. but benchmarks say different.
my gpu seems to like code B. I don't understand why does code B takes less time to render?
Changes to OpenGL state can generally be expensive - the driver's and/or GPUs data structures and caches can become invalidated. In your case, the change in question is binding a different texture. In code B, you're doing it twice. In code A, you're easily doing it thousands of times.
When programming OpenGL rendering, you'll generally want to set up the pipeline for settings A, render everything which needs settings A, re-set the pipeline for settings B, render everything which needs settings B, and so on.
#Angew covered why one options is more efficient than the other. But there is an important point that needs to be stated very clearly. Based on the text of your question, particularly here:
for recording statistics of render times
my gpu seems to like code B
you seem to attempt to measure rendering/GPU performance.
You are NOT AT ALL measuring GPU performance!
You measure the time for setting up the state and making the draw calls. OpenGL lets the GPU operate asynchronously from the code executed on the CPU. The picture you should keep in mind when you make (most) OpenGL calls is that you're submitting work to the GPU for later execution. There's no telling when the GPU completes that work. It most definitely (except for very few calls that you want to avoid in speed critical code) does not happen by the time the call returns.
What you're measuring in your code is purely the CPU overhead for making these calls. This includes what's happening in your own code, and what happens in the driver code for handling the calls and preparing the work for later submission to the GPU.
I'm not saying that the measurement is not useful. Minimizing CPU overhead is very important. You just need to be very aware of what you are in fact measuring, and make sure that you draw the right conclusions.

What is the best way to detect mouse-location/clicks on object in OpenGL?

I am creating a simple 2D OpenGL game, and I need to know when the player clicks or mouses over an OpenGL primitive. (For example, on a GL_QUADS that serves as one of the tiles...) There doesn't seems to be a simple way to do this beyond brute force or opengl.org's suggestion of using a unique color for every one of my primitives, which seems a little hacky. Am I missing something? Thanks...
My advice, don't use OpenGL's selection mode or OpenGL rendering (brute force method you are talking about), use a CPU-based ray picking algorithm if 3D. For 2D, like in your case, it should be straightforward, it's just a test to know if a 2D point is in a 2D rectangle.
I would suggest to use the hacky method if you want a quick implementation (coding time, I mean). Especially if you don't want to implement a quadtree with moving ojects. If you are using opengl immediate mode, that should be straightforward:
// Rendering part
glClearColor(0,0,0,0);
glClear(GL_COLOR_BUFFER_BIT);
for(unsigned i=0; i<tileCout; ++i){
unsigned tileId = i+1; // we inc the tile ID in order not to pick up the black
glColor3ub(tileId &0xFF, (tileId >>8)&0xFF, (tileId >>16)&0xFF);
renderTileWithoutColorNorTextures(i);
}
// Let's retrieve the tile ID
unsigned tileId = 0;
glReadPixels(mouseX, mouseY, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE,
(unsigned char *)&tileId);
if(tileId!=0){ // if we didn't picked the black
tileId--;
// we picked the tile number tileId
}
// We don't want to show that to the user, so we clean the screen
glClearColor(...); // the color you want
glClear(GL_COLOR_BUFFER_BIT);
// Now, render your real scene
// ...
// And we swap
whateverSwapBuffers(); // might be glutSwapBuffers, glx, ...
You can use OpenGL's glRenderMode(GL_SELECT) mode. Here is some code that uses it, and it should be easy to follow (look for the _pick method)
(and here's the same code using GL_SELECT in C)
(There have been cases - in the past - of GL_SELECT being deliberately slowed down on 'non-workstation' cards in order to discourage CAD and modeling users from buying consumer 3D cards; that ought to be a bad habit of the past that ATI and NVidia have grown out of ;) )

OpenGL: How to undo scaling?

I'm new to OpenGL. I'm using JOGL.
I have a WorldEntity class that represents a thing that can be rendered. It has attributes like position and size. To render, I've been using this method:
/**
* Renders the object in the world.
*/
public void render() {
gl.glTranslatef(getPosition().x, getPosition().y, getPosition().z);
gl.glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
// gl.glScalef(size, size, size);
gl.glCallList(drawID);
// gl.glScalef(1/size, 1/size, 1/size);
gl.glRotatef(-getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
gl.glTranslatef(-getPosition().x, -getPosition().y, -getPosition().z);
}
The pattern I've been using is applying each attribute of the entity (like position or rotation), then undoing it to avoid corrupting the state for the next entity to get rendered.
Uncommenting out the scaling lines causes the app to be much more sluggish as it renders a modest scene on my modest computer. I'm guessing that the float division is too much to handle thousands of operations per second. (?)
What is the correct way to go about this? Can I find a less computationally intensive way to undo a scaling transformation? Do I need to sort objects by scale and draw them in order to reduce scaling transformations required?
Thanks.
This is where you use matrices (bear with me, I come from a OpenGL/C programming background):
glMatrixMode(GL_MODELVIEW); // set the matrix mode to manipulate models
glPushMatrix(); // push the matrix onto the matrix stack
// apply transformations
glTranslatef(getPosition().x, getPosition().y, getPosition().z);
glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
glScalef(size, size, size);
glCallList(drawID); // drawing here
glPopMatrix(); // get your original matrix back
... at least, that's what I think it is.
It's very unlikely the divisions will cause any perf issue. rfw gave you the usual way of implementing this, but my guess is that your "slugish" rendering is mostly due to the fact that your GPU is the bottleneck, and using the matrix stacks will not improve perf.
When you increase the size of your drawn objects, more pixels have to be processed, and the GPU has to work significantly harder. What your CPU does at this point (the divisions) is irrelevant.
To prove my point, try to keep the scaling code in, but with sizes around 1.