Related
I want to draw a ring (circle with big border) with the shaperenderer.
I tried two different solutions:
Solution: draw n-circles, each with 1 pixel width and 1 pixel bigger than the one before. Problem with that: it produces a graphic glitch. (also with different Multisample Anti-Aliasing values)
Solution: draw one big filled circle and then draw a smaller one with the backgroundcolor. Problem: I can't realize overlapping ring shapes. Everything else works fine.
I can't use a ring texture, because I have to increase/decrease the ring radius dynamic. The border-width should always have the same value.
How can I draw smooth rings with the shaperenderer?
EDIT:
Increasing the line-width doesn't help:
MeshBuilder has the option to create a ring using the ellipse method. It allows you to specify the inner and outer size of the ring. Normally this would result in a Mesh, which you would need to render yourself. But because of a recent change it is also possible to use in conjunction with PolygonSpriteBatch (an implementation of Batch that allows more flexible shapes, while SpriteBatch only allows quads). You can use PolygonSpriteBatch instead of where you normally would use a SpriteBatch (e.g. for your Stage or Sprite class).
Here is an example how to use it: https://gist.github.com/xoppa/2978633678fa1c19cc47, but keep in mind that you do need the latest nightly (or at least release 1.6.4) for this.
Maybe you can try making a ring some other way, such as using triangles. I'm not familiar with LibGDX, so here's some
pseudocode.
// number of sectors in the ring, you may need
// to adapt this value based on the desired size of
// the ring
int sectors=32;
float outer=0.8; // distance to outer edge
float inner=1.2; // distance to inner edge
glBegin(GL_TRIANGLES)
glNormal3f(0,0,1)
for(int i=0;i<sectors;i++){
// define each section of the ring
float angle=(i/sectors)*Math.PI*2
float nextangle=((i+1)/sectors)*Math.PI*2
float s=Math.sin(angle)
float c=Math.cos(angle)
float sn=Math.sin(nextangle)
float cn=Math.cos(nextangle)
glVertex3f(inner*c,inner*s,0)
glVertex3f(outer*cn,outer*sn,0)
glVertex3f(outer*c,outer*s,0)
glVertex3f(inner*c,inner*s,0)
glVertex3f(inner*cn,inner*sn,0)
glVertex3f(outer*cn,outer*sn,0)
}
glEnd()
Alternatively, divide the ring into four polygons, each of which consists of one quarter of the whole ring. Then use ShapeRenderer to fill each of these polygons.
Here's an illustration of how you would divide the ring:
if I understand your question,
maybe, using glLineWidth(); help you.
example pseudo code:
size = 5;
Gdx.gl.glLineWidth(size);
mShapeRenderer.begin(....);
..//
mShapeRenderer.end();
I am trying to sort my renderables/actors correctly and noticed that I have some troubles with walls since they get sorted by their centerpoint. So I am sorting all my actors before I draw them depending on their distance to the camera with an insertion sort. After that, I am trying to determine if the wall should be drawn behind or in front of the gamefield. To explain this, the game takes place inside of a cube which is out of 6 planes. Since I can rotate the camera around that cube I need a sorting which would put the planes in front/back depending on that. So here is a picture so you know what we are talking about:
You can clearly see the rendermisstake whats happening at the front of those kind of snake.
Okay here is my current sorting:
//list of Actors the abstract class which Wall and cube and so on extend
void Group::insertionSort(vector<Actor *> &actors)
{
int j;
for (int i = 1; i < actors.size(); i++)
{
Actor *val = actors[i];
j = i - 1;
while (j >= 0 && distanceToCamera(*actors[j]) < distanceToCamera(*val))
{
actors[j + 1] = actors[j];
j = j - 1;
}
actors[j + 1] = val;
}
}
float Group::distanceToCamera(Actor &a)
{
float result = 0;
XMVECTOR posActor = XMLoadFloat3(&a.getPosition()); //here i get the centerpoint of the object
XMVECTOR posCamera = XMLoadFloat3(&m_camera->getPosition());
XMVECTOR length = XMVector3Length(posCamera - posActor);
XMStoreFloat(&result, length);
return result;
}
To determine if it's a Wall I used kind like this dynamic_cast<Wall*>(val) but I don't get them in front/back of the vector depending on that. To remember the objects return their centerpoint. Can anyone lead me to the right way?
It's difficult to answer your question because it is a complex system which you haven't fully explained here and which you should also reduce to something simpler before posting. Chances are that you would find a fix yourself on the way. Anyway, I'll do some guessing...
Now, the first thing I'd fix is the sorting algorithm. Without analysing it in depth whether it works correctly in all cases or not, I'd throw it out and use std::sort(), which is both efficient and very unlikely to contain errors.
While replacing it, you need to think about the ordering between two rendered objects carefully: The question is when exactly does one object need to be drawn before the other? You are using the distance of the center point to the camera. I'm not sure if you are sorting 2D objects or 3D objects, but in both cases it's easy to come up with examples where this doesn't work! For example, a large square that doesn't directly face the camera could cover up a smaller one, even if the smaller square's center is closer. Another problem is when two objects intersect. Similarly for 3D objects, if they have different sizes or intersect then your algorithm doesn't work. If your objects all have the same size and they can't intersect, you should be fine though.
Still, and here I suspect one problem, it could be that a surface of an object and a surface of the cube grid have exactly the same position. One approach is that you shrink the objects slightly or enlarge the outside grid, so that the order is always clear. This would also work around an issue that you suffer from floating point rounding errors. Due to these, two objects that don't have an order mathematically could end up in different positions depending on the circumstances. This can manifest as them flickering between visible to covered depending on the camera angle.
One last thing: I'm assuming you want to solve this yourself for educational reasons, right? Otherwise, it would be a plain waste of time with existing rendering toolkits in place that would even offload all the computations to the graphics hardware.
I'm trying to get the hang of moving objects (in general) and line strips (in particular) most efficiently in opengl and therefore I'm writing an application where multiple line segments are traveling with a constant speed from right to left. At every time point the left most point will be removed, the entire line will be shifted to the left, and a new point will be added at the very right of the line (this new data point is streamed / received / calculated on the fly, every 10ms or so). To illustrate what I mean, see this image:
Because I want to work with many objects, I decided to use vertex buffer objects in order to minimize the amount of gl* calls. My current code looks something like this:
A) setup initial vertices:
# calculate my_func(x) in range [0, n]
# (could also be random data)
data = my_func(0, n)
# create & bind buffer
vbo_id = GLuint()
glGenBuffers(1, vbo_id);
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
# allocate memory & transfer data to GPU
glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_DYNAMIC_DRAW)
B) update vertices:
draw():
# get new data and update offset
data = my_func(n+dx, n+2*dx)
# update offset 'n' which is the current absolute value of x.
n = n + 2*dx
# upload data
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
glBufferSubData(GL_ARRAY_BUFFER, n, sizeof(data), data)
# translate scene so it looks like line strip has moved to the left.
glTranslatef(-local_shift, 0.0, 0.0)
# draw all points from offset
glVertexPointer(2, GL_FLOAT, 0, n)
glDrawArrays(GL_LINE_STRIP, 0, points_per_vbo)
where my_func would do something like this:
my_func(start_x, end_x):
# generate the correct x locations.
x_values = range(start_x, end_x, STEP_SIZE)
# generate the y values. We could be getting these values from a sensor.
y_values = []
for j in x_values:
y_values.append(random())
data = []
for i, j in zip(x_values, y_values):
data.extend([i, j])
return data
This works just fine, however if I have let's say 20 of those line strips that span the entire screen, then things slow down considerably.
Therefore my questions:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like? (I suspect that a shader is the wrong way to go, since my line strip is NOT a period function but rather contains random data).
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction. If the window is rescaled in x-direction, then in my current implementation I would have to recalculate the position of the entire line strip (calling my_func to get the new x coordinates) and upload it to the GPU. I guess this could be done more elegantly? How would I do that?
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points. This is most probably because the fine resolution that I use to calculate the line strip does not match the pixel resolution of the screen and therefore sometimes some points appear in front and sometimes behind other points (this is particularly annoying when you don't render a sine wave but some 'random' data). How can I prevent this from happening (besides the obvious solution of translating by a integer multiple of 1 pixel)? If a window get re-sized from let's say originally 800x800 pixels to 100x100 pixels and I still want to visualize a line strip of 20 seconds, then shifting in x direction must work flicker free somehow with sub pixel precision, right?
5) as you can see I always call glTranslatef(-local_shift, 0.0, 0.0) - without ever doing the opposite. Therefore I keep shifting the entire view to the right. And that's why I need to keep track of the absolute x position (in order to place new data at the correct location). This problem will eventually lead to an artifact, where the line is overlapping with the edges of the window. I guess there must be a better way for doing this, right? Like keeping the x values fixed and just moving & updating the y values?
EDIT I've removed the sine wave example and replaced it with a better example. My question is generally about how to move line strips in space most efficiently (while adding new values to them). Therefore any suggestions like "precompute the values for t -> infinity" don't help here (I could also just be drawing the current temperature measured in front of my house).
EDIT2
Consider this toy example where after each time step, the first point is removed and a new one is added to the end:
t = 0
*
* * *
* **** *
1234567890
t = 1
*
* * * *
**** *
2345678901
t = 2
* *
* * *
**** *
3456789012
I don't think I can use a shader here, can I?
EDIT 3: example with two line strips.
EDIT 4: based on Tim's answer I'm using now the following code, which works nicely, but breaks the line into two (since I have two calls of glDrawArrays), see also the following two screenshots.
# calculate the difference
diff_first = x[1] - x[0]
''' first part of the line '''
# push the matrix
glPushMatrix()
move_to = -(diff_first * c)
print 'going to %d ' % (move_to)
glTranslatef(move_to, 0, 0)
# format of glVertexPointer: nbr points per vertex, data type, stride, byte offset
# calculate the offset into the Vertex
offset_bytes = c * BYTES_PER_POINT
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# format of glDrawArrays: mode, Specifies the starting index in the enabled arrays, nbr of points
nbr_points_to_render = (nbr_points - c)
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
''' second part of the line '''
# push the matrix
glPushMatrix()
move_to = (nbr_points - c) * diff_first
print 'moving to %d ' %(move_to)
glTranslatef(move_to, 0, 0)
# select the vertex
offset_bytes = 0
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# draw the line
nbr_points_to_render = c
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
# update counter
c += 1
if c == nbr_points:
c = 0
EDIT5 the resulting solution must obviously render one line across the screen - and no two lines that are missing a connection. The circular buffer solution by Tim provides a solution on how to move the plot, but I end up with two lines, instead of one.
Here's my thoughts to the revised question:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the
data directly (instead of using glBufferSubData)? Or will this make no
difference performance wise?
I'm not aware that there is any significant performance between the two, though I would probably prefer glBufferSubData.
What I might suggest in your case is to create a VBO with N floats, and then use it similar to a circular buffer. Keep an index locally to where the 'end' of the buffer is, then every update replace the value under 'end' with the new value, and increment the pointer. This way you only have to update a single float each cycle.
Having done that, you can draw this buffer using 2x translates and 2x glDrawArrays/Elements:
Imagine that you've got an array of 10 elements, and the buffer end pointer is at element 4. Your array will contain the following 10 values, where x is a constant value, and f(n-d) is the random sample from d cycles ago:
0: (0, f(n-4) )
1: (1, f(n-3) )
2: (2, f(n-2) )
3: (3, f(n-1) )
4: (4, f(n) ) <-- end of buffer
5: (5, f(n-9) ) <-- start of buffer
6: (6, f(n-8) )
7: (7, f(n-7) )
8: (8, f(n-6) )
9: (9, f(n-5) )
To draw this (pseudo-guess code, might not be exactly correct):
glTranslatef( -end, 0, 0);
glDrawArrays( LINE_STRIP, end+1, (10-end)); //draw elems 5-9 shifted left by 4
glPopMatrix();
glTranslatef( end+1, 0, 0);
glDrawArrays(LINE_STRIP, 0, end); // draw elems 0-4 shifted right by 5
Then in the next cycle, replace the oldest value with the new random value,and shift the circular buffer pointer forward.
2) should I use a shader for moving objects (here line strip) instead
of calling glTranslatef? If so, how would such a shader look like? (I
suspect that a shader is the wrong way to go, since my line strip is
NOT a period function but rather contains random data).
Probably optional, if you use the method that I've described in #1. There's not a particular advantage to using one here.
3) what happens if the window get's resized? how do I keep aspect
ratio and scale vertices accordingly? glViewport() only helps scaling
in y direction, not in x direction. If the window is rescaled in
x-direction, then in my current implementation I would have to
recalculate the position of the entire line strip (calling my_func to
get the new x coordinates) and upload it to the GPU. I guess this
could be done more elegantly? How would I do that?
You shouldn't have to recalculate any data. Just define all your data in some fixed coordinate system that makes sense to you, and then use projection matrix to map this range to the window. Without more specifics its hard to answer.
4) I noticed that when I use glTranslatef with a non integral value,
the screen starts to flicker if the line strip consists of thousands
of points. This is most probably because the fine resolution that I
use to calculate the line strip does not match the pixel resolution of
the screen and therefore sometimes some points appear in front and
sometimes behind other points (this is particularly annoying when you
don't render a sine wave but some 'random' data). How can I prevent
this from happening (besides the obvious solution of translating by a
integer multiple of 1 pixel)? If a window get re-sized from let's say
originally 800x800 pixels to 100x100 pixels and I still want to
visualize a line strip of 20 seconds, then shifting in x direction
must work flicker free somehow with sub pixel precision, right?
Your assumption seems correct. I think the thing to do here would either to enable some kind of antialiasing (you can read other posts for how to do that), or make the lines wider.
There are a number of things that could be at work here.
glBindBuffer is one of the slowest OpenGL operations (along with similar call for shaders, textures, etc.)
glTranslate adjusts the modelview matrix, which the vertex unit multiplies all points by. So, it simply changes what matrix you multiply by. If you were to instead use a vertex shader, then you'd have to translate it for each vertex individually. In short: glTranslate is faster. In practice, this shouldn't matter too much, though.
If you're recalculating the sine function on a lot of points every time you draw, you're going to have performance issues (especially since, by looking at your source, it looks like you might be using Python).
You're updating your VBO every time you draw it, so it's not any faster than a vertex array. Vertex arrays are faster than intermediate mode (glVertex, etc.) but nowhere near as fast as display lists or static VBOs.
There could be coding errors or redundant calls somewhere.
My verdict:
You're calculating a sine wave and an offset on the CPU. I strongly suspect that most of your overhead comes from calculating and uploading different data every time you draw it. This is coupled with unnecessary OpenGL calls and possibly unnecessary local calls.
My recommendation:
This is an opportunity for the GPU to shine. Calculating function values on parallel data is (literally) what the GPU does best.
I suggest you make a display list representing your function, but set all the y-coordinates to 0 (so it's a series of points all along the line y=0). Then, draw this exact same display list once for every sine wave you want to draw. Ordinarily, this would just produce a flat graph, but, you write a vertex shader that transforms the points vertically into your sine wave. The shader takes a uniform for the sine wave's offset ("sin(x-offset)"), and just changes each vertex's y.
I estimate this will make your code at least ten times faster. Furthermore, because the vertices' x coordinates are all at integral points (the shader does the "translation" in the function's space by computing "sin(x-offset)"), you won't experience jittering when offsetting with floating point values.
You've got a lot here, so I'll cover what I can. Hopefully this will give you some areas to research.
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
I would expect glBufferSubData to have better performance. If the data is stored on the GPU then mapping it will either
Copy the data back into host memory so you can modify it, and the copy it back when you unmap it.
or, give you a pointer to the GPU's memory directly which the CPU will access over PCI-Express. This isn't anywhere near as slow as it used to be to access GPU memory when we were on AGP or PCI, but it's still slower and not as well cached, etc, as host memory.
glSubBufferData will send the update of the buffer to the GPU and it will modify the buffer. No copying the back and fore. All data transferred in one burst. It should be able to do it as an asynchronous update of the buffer as well.
Once you get into "is this faster than that?" type comparisons you need to start measuring how long things take. A simple frame timer is normally sufficient (but report time per frame, not frames per second - it makes numbers easier to compare). If you go finer-grained than that, just be aware that because of the asynchronous nature of OpenGL, you often see time being consumed away from the call that caused the work. This is because after you give the GPU a load of work, it's only when you have to wait for it to finish something that you notice how long it's taking. That normally only happens when you're waiting for front/back buffers to swap.
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like?
No difference. glTranslate modifies a matrix (normally the Model-View) which is then applied to all vertices. If you have a shader you'd apply a translation matrix to all your vertices. In fact the driver is probably building a small shader for you already.
Be aware that the older APIs like glTranslate() are depreciated from OpenGL 3.0 onwards, and in modern OpenGL everything is done with shaders.
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction.
glViewport() sets the size and shape of the screen area that is rendered to. Quite often it's called on window resizing to set the viewport to the size and shape of the window. Doing just this will cause any image rendered by OpenGL to change aspect ratio with the window. To keep things looking the same you also have to control the projection matrix to counteract the effect of changing the viewport.
Something along the lines of:
glViewport(0,0, width, height);
glMatrixMode(GL_PROJECTION_MATRIX);
glLoadIdentity();
glScale2f(1.0f, width / height); // Keeps X scale the same, but scales Y to compensate for aspect ratio
That's written from memory, and I might not have the maths right, but hopefully you get the idea.
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points.
I think you're seeing a form of aliasing which is due to the lines moving under the sampling grid of the pixels. There are various anti-aliasing techniques you can use to reduce the problem. OpenGL has anti-aliased lines (glEnable(GL_SMOOTH_LINE)), but a lot of consumer cards didn't support it, or only did it in software. You can try it, but you may get no effect or run very slowly.
Alternatively you can look into Multi-sample anti-aliasing (MSAA), or other types that your card may support through extensions.
Another option is rendering to a high resolution texture (via Frame Buffer Objects - FBOs) and then filtering it down when you render it to the screen as a textured quad. This would also allow you to do a trick where you move the rendered texture slightly to the left each time, and rendered the new strip on the right each frame.
1 1
1 1 1 Frame 1
11
1
1 1 1 Frame 1 is copied left, and a new line segment is added to make frame 2
11 2
1
1 1 3 Frame 2 is copied left, and a new line segment is added to make frame 3
11 2
It's not a simple change, but it might help you out with your problem (5).
Basically, I have an image like this
or one with multiple rectangles within the same image. The rectangles are completely black and white have "dirty" edges and gouges, but it's pretty easy to tell they're rectangles. To be more precise, they are image masks. The white regions are parts of the image which are to be "left alone", but the black parts are to be made bitonal.
My question is, how do I make a nice and crisp rectangle out of this degraded one? I am a Python person, but I have to use Qt and C++ for this task. It would be preferable if no other libraries are used.
Thanks!
If the bounding box that contains all non-black pixels can do what you want, this should do the trick:
int boundLeft = INT_MAX;
int boundRight = -1;
int boundTop = INT_MAX;
int boundBottom = -1;
for(int y=0;y<imageHeight;++y) {
bool hasNonMask = false;
for(int x=0;x<imageWidth;++x) {
if(isNotMask(x, y)) {
hasNonMask = true;
if(x < boundLeft) boundLeft = x;
if(x > boundRight) boundRight = x;
}
}
if(hasNonMask) {
if(y < boundTop) boundTop = y;
if(y > boundBottom) boundBottom = y
}
}
If the result has negative size, then there's no non-mask pixel in the image. The code can be more optimized but I haven't had enough coffee yet. :)
Usually you'd do that by repeatedly dilating and eroding the mask. I don't think qt has premade functions for that, so you probably have to implement them yourself if you don't want to use libraries - http://ostermiller.org/dilate_and_erode.html has information on how to implement the functions.
For the moment, we'll assume they're all supposed to come out as rectangles with no rotation. In this case, you should be able to use a pretty simple approach. Starting from each pixel at the edge of the bitmap, start sampling pixels working your way inward until you encounter a transition. Record the distance from the edge for each transition (if there is one). Once you've done that from each edge, you basically "take a vote" -- the distance that occurred most often from that edge is what you treat as that edge of the rectangle. If the rectangle really is aligned, that should constitute a large majority of the distances.
If, instead you see a number of distances with nearly equal frequencies, chances are that the rectangle is rotated (or at least one edge is). In this case, you can divide the side in half (for example) and repeat. Once you've reached a large majority of points in each region agreeing on the distance, you can (attempt to) linearly interpolate between them to give a straight line (and limiting the minimum region size will limit the maximum rotation -- if you get to some size without reaching agreement, you're looking at a gouge, not the rectangle edge). Likewise, if you have a region (or more than one) that doesn't fit cleanly with the rest and won't fit with a line, you should probably ignore it as well -- again, you're probably looking at a gouge, not what's intended as an edge.
I wrote this function for filling closed loop, pixvali is declared globally to store the color value of the pixel where the first click will be done (inside the closed loop).
But the problem is that this recursion doesn't terminate when its first *fill(..,..)*get over, and it says stack is overflowed...
void fill(int x,int y)
{
GLfloat pixval[3];
glReadPixels(x,y,1,1,GL_RGB,GL_FLOAT,pixval);
if(pixval[0]==pixvali[0] && pixval[1]==pixvali[1] && pixval[2]== pixvali[2])
{
glBegin(GL_POINTS);
glVertex2i(x,y);
glEnd();
glFlush();
fill(x-1,y);
fill(x+1,y);
fill(x,y-1);
fill(x,y+1);
}
}
The stack overflows because you are using recursion, and the depth of the recursion is linear in the number of pixels in the shape you're filling.
It may also be that you are trying to fill the shape in the same color as it already is. That is, the current gl color is the same as pixvali. In that case, you'll get infinite recursion.
It's kind of hard to tell from the question, but my guess would be that, you begin going in a loop of pixels.
For example, think that you have only 4 pixels that you need to color (0,0), (0,1), (1,0), (1,1).
You begin coloring (0,0). Then your recursion will enter (1,0) since(-1,0) doesn't need coloring. then (0,0) again since, it's the pixel that is (x-1, y) again and so on.
You need to add some way to mark pixels that have been colored already. But that's just a guess because you can't really see what's going on outside that functions.
Not sure of the implementation details, but if the 12 byte local array is allocated on the stack (3 floats a 4 bytes each), then you have 4 bytes each for the x and y parameters, and probably four bytes for the return address. That gives at least 24 every time you recurse. That means you only need a bit more than 40'000 calls to blow through 1MB of stack space, if there's nothing else on it, which won't be true.
To put that in perspective, 43'690 pixels is only about 10% of an 800x600 display.
You need to check what pixels are you editing.
e.g. If you have an image from 0,0 to 10,10 and you edit 11,10 you will get outside of memory.
So you need to check if x,y is between the boundaries of the image.
x>=left&&x<=right&&y>=top&&y<=bottom
implement your own stack, don't use recursion for flood fill unless you are filling shapes with relatively small surface area in terms of pixels.
a typical implementation is:
Stack stack;
stack.push(firstPoint);
while(!stack.isEmpty()){
Point currentPoint= stack.pop();
//do what ever you want to do here, namely paint.
//boundary check ur surrounding points and push them in the stack if they are inbounds
}
At first glance, the algorithm looks good. I'm a bit worried about the "==" because they don't work well with float values. I suggest to use
abs(val1 - val2) < limit
instead (where limit is < 1 and > 0. Try 0.0001, for example).
To track down the bug, I suggest to add a printf() at the beginning of the function. When you see what the function tries to fill, that will help. Maybe it is stuck somewhere and calls itself again and again with the same coordinates?
Also, the stack may simple be too small for the area you try to fill. Try with a small area first, say a small rectangle only 4 by 3 pixels. Don't try to click it with the mouse but start with a known good point inside (just call fill() in your code).
Also printing the values for the color could help.
Why are you abusing OpenGL for this? What you do there is very unstable. For example the pixel read by glReadPixels will only correspond to the vertex position if a carefully chosen combination of projection and modelview matrix is used. Also every iteration of fill will do a full round trip. Just because you're using OpenGL it doesn't get magically fast.
If you want to flood fill some area in the framebuffer, readout the whole framebuffer, do the floodfill on that and push the result back to OpenGL. Also if some part of the framebuffer is occluded (by a window, or similar), those parts won't be
Now to understand why you end up in a infinite recursion. Consider this:
fill(4, 4) will call fill(5, 4) will call fill(5, 5) will call fill(4, 5) will call fill(4, 4) boom
Now you've got that test there:
if( pixval[0] == pixvali[0] &&
pixval[1] == pixvali[1] &&
pixval[2] == pixvali[2] )
Note that this evaluates true if the to be set pixel already has the target color, again winding up in a endless recursion. You should test for inequality.
Last but not least: A picture may consists of millions of pixels easily. Usual stack sizes allow only for at most a few 1000 function nesting levels, so you'll have convert your tail recursion into a iteration.
TL;DR: Don't use OpenGL for this, operate on a local buffer, use proper iteration condition test and use iteration instead of recursion (or use a functional language, then the compiler will take care of that tail recursion).
http://en.wikipedia.org/wiki/Flood_fill