Update: This only seems to be a problem at some computers. The normal, intuitive code seems to work fine one my home computer, but the computer at work has trouble.
Home computer: (no problems)
Windows XP Professional SP3
AMD Athlon 64 X2 3800+ Dual Core 2.0 GHz
NVIDIA GeForce 7800 GT
2 GB RAM
Work computer: (this question applies to this computer)
Windows XP Professional SP3
Intel Pentium 4 2.8 Ghz (dual core, I think)
Intel 82945G Express Chipset Family
1 GB RAM
Original post:
I'm trying to apply a very simple texture to a part of the screen using Psychtoolbox in Matlab with the following code:
win = Screen('OpenWindow', 0, 127); % open window and obtain window pointer
tex = Screen('MakeTexture', win, [255 0;0 255]); % get texture pointer
% draw texture. Args: command, window pointer, texture pointer, source
% (i.e. the entire 2x2 matrix), destination (a 100x100 square), rotation
% (none) and filtering (nearest neighbour)
Screen('DrawTexture', win, tex, [0 0 2 2], [100 100 200 200], 0, 0);
Screen('Flip', win); % flip the buffer so the texture is drawn
KbWait; % wait for keystroke
Screen('Close', win); % close screen
Now I would expect to see this (four equally sized squares):
But instead I get this (right and bottom sides are cut off and top left square is too large):
Obviously the destination rectangle is a lot bigger than the source rectangle, so the texture needs to be magnified. I would expect this to happen symmetrically like in the first picture and this is also what I need. Why is this not happening and what can I do about it?
I have also tried using [128 0 1152 1024] as a destination rectangle (as it's the square in the center of my screen). In this case, all sides are 1024, which makes each involved rectangle a power of 2. This does not help.
Increasing the size of the checkerboard results in a similar situation where the right- and bottommost sides are not showed correctly.
Like I said, I use Psychtoolbox, but I know that it uses OpenGL under the hood. I don't know much about OpenGL either, but maybe someone who does can help without knowing Matlab. I don't know.
Thanks for your time!
While I don't know much (read: any) Matlab, I do know that textures are very picky in openGL. Last I checked, openGL requires texture files to be square and of a power of two (i.e. 128 x 128, 256 x 256, 512 x 512).
If they aren't, openGL is supposed to pad the file with appropriate white pixels where they're needed to meet this condition, although it could be a crapshoot depending on which system you are running it on.
I suggest making sure that your checkerboard texture fits these requirements.
Also, I can't quite make sure from your code posted, but openGL expects you to map the corners of your texture to the corners of the object you are intending to texture.
Another bit of advice, maybe try a linear filter instead of nearest neighbor. It's heavier computationally, but results in a better image. This probably won't matter in the end.
While this help is not Matlab specific, hope it is useful.
Without knowing a lot about the Psychtoolbox, but having dealt with graphics and user interfaces a lot in MATLAB, the first thing I would try would be to fiddle with the fourth input to Screen (the "source" input). Try shifting each corner by half-pixel and whole-pixel values. For example, the first thing I would try would be:
Screen('DrawTexture', win, tex, [0 0 2.5 2.5], [100 100 200 200], 0, 0);
And if that didn't seem to do anything, I would next try:
Screen('DrawTexture', win, tex, [0 0 3 3], [100 100 200 200], 0, 0);
My reasoning for this advice: I've noticed sometimes that images or GUI controls in my figures can appear to be off by a pixel, which I can only speculate is some kind of round-off error when scaling or positioning them.
That's the best advice I can give. Hope it helps!
Related
I've been under the assumption that my gamma correction pipeline should be as follows:
Use sRGB format for all textures loaded in (GL_SRGB8_ALPHA8) as all art programs pre-gamma correct their files. When sampling from a GL_SRGB8_ALPHA8 texture in a shader OpenGL will automatically convert to linear space.
Do all lighting calculations, post processing, etc. in linear space.
Convert back to sRGB space when writing final color that will be displayed on the screen.
Note that in my case the final color write involves me writing from a FBO (which is a linear RGB texture) to the back buffer.
My assumption has been challenged as if I gamma correct in the final stage my colors are brighter than they should be. I set up for a solid color to be drawn by my lights of value { 255, 106, 0 }, but when I render I get { 255, 171, 0 } (as determined by print-screening and color picking). Instead of orange I get yellow. If I don't gamma correct at the final step I get exactly the right value of { 255, 106, 0 }.
According to some resources modern LCD screens mimic CRT gamma. Do they always? If not, how can I tell if I should gamma correct? Am I going wrong somewhere else?
Edit 1
I've now noticed that even though the color I write with the light is correct, places where I use colors from textures are not correct (but rather far darker as I would expect without gamma correction). I don't know where this disparity is coming from.
Edit 2
After trying GL_RGBA8 for my textures instead of GL_SRGB8_ALPHA8, everything looks perfect, even when using the texture values in lighting computations (if I half the intensity of the light, the output color values are halfed).
My code is no longer taking gamma correction into account anywhere, and my output looks correct.
This confuses me even more, is gamma correction no longer needed/used?
Edit 3 - In response to datenwolf's answer
After some more experimenting I'm confused on a couple points here.
1 - Most image formats are stored non-linearly (in sRGB space)
I've loaded a few images (in my case both .png and .bmp images) and examined the raw binary data. It appears to me as though the images are actually in the RGB color space, as if I compare the values of pixels with an image editing program with the byte array I get in my program they match up perfectly. Since my image editor is giving me RGB values, this would indicate the image stored in RGB.
I'm using stb_image.h/.c to load my images and followed it all the way through loading a .png and did not see anywhere that it gamma corrected the image while loading. I also examined the .bmps in a hex editor and the values on disk matched up for them.
If these images are actually stored on disk in linear RGB space, how am I supposed to (programatically) know when to specify an image is in sRGB space? Is there some way to query for this that a more featured image loader might provide? Or is it up to the image creators to save their image as gamma corrected (or not) - meaning establishing a convention and following it for a given project. I've asked a couple artists and neither of them knew what gamma correction is.
If I specify my images are sRGB, they are too dark unless I gamma correct in the end (which would be understandable if the monitor output using sRGB, but see point #2).
2 - "On most computers the effective scanout LUT is linear! What does this mean though?"
I'm not sure I can find where this thought is finished in your response.
From what I can tell, having experimented, all monitors I've tested on output linear values. If I draw a full screen quad and color it with a hard-coded value in a shader with no gamma correction the monitor displays the correct value that I specified.
What the sentence I quoted above from your answer and my results would lead me to believe is that modern monitors output linear values (i.e. do not emulate CRT gamma).
The target platform for our application is the PC. For this platform (excluding people with CRTs or really old monitors), would it be reasonable to do whatever your response to #1 is, then for #2 to not gamma correct (i.e. not perform the final RGB->sRGB transformation - either manually or using GL_FRAMEBUFFER_SRGB)?
If this is so, what are the platforms on which GL_FRAMEBUFFER_SRGB is meant for (or where it would be valid to use it today), or are monitors that use linear RGB really that new (given that GL_FRAMEBUFFER_SRGB was introduced 2008)?
--
I've talked to a few other graphics devs at my school and from the sounds of it, none of them have taken gamma correction into account and they have not noticed anything incorrect (some were not even aware of it). One dev in particular said that he got incorrect results when taking gamma into account so he then decided to not worry about gamma. I'm unsure what to do in my project for my target platform given the conflicting information I'm getting online/seeing with my project.
Edit 4 - In response to datenwolf's updated answer
Yes, indeed. If somewhere in the signal chain a nonlinear transform is applied, but all the pixel values go unmodified from the image to the display, then that nonlinearity has already been pre-applied on the image's pixel values. Which means, that the image is already in a nonlinear color space.
Your response would make sense to me if I was examining the image on my display. To be sure I was clear, when I said I was examining the byte array for the image I mean I was examining the numerical value in memory for the texture, not the image output on the screen (which I did do for point #2). To me the only way I could see what you're saying to be true then is if the image editor was giving me values in sRGB space.
Also note that I did try examining the output on monitor, as well as modifying the texture color (for example, dividing by half or doubling it) and the output appeared correct (measured using the method I describe below).
How did you measure the signal response?
Unfortunately my methods of measurement are far cruder than yours. When I said I experimented on my monitors what I meant was that I output solid color full screen quad whose color was hard coded in a shader to a plain OpenGL framebuffer (which does not do any color space conversion when written to). When I output white, 75% gray, 50% gray, 25% gray and black the correct colors are displayed. Now here my interpretation of correct colors could most certainly be wrong. I take a screenshot and then use an image editing program to see what the values of the pixels are (as well as a visual appraisal to make sure the values make sense). If I understand correctly, if my monitors were non-linear I would need to perform a RGB->sRGB transformation before presenting them to the display device for them to be correct.
I'm not going to lie, I feel I'm getting a bit out of my depth here. I'm thinking the solution I might persue for my second point of confusion (the final RGB->sRGB transformation) will be a tweakable brightness setting and default it to what looks correct on my devices (no gamma correction).
First of all you must understand that the nonlinear mapping applied to the color channels is often more than just a simple power function. sRGB nonlinearity can be approximated by about x^2.4, but that's not really the real deal. Anyway your primary assumptions are more or less correct.
If your textures are stored in the more common image file formats, they will contain the values as they are presented to the graphics scanout. Now there are two common hardware scenarios:
The scanout interface outputs a linear signal and the display device will then internally apply a nonlinear mapping. Old CRT monitors were nonlinear due to their physics: The amplifiers could put only so much current into the electron beam, the phosphor saturating and so on – that's why the whole gamma thing was introduced in the first place, to model the nonlinearities of CRT displays.
Modern LCD and OLED displays either use resistor ladders in their driver amplifiers, or they have gamma ramp lookup tables in their image processors.
Some devices however are linear, and ask the image producing device to supply a proper matching LUT for the desired output color profile on the scanout.
On most computers the effective scanout LUT is linear! What does this mean though? A little detour:
For illustration I quickly hooked up my laptop's analogue display output (VGA connector) to my analogue oscilloscope: Blue channel onto scope channel 1, green channel to scope channel 2, external triggering on line synchronization signal (HSync). A quick and dirty OpenGL program, deliberately written with immediate mode was used to generate a linear color ramp:
#include <GL/glut.h>
void display()
{
GLuint win_width = glutGet(GLUT_WINDOW_WIDTH);
GLuint win_height = glutGet(GLUT_WINDOW_HEIGHT);
glViewport(0,0, win_width, win_height);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, 1, 0, 1, -1, 1);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glBegin(GL_QUAD_STRIP);
glColor3f(0., 0., 0.);
glVertex2f(0., 0.);
glVertex2f(0., 1.);
glColor3f(1., 1., 1.);
glVertex2f(1., 0.);
glVertex2f(1., 1.);
glEnd();
glutSwapBuffers();
}
int main(int argc, char *argv[])
{
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutCreateWindow("linear");
glutFullScreen();
glutDisplayFunc(display);
glutMainLoop();
return 0;
}
The graphics output was configured with the Modeline
"1440x900_60.00" 106.50 1440 1528 1672 1904 900 903 909 934 -HSync +VSync
(because that's the same mode the flat panel runs in, and I was using cloning mode)
gamma=2 LUT on the green channel.
linear (gamma=1) LUT on the blue channel
This is how the signals of a single scanout line look like (upper curve: Ch2 = green, lower curve: Ch1 = blue):
You can clearly see the x⟼x² and x⟼x mappings (parabola and linear shapes of the curves).
Now after this little detour we know, that the pixel values that go to the main framebuffer, go there as they are: The OpenGL linear ramp underwent no further changes and only when a nonlinear scanout LUT was applied it altered the signal sent to the display.
Either way the values you present to the scanout (which means the on-screen framebuffers) will undergo a nonlinear mapping at some point in the signal chain. And for all standard consumer devices this mapping will be according to the sRGB standard, because it's the smallest common factor (i.e. images represented in the sRGB color space can be reproduced on most output devices).
Since most programs, like webbrowsers assume the output to undergo a sRGB to display color space mapping, they simply copy the pixel values of the standard image file formats to the on-screen frame as they are, without performing a color space conversion, thereby implying that the color values within those images are in sRGB color space (or they will often merely convert to sRGB, if the image color profile is not sRGB); the correct thing to do (if, and only if the color values written to the framebuffer are scanned out to the display unaltered; assuming that scanout LUT is part of the display), would be conversion to the specified color profile the display expects.
But this implies, that the on-screen framebuffer itself is in sRGB color space (I don't want to split hairs about how idiotic that is, lets just accept this fact).
How to bring this together with OpenGL? First of all, OpenGL does all it's color operations linearly. However since the scanout is expected to be in some nonlinear color space, this means, that the end result of the rendering operations of OpenGL somehow must be brougt into the on-screen framebuffer color space.
This is where the ARB_framebuffer_sRGB extension (which went core with OpenGL-3) enters the picture, which introduced new flags used for the configuration of window pixelformats:
New Tokens
Accepted by the <attribList> parameter of glXChooseVisual, and by
the <attrib> parameter of glXGetConfig:
GLX_FRAMEBUFFER_SRGB_CAPABLE_ARB 0x20B2
Accepted by the <piAttributes> parameter of
wglGetPixelFormatAttribivEXT, wglGetPixelFormatAttribfvEXT, and
the <piAttribIList> and <pfAttribIList> of wglChoosePixelFormatEXT:
WGL_FRAMEBUFFER_SRGB_CAPABLE_ARB 0x20A9
Accepted by the <cap> parameter of Enable, Disable, and IsEnabled,
and by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
and GetDoublev:
FRAMEBUFFER_SRGB 0x8DB9
So if you have a window configured with such a sRGB pixelformat and enable sRGB rasterization mode in OpenGL with glEnable(GL_FRAMEBUFFER_SRGB); the result of the linear colorspace rendering operations will be transformed in sRGB color space.
Another way would be to render everything into an off-screen FBO and to the color conversion in a postprocessing shader.
But that's only the output side of rendering signal chain. You also got input signals, in the form of textures. And those are usually images, with their pixel values stored nonlinearly. So before those can be used in linear image operations, such images must be brought into a linear color space first. Lets just ignore for the time being, that mapping nonlinear color spaces into linear color spaces opens several of cans of worms upon itself – which is why the sRGB color space is so ridiculously small, namely to avoid those problems.
So to address this an extension EXT_texture_sRGB was introduced, which turned out to be so vital, that it never went through being ARB, but went straight into the OpenGL specification itself: Behold the GL_SRGB… internal texture formats.
A texture loaded with this format undergoes a sRGB to linear RGB colorspace transformation, before being used to source samples. This gives linear pixel values, suitable for linear rendering operations, and the result can then be validly transformed to sRGB when going to the main on-screen framebuffer.
A personal note on the whole issue: Presenting images on the on-screen framebuffer in the target device color space IMHO is a huge design flaw. There's no way to do everything right in such a setup without going insane.
What one really wants is to have the on-screen framebuffer in a linear, contact color space; the natural choice would be CIEXYZ. Rendering operations would naturally take place in the same contact color space. Doing all graphics operations in contact color spaces, avoids the opening of the aforementioned cans-of-worms involved with trying to push a square peg named linear RGB through a nonlinear, round hole named sRGB.
And although I don't like the design of Weston/Wayland very much, at least it offers the opportunity to actually implement such a display system, by having the clients render and the compositor operate in contact color space and apply the output device's color profiles in a last postprocessing step.
The only drawback of contact color spaces is, that there it's imperative to use deep color (i.e. > 12 bits per color channel). In fact 8 bits are completely insufficient, even with nonlinear RGB (the nonlinearity helps a bit to cover up the lack of perceptible resolution).
Update
I've loaded a few images (in my case both .png and .bmp images) and examined the raw binary data. It appears to me as though the images are actually in the RGB color space, as if I compare the values of pixels with an image editing program with the byte array I get in my program they match up perfectly. Since my image editor is giving me RGB values, this would indicate the image stored in RGB.
Yes, indeed. If somewhere in the signal chain a nonlinear transform is applied, but all the pixel values go unmodified from the image to the display, then that nonlinearity has already been pre-applied on the image's pixel values. Which means, that the image is already in a nonlinear color space.
2 - "On most computers the effective scanout LUT is linear! What does this mean though?
I'm not sure I can find where this thought is finished in your response.
This thought is elaborated in the section that immediately follows, where I show how the values you put into a plain (OpenGL) framebuffer go directly to the monitor, unmodified. The idea of sRGB is "put the values into the images exactly as they are sent to the monitor and build consumer displays to follow that sRGB color space".
From what I can tell, having experimented, all monitors I've tested on output linear values.
How did you measure the signal response? Did you use a calibrated power meter or similar device to measure the light intensity emitted from the monitor in response to the signal? You can't trust your eyes with that, because like all our senses our eyes have a logarithmic signal response.
Update 2
To me the only way I could see what you're saying to be true then is if the image editor was giving me values in sRGB space.
That's indeed the case. Because color management was added to all the widespread graphics systems as an afterthought, most image editors edit pixel values in their destination color space. Note that one particular design parameter of sRGB was, that it should merely retroactively specify the unmanaged, direct value transfer color operations as they were (and mostly still are done) done on consumer devices. Since there happens no color management at all, the values contained in the images and manipulated in editors must be in sRGB already. This works for so long, as long images are not synthetically created in a linear rendering process; in case of the later the render system has to take into account the destination color space.
I take a screenshot and then use an image editing program to see what the values of the pixels are
Which gives you of course only the raw values in the scanout buffer without the gamma LUT and the display nonlinearity applied.
I wanted to give a simple explanation of what went wrong in the initial attempt, because although the accepted answer goes in-depth on colorspace theory, it doesn't really answer that.
The setup of the pipeline was exactly right: use GL_SRGB8_ALPHA8 for textures, GL_FRAMEBUFFER_SRGB (or custom shader code) to convert back to sRGB at the end, and all your intermediate calculations will be using linear light.
The last bit is where you ran into trouble. You wanted a light with a color of (255, 106, 0) - but that's an sRGB color, and you're working with linear light. To get the color you want, you need to convert that color to the linear space, the same way that GL_SRGB8_ALPHA8 is doing for your textures. For your case, this would be a vec3 light with intensity (1, .1441, 0) - this is the value after applying gamma-compression.
I have lots of text to draw. If I call D3DXFont::DrawText with first parameter being NULL I get terrible performance.
I heard that using D3DXFont with conjunction with D3DXSprites makes things much more faster.
How my application needs to draw strings?
It daraws every string with pseudo shadow. It means I draw each string 4 times in black:
x + 1, y + 1
x - 1, y + 1
x - 1, y - 1
x + 1, y - 1
and 1 time in actual color. It makes very nice looking always readable strings. I even switched to pixel fonts for faster rendering.
Now call that string with shadow (ShadowString).
Every frame I draw 256 (worst case scenario) of those ShadowStrings on screen.
I would like to know how to use sprites (or any other technique) to speed up drawing of those string as much as possible). Now I'm getting 30 FPS in app. But I target for 120 min. And problem is ONLY that text drawing thing.
Surely, you must profile your application before any optimizations, but truth to be told, D3DXFont/D3DXSprites and "fast" is mutually exclusive concepts. If they do not fit, just don't use them.
Use 3rd party libraries or make your own sprite/font renderer.
Recently I've answered about how to do it here: How to draw line and font without D3DX9 in DirectX 9?
Also, Google for "sprite font", "sprite batching", "texture atlases", "TTF rendering". It is not very difficult if you are familiar with API (notably vertex buffers and texturing), and there are plenty of examples on web. Don't hesitate to look for D3D11 or OpenGL examples, principles are the same.
I'm trying to get the hang of moving objects (in general) and line strips (in particular) most efficiently in opengl and therefore I'm writing an application where multiple line segments are traveling with a constant speed from right to left. At every time point the left most point will be removed, the entire line will be shifted to the left, and a new point will be added at the very right of the line (this new data point is streamed / received / calculated on the fly, every 10ms or so). To illustrate what I mean, see this image:
Because I want to work with many objects, I decided to use vertex buffer objects in order to minimize the amount of gl* calls. My current code looks something like this:
A) setup initial vertices:
# calculate my_func(x) in range [0, n]
# (could also be random data)
data = my_func(0, n)
# create & bind buffer
vbo_id = GLuint()
glGenBuffers(1, vbo_id);
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
# allocate memory & transfer data to GPU
glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_DYNAMIC_DRAW)
B) update vertices:
draw():
# get new data and update offset
data = my_func(n+dx, n+2*dx)
# update offset 'n' which is the current absolute value of x.
n = n + 2*dx
# upload data
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
glBufferSubData(GL_ARRAY_BUFFER, n, sizeof(data), data)
# translate scene so it looks like line strip has moved to the left.
glTranslatef(-local_shift, 0.0, 0.0)
# draw all points from offset
glVertexPointer(2, GL_FLOAT, 0, n)
glDrawArrays(GL_LINE_STRIP, 0, points_per_vbo)
where my_func would do something like this:
my_func(start_x, end_x):
# generate the correct x locations.
x_values = range(start_x, end_x, STEP_SIZE)
# generate the y values. We could be getting these values from a sensor.
y_values = []
for j in x_values:
y_values.append(random())
data = []
for i, j in zip(x_values, y_values):
data.extend([i, j])
return data
This works just fine, however if I have let's say 20 of those line strips that span the entire screen, then things slow down considerably.
Therefore my questions:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like? (I suspect that a shader is the wrong way to go, since my line strip is NOT a period function but rather contains random data).
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction. If the window is rescaled in x-direction, then in my current implementation I would have to recalculate the position of the entire line strip (calling my_func to get the new x coordinates) and upload it to the GPU. I guess this could be done more elegantly? How would I do that?
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points. This is most probably because the fine resolution that I use to calculate the line strip does not match the pixel resolution of the screen and therefore sometimes some points appear in front and sometimes behind other points (this is particularly annoying when you don't render a sine wave but some 'random' data). How can I prevent this from happening (besides the obvious solution of translating by a integer multiple of 1 pixel)? If a window get re-sized from let's say originally 800x800 pixels to 100x100 pixels and I still want to visualize a line strip of 20 seconds, then shifting in x direction must work flicker free somehow with sub pixel precision, right?
5) as you can see I always call glTranslatef(-local_shift, 0.0, 0.0) - without ever doing the opposite. Therefore I keep shifting the entire view to the right. And that's why I need to keep track of the absolute x position (in order to place new data at the correct location). This problem will eventually lead to an artifact, where the line is overlapping with the edges of the window. I guess there must be a better way for doing this, right? Like keeping the x values fixed and just moving & updating the y values?
EDIT I've removed the sine wave example and replaced it with a better example. My question is generally about how to move line strips in space most efficiently (while adding new values to them). Therefore any suggestions like "precompute the values for t -> infinity" don't help here (I could also just be drawing the current temperature measured in front of my house).
EDIT2
Consider this toy example where after each time step, the first point is removed and a new one is added to the end:
t = 0
*
* * *
* **** *
1234567890
t = 1
*
* * * *
**** *
2345678901
t = 2
* *
* * *
**** *
3456789012
I don't think I can use a shader here, can I?
EDIT 3: example with two line strips.
EDIT 4: based on Tim's answer I'm using now the following code, which works nicely, but breaks the line into two (since I have two calls of glDrawArrays), see also the following two screenshots.
# calculate the difference
diff_first = x[1] - x[0]
''' first part of the line '''
# push the matrix
glPushMatrix()
move_to = -(diff_first * c)
print 'going to %d ' % (move_to)
glTranslatef(move_to, 0, 0)
# format of glVertexPointer: nbr points per vertex, data type, stride, byte offset
# calculate the offset into the Vertex
offset_bytes = c * BYTES_PER_POINT
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# format of glDrawArrays: mode, Specifies the starting index in the enabled arrays, nbr of points
nbr_points_to_render = (nbr_points - c)
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
''' second part of the line '''
# push the matrix
glPushMatrix()
move_to = (nbr_points - c) * diff_first
print 'moving to %d ' %(move_to)
glTranslatef(move_to, 0, 0)
# select the vertex
offset_bytes = 0
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# draw the line
nbr_points_to_render = c
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
# update counter
c += 1
if c == nbr_points:
c = 0
EDIT5 the resulting solution must obviously render one line across the screen - and no two lines that are missing a connection. The circular buffer solution by Tim provides a solution on how to move the plot, but I end up with two lines, instead of one.
Here's my thoughts to the revised question:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the
data directly (instead of using glBufferSubData)? Or will this make no
difference performance wise?
I'm not aware that there is any significant performance between the two, though I would probably prefer glBufferSubData.
What I might suggest in your case is to create a VBO with N floats, and then use it similar to a circular buffer. Keep an index locally to where the 'end' of the buffer is, then every update replace the value under 'end' with the new value, and increment the pointer. This way you only have to update a single float each cycle.
Having done that, you can draw this buffer using 2x translates and 2x glDrawArrays/Elements:
Imagine that you've got an array of 10 elements, and the buffer end pointer is at element 4. Your array will contain the following 10 values, where x is a constant value, and f(n-d) is the random sample from d cycles ago:
0: (0, f(n-4) )
1: (1, f(n-3) )
2: (2, f(n-2) )
3: (3, f(n-1) )
4: (4, f(n) ) <-- end of buffer
5: (5, f(n-9) ) <-- start of buffer
6: (6, f(n-8) )
7: (7, f(n-7) )
8: (8, f(n-6) )
9: (9, f(n-5) )
To draw this (pseudo-guess code, might not be exactly correct):
glTranslatef( -end, 0, 0);
glDrawArrays( LINE_STRIP, end+1, (10-end)); //draw elems 5-9 shifted left by 4
glPopMatrix();
glTranslatef( end+1, 0, 0);
glDrawArrays(LINE_STRIP, 0, end); // draw elems 0-4 shifted right by 5
Then in the next cycle, replace the oldest value with the new random value,and shift the circular buffer pointer forward.
2) should I use a shader for moving objects (here line strip) instead
of calling glTranslatef? If so, how would such a shader look like? (I
suspect that a shader is the wrong way to go, since my line strip is
NOT a period function but rather contains random data).
Probably optional, if you use the method that I've described in #1. There's not a particular advantage to using one here.
3) what happens if the window get's resized? how do I keep aspect
ratio and scale vertices accordingly? glViewport() only helps scaling
in y direction, not in x direction. If the window is rescaled in
x-direction, then in my current implementation I would have to
recalculate the position of the entire line strip (calling my_func to
get the new x coordinates) and upload it to the GPU. I guess this
could be done more elegantly? How would I do that?
You shouldn't have to recalculate any data. Just define all your data in some fixed coordinate system that makes sense to you, and then use projection matrix to map this range to the window. Without more specifics its hard to answer.
4) I noticed that when I use glTranslatef with a non integral value,
the screen starts to flicker if the line strip consists of thousands
of points. This is most probably because the fine resolution that I
use to calculate the line strip does not match the pixel resolution of
the screen and therefore sometimes some points appear in front and
sometimes behind other points (this is particularly annoying when you
don't render a sine wave but some 'random' data). How can I prevent
this from happening (besides the obvious solution of translating by a
integer multiple of 1 pixel)? If a window get re-sized from let's say
originally 800x800 pixels to 100x100 pixels and I still want to
visualize a line strip of 20 seconds, then shifting in x direction
must work flicker free somehow with sub pixel precision, right?
Your assumption seems correct. I think the thing to do here would either to enable some kind of antialiasing (you can read other posts for how to do that), or make the lines wider.
There are a number of things that could be at work here.
glBindBuffer is one of the slowest OpenGL operations (along with similar call for shaders, textures, etc.)
glTranslate adjusts the modelview matrix, which the vertex unit multiplies all points by. So, it simply changes what matrix you multiply by. If you were to instead use a vertex shader, then you'd have to translate it for each vertex individually. In short: glTranslate is faster. In practice, this shouldn't matter too much, though.
If you're recalculating the sine function on a lot of points every time you draw, you're going to have performance issues (especially since, by looking at your source, it looks like you might be using Python).
You're updating your VBO every time you draw it, so it's not any faster than a vertex array. Vertex arrays are faster than intermediate mode (glVertex, etc.) but nowhere near as fast as display lists or static VBOs.
There could be coding errors or redundant calls somewhere.
My verdict:
You're calculating a sine wave and an offset on the CPU. I strongly suspect that most of your overhead comes from calculating and uploading different data every time you draw it. This is coupled with unnecessary OpenGL calls and possibly unnecessary local calls.
My recommendation:
This is an opportunity for the GPU to shine. Calculating function values on parallel data is (literally) what the GPU does best.
I suggest you make a display list representing your function, but set all the y-coordinates to 0 (so it's a series of points all along the line y=0). Then, draw this exact same display list once for every sine wave you want to draw. Ordinarily, this would just produce a flat graph, but, you write a vertex shader that transforms the points vertically into your sine wave. The shader takes a uniform for the sine wave's offset ("sin(x-offset)"), and just changes each vertex's y.
I estimate this will make your code at least ten times faster. Furthermore, because the vertices' x coordinates are all at integral points (the shader does the "translation" in the function's space by computing "sin(x-offset)"), you won't experience jittering when offsetting with floating point values.
You've got a lot here, so I'll cover what I can. Hopefully this will give you some areas to research.
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
I would expect glBufferSubData to have better performance. If the data is stored on the GPU then mapping it will either
Copy the data back into host memory so you can modify it, and the copy it back when you unmap it.
or, give you a pointer to the GPU's memory directly which the CPU will access over PCI-Express. This isn't anywhere near as slow as it used to be to access GPU memory when we were on AGP or PCI, but it's still slower and not as well cached, etc, as host memory.
glSubBufferData will send the update of the buffer to the GPU and it will modify the buffer. No copying the back and fore. All data transferred in one burst. It should be able to do it as an asynchronous update of the buffer as well.
Once you get into "is this faster than that?" type comparisons you need to start measuring how long things take. A simple frame timer is normally sufficient (but report time per frame, not frames per second - it makes numbers easier to compare). If you go finer-grained than that, just be aware that because of the asynchronous nature of OpenGL, you often see time being consumed away from the call that caused the work. This is because after you give the GPU a load of work, it's only when you have to wait for it to finish something that you notice how long it's taking. That normally only happens when you're waiting for front/back buffers to swap.
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like?
No difference. glTranslate modifies a matrix (normally the Model-View) which is then applied to all vertices. If you have a shader you'd apply a translation matrix to all your vertices. In fact the driver is probably building a small shader for you already.
Be aware that the older APIs like glTranslate() are depreciated from OpenGL 3.0 onwards, and in modern OpenGL everything is done with shaders.
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction.
glViewport() sets the size and shape of the screen area that is rendered to. Quite often it's called on window resizing to set the viewport to the size and shape of the window. Doing just this will cause any image rendered by OpenGL to change aspect ratio with the window. To keep things looking the same you also have to control the projection matrix to counteract the effect of changing the viewport.
Something along the lines of:
glViewport(0,0, width, height);
glMatrixMode(GL_PROJECTION_MATRIX);
glLoadIdentity();
glScale2f(1.0f, width / height); // Keeps X scale the same, but scales Y to compensate for aspect ratio
That's written from memory, and I might not have the maths right, but hopefully you get the idea.
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points.
I think you're seeing a form of aliasing which is due to the lines moving under the sampling grid of the pixels. There are various anti-aliasing techniques you can use to reduce the problem. OpenGL has anti-aliased lines (glEnable(GL_SMOOTH_LINE)), but a lot of consumer cards didn't support it, or only did it in software. You can try it, but you may get no effect or run very slowly.
Alternatively you can look into Multi-sample anti-aliasing (MSAA), or other types that your card may support through extensions.
Another option is rendering to a high resolution texture (via Frame Buffer Objects - FBOs) and then filtering it down when you render it to the screen as a textured quad. This would also allow you to do a trick where you move the rendered texture slightly to the left each time, and rendered the new strip on the right each frame.
1 1
1 1 1 Frame 1
11
1
1 1 1 Frame 1 is copied left, and a new line segment is added to make frame 2
11 2
1
1 1 3 Frame 2 is copied left, and a new line segment is added to make frame 3
11 2
It's not a simple change, but it might help you out with your problem (5).
Is there any way to clamp out of range texture addresses to a certain value? In my case, I want them to be set to a simple zero, but the address mode I need doesn't seem to exist.
Thanks.
Edit: Any idea what the cudaAddressModeBorder setting does?
I don't think there's a way to specify the clamp but you can do the obvious and add a 1 pixel black (zero) border around the edge and offset your addressing by 1. It shouldn't be much more data and it'll get you the clamping for free.
If you have a maximum size 2D texture (for CUDA 2.x it is 64k x 64k) with 16 bytes per pixel (worst case) then you're looking at only 4 MB of extra data for the 1 pixel border which for a PCIe x16 card will take about 500 microseconds to copy to the card--hardly anything even in the worst case.
You can set the boundary mode to return zero when accessing to textures using Surface functions. I can not test it right now as you need a device of compute capability 2.0+ but you can check the reference in the NVIDIA CUDA C Programming Guide (version 3.2), Section B.9 p.114.
We can also clamp the boundary and trap it (make kernel fail) what is the default when using the surface memory.
Regards!
I'm somewhat new to OpenGL though I'm fairly sure my problem lies in the pixel format being used, or how my texture is being generated...
I'm drawing a texture onto a flat 2D quad using a 16bit RGB5_A1 pixel format, though I don't make use of any alpha at this stage. The problem I'm having is that each pair of horizontal pixel values have been swapped.
That is... if the pixels positions should be in this order (assume 8x2 image)
0 1 2 3
4 5 6 7
they are instead drawn as
1 0 3 2
5 4 7 6
Or, more clearly from this image (below).
Left is what I get... Right is what I should get.
.
The question is... How have I ended up with this? Is there something wrong with the pixel format? Unlikely since the colours all appear correct, and I would expect all kinds of nasty if it were down to endian-ness. Suggestions greatly appreciated.
Update: Turns out the problem was in my source renderer. Interestingly, I've avoided the problem entirely by using 32-bit textures (haven't tried 24-bit at this point).
This may be unrelated, and you have found a workaround, but it could be related to OpenGL unpack alignment. Have you tried with the following call ? To instruct the alignment of every image row to 1 byte (default is 4).
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);