Draw multiple meshes to different locations (DirectX 12) - c++

I have a problem with DirectX 12. I have made a small 3D renderer. Models are translated to 3D space in vertex shader with basic World View Projection matrixes that are in constant buffer.
To change data of the constant buffer i'm currently using memcpy(pMappedConstantBuffer + alignedSize * frame, newConstantBufferData, alignedSize) this command replaces constant buffer's data immediately.
So the problem comes here, drawing is recorded to a command list that will be later sent to the gpu for execution.
Example:
/* Now i want to change the constant buffer to change the next draw call's position to (0, 1, 0) */
memcpy(/*Parameters*/);
/* Now i want to record a draw call to the command list */
DrawInstanced(/*Parameters*/);
/* But now i want to draw other mesh to other position so i have to change the constant buffer. After this memcpy() the draw position will be (0, -1, 0) */
memcpy(/*Parameters*/);
/* Now i want to record new draw call to the list */
DrawInstanced(/*Parameters*/);
After this i sent the command list to gpu for execution, but quess what all the meshes will be in the same position, because all memcpys are executed before even the command list is sent to gpu. So basically the last memcpy overwrites the previous ones.
So basically the question is how do i draw meshes to different positions or how to replace constant buffer's data in the command list so the constant buffer changes between each draw call on gpu?
Thanks
No need for help anymore i solved it by myself. I created constant buffer for each mesh.

About execution order, you are totally right, you memcpy calls will update the buffers immediately, but the commands will not be processed until you push your command list in the queue (and you will not exactly know when this will happen).
In Direct3D11, when you use Map on a buffer, this is handled for you (some space will be allocated to avoid that if required).
So In Direct3D12 you have several choices, I'll consider that you want to draw N objects, and you want to store one matrix per object in your cbuffer.
First is to create one buffer per object and set data independently. If you have only a few, this is easy to maintain (and extra memory footprint due to resource allocations will be ok)
Other option is to create a large buffer (which can contain N matrices), and create N constant buffer views that points to the memory location of each object. (Please note that you also have to respect 256 bytes alignment in that case too, see CreateConstantBufferView).
You can also use a StructuredBuffer and copy all data into it (in that case you do not need the alignment), and use an index in the vertex shader to lookup the correct matrix. (it is possible to set a uint value in your shader and use SetGraphicsRoot32BitConstant to apply it directly).

Related

How to have public class member that can be accessed but not changed by other classes C++

This probably has been asked before, but I can't find something on it other than the general private/public/const solutions. Basically, I need to load fonts into an array when an instance of my class text is created. The text class is in my text.h file, and defined in text.cpp. In both of these files, they also include a Fonts class, and so I want my fonts class to have my selection of fonts preloaded in an array ready to be accessed by my text class AFTER the first instance is created. I want these fonts to be able to be accessed by my text class, but not able to be changed. I can't create a TTF_Font *Get_Font() method in the fonts class as each time a font is created, it loads memory that needs to be manually closed, so I couldn't exactly close it after it runs out of the method's scope, so rather, I would want to do something like, when creating a character for example, call TTF_RenderText_Blended(Fonts::arialFonts[10], "123", black); which would select the font type of arial in size 11 for example.
I'm not sure what type of application you are using your fonts in, but I've done some 3D graphics programming using OpenGL and Vulkan. This may or may not help you, but should give you some kind of context on the structure of the framework that is commonly used in Game Engine development.
Typically we will have a Texture class or struct that has a vector of color triplets or quads that represent the color data for each pixel in the image. Other classes will contain UV coordinates that the texture will be applied to... We also usually have functions to load in textures or graphics files such as PNG, JPG, TGA, etc. You can write your own fairly trivially or you can use many of the opensource loader libraries that are out there if you're doing graphics type programming. The texture class will contain other properties such as mipmap, if it is repeated or mirrored, its quality, etc... So we will typically load a texture and assign it an ID value and will store it into a hash table. If this texture is trying to be loaded again from a file, the code will recognize that it exists and exits that function call, otherwise it will store this new texture into the hash table.
Since most rendered texts are 2D, instead of creating a 3D model and sending it to the render to be processed by the model-view-projection matrix... We create what is commonly called a Sprite. This is another class. It has vertices to make up its polygonal edges. Typically a sprite will have 4 vertices since it is a QUAD. It will also have texture coordinates that will be associated with it. We don't apply the textures directly to the sprite because we want to instance a single sprite having only a single copy in memory. What we will typically do here, is we will send a reference of it to the renderer along with a reference to the texture by id, and transformation matrix to changes its shape, size, and world position. When this is being processed by the GPU through the shaders, since it is a 2D object, we use an orthographic projection. So this saves a matrix multiplication within the vertex shader for every vertex. Basically, it will be processed by the view-projection matrix. Here our Sprites will be stored similarly to our textures, in a hash table with an associated ID.
Now we have a sprite that is basically a Graphics Image that is drawn to the screen, a simple quad that can be resized and placed anywhere easily. We only need a single quad in memory but can draw hundreds even thousands because they are instanced via a reference count.
How does this help with Text or Fonts? You want your Text class to be separate from your Font class. The text class will contain where you want to draw it, which font to use, the size of the font, the color to be applied, and the text itself... The Font class would typically inherit from the basic Sprite Class.
We will normally create a separate tool or mini- console program that will allow you to take any of the known true type fonts or windows fonts by name that will generate 2 files for you. You will pass flags into the program's command-line arguments along with other commands such as -all for all characters or -"abc123" for just the specific characters you want, and -12, for the font size, and by doing so it will generate your needed files. The first will be a single textured image that we call a Font Atlas which is basically a specific form of a Sprite Sheet and the other file will be a CSV text file with generated values for texture positioning of each character's quad.
Back in the main project, the font class will load in these two files the first is simple as it's just a Texture image which we've already done before, and the later is the CSV file for the information that is needed to generate all of the appropriate quads, however, the "Font" class does have quite a bit of complex calculations that need to be performed. Again, when a Font is loaded into memory, we do the same as we did before. We check to see if it is already loaded into memory by either filename or by id and if it isn't we then store it into a hash table with a generated associated ID.
Now when we go to use the Text class to render our text, the code might look something like this:
void Engine::loadAssets() {
// Textures
assetManager_.loadTexture( "assets\textures\shells.png" );
assetManager_.loadTexture( "assets\textures\blue_sky.jpg" );
// Sprites
assetManager_.loadSprite( "assets\sprites\jumping_jim.spr" );
assetManager_.loadSprite( "assets\sprites\exploading_bomb.spr" );
assetManager_.loadFont( "assets\fonts\"arial.png", 12 );
// Same font as above, but the code structure requires a different font id for each different size that is used.
assetManager_.loadFont( "assets\fonts\"arial.png", 16 );
assetManager_.loadFont( "assets\fonts\"helvetica.png" );
}
Now, these are all stored as a single instance in our AssetManager class, this class containers several hashtables for each of the different types of assets. Its role is to manage their memory and lifetime. There is only ever a single instance of each, but we can reference them 1,000 times... Now somewhere else in the code, we may have a file with a bunch of standalone enumerations...
enum class FontType {
ARIAL,
HELVETICA,
};
Then in our render call or loadScene function....
void Engine::loadScene() {
fontManager_.drawText( ARIAL, 18, glm::vec3(-0.5, 1.0, 0.5), glm::vec4(245, 169, 108, 128), "Hello World!");
fontManager_.drawText( ARIAL, 12, glm::vec3(128, 128, 0), glm::vec4(128, 255, 244, 80), "Good Bye!");
}
The drawText function would be the one that takes the ARIAL id and gets the reference into the hashtable for that stored font. The renderer uses the positioning, and color values, the font size, and the string message... The ID and Size are used to retrieve the appropriate Font Atlas or Sprite Sheet. Then each character in the message string will be matched and the appropriate texture coordinates will be used to apply that to a quad with the appropriate size based on the font's size you specified.
All of the file handling, opening, reading from, and closing of was already previously done in the loadAssets function. All of the required information is already stored in a set of hashtables that we reference in to via instancing. There is no need or concern to have to worry about memory management or heap access at this time, this is all done through utilizing the cache. When we draw the text to the screen, we are only manipulating the pixels through the shaders by matrix transformations.
There is another major component to the engine that hasn't been mentioned yet, but we typically use a Batch Process and a Batch Manager class which handles all of the processing for sending the Vertices, UV Coordinates, Color, or Texture Data, etc... to the video card. CPU to GPU transfer across the buss and or the PCI-Express lanes are considered slow, we don't want to be sending 10,000, 100,000, or even 1 million individual render calls every single frame! So we will typically create a set of batches that has a priority queue functionality and when all buckets are full, either the bucket with the highest priority value or the fullest bucket will be sent to the GPU and then emptied. A single bucket can hold 10,000 - 100,000 primitives... where single a primitive could be a point, a line, a triangle list, a triangle fan, etc... This makes the code much more efficient. The heap is seldom used. The BatchManager, AssetManager, TextureManager, AudioManager, FontManager classes, etc. are the ones that live on the heap, but all of their stored assets are used by reference and due to that we can instance a single object a million times! I hope this explanation helps.

Memory Management of update Method in Texture Pixel Manipulation

How is the array of pixels, that is passed to the update method in the Texture class (SFML), managed memory-wise? These are some of my guesses:
A weak pointer is saved inside the texture instance; which means that it is necessary to keep a pointer to the array of pixels of your own and manage it yourself.
The array is copied and managed by the texture (which also means that every time the update method is called again, the previous one is deallocated).
The second guess would justify this for updating a texture multiple times:
auto newPixels = new sf::Uint8[WIDTH * HEIGHT * 4];
... //do stuff to pixels
texture.update(newPixels);
Where the pixels are reallocated every time the texture is updated. Otherwise (if the pixels are just stored as a weak pointer and not managed/deallocated/allocated) a different approach would be necessary, where the pixels are managed by the user...
Thanks in advance for any answers :)
SFML is open source. You don't need to take guesses or ask here. You can just read it for yourself:
https://github.com/SFML/SFML/blob/master/src/SFML/Graphics/Texture.cpp#L390
Specifically, the pointer is passed to the glTexSubImage2D OpenGL method.

Manual depth rendering: Random results despite using atomic operations

i'm rendering single-pixel points into a uint32-texture with a compute shader. the texture is a 3d texture, x and y are viewport coordinates, z has depth information on coordinate 0 and additional attributes on 1. so two manually built rendertargets, if you will. code looks like this:
layout (r32ui, binding = 0) coherent volatile uniform uimage3D renderBuffer;
layout (rgba32f, binding = 1) restrict readonly uniform imageBuffer pointBuffer;
for(int j = 0; j < numPoints / gl_WorkGroupSize.x + 1; j++)
{
vec4 point = imageLoad(pointBuffer, ...)
// ... transform point ...
uint originalDepth = imageAtomicMin(renderBuffer, ivec3(imageCoords, 0), point.depth);
if (originalDepth >= point.depth)
{
// write happened, store the attributes
imageStore(renderBuffer, ivec3(imageCoords, 1), point.attributes);
}
}
while the depth values are correct, i have a few pixels where the attributes flicker between two values.
the order of points in the pointBuffer is random (but i've verified the set of all points is always the same), so my first thought was that two equal depth values might change the output, depending on which one comes first. so i made it that, if originalDepth == point.depth it uses imageAtomicMax to always have the same of the two alternative attributes written, but that changed nothing.
i scattered barrier() and memoryBarrier() all over the place, but that changed nothing. i also removed all diverging control flow for this, changed nothing.
reducing the local work size to 32 removes 90% of the flickering, but some still remains.
any ideas would be greatly appreciated.
edit: before you ask why i do this stuff manually instead of using normal rasterization and fragment shaders, the reason is performance. the rasterizer does not help since i'm rendering single-pixel-points, shared memory greatly speeded things up, and i render each point multiple times, which required me to use a geometry shader which was slow.
The problem is this: you have a race condition on writing to renderBuffer. If two different CS invocations map to the same pixel, and both of them decide to write the value, then there is a race on your imageStore call. One may overwrite the other, it may be a partial overwrite, or something else entirely. But in any case, it's not guaranteed to work.
This would be best solved by doing what rasterizers do: break the process down into two separate phases. The first phase does the ... transform point ... part, writing that data out to a buffer. The second phase then goes through the points and writes them to the final image.
In phase 2, each CS invocation performs all of the processing for a particular output pixel. That way, there are no race conditions. Of course, that requires that phase 1 produces data in a way that can be ordered per-pixel.
There are several ways to go about the latter. You could use a linked list, with a list per-pixel. Or your could use a list per-workgroup, where a workgroup represents some X/Y region of pixel space. In that case, you would use local shared memory as your local depth buffer, with all CS invocations reading from/writing to that region. After they all get done processing pixels, you write it out to real memory. Basically, you'd be implementing tile-based rendering manually.
Indeed, if you have a lot of these points, a tile-based solution would allow you to incorporate pipelining, so that you don't have to wait until all of phase 1 is done before starting on some of phase 2. You could break phase 1 down into chunks. You start a couple of phase 1 chunks, then a phase 2 chunk that reads from the first phase 1, then another phase 1, and so forth.
Vulkan with its event system, has better tools for building such an efficient dependency chain than OpenGL.

How do images work in opencl kernel?

I'm trying to find ways to copy multidimensional arrays from host to device in opencl and thought an approach was to use an image... which can be 1, 2, or 3 dimensional objects. However I'm confused because when reading a pixle from an array, they are using vector datatypes. Normally I would think double pointer, but it doesn't sound like that is what is meant by vector datatypes. Anyway here are my questions:
1) What is actually meant to vector datatype, why wouldn't we just specify 2 or 3 indices when denoting pixel coordinates? It looks like a single value such as float2 is being used to denote coordinates, but that makes no sense to me. I'm looking at the function read_imageui and read_image.
2) Can the input image just be a subset of the entire image and sampler be the subset of the input image? I don't understand how the coordinates are actually specified here either since read_image() only seams to take a single value for input and a single value for sampler.
3) If doing linear algebra, should I just bite the bullet and translate 1-D array data from the buffer into multi-dim arrays in opencl?
4) I'm still interested in images, so even if what I want to do is not best for images, could you still explain questions 1 and 2?
Thanks!
EDIT
I wanted to refine my question and ask, in the following khronos documentation they define...
int4 read_imagei (
image2d_t image,
sampler_t sampler,
int2 coord)
But nowhere can I find what image2d_t's definition or structure is supposed to be. The samething for sampler_t and int2 coord. They seem like structs to me or pointers to structs since opencl is supposed to be based on ansi c, but what are the fields of these structs or how do I note the coord with what looks like a scala?! I've seen the notation (int2)(x,y), but that's not ansi c, that looks like scala, haha. Things seem conflicting to me. Thanks again!
In general you can read from images in three different ways:
direct pixel access, no sampling
sampling, normalized coordinates
sampling, integer coordinates
The first one is what you want, that is, you pass integer pixel coordinates like (10, 43) and it will return the contents of the image at that point, with no filtering whatsoever, as if it were a memory buffer. You can use the read_image*() family of functions which take no sampler_t param.
The second one is what most people want from images, you specify normalized image coords between 0 and 1, and the return value is the interpolated image color at the specified point (so if your coordinates specify a point in between pixels, the color is interpolated based on surrounding pixel colors). The interpolation, and the way out-of-bounds coordinates are handled, are defined by the configuration of the sampler_t parameter you pass to the function.
The third one is the same as the second one, except the texture coordinates are not normalized, and the sampler needs to be configured accordingly. In some sense the third way is closer to the first, and the only additional feature it provides is the ability to handle out-of-bounds pixel coordinates (for instance, by wrapping or clamping them) instead of you doing it manually.
Finally, the different versions of each function, e.g. read_imagef, read_imagei, read_imageui are to be used depending on the pixel format of your image. If it contains floats (in each channel), use read_imagef, if it contains signed integers (in each channel), use read_imagei, etc...
Writing to an image on the other hand is straightforward, there are write_image{f,i,ui}() functions that take an image object, integer pixel coordinates and a pixel color, all very easy.
Note that you cannot read and write to the same image in the same kernel! (I don't know if recent OpenCL versions have changed that). In general I would recommend using a buffer if you are not going to be using images as actual images (i.e. input textures that you sample or output textures that you write to only once at the end of your kernel).
About the image2d_t, sampler_t types, they are OpenCL "pseudo-objects" that you can pass into a kernel from C (they are reserved types). You send your image or your sampler from the C side into clSetKernelArg, and the kernel gets back a sampler_t or an image2d_t in the kernel's parameter list (just like you pass in a buffer object and it gets a pointer). The objects themselves cannot be meaningfully manipulated inside the kernel, they are just handles that you can send into the read_image/write_image functions, along with a few others.
As for the "actual" low-level difference between images and buffers, GPU's often have specially reserved texture memory that is highly optimized for "read often, write once" access patterns, with special texture sampling hardware and texture caches to optimize scatter reads, mipmaps, etc..
On the CPU there is probably no underlying difference between an image and a buffer, and your runtime likely implements both as memory arrays while enforcing image semantics.

what is the most efficient way of moving multiple objects (stored in VBO) in space? should I use glTranslatef or a shader?

I'm trying to get the hang of moving objects (in general) and line strips (in particular) most efficiently in opengl and therefore I'm writing an application where multiple line segments are traveling with a constant speed from right to left. At every time point the left most point will be removed, the entire line will be shifted to the left, and a new point will be added at the very right of the line (this new data point is streamed / received / calculated on the fly, every 10ms or so). To illustrate what I mean, see this image:
Because I want to work with many objects, I decided to use vertex buffer objects in order to minimize the amount of gl* calls. My current code looks something like this:
A) setup initial vertices:
# calculate my_func(x) in range [0, n]
# (could also be random data)
data = my_func(0, n)
# create & bind buffer
vbo_id = GLuint()
glGenBuffers(1, vbo_id);
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
# allocate memory & transfer data to GPU
glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_DYNAMIC_DRAW)
B) update vertices:
draw():
# get new data and update offset
data = my_func(n+dx, n+2*dx)
# update offset 'n' which is the current absolute value of x.
n = n + 2*dx
# upload data
glBindBuffer(GL_ARRAY_BUFFER, vbo_id)
glBufferSubData(GL_ARRAY_BUFFER, n, sizeof(data), data)
# translate scene so it looks like line strip has moved to the left.
glTranslatef(-local_shift, 0.0, 0.0)
# draw all points from offset
glVertexPointer(2, GL_FLOAT, 0, n)
glDrawArrays(GL_LINE_STRIP, 0, points_per_vbo)
where my_func would do something like this:
my_func(start_x, end_x):
# generate the correct x locations.
x_values = range(start_x, end_x, STEP_SIZE)
# generate the y values. We could be getting these values from a sensor.
y_values = []
for j in x_values:
y_values.append(random())
data = []
for i, j in zip(x_values, y_values):
data.extend([i, j])
return data
This works just fine, however if I have let's say 20 of those line strips that span the entire screen, then things slow down considerably.
Therefore my questions:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like? (I suspect that a shader is the wrong way to go, since my line strip is NOT a period function but rather contains random data).
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction. If the window is rescaled in x-direction, then in my current implementation I would have to recalculate the position of the entire line strip (calling my_func to get the new x coordinates) and upload it to the GPU. I guess this could be done more elegantly? How would I do that?
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points. This is most probably because the fine resolution that I use to calculate the line strip does not match the pixel resolution of the screen and therefore sometimes some points appear in front and sometimes behind other points (this is particularly annoying when you don't render a sine wave but some 'random' data). How can I prevent this from happening (besides the obvious solution of translating by a integer multiple of 1 pixel)? If a window get re-sized from let's say originally 800x800 pixels to 100x100 pixels and I still want to visualize a line strip of 20 seconds, then shifting in x direction must work flicker free somehow with sub pixel precision, right?
5) as you can see I always call glTranslatef(-local_shift, 0.0, 0.0) - without ever doing the opposite. Therefore I keep shifting the entire view to the right. And that's why I need to keep track of the absolute x position (in order to place new data at the correct location). This problem will eventually lead to an artifact, where the line is overlapping with the edges of the window. I guess there must be a better way for doing this, right? Like keeping the x values fixed and just moving & updating the y values?
EDIT I've removed the sine wave example and replaced it with a better example. My question is generally about how to move line strips in space most efficiently (while adding new values to them). Therefore any suggestions like "precompute the values for t -> infinity" don't help here (I could also just be drawing the current temperature measured in front of my house).
EDIT2
Consider this toy example where after each time step, the first point is removed and a new one is added to the end:
t = 0
*
* * *
* **** *
1234567890
t = 1
*
* * * *
**** *
2345678901
t = 2
* *
* * *
**** *
3456789012
I don't think I can use a shader here, can I?
EDIT 3: example with two line strips.
EDIT 4: based on Tim's answer I'm using now the following code, which works nicely, but breaks the line into two (since I have two calls of glDrawArrays), see also the following two screenshots.
# calculate the difference
diff_first = x[1] - x[0]
''' first part of the line '''
# push the matrix
glPushMatrix()
move_to = -(diff_first * c)
print 'going to %d ' % (move_to)
glTranslatef(move_to, 0, 0)
# format of glVertexPointer: nbr points per vertex, data type, stride, byte offset
# calculate the offset into the Vertex
offset_bytes = c * BYTES_PER_POINT
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# format of glDrawArrays: mode, Specifies the starting index in the enabled arrays, nbr of points
nbr_points_to_render = (nbr_points - c)
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
''' second part of the line '''
# push the matrix
glPushMatrix()
move_to = (nbr_points - c) * diff_first
print 'moving to %d ' %(move_to)
glTranslatef(move_to, 0, 0)
# select the vertex
offset_bytes = 0
stride = 0
glVertexPointer(2, GL_FLOAT, stride, offset_bytes)
# draw the line
nbr_points_to_render = c
starting_point_in_above_selected_Vertex = 0
glDrawArrays(GL_POINTS, starting_point_in_above_selected_Vertex, nbr_points_to_render)
# pop the matrix
glPopMatrix()
# update counter
c += 1
if c == nbr_points:
c = 0
EDIT5 the resulting solution must obviously render one line across the screen - and no two lines that are missing a connection. The circular buffer solution by Tim provides a solution on how to move the plot, but I end up with two lines, instead of one.
Here's my thoughts to the revised question:
1) should I use glMapBuffer to bind the buffer on the GPU and fill the
data directly (instead of using glBufferSubData)? Or will this make no
difference performance wise?
I'm not aware that there is any significant performance between the two, though I would probably prefer glBufferSubData.
What I might suggest in your case is to create a VBO with N floats, and then use it similar to a circular buffer. Keep an index locally to where the 'end' of the buffer is, then every update replace the value under 'end' with the new value, and increment the pointer. This way you only have to update a single float each cycle.
Having done that, you can draw this buffer using 2x translates and 2x glDrawArrays/Elements:
Imagine that you've got an array of 10 elements, and the buffer end pointer is at element 4. Your array will contain the following 10 values, where x is a constant value, and f(n-d) is the random sample from d cycles ago:
0: (0, f(n-4) )
1: (1, f(n-3) )
2: (2, f(n-2) )
3: (3, f(n-1) )
4: (4, f(n) ) <-- end of buffer
5: (5, f(n-9) ) <-- start of buffer
6: (6, f(n-8) )
7: (7, f(n-7) )
8: (8, f(n-6) )
9: (9, f(n-5) )
To draw this (pseudo-guess code, might not be exactly correct):
glTranslatef( -end, 0, 0);
glDrawArrays( LINE_STRIP, end+1, (10-end)); //draw elems 5-9 shifted left by 4
glPopMatrix();
glTranslatef( end+1, 0, 0);
glDrawArrays(LINE_STRIP, 0, end); // draw elems 0-4 shifted right by 5
Then in the next cycle, replace the oldest value with the new random value,and shift the circular buffer pointer forward.
2) should I use a shader for moving objects (here line strip) instead
of calling glTranslatef? If so, how would such a shader look like? (I
suspect that a shader is the wrong way to go, since my line strip is
NOT a period function but rather contains random data).
Probably optional, if you use the method that I've described in #1. There's not a particular advantage to using one here.
3) what happens if the window get's resized? how do I keep aspect
ratio and scale vertices accordingly? glViewport() only helps scaling
in y direction, not in x direction. If the window is rescaled in
x-direction, then in my current implementation I would have to
recalculate the position of the entire line strip (calling my_func to
get the new x coordinates) and upload it to the GPU. I guess this
could be done more elegantly? How would I do that?
You shouldn't have to recalculate any data. Just define all your data in some fixed coordinate system that makes sense to you, and then use projection matrix to map this range to the window. Without more specifics its hard to answer.
4) I noticed that when I use glTranslatef with a non integral value,
the screen starts to flicker if the line strip consists of thousands
of points. This is most probably because the fine resolution that I
use to calculate the line strip does not match the pixel resolution of
the screen and therefore sometimes some points appear in front and
sometimes behind other points (this is particularly annoying when you
don't render a sine wave but some 'random' data). How can I prevent
this from happening (besides the obvious solution of translating by a
integer multiple of 1 pixel)? If a window get re-sized from let's say
originally 800x800 pixels to 100x100 pixels and I still want to
visualize a line strip of 20 seconds, then shifting in x direction
must work flicker free somehow with sub pixel precision, right?
Your assumption seems correct. I think the thing to do here would either to enable some kind of antialiasing (you can read other posts for how to do that), or make the lines wider.
There are a number of things that could be at work here.
glBindBuffer is one of the slowest OpenGL operations (along with similar call for shaders, textures, etc.)
glTranslate adjusts the modelview matrix, which the vertex unit multiplies all points by. So, it simply changes what matrix you multiply by. If you were to instead use a vertex shader, then you'd have to translate it for each vertex individually. In short: glTranslate is faster. In practice, this shouldn't matter too much, though.
If you're recalculating the sine function on a lot of points every time you draw, you're going to have performance issues (especially since, by looking at your source, it looks like you might be using Python).
You're updating your VBO every time you draw it, so it's not any faster than a vertex array. Vertex arrays are faster than intermediate mode (glVertex, etc.) but nowhere near as fast as display lists or static VBOs.
There could be coding errors or redundant calls somewhere.
My verdict:
You're calculating a sine wave and an offset on the CPU. I strongly suspect that most of your overhead comes from calculating and uploading different data every time you draw it. This is coupled with unnecessary OpenGL calls and possibly unnecessary local calls.
My recommendation:
This is an opportunity for the GPU to shine. Calculating function values on parallel data is (literally) what the GPU does best.
I suggest you make a display list representing your function, but set all the y-coordinates to 0 (so it's a series of points all along the line y=0). Then, draw this exact same display list once for every sine wave you want to draw. Ordinarily, this would just produce a flat graph, but, you write a vertex shader that transforms the points vertically into your sine wave. The shader takes a uniform for the sine wave's offset ("sin(x-offset)"), and just changes each vertex's y.
I estimate this will make your code at least ten times faster. Furthermore, because the vertices' x coordinates are all at integral points (the shader does the "translation" in the function's space by computing "sin(x-offset)"), you won't experience jittering when offsetting with floating point values.
You've got a lot here, so I'll cover what I can. Hopefully this will give you some areas to research.
1) should I use glMapBuffer to bind the buffer on the GPU and fill the data directly (instead of using glBufferSubData)? Or will this make no difference performance wise?
I would expect glBufferSubData to have better performance. If the data is stored on the GPU then mapping it will either
Copy the data back into host memory so you can modify it, and the copy it back when you unmap it.
or, give you a pointer to the GPU's memory directly which the CPU will access over PCI-Express. This isn't anywhere near as slow as it used to be to access GPU memory when we were on AGP or PCI, but it's still slower and not as well cached, etc, as host memory.
glSubBufferData will send the update of the buffer to the GPU and it will modify the buffer. No copying the back and fore. All data transferred in one burst. It should be able to do it as an asynchronous update of the buffer as well.
Once you get into "is this faster than that?" type comparisons you need to start measuring how long things take. A simple frame timer is normally sufficient (but report time per frame, not frames per second - it makes numbers easier to compare). If you go finer-grained than that, just be aware that because of the asynchronous nature of OpenGL, you often see time being consumed away from the call that caused the work. This is because after you give the GPU a load of work, it's only when you have to wait for it to finish something that you notice how long it's taking. That normally only happens when you're waiting for front/back buffers to swap.
2) should I use a shader for moving objects (here line strip) instead of calling glTranslatef? If so, how would such a shader look like?
No difference. glTranslate modifies a matrix (normally the Model-View) which is then applied to all vertices. If you have a shader you'd apply a translation matrix to all your vertices. In fact the driver is probably building a small shader for you already.
Be aware that the older APIs like glTranslate() are depreciated from OpenGL 3.0 onwards, and in modern OpenGL everything is done with shaders.
3) what happens if the window get's resized? how do I keep aspect ratio and scale vertices accordingly? glViewport() only helps scaling in y direction, not in x direction.
glViewport() sets the size and shape of the screen area that is rendered to. Quite often it's called on window resizing to set the viewport to the size and shape of the window. Doing just this will cause any image rendered by OpenGL to change aspect ratio with the window. To keep things looking the same you also have to control the projection matrix to counteract the effect of changing the viewport.
Something along the lines of:
glViewport(0,0, width, height);
glMatrixMode(GL_PROJECTION_MATRIX);
glLoadIdentity();
glScale2f(1.0f, width / height); // Keeps X scale the same, but scales Y to compensate for aspect ratio
That's written from memory, and I might not have the maths right, but hopefully you get the idea.
4) I noticed that when I use glTranslatef with a non integral value, the screen starts to flicker if the line strip consists of thousands of points.
I think you're seeing a form of aliasing which is due to the lines moving under the sampling grid of the pixels. There are various anti-aliasing techniques you can use to reduce the problem. OpenGL has anti-aliased lines (glEnable(GL_SMOOTH_LINE)), but a lot of consumer cards didn't support it, or only did it in software. You can try it, but you may get no effect or run very slowly.
Alternatively you can look into Multi-sample anti-aliasing (MSAA), or other types that your card may support through extensions.
Another option is rendering to a high resolution texture (via Frame Buffer Objects - FBOs) and then filtering it down when you render it to the screen as a textured quad. This would also allow you to do a trick where you move the rendered texture slightly to the left each time, and rendered the new strip on the right each frame.
1 1
1 1 1 Frame 1
11
1
1 1 1 Frame 1 is copied left, and a new line segment is added to make frame 2
11 2
1
1 1 3 Frame 2 is copied left, and a new line segment is added to make frame 3
11 2
It's not a simple change, but it might help you out with your problem (5).