OpenGL Precompute Vertices/Matrices for Particle System / Optimization - c++

I have a particle system which I want to make as really fast as possible without any effects on the main display function, I basically placed all particles calculations on a separate infinite thread which I keep synchronized with WaitForEvent() (Windows), DataLock flags, etc.
I use glColorPointer, glNormalPointer, glVertexPointer etc to point to the buffered data on the GPU (glGenBuffers, glBufferData) and then glDrawElements to render them.
At the moment I don't have the code so I hope that won't be a problem but I'll try my best to get the infrastructure described:
Main [Init]
Create a pre-calc queue 30% in size of N particles and do sequential calculations (Thread 1 #2)
Thread 1
Wait for Calculate Event signal or if pre-calc queue is not full then continue
Loop through N particles and update position / velocity, storing it in pUpdate
If pre-calc queue is not full, add pUpdate to it
Main [Render]
glActiveTexture(TEXTURE0)
glCol/glNorm/glTex/glVertexPointer
If pre-calc is empty use the most recent pUpdate
OR use one of the pre-calc and delete
Store item in buffer using glBufferSubData()
DrawElements() to draw them
SwapBuffers
The problem is that the Render function uses about 50 pre-calc per second (which speeds up rendering while there are enough left) before 1 could even be added. In short order the pre-calc is empty so everything slows down and the program reverts to Main-Render #3
Any ideas?

Related

Draw multiple meshes to different locations (DirectX 12)

I have a problem with DirectX 12. I have made a small 3D renderer. Models are translated to 3D space in vertex shader with basic World View Projection matrixes that are in constant buffer.
To change data of the constant buffer i'm currently using memcpy(pMappedConstantBuffer + alignedSize * frame, newConstantBufferData, alignedSize) this command replaces constant buffer's data immediately.
So the problem comes here, drawing is recorded to a command list that will be later sent to the gpu for execution.
Example:
/* Now i want to change the constant buffer to change the next draw call's position to (0, 1, 0) */
memcpy(/*Parameters*/);
/* Now i want to record a draw call to the command list */
DrawInstanced(/*Parameters*/);
/* But now i want to draw other mesh to other position so i have to change the constant buffer. After this memcpy() the draw position will be (0, -1, 0) */
memcpy(/*Parameters*/);
/* Now i want to record new draw call to the list */
DrawInstanced(/*Parameters*/);
After this i sent the command list to gpu for execution, but quess what all the meshes will be in the same position, because all memcpys are executed before even the command list is sent to gpu. So basically the last memcpy overwrites the previous ones.
So basically the question is how do i draw meshes to different positions or how to replace constant buffer's data in the command list so the constant buffer changes between each draw call on gpu?
Thanks
No need for help anymore i solved it by myself. I created constant buffer for each mesh.
About execution order, you are totally right, you memcpy calls will update the buffers immediately, but the commands will not be processed until you push your command list in the queue (and you will not exactly know when this will happen).
In Direct3D11, when you use Map on a buffer, this is handled for you (some space will be allocated to avoid that if required).
So In Direct3D12 you have several choices, I'll consider that you want to draw N objects, and you want to store one matrix per object in your cbuffer.
First is to create one buffer per object and set data independently. If you have only a few, this is easy to maintain (and extra memory footprint due to resource allocations will be ok)
Other option is to create a large buffer (which can contain N matrices), and create N constant buffer views that points to the memory location of each object. (Please note that you also have to respect 256 bytes alignment in that case too, see CreateConstantBufferView).
You can also use a StructuredBuffer and copy all data into it (in that case you do not need the alignment), and use an index in the vertex shader to lookup the correct matrix. (it is possible to set a uint value in your shader and use SetGraphicsRoot32BitConstant to apply it directly).

C++ Maya - Getting mesh vertices from frame and subframe

I'm writing a mesh deformer plugin that gets info about the mesh from past frames to perform some calculations. In the past, to get past mesh info, I did the following
MStatus MyClass::deform(MDataBlock& dataBlock, MItGeometry& itGeo,
const MMatrix& localToWorldMatrix, unsigned int index)
{
MFnPointArrayData fnPoints;
//... other init code
MPlug meshPlug = nodeFn.findPlug(MString("inputMesh"));
// gets the mesh connection from the previous frame
MPlug meshPositionPlug = meshPlug.elementByLogicalIndex(0);
MObject objOldMesh;
meshPositionPlug.getValue(objOldMesh);
fnPoints.setObject(objOldMesh);
// previous frame's vertices
MPointArray oldMeshPositionVertices = fnPoints.array();
// ... calculations
return MS::kSuccess;
}
If I needed more than one frame I'd run for-loops over logical indices and repeat the process. Since creating this however, I've found that the needs of my plugin can't just get past frames but also frames in the future as well as subframes (between integer frames). Since my current code relies on elementByLogicalIndex() to get past frame info and that only takes unsigned integers, and the 0th index refers to the previous frame, I can't get subframe information. I haven't tried getting future frame info yet but I don't think that's possible either.
How do I query mesh vertex positions in an array for past/future/sub-frames? Is my current method inflexible and, if so, how else could I do this?
So, the "intended" way to accomplish this is with an MDGContext, either with an MDGContextGuard, or with the versions of MPlug.asMObject that explicitly take a context (though these are deprecated).
Having said that - in the past when I've tried to use MDGContexts to query values at other times, I've found them either VERY slow, unstable, or both. So use with caution. It's possible that things will work better if, as you say, you're dealing purely with objects coming straight from an alembic mesh. However, if that's the case, you may have better luck reading the cache path from the node, and querying through the alembic API directly yourself.

cocos2d/cocos2d-x stopping particle system gracefully

Is there a way to stop a particle system gracefully -- ie. I call stop, and the particles dissipate naturally, no new particles are generated.
I use ParticleSystemQuad. So, to stop particles emitting, I set
particle->stopSystem();
particle->setAutoRemoveOnFinish(true);
It stops the particle emission and then, after disappearing last particle, auto removes particle system.
You can also set Visible false Or Remove from parant
ParticleSystemQuad *m_emitter=ParticleSystemQuad::create(ch);
m_emitter->setVisible(true);
this->addChild(m_emitter,50);
m_emitter->setPosition(100,100);
m_emitter->setVisible(false);
Or
m_emitter->runAction(Sequence::create(DelayTime::create(3.0),RemoveSelf::create(), NULL));

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?
First and foremost the biggest CPU hog with OpenGL is immediate modeā€¦ and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.
I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

Is this possible to wait for glRender() and glSwapBuffers() finished?

I noticed that when a key is pressed, and the redraw is initiated from the keyboard even function, the previous draw sometimes is not completely finished. The result is a sloppy "animation". I am basically scrolling the contents of the window. When I measure my draw() function I can see it takes 5 ms. Which is more than enough for a smooth scrolling. But my guess is that the actual drawing is done asynchronously by OpenGL driver somewhere under the hood. So the question:
Can I get notified when the actual rendering and screen update is finished?
function draw(ev) {
var gl = GLX.renderPipeline();
gl.Viewport(0, 0, width, height);
gl.MatrixMode(GL_PROJECTION);
gl.LoadIdentity();
gl.Ortho(0, width, height, 0, -1, 1);
gl.MatrixMode(GL_MODELVIEW);
gl.LoadIdentity();
gl.ClearColor(0.3,0.3,0.8,0.0);
gl.Clear(0x00004000|0x00000100);
gl.Color4f(0, 0, 0.9, 0.5);
rect(gl, 100, 100, 400, 300)
var s = 'ATARI 65 XE FOREVER ATARI 65 XE FOREVER ATARI 65 XE FOREVER ATARI 65 XE FOREVER!'
s = s + s
s = s.split('')
for (var i = 0; i < s.length; i++) s[i] = s[i].charCodeAt(0) + charListBegin
for (var y = 0; y < 60; y++) {
gl.LoadIdentity();
gl.Translatef(charsX, charsY + y * fontSize*8, 0)
gl.Color4f(colors[y][0], colors[y][1], colors[y][2], 0.5);
gl.CallLists(s)
}
gl.Render(ctx);
GLX.SwapBuffers(ctx, win);
}
In general, your commands are placed into a queue. The command queue is flushed at distinct points, for example when you call SwapBuffers or glFlush (but also on some other occasions, e.g. when the queue is full) and the commands are worked off asynchronously. Most commands simply post a command and return immediately, unless they have some lengthy work to do that cannot be postponed, like glBufferData performing a copy of a few hundred kilobytes into a buffer object (this is something that has to happen immediately too, because OpenGL cannot know if the data is still valid at a later time). The time it takes to post commands is what you measure, but it's not what you are interested in.
If your GL version is at least 3.2, you can be "kind of notified" by calling glFenceSync, which inserts a fence object, and allows you to block until the fence has been realized using glClientWaitSync. When glClientWaitSync returns, all commands up to the fence have completed.
If you have at least version 3.3, you can measure the time your OpenGL commands take to render by inserting a query of type GL_TIME_ELAPSED. This works without blocking and is therefore by far the preferrable thing. This is the actual time it takes to draw your stuff.
SwapBuffers, like most commands, does mostly nothing. It will call glFlush, insert the equivalent of a fence, and mark the framebuffer as "locked, and ready to be swapped".
Eventually, when all draw commands have finished and when the driver or window manager can be bothered (think of vertical sync and compositing window managers!), the driver will unlock and swap buffers. This is when your stuff actually gets visible.
If you perform any other command in the mean time that would alter the locked frame buffer, the command blocks. This is what gives SwapBuffers the illusion of blocking.
You don't have much control over that (other than modifying the swap interval, if the implementation lets you) and you can't make it any faster -- but by playing with things like glFlush or glFinish you can make it slower.
Usually you queue a redraw, instead of calling draw directly. I.e. when using Qt, you'd call QWidget::update().
As far as waiting until all commands in the pipeline have been processed, you can call glFinish(), see https://www.opengl.org/sdk/docs/man4/xhtml/glFinish.xml .