Ok I am trying to switch my Game Engine to multithreading. I have done the research on how to make it work to use OpenGL in multithreaded application. I have no problem with rendering or switching contexts. Let my piece of code explain the problem :) :
for (it = (*mIt).Renderables.begin(); it != (*mIt).Renderables.end(); it++)
{
//Set State Modeling matrix
CORE_RENDERER->state.ModelMatrix = (*it).matrix;
CORE_RENDERER->state.ActiveSubmesh = (*it).submesh;
//Internal Model Uniforms
THREAD_POOL->service.post([&]
{
for (unsigned int i = 0; i < CORE_RENDERER->state.ActiveShaderProgram->InternalModelUniforms.size(); i++)
{
CORE_RENDERER->state.ActiveShaderProgram->InternalModelUniforms[i]->Set( CORE_RENDERER->state.ModelMatrix);
}
CORE_RENDERER->state.ActiveSubmesh->_Render();
});
//Sleep(10);
}
I'll quickly explain what are the elements in the code to make my problem more clear. Renderables is a simple std::vector of elements with _Render() function which works perfectly. CORE_RENDERER->state is a struct holding information about current render state such as current material properties as well as current submesh ModelMatrix. So Matrix and Submesh are stored to state struct (I KNOW THIS IS SLOW, I'll probably change that in time :) ) The next piece of code is sent to THREAD_POOL->service which is actually boost::asio::io_service and has only one thread so it acts like a queue of rendering commands. The idea is that the main thread provides information about what to render and do frustum culling and other tests while an auxilary thread does actual rendering. This works fine, except there is a slight problem:
The code that is sent to thread pool starts to execute, but before all InternalModelUniforms are set and submesh is rendered the next iteration of Renderables is executed and both ModelMatrix and ActiveSubmesh are changed. The program doesn't crash but both informations change and some meshes are rendered some matrices are right others not which results in flickering image. Objects apear on frame and the next frame they are gone. The problem is only fixed if I enable that Sleep(10) function which makes sure that the code is executed before next iteration which obviously kills the idea of gaining preformance. What is the best possible solution for this? How can I send commands to the queue each with unique built in data? Maybe I need to implement my own queue for commands and a single thread without io_service?
I will continue my research as I know there is a way. The idea is right cause I get preformance boost as not a single if/else statement is processed by the rendering thread :) Any help or tips will really help!
Thanks!
Update:
After struggling for few nights I have created a very primitive model of communication between main thread and an Aux Thread. I created a class that represents a base command to be executed by aux thread:
class _ThreadCommand
{
public:
_ThreadCommand() {}
~_ThreadCommand() {}
virtual void _Execute() = 0;
virtual _ThreadCommand* Clone() = 0;
};
These commands that are childs of this class have _Execute() function to do whatever operation needs to be done. The main thread upon rendering fills a boost::ptr_vector of these commands While aux thread keeps on checking if there are any commands to process. When commands are found it copies entire vector to it's own vector inside _AuxThread and clears the original one. Commands are then executed by calling _Execute functions on each:
void _AuxThread()
{
//List of Thread commands
boost::ptr_vector<_ThreadCommand> _cmd;
//Infinitive loop
while(CORE_ENGINE->isRunning())
{
boost::lock_guard<boost::mutex> _lock(_auxMutex);
if (CORE_ENGINE->_ThreadCommands.size() > 0)
{
boost::lock_guard<boost::mutex> _auxLock(_cmdMutex);
for (unsigned int i = 0; i < CORE_ENGINE->_ThreadCommands.size(); i++)
{
_cmd.push_back(CORE_ENGINE->_ThreadCommands[i].Clone());
}
//Clear commands
CORE_ENGINE->_ThreadCommands.clear();
//Execute Commands
for (unsigned int i = 0; i < _cmd.size(); i++)
{
//Execute
_cmd[i]._Execute();
}
//Empty _cmd
_cmd.clear();
}
}
//Notify main thread that we have finished
CORE_ENGINE->_ShutdownCondition->notify_one();
}
I know that this is a really bad way to do it. Preformance is quite slower which I'm quite sure is because of all the copying and mutex locks. But at least the renderer works. You can get the idea of what I want to achieve but as I said I am very new to multithreading. What is the best solution for this scenario? Should I return back to ThreadPool system with asio::io_service? How can I feed commands to AuxThread with all values that must be sent to renderer to preform tasks in correct way?
First, a warning. Your "slight problem" is not slight at all. It is race condition, which is undefined behavior in C++, which, in turn, implies that anything could happen, including:
Everything renders fine
Image flickers
Nothing renders at all
It crashes on the last Saturday of every month. Or working fine on your computer and crashing on everyone's else.
Seriously, do not ever rely on UB, especially when writing library/framework/game engine.
Now about your question.
Lets leave aside any practical benefits of your approach and fix it first.
Actually, OpenGL implementation uses something very similar under the hood. Commands are executed asynchronously by the driver thread. I recommend you to read about their implementation to get some ideas on how to improve your design.
What you need to do, is to somehow "capture" the state at the time you post a rendering command. Simplest possible thing - copy the CORE_RENDERER->state into closure and use this copy to do the rendering. If state is large enough, it can be costly, though.
Alternative solution (and OpenGL goes that way) is to make every change in the state a command also, so
CORE_RENDERER->state.ModelMatrix = (*it).matrix;
CORE_RENDERER->state.ActiveSubmesh = (*it).submesh;
translates into
Matrix matrix = (*it).matrix;
Submesh submesh = (*it).submesh;
THREAD_POOL->service.post([&,matrix,submesh]{
CORE_RENDERER->state.ModelMatrix = matrix;
CORE_RENDERER->state.ActiveSubmesh = submesh;
});
Notice, however, that now you can't simply read CORE_RENDERER->state.ModelMatrix from your main thread, as it is changing in a different thread. You must first ensure that command queue is empty.
Related
I'm trying to write a multithreaded graphics manipulation program using Borland's C++ Builder 6 on WinXP SP3, but have run into (I think) a synchronisation issue, and can't figure out why.
Main Form (Form1) has a TPicture loaded from file. A copy of this is acquired by the thread via a Synchronize() call, and works fine. The thread does some work on the image, and in theory, it periodically updates the main Form image. The main Form also controls a machine, and is a 'First Resort' emergency stop, so blocking isn't an option. Everything is fine until the main Form gets hold of the working copy, or a copy of the working copy (sorry, but it's got to that) at which point the program hangs, and is only responsive to a 'program reset' from the IDE. A poor solution is to copy the working image to the Clipboard, and then, from the main Form, copy from the Clipboard to the main Form's image.
//Synchronization routines:
//----------------------------------------------------------------
`void __fastcall ImageRout::update()
{
Form1->Image9->Picture->Bitmap->Assign(Imgcopy);
//never returns
}
//----------------------------------------------------------------
void __fastcall ImageRout::getimage()
{
Imgcopy->Assign(Form1->Image9->Picture);
}
//----------------------------------------------------------------
//do the initialisation things... Then,
//(data is a struct, loaded with image data via a Synchronize() call)
Imgcopy=new Graphics::TBitmap;
Imgcopy->Width=data.width;
Imgcopy->Height=data.height; //size the bitmap
while(Imgcopy->Canvas->LockCount!=1)
{
Imgcopy->Canvas->TryLock();
} //have to Lock() the image or it gets lost... Somewhere
Synchronize(getimage); //works fine
//do some work on Imgcopy
//"By the book"- attempt 1
//(rate (=15) is a 'brake' to stop every alteration being displayed)
update_count++;
if(update_count>rate) //after a few iterations, update
{ //user interface
Synchronize(update); //fails: never returns from Synchronize call
update_count=0;
}
After a lot of failed attempts, I came up with this.
//in the thread...
update_count++;
if(update_count>rate)
{
EnterCriticalSection(&Form1->mylock1);
Form1->tempimage->Assign(Imgcopy); //tempimage is another bitmap,
InterlockedExchange(&Form1->imageready,1);//declared in the main Form
LeaveCriticalSection(&Form1->mylock1); //and is only ever accessed
update_count=0; //inside a critical section
}
//...and in the main Form....
if(imageready==1)
{
EnterCriticalSection(&mylock1);
Image9->Picture->Bitmap->Assign(tempimage); //Fails here
InterlockedExchange(&gotimage,1);
InterlockedExchange(&imageready,0);
LeaveCriticalSection(&mylock1);
}
So, in desperation.
//in the thread...
update_count++;
if(update_count>rate)
{
Synchronize(update);
EnterCriticalSection(&Form1->mylock1);
Form1->tempimage->Assign(Imgcopy);
Clipboard()->Assign(Imgcopy);
InterlockedExchange(&Form1->imageready,1);
LeaveCriticalSection(&Form1->mylock1); */
update_count=0;
}
//and in the main Form...
if(imageready==1)
{
EnterCriticalSection(&mylock1);
if (Clipboard()->HasFormat(CF_BITMAP))
{
Image9->Picture->Bitmap->Assign(Clipboard());
}
InterlockedExchange(&gotimage,1);
InterlockedExchange(&imageready,0);
LeaveCriticalSection(&mylock1);
}
This last attempt works, albeit relatively slowly, because of the Clipboard overhead, and it's a poor crutch, at best. I suspect the Clipboard is enforcing an otherwise failed synchronisation effort, but, as I said earlier, I can't fathom why. What can be the issue?
Thanks for your comments, Remy. They shook me out of a "tizzy" I'd got myself into whilst trying to solve the problem. I'd forgotten that Windows needs to move memory blocks around, and can't do this if locked them.
The initial problem of the Synchronize(update) call (code block 1 above) was caused by my still having the working copy (Imgcopy) locked (from inside the thread) during the call, preventing the main Form from subsequently accessing it. I suspect (but haven't investigated- that code has gone) the same root cause was at work in code block 2.
Locking every bitmap just prior to access, and unlocking immediately afterwards has solved this problem.
Peter O, thanks for your edit- I didn't realise there was so much overhead in my initial post.
I currently have a project using DirectX11, that generates random terrain based on the Hill algorithm. I have a set up where by you can change the inputs on the terrain (seed, number of hills etc) and then reinitialize and watch the terrain get generated hill by hill. The issue with this is that (as you would expect) there is a large FPS loss, but also things like the camera will stutter when attempting to move it. Essentially, what I want to do is create a thread for the terrain hill step, so that the generation doesn't interfere with the frame time and therefor the camera (so the camera can still move seamlessly). I've looked at a few resources but I'm still not understanding threads properly.
Checking when to reinitialize the terrain, during the update method:
void CThrive::Update(float frameTime)
{
CKeyboardState kb = CKeyboardState::GetKeyboardState(mWindow->GetHWND());
mCamera->Update(kb, frameTime);
if (mGui->Reint())
{
SafeDelete(mTerrain);
mTerrain = new CTerrain(mGui->GetTerrSize(), mGui->GetTerrMin(), mGui->GetTerrMax(), mGui->GetTerrNumHills(), mGui->GetTerrSeed());
mNewTerrain = true;
mGui->SetReint(false);
}
Calling method to generate new hills, during the render method:
void CThrive::Render()
{
if (mNewTerrain)
{
reintTerrain();
}
MainPass();
}
Method used to add to the terrain:
void CThrive::reintTerrain()
{
if (!mTerrain->GenerationComplete())
{
mTerrain->GenerateStep(mGraphicsDevice->GetDevice());
}
else
{
mNewTerrain = false;
}
}
I assume I'd create a thread for reintTerrain, but I'm not entirely sure how to properly make this work within the class, as I require it to stop adding hills when it's finished.
Thank you for your help
Use std::thread for thread creation. Pass pointer to thread's entry point to its constructor as a lambda of member function pointer. Instances of std::thread may reside in the private section of your class. Accesses to shared object's fields used by multiple threads should be protected by fences (std::atomic<>, std::atomic_thread_fence) in order to avoid cache coherency problems.
I wrote in C++ a solver for the 8-puzzle game, and now I'm trying to use Qt to give it a GUI.
Basically I have an underlying object of type "Board" which represents the board of the puzzle, and I have organized the GUI as a grid of QPushButton. Then I have a method updateUI which associates to every button the correct text, based on the Board. Something like
for(int i=0; i<Board::MATRIX_DIM * Board::MATRIX_DIM; i++)
{
m_buttons[i]->setText(m_values[i]);
}
In another method (solveGUI) I have
void MainWindow::solveGUI()
{
m_game->solve();
int solutionDepth = m_game->getSolutionDepth();
Move *solutionMoves = m_game->getSolutionMoves();
for(int i=0; i<solutionDepth; i++)
{
Move m = solutionMoves[i];
m_board.performMove(m); /* perform the move on the Board object */
updateUI(); /* should update the GUI so that it represents the Board */
Sleep(1000);
}
}
where the first line (m_game->solve) takes some time. Then I obtain a list of the moves performed, in solutionMoves, and what I would like to do is showing this moves on the board, with some delay between a move and the next one. This method is called by my main, which looks like this:
QApplication app(argc, argv);
MainWindow w;
w.show();
w.solveGUI();
return app.exec();
The result is that the GUI hangs and, after some time, it displays only the solution, completely skipping the moves.
What am I missing? Thank you!
P.S. I don't think I need a different Thread for the solver because I want the solver to run before the solution is displayed. Is it right?
It's app.exec() that actually runs the main loop which handles all events, including displaying GUI. If you want to call solve() before that, it's OK, but if you want to actually display and update GUI before exec(), it's wrong. I'm not sure if it's totally impossible, but it's definitely not the right way to do it.
There are two ways around it. The more canonical way is to redesign a program using a QTimer. Then everything will be smooth and responsive. But that can be tedious sometimes. In your case it should be quite easy, though. Just save the results somewhere, and call a slot using a QTimer object every 1000 seconds - it will have the same effect as your Sleep(), but will keep everything responsive.
The other solution is to call your solveGUI() method after exec() starts its job. It can be done, for example, using QTimer::singleShot():
QTimer::singleShot(0, &w, SLOT(showGUI()));
return app.exec();
Then, before each Sleep(), you should call QApplication::processEvents(), which basically allows you to temporary yield control, processing all pending events, including GUI updates. This approach is somewhat easier, but it's inferior since the GUI still freezes at each Sleep(). For example, if the user wants to exit the application, or if the window is needed to be repainted, it will cause uncomfortable GUI lags.
You're stalling the main thread (which also does the event processing) and rendering it uncapable of responding to keyboard/mouse/window messages.
You should use an asynchronous timer operation instead of the sleep function: use a QTimer to delay showing the next solution and avoid messages being left unanswered for too long.
There is a nice article of methods to keep the GUI responsive during processing loops. if it's not a complicated case I think, just insert QCoreApplication::processEvents(); inside the long processing loops.
try the following:
void MainWindow::Wait(int interval ) {
QTime timer = new QTime;
timer.restart();
while(timer.elapsed() < interval) {
QApplication::processEvents();
}
}
...
for(...) {
//wait 1 second (1000 milliseconds) between each loop run at first
Wait(1000);
...
}
...
not tested yet - but should work (maybe there is some cpu load)!
I have two threads (the applications main thread and another one). I am using OpenGL to draw some stuff and I am using the OpenGL keyboard and mouse callbacks. OpenGL blocks when I call glutMainLoop() and since I have to do some calculations in the background, I created another thread. Now, the OpenGL callbacks shall send some data (e.g. x, y position of the mouse/key which has been pressed) to the other thread which has a critical section. While the critical section is running no messages should be accepted, but rather than dropping these messages, I want to process them after the critical section. The class of the non-OpenGL looks something like this:
void run()
{
for (;;) {
int currentTime = now();
if (now() - previousTime > WAIT_INTERVAL) {
previousTime = currentTime;
tick();
}
}
}
void tick() {
// critical section begins
processor->step()
// critical section ends
}
void receiveMessage(void *data) {
processor->changeSomeData(data);
}
So, if receiveMessage() is called from the OpenGL thread and processor->step() is running, the call to changeSomeData() should be postponed because it would mess up the whole calculation.
I want to use the following classes to synchronize the threads:
Mutex.h:
#ifndef MUTEX_H
#define MUTEX_H
#include <Windows.h>
class Mutex;
#include "Lock.h"
class Mutex
{
public:
Mutex();
~Mutex();
private:
void acquire();
void release();
CRITICAL_SECTION criticalSection;
friend class Lock;
};
#endif
Mutex.cpp:
#include "Mutex.h"
Mutex::Mutex()
{
InitializeCriticalSection(&this->criticalSection);
}
Mutex::~Mutex()
{
DeleteCriticalSection(&this->criticalSection);
}
void Mutex::acquire()
{
EnterCriticalSection(&this->criticalSection);
}
void Mutex::release()
{
LeaveCriticalSection(&this->criticalSection);
}
Lock.h:
#ifndef LOCK_H
#define LOCK_H
class Lock;
#include "Mutex.h"
class Lock
{
public:
Lock(Mutex& mutex);
~Lock();
private:
Mutex &mutex;
};
#endif
Lock.cpp
#include "Lock.h"
Lock::Lock(Mutex& mutex) : mutex(mutex)
{
this->mutex.acquire();
}
Lock::~Lock ()
{
this->mutex.release();
}
EDIT:
Here is the whole project: http://upload.visusnet.de/uploads/BlobbyWarriors-rev30.zip (~180 MB)
EDIT 2:
And here is the SVN repo: https://projects.fse.uni-due.de/svn/alexander-mueller-bloby-warriors/trunk/
Oh... No, no, no. Threads are NOT what you should use here. Seriously. Thread are NOT your solution in this particular case. Let's roll back a bit...
You're using GLUT at the moment and you say you need threads to "avoid locking on glutMainLoop(). And you don't want locking because you want to do some calculations in the meantime.
Stop now and ask yourself - are you sure that those operations need to be done asynchronically (as a whole) from OpenGL rendering? If so, you may stop reading this post and look at the other ones, but I sincerely believe that it may not be the case for a +- typical real-time OpenGL application.
So... A typical OpenGL app looks like this:
handle events
tick calculations
redraw screen
Most GL window libraries let you implement that as your own main loop, GLUT kind of obfuscates that with its "callbacks", but the idea is the same.
You can still introduce parallelism in your application, but it should start and stop at step 2, so it's still sequential on main-loop level: "calculate a frame of calculations, THEN render this frame". This approach is likely to save you a lot of trouble.
Protip: Change your library. GLUT is outdated and not maintained anymore. Switching to GLFW (or SDL) for window creation wouldn't take much effort in terms of code and - contrary to GLUT - you define your main loop yourself, which seems to be what you want to achieve here. (Plus they tend to be more convenient for input & window event handling, etc.)
Some typical pseudocode with constant-timestep real-time physics without interfering with rendering (assuming that you want to run physics more often than rendering, in general):
var accum = 0
const PHYSICS_TIMESTEP = 20
while (runMainLoop) {
var dt = getTimeFromLastFrame
accum += dt
while (accum > PHYSICS_TIMESTEP) {
accum -= PHYSICS_TIMESTEP
tickPhysicsSimulation(PHYSICS_TIMESTEP)
}
tickAnyOtherLogic(dt)
render()
}
A possible extension from that is to use the value of accum as an additional "extrapolation" value only for rendering, which would allow for visually smooth the graphical representation while simulating physics more seldomly (with bigger DT), possibly more seldomly than once per rendering frame.
In the main thread: lock a mutex, add a struct/object containing the necessary info to a FIFO data structure of some sort, unlock the mutex, then (optionally) wake up the background thread (via a signal or a condition variable or writing a byte to a socket or however)
In the background thread: (optionally) block until awoken by the main thread, then lock the mutex, pop the first item from the head of the FIFO, unlock the mutex, process the item, repeat.
Critical sections and mutexes are bad. They should only be used by library designers, and usually not even then (because for reusable code, it's often worth the extra effort to gain the extra scalability of lock-free).
Instead, you should use a threadsafe queue. Windows offers lots:
thread message queue (PostMessage)
mailslots
message-mode pipes
datagram sockets
SList API
are just a few of your options.
All of these are highly optimized and much easier to use than designing your own queue.
I wouldn't recommend using GLUT anymore - it's terribly outdated and very restrictive. But if you're set on using it, you might want to look into glutIdleFunc. GLUT will continuously invoke this callback when it's idle - you can use this to perform background processing in the main thread.
I have the following action which is executed when a certain
button is pressed in a Qt application:
#include <shape.h>
void computeOperations()
{
polynomial_t p1("x^2-x*y+1"),p2("x^2+2*y-1");
BoundingBox bx(-4.01, 4.01,-6.01,6.01,-6.01,6.01);
Topology3d g(bx);
AlgebraicCurve* cv= new AlgebraicCurve(p1,p2);
g.push_back(cv);
g.run();
//Other operations on g.
}
Topology3d(...), AlgebraicCurve(..), BoundingBox(...),
polynomial_t(...) are user defined types defined in the
corresponding header file .
Now for some values of p1 and p2, the method g.run() works perfectly.
Thus for some other values of p1 and p2, g.run() it is not
working anymore as the method gets blocked somehow and the
message "Application Not Responding" appears and I have to
kill the Application.
I would want to have the following behavior: whenever
g.run() is taking too long, gets blocked for some particular
values of p1, p2, I would want to display an warning box
using QMessageBox::Warning.
I try to do this with try{...} and catch{...}:
#include <shape.h>
class topologyException : public std::runtime_error
{
public:
topologyException::topologyException(): std::runtime_error( "topology fails" ) {}
};
void computeOperations()
{
try
{
polynomial_t p1("x^2-x*y+1"),p2("x^2+2*y-1");
BoundingBox bx(-4.01, 4.01,-6.01,6.01,-6.01,6.01);
Topology3d g(bx);
AlgebraicCurve* cv= new AlgebraicCurve(p1,p2);
g.push_back(cv);
g.run();
//other operations on g
throw topologyException();
}
catch(topologyException& topException)
{
QMessageBox errorBox;
errorBox.setIcon(QMessageBox::Warning);
errorBox.setText("The parameters are incorrect.");
errorBox.setInformativeText("Please insert another polynomial.");
errorBox.exec();
}
}
This code compiles, but when it runs it does not really
implement the required behavior.
For the polynomials for which g.run() gets blocked the error
message box code is never reached, plus for the polynomials
for which g.run() is working well, the error message box
code still is reached somehow and the box appears in the
application.
I am new to handling exceptions, so any help is more than
welcomed.
I think the program gets blocked somewhere inside g.run() so
it does not reach the exception, still I do not understand
what really happens.
Still I would want to throw this exception without going
into the code of g.run(), this function is implemented as
part of a bigger library, which I just use in my code.
Can I have this behavior in my program without putting any
try{...} catch{...} block statement in the g.run() function?
You cannot achieve what you want with the use of try-catch. if g.run() takes too much time or goes into an infinite loop, that doesn't mean an exception will be thrown.
What you can do is, you can move the operations that take a lot of time into another thread. Start that thread in your event handler and wait for it to finish in your main thread for a fixed amount of time. If it does not finish, kill that thread & show your messagebox.
For further reference, read QThread, Qt Thread Support
Thanks for the suggestions.
So I see how I should create the thread, something like:
class myopThread : public QThread
{
public:
void run();
};
Then I am rewriting the run() function and put all the operations that take a lot of time in it:
void myopThread::run()
{
polynomial_t p1("x^2-x*y+1"),p2("x^2+2*y-1");
BoundingBox bx(-4.01, 4.01,-6.01,6.01,-6.01,6.01);
Topology3d g(bx);
AlgebraicCurve* cv= new AlgebraicCurve(p1,p2);
g.push_back(cv);
g.run();
//other operations on g
exec();
}
Okay everything is clear so far, still I do not see how to "Start that thread in your event handler and wait for it to finish in your main thread for a fixed amount of time. If it does not finish, kill that thread & show your messagebox."
I mean start the thread in the event handler refers somehow at using the connect (..Signal, Slot..) still I do not see how exactly this is done. I have never used QThread before so it is more then new.
Thank you very much for your help,
madalina
The most elegant way to solve this that I know of is with a future value. If you haven't run across these before they can be quite handy in situations like this. Say you have a value that you'll need later on, but you can begin calculating concurrently. The code might look something like this:
SomeValue getValue() {
... calculate the value ...
}
void foo() {
Future<SomeValue> future_value(getValue);
... other code that takes a long time ...
SomeValue v = future_value.get();
}
Upon calling the .get() method of course, the value computed is returned, either by calling the function then and there or by retrieving the cache value calculated in another thread started when the Future<T> was created. One nice thing is that, at least for a few libraries, you can pass in a timeout parameter into the .get() method. This way if your value is taking too long to compute you can always unblock. Such elegant isn't usually achieved.
For a real life library, you might try looking into the library documented here. As I recall it wasn't accepted as the official boost futures library, but it certainly had promise. Good luck!