Making a separate thread for rendering made my code slower

Making a separate thread for rendering made my code slower - c++

I had a method called run in which I am updating and rendering the game objects.
void Run(olc::PixelGameEngine* pge) noexcept
{
Update(pge);
Render(pge);
}
Frame rate then was fluctuating 300~400 frames in release mode and 200~300 frames in debug mode. I have yet to add lodes of game logic, so I thought I would do rendering in a separate thread, so after a quick tutorial later I changed it to this.
void Run(olc::PixelGameEngine* pge) noexcept
{
Update(pge);
std::thread renderer(&GameManager::Render, this, pge);
renderer.join();
}
Now the frame rate is around 100~150 frames in release mode and 60~100 frames in debug mode.

std::thread creates a thread.
renderer.join() waits until the thread has finished.
Basically the same logic as your first example, but you create and destroy a thread in your 'loop'. Much more work than before, not surprising that the framerate goes down.
What you can do:
define two functions, one for the update and one for the render thread
introduce an global atomic flag
set it in the update function (indicates scene has been updated)
clear it in the render function (indicates that the changes has been presented)
create a thread (runnable, either update or render)
if the scene gets updated, set the flag accordingly
the renderer can decide based on the flag to render the scene or wait until the scene gets updated or render anyway (clear the flag)
the updater can decide based on the flag to wait until the scene has been rendered or update it anyway
Example (c++11):
#include <atomic>
#include <thread>
atomic_bool active = true;
atomic_bool scene_flag = false;
void render(olc::PixelGameEngine* pge) noexcept
{
while (active) {
//while (scene_flag == false) //wait for scene update
renderScene(pge);
scene_flag = false; //clear flag
}
}
void update(olc::PixelGameEngine* pge) noexcept
{
while (active) {
//while (scene_flag == true) //wait for scene render
updateScene(pge);
scene_flag = true; //set flag
}
}
int main()
{
thread u(update, nullptr);
thread r(render, nullptr);
/*while (some_condition) ...*/
active = false;
u.join();
r.join();
return 0;
}
Note: The above code could/will update the scene while it is being rendered (could/will lead to several problems). Proper synchronization is recommended.

Related

Multithreading slows down main thread

I'll do my best to replicate this issue, since it's quite complicated. As an overview, I have a program that displays graphs using OpenGL. This is thread specific so I know the rendering is only done on one thread. My other thread queries a database and stores the data in a copy vector. Once the thread is finished, it swaps the data with the data the OpenGL thread is using (After joining the thread with the main one). In theory there is nothing about this that should make the program run so slow?
The extremely odd part of this is how it eventually "warms up" and runs much faster after a while (it varies quite a bit, sometimes almost instantaneously, sometimes after 30s of runtime). From value's side of thing to compare, the program begins running at about 30-60 fps whilst querying the data (as in, constantly loading it and swapping it and joining the threads), but then once it has warmed up it runs at 1000 fps.
I have tested some things out, beginning with making the query take A LONG time to run. During this, the fps is at a max of what it would be (3000+). It is only when the data is constantly being changed (swapping vectors) that is starts to run very slow. It doesn't make sense that this alone is causing the performance hit since it runs very well after it's "warmed up".
Edit:
I've managed to make a reasonable minimal reproducable example, and i've found some interesting result.
Here is the code:
#include <iostream>
#include <string>
#include <thread>
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include "ImGui/imgui.h"
#include "ImGui/imgui_impl_glfw.h"
#include "ImGui/imgui_impl_opengl3.h"
bool querying = false;
std::thread thread;
int m_Window_Width = 1280;
int m_Window_Height = 720;
static void LoadData()
{
querying = true;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
querying = false;
}
int main()
{
glfwInit();
const char* m_GLSL_Version = "#version 460";
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 6);
GLFWwindow* m_Window = glfwCreateWindow(m_Window_Width, m_Window_Height, "Program", NULL, NULL);
glfwMakeContextCurrent(m_Window);
glfwSwapInterval(0); // vsync
glewInit();
IMGUI_CHECKVERSION();
ImGui::CreateContext();
ImGui::StyleColorsClassic();
// Setup Platform/Renderer backends
ImGui_ImplGlfw_InitForOpenGL(m_Window, true);
ImGui_ImplOpenGL3_Init(m_GLSL_Version);
thread = std::thread(LoadData);
while (!glfwWindowShouldClose(m_Window))
{
glfwPollEvents();
ImGui_ImplOpenGL3_NewFrame();
ImGui_ImplGlfw_NewFrame();
ImGui::NewFrame();
char fps[12];
sprintf_s(fps, "%f", ImGui::GetIO().Framerate);
glfwSetWindowTitle(m_Window, fps);
//Load the data
if (thread.joinable() == false && querying == false) {
thread = std::thread(LoadData);
}
//Swap the data after thread is finished
if (thread.joinable() == true && querying == false) {
thread.join();
}
// Rendering
ImGui::Render();
glfwGetFramebufferSize(m_Window, &m_Window_Width, &m_Window_Height);
glViewport(0, 0, m_Window_Width, m_Window_Height);
glClearColor(0.45f, 0.55f, 0.60f, 1.00f);
glClear(GL_COLOR_BUFFER_BIT);
ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
glfwSwapBuffers(m_Window);
}
ImGui_ImplOpenGL3_Shutdown();
ImGui_ImplGlfw_Shutdown();
ImGui::DestroyContext();
glfwDestroyWindow(m_Window);
glfwTerminate();
return 0;
}
Now the interesting thing here is playing around with std::this_thread::sleep_for(). I have implemented this in so I can simulate the speed it actually takes when running the query on the main database. What is interesting is that it actually causes the main thread to stop running and freezes it. These threads should be separate and not impact one another, however that is not the case here. Is there any explanation for this? This seems to be the root issue for my main program and boiled down to this.
Edit 2
To use the libraries (in Visual Studio), download from here, the binaries, https://www.glfw.org/download.html and here aswell, http://glew.sourceforge.net/ and lastly, ImGui from here, https://github.com/ocornut/imgui
Preprocessor: GLEW_STATIC; WIN32;
Linker: glfw3.lib;glew32s.lib;opengl32.lib;Gdi32.lib;Shell32.lib;user32.lib;Gdi32.lib

This may or may not be your issue, but here:
//Load the data
if (thread.joinable() == false && querying == false) {
thread = std::thread(LoadData);
}
//Swap the data after thread is finished
if (thread.joinable() == true && querying == false) {
thread.join();
}
it is possible that you start the thread in the first if block, then get to the second one before LoadData modifies that bool, causing the wait for that tread to finish.
I would set querying = true; in the main thread, right after you created LoadData thread. Also, I would use some kind of synchronization, for example declare querying as atomic<bool>.
EDIT:
It appears that you do not need to check joinable() - you know when the thread is joinable: when you enter the loop, and after you re-start that thread. This looks cleaner:
std::atomic<bool> querying = true;
void LoadData()
{
std::this_thread::sleep_for(std::chrono::milliseconds(100));
querying = false;
}
and later in your loop:
//Swap the data after thread is finished
if (!querying) {
thread.join();
querying = true;
thread = std::thread(LoadData);
}

Rendering texture (video) in OpenGL in a blocking way so my received video frame won't get replaced by new one while rendering

I think that the GTK specifics of this code are not needed to understand what's happening. The glDraw() function does OpenGL rendering from the Frame frame which is retrieved from decodedFramesFifo which is a thread-safe deque.
The .h file
class OpenGLArea2 : public Gtk::Window
{
public:
OpenGLArea2();
~OpenGLArea2() override;
public:
Gtk::Box m_VBox{Gtk::ORIENTATION_VERTICAL, false};
Gtk::GLArea glArea;
virtual bool render(const Glib::RefPtr<Gdk::GLContext> &context){};
};
Then the cpp file:
OpenGLArea2::OpenGLArea2()
{
set_default_size(640, 360);
add(m_VBox);
glArea.set_hexpand(true);
glArea.set_vexpand(true);
glArea.set_auto_render(true);
m_VBox.add(glArea);
glArea.signal_render().connect(sigc::mem_fun(*this, &OpenGLArea2::render), false);
glArea.show();
m_VBox.show();
}
glFlush
bool OpenGLArea2::render(const Glib::RefPtr<Gdk::GLContext> &context)
{
try
{
glArea.throw_if_error();
glFlush
glDraw();
glFlush();
}
catch (const Gdk::GLError &gle)
{
std::cerr << "An error occurred in the render callback of the GLArea" << std::endl;
return false;glFlush
}
}
void OpenGLArea2::run()
{
while (true)
{
//Important: if decodedFramesFifo does not have any data, it blocks until it has
Frame frame = decodedFramesFifo->pop_front();
this->frame = std::move(frame);
if (!firstFrameReceived)
firstFrameReceived = true;
queue_draw();
}
}
Here's a sketch of what glDraw() does:
void OpenGLArea2::run()
{
//Creates shader programs
//Generate buffers and pixel buffer object to render from
glBufferData(GL_PIXEL_UNPACK_BUFFER, textureSize, frame.buffer(j), GL_STREAM_DRAW);
//calls glTexSubImage2D to do rendering
}
The problem is that I'm getting segmentation faults sometimes. I tried debugging with gdb and valgrind, but in gdb it won't show the call stack, just the place where the error ocurred (some memmove weird things) and in valgrind it slows down the application to 1 fps and it simply won't experience the segmentation faults because I think it has plenty of time to render the data before new data arrives.
I suspect that queue_draw() isn't blocking, and therefore, it just marks the window for rendering and returns. The window then calls render(). If render() is fast enough to render before a new frame arrives on the while loop, then no data race occurs. But if render() takes a little bit more time, a new frame arrives and is written in the place of the old frame which was in the middle of the rendering
So the questions are: how to render in a blocking way? That is, instead of calling queue_draw(), I call directly glDraw() and wait for it to return? And can I trust that glBufferData() and glTexSubImage2D() both consume the frame data in a blocking way, and does not simply mark it to be sent to the GPU in a later time?
ps: I found void Gtk::GLArea::queue_render and void Gtk::GLArea::set_auto_render(bool auto_render=true) but I think queue_render() also returns immediately.
UPDATE:
Some people said I should use glFinish. The problem is that, in the renderer loop, queue_draw() returns immediately, therefore the thread is not blocked at all. How to render without queue_draw()?
UPDATE:
I added, at the beggining of the while loop:
std::unique_lock<std::mutex> lock{mutex};
and at the end of the while loop:
conditionVariable.wait(lock);
And now my render function is like this:
glArea.throw_if_error();
glDraw();
glFinish();
conditionVariable.notify_one();
The conidition variable makes the while loop wait before the rendering finishes, so it can safely delete the received frame (because it goes out of scope). But I'm still receiving segfaults. I added logging to some lines and found out the segfault occurs while waiting. What could be the reason?

I think this is the most problematic part:
Frame frame = decodedFramesFifo->pop_front();
this->frame = std::move(frame);
Since pop_front blocks and waits for frame. If you get a second frame before rendering than the first frame is going to be destroyed with:
this->frame = std::move(frame); // this->frame now contains second frame, first frame is destroyed
You should lock access to this->frame
void ...::run()
{
while (true)
{
Frame frame = decodedFramesFifo->pop_front();
std::unique_lock<std::mutex> lk{mutex};
this->frame = std::move(frame);
lk.unlock();
if (!firstFrameReceived)
firstFrameReceived = true;
queue_draw();
}
}
void ...::render()
{
std::unique_lock<std::mutex> lk{mutex};
// draw 'this->frame'
}
Optinaly if you can std::move frame out and you only render once per frame max, you can:
void ...::render()
{
std::unique_lock<std::mutex> lk{mutex};
Frame frame = std::move(this->frame);
lk.unlock();
// draw 'frame'
}

SFML thread synchronization?

I'm new to SFML, been trying to have a multi-threading game system (all of the game logic on the main thread, and the rendering in a dedicated thread using sf::Thread; mainly for practicing with threads) as explained in this page ("Drawing with threads" section):
Unfortunately my program has a long processing time during it's update() and makes the rendering process completely out of control, showing some frames painted and some others completely empty. If it isn't obvious my rendering thread is trying to paint something that isn't even calculated, leaving this epileptic effect.
What I'm looking for is to allow the thread to render only when the main logic has been calculated. Here's what I got so far:
void renderThread()
{
while (window->isOpen())
{
//some other gl stuff
//window clear
//window draw
//window display
}
}
void update()
{
while (window->isOpen() && isRunning)
{
while (window->pollEvent(event))
{
if (event.type == sf::Event::Closed || sf::Keyboard::isKeyPressed(sf::Keyboard::Escape))
{
isRunning = false;
}
else if (m_event.type == sf::Event::Resized)
{
glViewport(0, 0, m_event.size.width, m_event.size.height);
}
}
// really resource intensive process here
time = m_clock.getElapsedTime();
clock.restart().asSeconds();
}
}
Thanks in advance.

I guess the errors happen because you manipulate elements that are getting rendered at the same time in parallel. You need to look into mutexes.
Mutexes lock the element you want to manipulate (or draw in the the other thread) for as long as the manipulation takes and frees it afterwards.
While the element is locked it can not be accessed by another thread.
Example in pseudo-code:
updateThread(){
renderMutex.lock();
globalEntity.manipulate();
renderMutex.unlock();
}
renderThread(){
renderMutex.lock();
window.draw(globalEntity);
renderMutex.unlock();
}

Sharing opengl resources (OpenGL ES 2.0 Multithreading)

I have developed an OpenGL ES 2.0 win32 application, that works fine in a single thread. But I also understand that UI thread and a rendering thread should be separate.
Currently my game loop looks something like that:
done = 0;
while(!done)
{
msg = GetMessage(..); // getting messages from OS
if(msg == QUIT) // the window has been closed
{
done = 1;
}
DispatchMessage(msg,..); //Calling KeyDown, KeyUp events to handle user input;
DrawCall(...); //Render a frame
Update(..); // Update
}
Please view it as a pseudo code, cause i don't want to bother you with details at this point.
So my next step was to turn done into an std::atomic_int and create a function
RenderThreadMain()
{
while(!done.load())
{
Draw(...);
}
}
and create a std::unique_ptr<std::thread> m_renderThread variable. As you can guess nothing has worked for me so far, so i made my code as stupid and simple as possible in order to make sure i don't break anything with the order i call methods in. So right now my game loop works like this.
done.store(0);
bool created = false;
while(!done)
{
msg = GetMessage(..); // getting messages from OS
if(msg == QUIT) // the window has been closed
{
done.store(1);
}
DispatchMessage(msg,..); //Calling KeyDown, KeyUp events to handle user input;
// to make sure, that my problem is not related to the fact, that i'm rendering too early.
if(!created)
{
m_renderThread = std::make_unique<std::thread>(RenderThreadMain, ...);
created = true;
}
Update(..); // Update
}
But this doesn't work. On every draw call, when i try to somehow access or use my buffers \ textures anything else, i get the GL_INVALID_OPERATION error code.
So my guess would be, that the problem is in me calling glGenBuffers(mk_bufferNumber, m_bufferIds); in the main thread during initialization and then calling glBindBuffer(GL_ARRAY_BUFFER, m_bufferIds[0]); in a render thread during the draw call. (the same applies to every openGL object i have)
But I don't now if i'm right or wrong.

2 threads left hanging waiting on QWaitCondition in spite of wakeAll calls

I have threaded iterative generation of some geometries. I use VTK for rendering. After each iteration I would like to display (render) the current progress. My approach works as expected until the last 2 threads are left hanging waiting for QWaitCondition. They are blocked, even though their status in QWaitCondition's queue is wokenUp (inspected through debugger). I suspect that number of 2 threads is somehow connected with my processor's 4 cores.
Simplified code is below. What am I doing wrong and how to fix it?
class Logic
{
QMutex threadLock, renderLock;
//SOLUTION: renderLock should be per thread, not global like this!
QWaitCondition wc;
bool render;
...
}
Logic::Logic()
{
...
renderLock.lock(); //needs to be locked for QWaitCondition
}
void Logic::timerProc()
{
static int count=0;
if (render||count>10) //render wanted or not rendered in a while
{
threadLock.lock();
vtkRenderWindow->Render();
render=false;
count=0;
wc.wakeAll();
threadLock.unlock();
}
else
count++;
}
double Logic::makeMesh(int meshIndex)
{
while (notFinished)
{
...(calculate g)
threadLock.lock(); //lock scene
mesh[meshIndex]->setGeometry(g);
render=true;
threadLock.unlock();
wc.wait(&renderLock); //wait until rendered
}
return g.size;
}
void Logic::makeAllMeshes()
{
vector<QFuture<double>> r;
for (int i=0; i<meshes.size(); i++)
{
QFuture<double> future = QtConcurrent::run<double>(this, &Logic::makeMesh, i);
r.push_back(future);
}
while(any r is not finished)
QApplication::processEvents(); //give timer a chance
}

There is at least one defect in your code. count and render belong to the critical section, which means they need to be protected from concurrent access.
Assume there are more threads waiting on wc.wait(&renderLock);. Someone somewhere execute wc.wakeAll();. ALL the threads are woken up. Assume at least one thread sees notFinished as true (if any of your code make sense, this must be possible) and go back to execute :
threadLock.lock(); //lock scene
mesh[meshIndex]->setGeometry(g);
render=true;
threadLock.unlock();
wc.wait(&renderLock) <----OOPS...
The second time the thread comes back, he doesn't have the lock renderLock. So Kamil Klimek is right: you call wait on a mutex you don't hold.
You should remove the lock in constructor, and lock before the calling the condition. Wherever you lock renderlock, the thread should not hold threadlock.

The catch was that I needed one QMutex per thread, and not just one global QMutex. The corrected code is below. Thanks for help UmNyobe!
class Logic
{
QMutex threadLock;
QWaitCondition wc;
bool render;
...
}
//nothing in constructor related to threading
void Logic::timerProc()
{
//count was a debugging workaround and is not needed
if (render)
{
threadLock.lock();
vtkRenderWindow->Render();
render=false;
wc.wakeAll();
threadLock.unlock();
}
}
double Logic::makeMesh(int meshIndex)
{
QMutex renderLock; //fix
renderLock.lock(); //fix
while (notFinished)
{
...(calculate g)
threadLock.lock(); //lock scene
mesh[meshIndex]->setGeometry(g);
render=true;
threadLock.unlock();
wc.wait(&renderLock); //wait until rendered
}
return g.size;
}
void Logic::makeAllMeshes()
{
vector<QFuture<double>> r;
for (int i=0; i<meshes.size(); i++)
{
QFuture<double> future = QtConcurrent::run<double>(this, &Logic::makeMesh, i);
r.push_back(future);
}
while(any r is not finished)
QApplication::processEvents(); //give timer a chance
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Making a separate thread for rendering made my code slower - c++

Related

Multithreading slows down main thread

Rendering texture (video) in OpenGL in a blocking way so my received video frame won't get replaced by new one while rendering

SFML thread synchronization?

Sharing opengl resources (OpenGL ES 2.0 Multithreading)

2 threads left hanging waiting on QWaitCondition in spite of wakeAll calls

Categories

Resources