OpenGL + Optimizing drawing particle objects for cloth simulation

OpenGL + Optimizing drawing particle objects for cloth simulation - c++

I am creating a cloth simulation for the first time. I have created a class for point masses called 'Particle' that stores all of the important information such as its current and previous positions, the forces acting on it, normal, mass.... etc.
I want to draw these particles for which I only need the currentPosition which is the first of many parameters in the class (see below). From my meagre experience it seems the way to draw these particles would be to send the entire array of Particle objects down to the GPU then give it the stride using Glvertexattribpointer(). It seems like there is a considerable amount of unnecessary information being passed to the GPU however when doing this. I have considered creating a new array and filling it with only the position data every iteration of the draw call to send down but this method however seems like a large overhead of both memory and CPU power.
How does the penalty caused by sending unneeded data to the GPU compare to creating a vertex only array? Granted, my particle class isn't huge but I figure cloths require a large number of particles so the redundant data could add up fast. Is there an alternative to both of these methods I am unaware of?
#ifndef PARTICLE_HPP
#define PARTICLE_HPP
#pragma once
#include <atlas/utils/Geometry.hpp>
class Particle {
public:
Particle(glm::vec3 pos);
~Particle();
void addForce(glm::vec3 force);
void updateGeometry(atlas::utils::Time const& t, GLfloat DAMPING);
void makeMoveable();
void makeStationary();
void resetAcceleration();
void setPosition(glm::vec3 newPos);
void offSetPosition(glm::vec3 newPos);
void addToSumOfSpringForces(glm::vec3 force);
void setWind(glm::vec3 wind_);
void resetNormal();
void addToNormal(glm::vec3 V);
glm::vec3 getCurrentPosition();
private:
glm::vec3 currentPosition, previousPosition, acceleration, totalSpringForces, gravity, wind, velocity, mNormal;
GLfloat mass, dragCoefficient;
bool stationary;
};
#endif // !PARTICLE_HPP

At some point, you need to allocate server GPU memory to store the particle data - whether it's just the positions, or the positions along with all the other data. Whatever vertex buffering method you use (client buffers, VBOs, VAOs, etc), your data at some point needs to be copied from client memory into server memory (CPU to GPU). Thus, if all you need is the currentPosition data within your shader drawing the particles, allocating the additional memory and copying it is wasteful and will never perform as well.
Alternatively, you could do your particle simulation on the GPU, and skip the transfer completely, which would likely be much more efficient assuming your have sufficient hardware and don't need CPU access to the particle data.

Related

Not executing all writes before end of program

For a school project we were tasked with writing a ray tracer. I chose to use C++ since it's the language I'm most comfortable with, but I've been getting some weird artifacts.
Please keep in mind that we are still in the first few lessons of the class, so right now we are limited to checking whether or not a ray hits a certain object.
When my raytracer finishes quickly (less than 1 second spent on actual ray tracing) I've noticed that not all hits get registered in my "framebuffer".
To illustrate, here are two examples:
1.:
2.:
In the first image, you can clearly see that there are horizontal artifacts.
The second image contains a vertical artifact.
I was wondering if anyone could help me to figure out why this is happening?
I should mention that my application is multi-threaded, the multithreaded portion of the code looks like this:
Stats RayTracer::runParallel(const std::vector<Math::ivec2>& pixelList, const Math::vec3& eyePos, const Math::vec3& screenCenter, long numThreads) noexcept
{
//...
for (int i = 0; i < threads.size(); i++)
{
threads[i] = std::thread(&RayTracer::run, this, splitPixels[i], eyePos, screenCenter);
}
for (std::thread& thread: threads)
{
thread.join();
}
//...
}
The RayTracer::run method access the framebuffer as follows:
Stats RayTracer::run(const std::vector<Math::ivec2>& pixelList, const Math::vec3& eyePos, const Math::vec3& screenCenter) noexcept
{
this->frameBuffer.clear(RayTracer::CLEAR_COLOUR);
// ...
for (const Math::ivec2& pixel : pixelList)
{
// ...
for (const std::shared_ptr<Objects::Object>& object : this->objects)
{
std::optional<Objects::Hit> hit = object->hit(ray, pixelPos);
if (hit)
{
// ...
if (dist < minDist)
{
std::lock_guard lock (this->frameBufferMutex);
// ...
this->frameBuffer(pixel.y, pixel.x) = hit->getColor();
}
}
}
}
// ...
}
This is the operator() for the framebuffer class
class FrameBuffer
{
private:
PixelBuffer buffer;
public:
// ...
Color& FrameBuffer::operator()(int row, int col) noexcept
{
return this->buffer(row, col);
}
// ...
}
Which make use of the PixelBuffer's operator()
class PixelBuffer
{
private:
int mRows;
int mCols;
Color* mBuffer;
public:
// ...
Color& PixelBuffer::operator()(int row, int col) noexcept
{
return this->mBuffer[this->flattenIndex(row, col)];
}
// ...
}
I didn't bother to use any synchronization primitives because each thread gets assigned a certain subset of pixels from the complete image. The thread casts a ray for each of its assigned pixels and writes the resultant color back to the color buffer in that pixel's slot. This means that, while all my threads are concurrently accessing (and writing to) the same object, they don't write to the same memory locations.
After some initial testing, using a std::lock_guard to protect the shared framebuffer seems to help, but it's not a perfect solution, artifacts still occur (although much less common).
It should be noted that the way I divide pixels between threads determines the direction of the artifacts. If I give each thread a set of rows the artifacts will be horizontal lines, if I give each thread a set of columns, the artifacts will be vertical lines.
Another interesting conclusion is that when I trace more complex objects (These take anywhere between 30 seconds and 2 minutes) these aretifacts are extremely rare (I've seen it once in my 100's-1000's of traces so far)
I can't help but feel like this is a problem related to multithreading, but I don't really understand why std::lock_guard wouldn't completely solve the problem.
Edit: After suggestions by Jeremy Friesner I ran the raytracer about 10 times on a single thread, without any issues, so the problem does indeed appear to be a race condition.

I solved the problem thanks to Jeremy Friesner.
As you can see in the code, every thread calls framebuffer.clear() separately (without locking the mutex!). This means that thread A might have already hit 5-10 pixels because it was started first when thread B clears the framebuffer. this would erase thread A's already hit pixels.
By moving the framebuffer.clear() call to the beginning of the runParallel() method I was able to solve the issue.

Multiple threads access shared resources

I'm currently working on a particle system, which uses one thread in which the particles are first updated, then drawn. The particles are stored in a std::vector. I would like to move the update function to a separate thread to improve the systems performance. However this means that I encounter problems when the update thread and the draw thread are accessing the std::vector at the same time. My update function will change the values for the position, and colour of all particles, and also almost always resize the std::vector.
Single thread approach:
std::vector<Particle> particles;
void tick() //tick would be called from main update loop
{
//slow as must wait for update to draw
updateParticles();
drawParticles();
}
Multithreaded:
std::vector<Particle> particles;
//quicker as no longer need to wait to draw and update
//crashes when both threads access the same data, or update resizes vector
void updateThread()
{
updateParticles();
}
void drawThread()
{
drawParticles();
}
To fix this problem I have investigated using std::mutex however in practice, with a large amount of particles, the constant locking of threads meant that performance didn't increase. I have also investigated std::atomic however, neither the particles nor std::vector are trivially copyable and so can't use this either.
Multithreaded using mutex:
NOTE: I am using SDL mutex, as far as I am aware, the principles are the same.
SDL_mutex mutex = SDL_CreateMutex();
SDL_cond canDraw = SDL_CreateCond();
SDL_cond canUpdate = SDL_CreateCond();
std::vector<Particle> particles;
//locking the threads leads to the same problems as before,
//now each thread must wait for the other one
void updateThread()
{
SDL_LockMutex(lock);
while(!canUpdate)
{
SDL_CondWait(canUpdate, lock);
}
updateParticles();
SDL_UnlockMutex(lock);
SDL_CondSignal(canDraw);
}
void drawThread()
{
SDL_LockMutex(lock);
while(!canDraw)
{
SDL_CondWait(canDraw, lock);
}
drawParticles();
SDL_UnlockMutex(lock);
SDL_CondSignal(canUpdate);
}
I am wondering if there are any other ways to implement the multi threaded approach? Essentially preventing the same data from being accessed by both threads at the same time, without having to make each thread wait for the other. I have thought about making a local copy of the vector to draw from, but this seems like it would be inefficient, and may run into the same problems if the update thread changes the vector while it's being copied?

I would use a more granular locking strategy. Instead of storing a particle object in your vector, I would store a pointer to a different object.
struct lockedParticle {
particle* containedParticle;
SDL_mutex lockingObject;
};
In updateParticles() I would attempt to obtain the individual locking objects using SDL_TryLockMutex() - if I fail to obtain control of the mutex I would add the pointer to this particular lockedParticle instance to another vector, and retry later to update them.
I would follow a similar strategy inside the drawParticles(). This relies on the fact that draw order does not matter for particles, which is often the case.

If data consistency is not a concern you can avoid blocking the whole vector by encapsulating vector in a custom class and setting mutex on single read/write operations only, something like:
struct SharedVector
{
// ...
std::vector<Particle> vec;
void push( const& Particle particle )
{
SDL_LockMutex(lock);
vec.push_back(particle);
SDL_UnlockMutex(lock);
}
}
//...
SharedVector particles;
Then of course, you need to amend updateParticles() and drawParticles() to use new type instead of std::vector.
EDIT:
You can avoid creating new structure by using mutexes in updateParticles() and drawParticles() methods, e.g.
void updateParticles()
{
//... get Particle particle object
SDL_LockMutex(lock);
particles.push_back(particle);
SDL_UnlockMutex(lock);
}
The same should be done for drawParticles() as well.

If the vector is changing all the time, you can use two vectors. drawParticles would have its own copy, and updateParticles would write to another one. Once both functions are done, swap, copy, or move the vector used by updateParticles to the to be used by drawParticles. (updateParticles can read from the same vector used by drawParticles to get at the current particle positions, so you shouldn't need to create a complete new copy.) No locking necessary.

How to let a C++ compiler to know that a function is `Idempotent`

I'm writing an OpenGL C++ wrapper. This wrapper aims at reducing the complex and fallible usage.
For example, I currently want the user paying only a little attention to OpenGL Context. To do this, I wrote a class gl_texture_2d. As is known to us all, an OpenGL texture basically has the following operations:
Set it's u/v parameter as repeat/mirror and so on
Set it's min/mag filter as linear
...
Based on this, we have:
class gl_texture_2d
{
public:
void mirror_u(); // set u parameter as mirror model
void mirror_v(); // set v parameter as mirror model
void linear_min_filter(); // ...
void linear_mag_filter(); // ...
};
Well, we know that, we can only perform these operations only if the handle of OpenGL texture object is currently bound to OpenGL context.
Suppose we have a function do this:
void bind(GLuint htex); // actually an alias of related GL function
Ok, we can now design our gl_texture_2d usage as:
gl_texture_2d tex;
bind(tex.handle());
tex.mirror_u();
tex.linear_min_filter();
unbind(tex.handle());
It confirms GL's logic, but it lose the wrapper's significant, right? As an user, I wish to operate like:
gl_texture_2d tex;
tex.mirror_u();
tex.linear_min_filter();
To achieve this, we must implement the function alike to:
void gl_texture_2d::mirror_u()
{
glBindTexture(GL_TEXTURE_2D, handle());
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_MIRRORED_REPEAT);
glBindTexture(GL_TEXTURE_2D, 0);
}
Always doing the binding operation internally makes sure that the operation is valid. But the cost is expensive!
The codes:
tex.mirror_u();
tex.mirror_v();
will expand to a pair of meaningless binding/unbinding operation.
So is there any mechanism so that the compiler can know:
If bind(b) immediately follows bind(a), the bind(a) can be removed;
If bind(a) occurs twice in a block, the last has no effect.

If you're working with pre-DSA OpenGL, and you absolutely must wrap OpenGL calls this directly with your own API, then the user is probably going to have to know about the whole bind-to-edit thing. After all, if they've bound a texture for rendering purposes, then they try to modify one, it could damage the current binding.
As such, you should build the bind-to-edit notion directly into the API.
That is, a texture object (which, BTW, should not be limited to just 2D textures) shouldn't actually have functions for modifying it, since you cannot modify an OpenGL texture without binding it (or without DSA, which you really ought to learn). It shouldn't have mirror_u and so forth; those functions should be part of a binder object:
bound_texture bind(some_texture, tex_unit);
bind.mirror_u();
...
The constructor of bound_texture binds some_texture to tex_unit. Its member functions will modify that texture (note: they need to call glActiveTexture to make sure that nobody has changed the active texture unit).
The destructor of bound_texture should automatically unbind the texture. But you should have a release member function that manually unbinds it.

You're not going to be able to do this at a compilation level. Instead, if you're really worried about the time costs of these kinds of mistakes, a manager object might be the way to go:
class state_manager {
GLuint current_texture;
/*Maybe other stuff?*/
public:
void bind_texture(gl_texture_2d const& tex) {
if(tex.handle() != current_texture) {
current_texture = tex.handle();
glBindTexture(/*...*/, current_texture);
}
}
};
int main() {
state_manager manager;
/*...*/
gl_texture_2d tex;
manager.bind(tex);
manager.bind(tex); //Won't execute the bind twice in a row!
/*Do Stuff with tex bound*/
}

How does glDeleteTextures and glDeleteBuffers work?

Basically, in my code I hook the glDeleteTextures and glBufferData functions. I store a list of textures and a list of buffers. The buffer list holds checksums and pointers to the buffer. The below code intercepts the data before it reaches the graphics card.
Hook_glDeleteTextures(GLsizei n, const GLuint* textures)
{
for (int I = 0; I < n; ++I)
{
if (ListOfTextures[I] == textures[I]) //??? Not sure if correct..
{
//Erase them from list of textures..
}
}
(*original_glDeleteTextures)(n, textures);
}
And I do the same thing for my buffers. I save the buffers and textures to a list like below:
void Hook_glBufferData(GLenum target, GLsizeiptr size, const GLvoid* data, GLenum usage)
{
Buffer.size = size;
Buffer.target = target;
Buffer.data = data;
Buffer.usage = usage;
ListOfBuffers.push_back(Buffer);
(*original_glBufferData(target, size, data, usage);
}
Now I need to delete whenever the client deletes. How can I do this? I used a debugger and it seems to know exactly which textures and buffers are being deleted.
Am I doing it wrong? Should I be iterating the pointers passed and deleting the textures?

You do realize, that you should to it other way round: Have a list of texture-info objects and when you delete one of them, call OpenGL to delete the textures. BTW: OpenGL calls don't go to the graphics card, they go to the driver and textures may be stored not on GPU memory at all but be swapped out to system memory.
Am I doing it wrong? Should I be iterating the pointers passed and deleting the textures?
Yes. You should not intercept OpenGL calls to trigger data management in your program. For one, you'd have to track the active OpenGL context as well. But more importantly, it's your program that does the OpenGL calls in the first place. And unless your program/compiler/CPU is schizophrenic it should be easier to track the data first and manage OpenGL objects according to this. Also the usual approach is to keep texture image data in a cache, but delete OpenGL textures of those images, if you don't need them right now, but may need them in the near future.
Your approach is basically inside-out, you're putting the cart before the horse.

Multithreading and heap corruption

So I just started trying out some multithreaded programming for the first time, and I've run into this heap corruption problem. Basically the program will run for some random length of time (as short as 2 seconds, as long as 200) before crashing and spitting out a heap corruption error. Everything I've read on the subject says its very hard thing to diagnose, since the what triggers the error often has little to do with what actually causes it. As such, I remain stumped.
I haven't been formally taught multithreading however, so I was mostly programming off of what I understood of the concept, and my code may be completely wrong. So here's a basic rundown of what I'm trying to do and how the program currently tries to handle it:
I'm writing code for a simple game that involves drawing several parallaxing layers of background. These levels are very large (eg 20000x5000 pixels), so obviously trying to load 3 layers of those sized images is not feasible (if not impossible). So currently the images are split up into 500x500 images and I have the code only have the images it immediately needs to display held in memory. Any images it has loaded that it no longer needs are removed from memory. However, in a single thread, this causes the program to hang significantly while waiting for the image to load before continuing.
This is where multithreading seemed logical to me. I wanted the program to do the loading it needed to do, without affecting the smoothness of the game, as long as the image was loaded by the time it was actually needed. So here is how I have it organized:
1.) All the data for where the images should go and any data associated with them is all stored in one multidimensional array, but initially no image data is loaded. Each frame, the code checks each position on the array, and tests if the spot where the image should go is within some radius of the player.
2.) If it is, it flags this spot as needing to be loaded. A pointer to where the image should be loaded into is push_back()'d onto a vector.
3.) The second thread is started once the level begins. This thread is initially passed a pointer to the aforementioned vector.
4.) This thread is put into an infinite While loop (which by itself sounds wrong) that only terminates when the thread is terminated. This loop continuously checks if there are any elements in the vector. If there are, it grabs the 0th element, loads the image data into that pointer, then .erase()'s the element from the vector.
That's pretty much a rundown of how it works. My uneducated assumption is that the 2 threads collide at some point trying to write and delete in the same space at once or something. Given that I'm new to this I'm certain this method is terrible to some embarrassing degree, so I'm eager to hear what I should improve upon.
EDIT: Adding source code upon request:
class ImageLoadQueue
{
private:
ImageHandle* image;
std::string path;
int frameWidth, frameHeight, numOfFrames;
public:
ImageLoadQueue();
ImageLoadQueue(ImageHandle* a, std::string b, int c, int d, int e=1) { setData(a,b,c,d,e); }
void setData(ImageHandle* a, std::string b, int c, int d, int e=1)
{
image = a;
path = b;
frameWidth = c;
frameHeight = d;
numOfFrames = e;
}
void loadThisImage() { image->loadImage(path, frameWidth, frameHeight, numOfFrames, numOfFrames); }
};
class ImageLoadThread : public sf::Thread
{
private:
std::vector<ImageLoadQueue*>* images;
public:
ImageLoadThread() { };
ImageLoadThread(std::vector<ImageLoadQueue*>* a) { linkVector(a); }
void linkVector(std::vector<ImageLoadQueue*>* a) { images = a; }
virtual void Run()
{
while (1==1)
{
if (!images->empty())
{
(*images)[0]->loadThisImage();
images->erase(images->begin());
}
}
}
};
class LevelArt
{
private:
int levelWidth, levelHeight, startX, startY, numOfLayers;
float widthScale, heightScale, widthOfSegs, heightOfSegs;
float* parallaxFactor;
ImageHandle** levelImages;
int** frame;
int** numOfFrames;
bool* tileLayer;
bool** isLoaded;
Animation** animData;
std::string** imagePath;
std::vector<ImageLoadQueue*> imageQueue;
ImageLoadThread imageThread;
public:
LevelArt(void);
LevelArt(std::string);
~LevelArt(void);
void loadData(std::string);
void drawLevel(sf::RenderWindow*, float, float);
void scaleLevel(float, float);
void forceDraw(sf::RenderWindow*);
void wipeLevel();
void initialLoad();
int getLevelWidth() { return levelWidth; }
int getLevelHeight() { return levelHeight; }
int getTotalWidth() { return widthOfSegs*levelWidth; }
int getTotalHeight() { return heightOfSegs*levelHeight; }
int getStartX() { return startX; }
int getStartY() { return startY; }
};
That's most of the relevant threading code, in this header. Within the levelArt.cpp file exists 3 nested for loops to iterate through all the levelArt data stored, testing if they exist close enough to the player to be displayed, wherein it calls:
imageQueue.push_back(new ImageLoadQueue(&levelImages[i][(j*levelWidth)+k], imagePath[i][(j*levelWidth)+k], widthOfSegs, heightOfSegs, numOfFrames[i][(j*levelWidth)+k]));
i,j,k being the for loop iterators.

This seems like a reasonable use of multithreading. The key idea (in other words, the main place you'll have problems if you do it wrong) is that you have to be careful about data that is used by more than one thread.
You have two places where you have such data:
The vector (which, by the way, should probably be a queue)
The array where you return the data
One way to arrange things - by no means the only one - would be to wrap each of these into its own class (e.g., a class that has a member variable of the vector). Don't allow any direct access to the vector, only through methods on the class. Then synchronize the methods, for example using a mutex or whatever the appropriate synchronization object is. Note that you're synchronizing access to the object, not just the individual methods. So it's not enough to put a mutex in the "read from queue" method; you need a common mutex in the "read from queue" and "write to queue" methods so that no one is doing one while the other occurs. (Also note I'm using the term mutex; that may be a very wrong thing to use depending on your platform and the exact situation. I would likely use a semaphore and a critical section on Windows.)
Synchronization will make the program thread-safe. That's different than making the program efficient. To do that, you probably want a semaphore that represents the number of items in the queue, and have your "load data thread" wait on that semaphore, rather than doing a while loop.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

OpenGL + Optimizing drawing particle objects for cloth simulation - c++

Related

Not executing all writes before end of program

Multiple threads access shared resources

How to let a C++ compiler to know that a function is `Idempotent`

How does glDeleteTextures and glDeleteBuffers work?

Multithreading and heap corruption

Categories

Resources