I have a loop in C++ that I would like to run for a few seconds. Although the amount of work on every iteration is different, from a few microseconds to seconds, it is ok to stop between iterations. It is high-performance code so I would like to avoid calculating time difference on each iteration:
while (!status.has_value())
{
// do something
// this adds extra delays that I would like to avoid
if (duration_cast<seconds>(system_clock::now() - started).count() >= limit)
status = CompletedBy::duration;
}
What I'm thinking is maybe there is a way to schedule signal and then stop the loop when it happens instead of checking the time difference on every iteration.
BTW, the loop may exit before the signal.
I have done something similar, but in Java. The general idea is to use a separate thread to manage a sentinel value, making your loop look like...
okayToLoop = true;
// code to launch thread that will wait N milliseconds, and then negate okayToLoop
while((!status.hasValue()) AND (okayToLoop)) {
// loop code
}
The "cautionary note" is that many sleep() functions for threads employ "sleep at least" semantics, so if it is really important to only sleep N milliseconds, you'll need to address that in your thread implementation. But, this avoids constantly checking the duration for each iteration of the loop.
Note that this will also allow the current iteration of the loop to finish, before the sentinel value is checked. I have also implemented this approach where the "control thread" actually interrupts the thread on which the loop is executing, interrupting the iteration. When I've done this, I've actually put the loop into a worker thread.
Any form of inter-thread communication is going to be way slower than a simple query of a high performance clock.
Now, steady_clock::now() might be too slow in the loop.
Using OS specific APIs, bind your thread to have ridiculous priority and affinity for a specific CPU. Or use rdtsc, after taking into account everything that can go wrong. Calculate what value you'd expect to get if (a) something went wrong, or (b) you have passed the time threshold.
When that happens, check steady_clock::now(), see if you are close enough to being done, and if so finish. If not, calculate a new high performance clock target and loop again.
What does it mean for a function call to "spin?" Is it the same as blocking?
This is the context I'm referring to:
/*
* Obtain a free channel
* This call spins till a free channel is obtained
*/
chNum = _getFreeChannel(&tccNum);
Thanks.
To loop, basically. What a spinning function does is busy-waiting with a loop.
A spin is a concurrency technique. Essentially the function loops until the desired condition is met.
It can be computationally expensive if the spin time is large, but can be preferable to wait and notification idioms (for which there is a set-up overhead) if the expected spin time is very small with low variance.
When a Function spins, it is typically checking some condition (like a variable) over and over in a tight loop until it becomes some interesting value. It then continues running once the condition is met.
I would like to have several threads which change the value of certain elements periodically. Let's say I have some kind of run-method changing the value and a certain amount in milliseconds of sleep afterwards. I do need to be able to change the interval a) right after the change and b) after the sleep. I also need a possibility to change between single execution and repeated execution in timed intervals.
The problem with using a Timer is, that I do not have the possibilities I have with using threads directly, like naming or using conditions.
Can anybody please give me a hint in the right direction?
Is there a way to know how much time passed from when I called WaitForSingleObject function untill it returned?
or that the only way of knowing is by using some kind of a timer?
Just store current time before calling WaitForSingleObject(). Then compare to time when it returns.
From http://msdn.microsoft.com/en-us/library/windows/desktop/ms725473(v=vs.85).aspx - select a function to retrieve time.
You will need to use a timer if you need to track that. The other alternative (although you won't get exact time, a timer would be better) is to call WaitForSingleObject with a small timeout value. You can check the return code to see if it is WAIT_TIMEOUT. If it is, then increment your time count by the timeout value.
GetTickCount() before and after the call. Watch out for DWORD wrap-around.
I have a classic physics-thread vs. graphics-thread problem:
Say I'm running one thread for the physics update and one thread for rendering.
In the physics thread (pseudo-code):
while(true)
{
foreach object in simulation
SomeComplicatedPhysicsIntegration( &object->modelviewmatrix);
//modelviewmatrix is a vector of 16 floats (ie. a 4x4 matrix)
}
and in the graphics thread:
while(true)
{
foreach object in simulation
RenderObject(object->modelviewmatrix);
}
Now in theory this would not require locks, as one thread is only writing to the matrices and another is only reading, and I don't care about stale data so much.
The problem that updating the matrix is not an atomic operation and sometimes the graphics thread will read only partially updated matrices (ie. not all 16 floats have been copied, only part of them) which means part of the matrix is from one physics frame and part is from the previous frame, which in turn means the matrix is nolonger affine (ie. it's basically corrupted).
Is there any good method of preventing this without using locks? I read about a possible implementation using double buffering, but I cannot imagine a way that would work without syncing the threads.
Edit: I guess what I'd really like to use is some sort of triple buffering like they use on graphic displays.. anyone know of a good presentation of the triple buffering algorithm?
Edit 2: Indeed using non-synced triple buffering is not a good ideea (as suggested in the answers below). The physics thread can run mutiple cycles eating a lot of CPU and stalling the graphics thread, computing frames that never even get rendered in the end.
I have opted for a simple double-buffered algorithm with a single lock, where the physics thread only computes as much as 1 frame in advance of the graphics thread before swapping buffers. Something like this:
Physics:
while(true)
{
foreach physicstimestep
foreach object in simulation
SomeComplicatedPhysicsIntegration( &object->modelviewmatrix.WriteBuffer);
LockSemaphore()
SwapBuffers()
UnlockSemaphore()
}
Graphics:
while(true)
{
LockSemaphore()
foreach object in simulation
RenderObject(object->modelviewmatrix.ReadBuffer);
UnlockSemaphore()
}
How does that sound?
You could maintain a shared queue between the two threads, and implement the physics thread such that it only adds a matrix to the queue after it has fully populated all of the values in that matrix. This assumes that the physics thread allocates a new matrix on each iteration (or more specifically that the matrices are treated as read-only once they are placed in the queue).
So any time your graphics thread pulls a matrix out of the queue, it is guaranteed to be fully populated and a valid representation of the simulation state at the time at which the matrix was generated.
Note that the graphics thread will need to be able to handle cases in which the queue is empty for one or more iterations, and that it would probably be a good idea to world-timestamp each queue entry so that you have a mechanism of keeping the two threads reasonably in sync without using any formal synchronization techniques (for instance, by not allowing the graphics thread to consume any matrices that have a timestamp that is in the future, and by allowing it to skip ahead in the queue if the next matrix is from too far in the past). Also note that whatever queue you use must be implemented such that it will not explode if the physics thread tries to add something at the same time that the graphics thread is removing something.
but I cannot imagine a way that would work without syncing the threads.
No matter what kind of scheme you are using, synchronizing the threads is an absolute essential here. Without synchronization you run the risk that your physics thread will race far ahead of the graphics thread, or vice versa. Your program, typically a master thread that advances time, needs to be in control of thread operations, not the threading mechanism.
Double buffering is one scheme that lets your physics and graphics threads run in parallel (for example, you have a multi-CPU or multi-core machine). The physics thread operates on one buffer while the graphics thread operates on the other. Note that this induces a lag in the graphics, which may or may not be an issue.
The basic gist behind double buffering is that you duplicate your data to be rendered on screen.
If you run with some sort of locking, then your simulation thread will always be rendering exactly one frame ahead of the display thread. Every piece of data that gets simulated gets rendered. (The synchronization doesn't have to be very heavy: a simple condition variable can frequently be updated and wake the rendering thread pretty cheaply.)
If you run without synchronization, your simulation thread might simulate events that never get rendered, if the rendering thread cannot keep up. If you include a monotonically increasing generation number in your data (update it after each complete simulation cycle), then your rendering thread can simply busy-wait on the two generation numbers (one for each buffer of data).
Once one (or both) of the generation numbers is greater than the most-recently-rendered generation, copy the newest buffer into the rendering thread, update the most-recently-rendered counter, and start rendering. When it's done, return to busy waiting.
If your rendering thread is too fast, you may chew through a lot of processor in that busy wait. So this only makes sense if you expect to periodically skip rendering some data and almost never need to wait for more simulation.
Don't update the matrix in the physics thread?
Take a chunk, (perhaps a row you have just rendered), and queue its position/size/whatever to the physics thread. Invert/transpose/whateverCleverMatrixStuff the row of modelviewmatrix's into another, new row. Post it back to the render thread. Copy the new row in at some suitable time in your rendering. Perhaps you do not need to copy it in - maybe you can just swap out an 'old' vector for the new one and free the old one?
Is this possible, or is the structure/manipulation of your matrices/whatever too complex for this?
All kinda depends on the structure of your data, so this solution may well be inappropriate/impossible.
Rgds,
Martin
Now in theory this would not require locks, as one thread is only writing to the matrices and another is only reading, and I don't care about stale data so much.
Beware: without proper synchronization, there is no guarantee that the reading thread will ever observe any changes by the writing thread. This aspect is known as visibility, and sadly, it is often overlooked.
Assuming lockless or near-lockless updates is actually what would solve your problem best, it sounds like you want the physics thread to calculate a new matrix, and then instantaneously update all those values at once, so it doesn't matter what version of the matrices the graphics thread gets, as long as (a) it gets them eventually and (b) it never gets half of one and half of the old one.
In which case, it sounds like you want a physics thread something like:
/* pseudocode */
while (true) foreach (object in simulation) {
auto new_object = object;
SomeComplicatedPhysicsIntegrationInPlace(new_object)
atomic_swap(object, new_object); // In pseudocode, ignore return value since nowhere
// else changes value of object. In code, use assert, etc
}
Alternativelty, you can calculate a new state of the whole simulation, and then swap the the values. An easy way of implementing this would be:
/* Psudocode */
while (true) {
simulation[ 1-global_active_idx ] = simulation[ global_active_idx ];
foreach (object in simulation[global_inactive_idx]) {
SomeComplicatedPhysicsIntegrationInPlace(object);
}
global_active_idx = 1-global_active_idx; // implicitly assume this is atomic
}
The graphics thread should constantly render simulation[global_active_idx].
In fact, this doesn't work. It will work in many situations because typically, writing a 1 to memory location containing 0 is in fact atomic on most processors, but it's not guaraneed to work. Specifically, the other thread may never reread that value. Many people bodge this by declaring the variable volatile, while works on many compilers, but is not guaranteed to work.
However, to make either example work, all you need is an atomic write instruction, while C++ doesn't provide until C++0x, but is fairly easy for compilers to implement, since most "write an int" instructions ARE atomic, the compiler just has to ensure that.
So, you can write your code using an atomic_swap function at the end of the physics loop, and implement that in terms of (a) a lock, write, unlock sequence -- which shouldn't block the graphics thread significantly, beause it's only blocking for the length of time of one memory write, and possibly only doing so once per whole frame or (b) a compiler's built in atomic support, eg. http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
There are similar solutions, eg. the physics thread updating a semaphore, which the graphics thread treats simply as a variable with value 0 or 1; eg. the physics thread posts finished calculations to a queue (which is implemented internally in a similar way to the above), and the graphics thread constantly renders the top of the queue, repeating the last one if the queue underflows.
However, I'm not sure I'm understanding your problem. Why is there any point updating the graphics if the physics hasn't changed? Why is there any point updating the physics faster than the physics, can't it extrapolate further in each chunk? Does locking the update actually make any difference?
You can make two matrices fit in one cache line size (128 bytes) and aligned it to 128 bytes. This way the matrix is loaded in a cache line and therefore write to and from memory in one block. This is not for the feint of heart though and will require much more work. (This only fixes the problem of reading a matrix while it is being updated and getting a non-orthogonal result.)