Delay function makes openFrameworks window freeze until specified time passes - c++

I have network of vertices and I want to change their color every second. I tried using Sleep() function and currently using different delay function but both give same result - lets say I want to color 10 vertices red with 1 second pause. When Im starting project it seems like the window freezes for 10 seconds and then shows every vertice already with red color.
This is my update function.
void ofApp::update(){
for (int i = 0; i < vertices.size(); i++) {
ofColor red(255, 0, 0);
vertices[i].setColor(red);
delay(1);
}
}
Here is draw function
void ofApp::draw(){
for (int i = 0; i < vertices.size(); i++) {
for (int j = 0; j < G[i].size(); j++) {
ofDrawLine(vertices[i].getX(), vertices[i].getY(), vertices[G[i][j]].getX(), vertices[G[i][j]].getY());
}
vertices[i].drawBFS();
}
}
void vertice::drawBFS() {
ofNoFill();
ofSetColor(_color);
ofDrawCircle(_x, _y, 20);
ofDrawBitmapString(_id, _x - 3, _y + 3);
ofColor black(0, 0, 0);
ofSetColor(black);
}
This is my delay() function
void ofApp::delay(int number_of_seconds) {
// Converting time into milli_seconds
int milli_seconds = 1000 * number_of_seconds;
// Stroing start time
clock_t start_time = clock();
// looping till required time is not acheived
while (clock() < start_time + milli_seconds)
;
}

There is no bug in your code, just a misconception about waiting. So this is no definitive answer, just a hint in the right direction.
You have probably just one thread. One thread can do exactly one thing at a time. When you call delay all this one thread is doing is checking the time over and over again until some time has passed.
During this time the thread can do nothing else (it can not magically skip instructions or detect your intentions). So it can not issue drawing commands or swap some buffers to display vectors on screen. That's the reason why you application seems to freeze - 99.9% of the time it's checking if the interval has passed. This also places a heavy load on the cpu.
The solution can be a bit tricky and requires threading. Usually you have a UI-Thread that regularly draws stuff, refreshes the display, maybe takes inputs and so on. This thread should never do heavy calculations to keep the UI responsive. A second thread will then manage heavier calculations or updating data.
If you want to run a task in an interval you don't just loop until the time is over, but essentially "tell the OS" that the second thread should be inactive for a certain period. The OS will manage this way more efficient than implementing active waiting.
But that's quite a large topic, so I suggest you read on about Multithreading. C++ has a small thread library since C++11. May be worth a look.

// .h
float nextEventSeconds = 0;
// .cpp
void ofApp::draw(){
float now = ofGetElapsedTimef();
if(now > nextEventSeconds) {
// do something here that should only happen
// every 3 seconds
nextEventSeconds = now + 3;
}
}
Following this example I managed to solve my problem.

Related

this_thread::sleep_for / SDL Rendering Skips instructions

I'm trying to make a sorting visualizer with SDL2, everything works except one thing, the wait time.
The sorting visualizer has a delay, I can change it to whatever i want, but when I set it to around 1ms it skips some instructions.
Here is 10ms vs 1ms:
10ms delay
1ms delay
The video shows how the 1ms delay doesn't actually finish sorting:
Picture of 1ms delay algorithm completion.
I suspect the problem being the wait function I use, I'm trying to make this program multi-platform so there are little to no options.
Here's a snippet of the code:
Selection Sort Code (Shown in videos):
void selectionSort(void)
{
int minimum;
// One by one move boundary of unsorted subarray
for (int i = 0; i < totalValue-1; i++)
{
// Find the minimum element in unsorted array
minimum = i;
for (int j = i+1; j < totalValue; j++){
if (randArray[j] < randArray[minimum]){
minimum = j;
lineColoration[j] = 2;
render();
}
}
lineColoration[i] = 1;
// Swap the found minimum element with the first element
swap(randArray[minimum], randArray[i]);
this_thread::sleep_for(waitTime);
render();
}
}
Some variables need explanation:
totalValue is the amount of values to be sorted (user input)
randArray is a vector that stores all the values
waitTime is the amount of milliseconds the computer will wait each time (user input)
I've cut the code down, and removed other algorithms to make a reproducible example, not rendering and using cout seems to work, but I still cant pin down if the issue is the render or the wait function:
#include <algorithm>
#include <chrono>
#include <iostream>
#include <random>
#include <thread>
#include <vector>
#include <math.h>
SDL_Window* window;
SDL_Renderer* renderer;
using namespace std;
vector<int> randArray;
int totalValue= 100;
auto waitTime= 1ms;
vector<int> lineColoration;
int lineSize;
int lineHeight;
Uint32 ticks= 0;
void OrganizeVariables()
{
randArray.clear();
for(int i= 0; i < totalValue; i++)
randArray.push_back(i + 1);
auto rng= default_random_engine{};
shuffle(begin(randArray), end(randArray), rng);
lineColoration.assign(totalValue,0);
}
int create_window(void)
{
window= SDL_CreateWindow("Sorting Visualizer", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 1800, 900, SDL_WINDOW_SHOWN);
return window != NULL;
}
int create_renderer(void)
{
renderer= SDL_CreateRenderer(
window, -1, SDL_RENDERER_PRESENTVSYNC); // Change SDL_RENDERER_PRESENTVSYNC to SDL_RENDERER_ACCELERATED
return renderer != NULL;
}
int init(void)
{
if(SDL_Init(SDL_INIT_VIDEO) != 0)
goto bad_exit;
if(create_window() == 0)
goto quit_sdl;
if(create_renderer() == 0)
goto destroy_window;
cout << "All safety checks passed succesfully" << endl;
return 1;
destroy_window:
SDL_DestroyRenderer(renderer);
SDL_DestroyWindow(window);
quit_sdl:
SDL_Quit();
bad_exit:
return 0;
}
void cleanup(void)
{
SDL_DestroyWindow(window);
SDL_Quit();
}
void render(void)
{
SDL_SetRenderDrawColor(renderer, 0, 0, 0, 255);
SDL_RenderClear(renderer);
//This is used to only render when 16ms hits (60fps), if true, will set the ticks variable to GetTicks() + 16
if(SDL_GetTicks() > ticks) {
for(int i= 0; i < totalValue - 1; i++) {
// SDL_Rect image_pos = {i*4, 100, 3, randArray[i]*2};
SDL_Rect fill_pos= {i * (1 + lineSize), 100, lineSize,randArray[i] * lineHeight};
switch(lineColoration[i]) {
case 0:
SDL_SetRenderDrawColor(renderer,255,255,255,255);
break;
case 1:
SDL_SetRenderDrawColor(renderer,255,0,0,255);
break;
case 2:
SDL_SetRenderDrawColor(renderer,0,255,255,255);
break;
default:
cout << "Error, drawing color not defined, exting...";
cout << "Unkown Color ID: " << lineColoration[i];
cleanup();
abort();
break;
}
SDL_RenderFillRect(renderer, &fill_pos);
}
SDL_RenderPresent(renderer);
lineColoration.assign(totalValue,0);
ticks= SDL_GetTicks() + 16;
}
}
void selectionSort(void)
{
int minimum;
// One by one move boundary of unsorted subarray
for (int i = 0; i < totalValue-1; i++) {
// Find the minimum element in unsorted array
minimum = i;
for (int j = i+1; j < totalValue; j++) {
if (randArray[j] < randArray[minimum]) {
minimum = j;
lineColoration[j] = 2;
render();
}
}
lineColoration[i] = 1;
// Swap the found minimum element with the first element
swap(randArray[minimum], randArray[i]);
this_thread::sleep_for(waitTime);
render();
}
}
int main(int argc, char** argv)
{
//Rough estimate of screen size
lineSize= 1100 / totalValue;
lineHeight= 700 / totalValue;
create_window();
create_renderer();
OrganizeVariables();
selectionSort();
this_thread::sleep_for(5000ms);
cleanup();
}
The problem is the ticks= SDL_GetTicks() + 16; as those are too many ticks for a millisecond wait and the if(SDL_GetTicks() > ticks) condition is false most of the time.
If you put 1ms wait and ticks= SDL_GetTicks() + 5 it will work.
In the selectionSort loop, if in the last, say, eight iterations, the if(SDL_GetTicks() > ticks) skips the drawing, the loop may well finish and let some pending drawings.
It is not the algorithm not completing, it is it finish before ticks reaches a number high enough to allow the drawing.
The main problem is that you are dropping updates to the screen by making all rendering dependant on an if condition:
if(SDL_GetTicks() > ticks)
My tests have shown that only about every 70th call to the function render actually gets rendered. All other calls are filtered by this if condition.
This extremely high number is because you are calling the function render not only in your outer loop, but also in the inner loop. I see no reason why it should also be called in the inner loop. In my opinion, it should only be called in the outer loop.
If you only call it in the outer loop, then about every 16th call to the function is actually rendered.
However, this still means that the last call to the render function only has a 1 in 16 chance of being rendered. Therefore, it is not surprising that the last render of your program does not represent the last sorting step.
If you want to ensure that the last sorting step gets rendered, you could simply execute the rendering code once unconditionally, after the sorting has finished. However, this may not be the ideal solution, because I believe you should first make a more fundamental decision on how your program should behave:
In your question, you are using delays of 1ms between calls to render. This means that your program is designed to render 1000 frames per second. However, your monitor can probably only display about 60 frames per second (some gaming monitors can display more). In that case, every displayed frame lasts for at least 16.7 milliseconds.
Therefore, you must decide how you want your program to behave with regard to the monitor. You could make your program
sort faster than your monitor can display individual sorting steps, so that not all of the sorting steps are rendered, or
sort slower than your monitor can display individual sorting steps, so that all sorting steps are displayed by the monitor for at least one frame, possibly several frames, or
sort at exactly the same speed as your monitor can display, so that one sorting step is displaying for exactly one frame by the monitor.
Implementing #3 is the easiest of all. Because you have enabled VSYNC in the function call to SDL_CreateRenderer, SDL will automatically limit the number of renders to the display rate of your monitor. Therefore, you don't have to perform any additional waiting in your code and can remove the line
this_thread::sleep_for(waitTime);
from the function selectionSort. Also, since SDL knows better than you whether your monitor is ready for the next frame to be drawn, it does not seem appropriate that you try to limit the number of frames yourself. So you can remove the line
if(SDL_GetTicks() > ticks) {
and the corresponding closing brace from the function render.
On the other hand, it may be better to keep the if statement to prevent the massively high frame rates in case SDL doesn't limit them properly. In that case, the frame rate limiter should probably be set well above 60 fps, though (maybe 100-200 fps), to ensure that the frames are passed fast enough to SDL.
Implementing #1 is harder, as it actually requires you to select which sorting steps to render and which ones not to render. Therefore, in order to implement #1, you will probably need to keep the if statement mentioned above, so that rendering only occurs conditionally.
However, it does not seem meaningful to make the if statement dependant on elapsed time since the last render, because while wating, the sorting will continue at full speed and it is therefore possible that all of the sorting will be completed with only one frame of rendering. You are currently preventing this from happending by slowing down the sort by using the line
this_thread::sleep_for(waitTime);
in the function selectionSort. But this does not seem like an ideal solution, but rather a stopgap measure.
Instead of making the if condition dependant on time, it would be easier to make it dependant on the number of sorting steps since the last render. That way, you could, for example, program it in such a way that every 5th sorting step gets rendered. In that case, there would be no need anymore to additionally slow down the actual sorting and your code would be simpler.
As already described above, when implementing #1, you will also have to ensure that you do not drop the last rendering step, or that you at least render the last frame after the sorting is finished. Otherwise, the last frame will likely not display the completed sort, but rather an intermediate sorting step.
Implementing #2 is similar to implementing #1, except that you will have to use SDL_Delay (which is equivalent this_thread::sleep_for) or SDL_AddTimer to determine when it is time to render the next sorting step.
Using SDL_AddTimer would require you to handle SDL Events. However, I would recommend that you do this anyway, because that way, you will also be able to handle SDL_QUIT events, so that you can close your program by closing the window. This would also make the line
this_thread::sleep_for( 5000ms );
at the end of your program unnecessary, because you could instead wait for the user to close the window, like this:
for (;;)
{
SDL_Event event;
SDL_WaitEvent( &event );
if ( event.type == SDL_QUIT ) break;
}
However, it would probably be better if you restructured your entire program, so that you only have one message loop, which responds to both SDL Timer and SDL_QUIT events.

Realtime audio application, improving performance

I am currently writing a C++ real time audio application which roughly contains:
reading frames from a buffer
interpolating frames with the hermit interpolation here
filtering ever frame with two biquad filters (and updating their coefficients every frame)
a 3 band crossover containing 18 biquad calculations
a FreeVerb algorithm from the STK libary here
I think this should be handable for my PC but I get some buffer underflows every so often so I would like to improve the performance of my application. I have a bunch of question I hope you can answer me. :)
1) Operator Overloading
Instead of working directly with my flaot samples and doing calculations for every sample,
I pack my floats in a Frame class which contains the left and the right Sample. The class overloads some operators for addition, subtraction and multiplication with float.
The filters (biquad mostly) and the reverb works with floats and doesn't use this class but the hermite interpolator and every multiplication and addition for volume controll and mixing uses the class.
Does this has an impact on the performance and would it be better to work with left and right sample directly?
2) std::function
The callback function from the audio IO libary PortAudio calls a std::function. I use this to encapsulation everything related to PortAudio. So the "user" sets his own callback function with std::bind
std::bind( &AudioController::processAudio,
&(*this),
std::placeholders::_1,
std::placeholders::_2));
Since for every callback, the right function has to be found from the CPU (however this works...), does this have an impact and would it be better to define a class the user has to inherit from?
3) virtual functions
I use a class called AudioProcessor which declares a virtual function:
virtual void tick(Frame *buffer, int frameCout) = 0;
This function always processes a number of frames at once. Depending on the drive, 200 frames up to 1000 frames per call.
Within the signal processing path, I call this function 6 time from multiple derivated classes. I remember that this is done with lookup tables so the CPU knows exactly which function it has to call. So does the process of calling a "virtual" (derivated) function has an impact on the performance?
The nice thing about this is the structure in the source code but only using inlines maybe would have an performance improvement.
These are all questions for now. I have some more about Qt's event loop because I think that my GUI uses quite a bit of CPU time as well. But this is another topic I guess. :)
Thanks in advance!
These are all relevant function calls within the signal processing. Some of them are from the STK libary.
The biquad functions are from STK and should perform fine. This goes for the freeverb algorithm as well.
// ################################ AudioController Function ############################
void AudioController::processAudio(int frameCount, float *output) {
// CALCULATE LEFT TRACK
Frame * leftFrameBuffer = (Frame*) output;
if(leftLoaded) { // the left processor is loaded
leftProcessor->tick(leftFrameBuffer, frameCount); //(TrackProcessor::tick()
} else {
for(int i = 0; i < frameCount; i++) {
leftFrameBuffer[i].leftSample = 0.0f;
leftFrameBuffer[i].rightSample = 0.0f;
}
}
// CALCULATE RIGHT TRACk
if(rightLoaded) { // the right processor is loaded
// the rightFrameBuffer is allocated once and ensured to have enough space for frameCount Frames
rightProcessor->tick(rightFrameBuffer, frameCount); //(TrackProcessor::tick()
} else {
for(int i = 0; i < frameCount; i++) {
rightFrameBuffer[i].leftSample = 0.0f;
rightFrameBuffer[i].rightSample = 0.0f;
}
}
// MIX
for(int i = 0; i < frameCount; i++ ) {
leftFrameBuffer[i] = volume * (leftRightMix * leftFrameBuffer[i] + (1.0 - leftRightMix) * rightFrameBuffer[i]);
}
}
// ################################ AudioController Function ############################
void TrackProcessor::tick(Frame *frames, int frameNum) {
if(bufferLoaded && playback) {
for(int i = 0; i < frameNum; i++) {
// read from buffer
frames[i] = bufferPlayer->tick();
// filter coeffs
caltulateFilterCoeffs(lowCutoffFilter->tick(), highCutoffFilter->tick());
// filter
frames[i].leftSample = lpFilterL->tick(hpFilterL->tick(frames[i].leftSample));
frames[i].rightSample = lpFilterR->tick(hpFilterR->tick(frames[i].rightSample));
}
} else {
for(int i = 0; i < frameNum; i++) {
frames[i] = Frame(0,0);
}
}
// Effect 1, Equalizer
if(effsActive[0]) {
insEffProcessors[0]->tick(frames, frameNum);
}
// Effect 2, Reverb
if(effsActive[1]) {
insEffProcessors[1]->tick(frames, frameNum);
}
// Volume
for(int i = 0; i < frameNum; i++) {
frames[i].leftSample *= volume;
frames[i].rightSample *= volume;
}
}
// ################################ Equalizer ############################
void EqualizerProcessor::tick(Frame *frames, int frameNum) {
if(active) {
Frame lowCross;
Frame highCross;
for(int f = 0; f < frameNum; f++) {
lowAmp = lowAmpFilter->tick();
midAmp = midAmpFilter->tick();
highAmp = highAmpFilter->tick();
lowCross = highLPF->tick(frames[f]);
highCross = highHPF->tick(frames[f]);
frames[f] = lowAmp * lowLPF->tick(lowCross)
+ midAmp * lowHPF->tick(lowCross)
+ highAmp * lowAPF->tick(highCross);
}
}
}
// ################################ Reverb ############################
// This function just calls the stk::FreeVerb tick function for every frame
// The FreeVerb implementation can't realy be optimised so I will take it as it is.
void ReverbProcessor::tick(Frame *frames, int frameNum) {
if(active) {
for(int i = 0; i < frameNum; i++) {
frames[i].leftSample = reverb->tick(frames[i].leftSample, frames[i].rightSample);
frames[i].rightSample = reverb->lastOut(1);
}
}
}
// ################################ Buffer Playback (BufferPlayer) ############################
Frame BufferPlayer::tick() {
// adjust read position based on loop status
if(inLoop) {
while(readPos > loopEndPos) {
readPos = loopStartPos + (readPos - loopEndPos);
}
}
int x1 = readPos;
float t = readPos - x1;
Frame f = interpolate(buffer->frameAt(x1-1),
buffer->frameAt(x1),
buffer->frameAt(x1+1),
buffer->frameAt(x1+2),
t);
readPos += stepSize;;
return f;
}
// interpolation:
Frame BufferPlayer::interpolate(Frame x0, Frame x1, Frame x2, Frame x3, float t) {
Frame c0 = x1;
Frame c1 = 0.5f * (x2 - x0);
Frame c2 = x0 - (2.5f * x1) + (2.0f * x2) - (0.5f * x3);
Frame c3 = (0.5f * (x3 - x0)) + (1.5f * (x1 - x2));
return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}
inline Frame BufferPlayer::frameAt(int pos) {
if(pos < 0) {
pos = 0;
} else if (pos >= frames) {
pos = frames -1;
}
// get chunk and relative Sample
int chunk = pos/ChunkSize;
int chunkSample = pos%ChunkSize;
return Frame(leftChunks[chunk][chunkSample], rightChunks[chunk][chunkSample]);
}
Some suggestions on performance improvement:
Optimize Data Cache Usage
Review your functions that operate on a lot of data (e.g. arrays). The functions should load data into cache, operate on the data, then store back into memory.
The data should be organized to best fit into the data cache. Break up the data into smaller blocks if it doesn't fit. Search the web for "data driven design" and "cache optimizations".
In one project, performing data smoothing, I changed the layout of data and gained 70% performance.
Use Multiple Threads
In the big picture, you may be able to use at least three dedicated threads: input, processing and output. The input thread obtains the data and stores it in buffer(s); search the Web for "double buffering". The second thread gets data from the input buffer, processes it, then writes to an output buffer. The third thread writes data from the output buffer to the file.
You may also benefit from using threads for left and right samples. For example, while one thread is processing the left sample, another thread could be processing the right sample. If you could put the threads on different cores, you may see even more performance benefit.
Use the GPU processing
A lot of modern Graphics Processing Units (GPU) have a lot of cores that can process floating point values. Maybe you could delegate some of the filtering or analysis functions to the cores in the GPU. Be aware that this requires overhead and to gain the benefit, the processing part should be more computative than the overhead.
Reducing the Branching
Processors prefer to manipulate data over branching. Branching stalls the execution as the processor has to figure out where to get and process the next instruction. Some have large instruction caches that can contain small loops; but there is still a penalty for branching to the top of the loop again. See "Loop Unrolling". Also check your compiler optimizations and optimize high for performance. Many compilers will switch to loop unrolling for you, if the circumstances are correct.
Reduce the Amount of Processing
Do you need to process the entire sample or portions of it? For example, in video processing, much of the frame doesn't change only small portions. So the entire frame doesn't need to be processed. Can the audio channels be isolated so only a few channels are processed rather than the entire spectrum?
Coding to Help the Compiler Optimize
You can help the compiler with optimizations by using the const modifier. The compiler may be able to use different algorithms for variables that don't change versus ones that do. For example, a const value can be placed in the executable code, but a non-const value must be placed in memory.
Using static and const can help too. The static usually implies only one instance. The const implies something that doesn't change. So if there is only one instance of the variable that doesn't change, the compiler can place it into the executable or read-only memory and perform a higher optimization of the code.
Loading multiple variables at the same time can help too. The processor can place the data into the cache. The compiler may be able to use specialized assembly instructions for fetching sequential data.

Insert a delay in Processing sketch

I am attempting to insert a delay in Processing sketch. I tried Thread.sleep() but I guess it will not work because, as in Java, it prevents rendering of the drawings.
Basically, I have to draw a triangle with delays in drawing three sides.
How do I do that?
Processing programs can read the value of computer’s clock. The current second is read with the second() function, which returns values from 0 to 59. The current minute is read with the minute() function, which also returns values from 0 to 59. - Processing: A Programming Handbook
Other clock related functions : millis(), day(), month(), year().
Those numbers can be used to trigger events and calculate the passage of time, as in the following Processing sketch quoted from the aforementioned book:
// Uses millis() to start a line in motion three seconds
// after the program starts
int x = 0;
void setup() {
size(100, 100);
}
void draw() {
if (millis() > 3000) {
x++;
line(x, 0, x, 100);
}
}
Here's an example of a triangle whose sides are drawn each one after 3 seconds (the triangle is reset every minute):
int i = second();
void draw () {
background(255);
beginShape();
if (second()-i>=3) {
vertex(50,0);
vertex(99,99);
}
if (second()-i>=6) vertex(0,99);
if (second()-i>=9) vertex(50,0);
endShape();
}
As #user2468700 suggests, use a time keeping function. I like millis().
If you have a value to keep track of the time at certain intervals and the current time (continuously updated) you can check if one timer(manually updated one) falls behind the other(continuous one) based on a delay/wait value. If it does, update your data (number of points to draw in this case) and finally the local stop-watch like value.
Here's a basic commented example.
Rendering is separated from data updates to make it easier to understand.
//render related
PVector[] points = new PVector[]{new PVector(10,10),//a list of points
new PVector(90,10),
new PVector(90,90)};
int pointsToDraw = 0;//the number of points to draw on the screen
//time keeping related
int now;//keeps track of time only when we update, not continuously
int wait = 1000;//a delay value to check against
void setup(){
now = millis();//update the 'stop-watch'
}
void draw(){
//update
if(millis()-now >= wait){//if the difference between the last 'stop-watch' update and the current time in millis is greater than the wait time
if(pointsToDraw < points.length) pointsToDraw++;//if there are points to render, increment that
now = millis();//update the 'stop-watch'
}
//render
background(255);
beginShape();
for(int i = 0 ; i < pointsToDraw; i++) {
vertex(points[i].x,points[i].y);
}
endShape(CLOSE);
}

Help with code optimization

I've written a little particle system for my 2d-application. Here is raining code:
// HPP -----------------------------------
struct Data
{
float x, y, x_speed, y_speed;
int timeout;
Data();
};
std::vector<Data> mData;
bool mFirstTime;
void processDrops(float windPower, int i);
// CPP -----------------------------------
Data::Data()
: x(rand()%ScreenResolutionX), y(0)
, x_speed(0), y_speed(0), timeout(rand()%130)
{ }
void Rain::processDrops(float windPower, int i)
{
int posX = rand() % mWindowWidth;
mData[i].x = posX;
mData[i].x_speed = WindPower*0.1; // WindPower is float
mData[i].y_speed = Gravity*0.1; // Gravity is 9.8 * 19.2
// If that is first time, process drops randomly with window height
if (mFirstTime)
{
mData[i].timeout = 0;
mData[i].y = rand() % mWindowHeight;
}
else
{
mData[i].timeout = rand() % 130;
mData[i].y = 0;
}
}
void update(float windPower, float elapsed)
{
// If this is first time - create array with new Data structure objects
if (mFirstTime)
{
for (int i=0; i < mMaxObjects; ++i)
{
mData.push_back(Data());
processDrops(windPower, i);
}
mFirstTime = false;
}
for (int i=0; i < mMaxObjects; i++)
{
// Sleep until uptime > 0 (To make drops fall with randomly timeout)
if (mData[i].timeout > 0)
{
mData[i].timeout--;
}
else
{
// Find new x/y positions
mData[i].x += mData[i].x_speed * elapsed;
mData[i].y += mData[i].y_speed * elapsed;
// Find new speeds
mData[i].x_speed += windPower * elapsed;
mData[i].y_speed += Gravity * elapsed;
// Drawing here ...
// If drop has been falled out of the screen
if (mData[i].y > mWindowHeight) processDrops(windPower, i);
}
}
}
So the main idea is: I have some structure which consist of drop position, speed. I have a function for processing drops at some index in the vector-array. Now if that's first time of running I'm making array with max size and process it in cycle.
But this code works slower that all another I have. Please, help me to optimize it.
I tried to replace all int with uint16_t but I think it doesn't matter.
Replacing int with uint16_t shouldn't do any difference (it'll take less memory, but shouldn't affect running time on most machines).
The shown code already seems pretty fast (it's doing only what it's needed to do, and there are no particular mistakes), I don't see how you could optimize it further (at most you could remove the check on mFirstTime, but that should make no difference).
If it's slow it's because of something else. Maybe you've got too many drops, or the rest of your code is so slow that update gets called little times per second.
I'd suggest you to profile your program and see where most time is spent.
EDIT:
one thing that could speed up such algorithm, especially if your system hasn't got an FPU (! That's not the case of a personal computer...), would be to replace your floating point values with integers.
Just multiply the elapsed variable (and your constants, like those 0.1) by 1000 so that they will represent milliseconds, and use only integers everywhere.
Few points:
Physics is incorrect: wind power should be changed as speed makes closed to wind speed, also for simplicity I would assume that initial value of x_speed is the speed of the wind.
You don't take care the fraction with the wind at all, so drops getting faster and faster. but that depends on your want to model.
I would simply assume that drop fails in constant speed in constant direction because this is really what happens very fast.
Also you can optimize all this very simply as you don't need to solve motion equation using integration as it can be solved quite simply directly as:
x(t):= x_0 + wind_speed * t
y(t):= y_0 - fall_speed * t
This is the case of stable fall when the gravity force is equal to friction.
x(t):= x_0 + wind_speed * t;
y(t):= y_0 - 0.5 * g * t^2;
If you want to model drops that fall faster and faster.
Few things to consider:
In your processDrops function, you pass in windPower but use some sort of class member or global called WindPower, is that a typo? If the value of Gravity does not change, then save the calculation (i.e. mult by 0.1) and use that directly.
In your update function, rather than calculating windPower * elapsed and Gravity * elapsed for every iteration, calculate and save that before the loop, then add. Also, re-organise the loop, there is no need to do the speed calculation and render if the drop is out of the screen, do the check first, and if the drop is still in the screen, then update the speed and render!
Interestingly, you never check to see if the drop is out of the screen interms of it's x co-ordinate, you check the height, but not the width, you could save yourself some calculations and rendering time if you did this check as well!
In loop introduce reference Data& current = mData[i] and use it instead of mData[i]. And use this reference instead of index also in procesDrops.
BTW I think that consulting mFirstTime in processDrops serves no purpose because it will never be true. Hmm, I missed processDrops in initialization loop. Never mind this.
This looks pretty fast to me already.
You could get some tiny speedup by removing the "firsttime" code and putting it in it's own functions to call once rather that testing every calls.
You are doing the same calculation on lots of similar data so maybe you could look into using SSE intrinsics to process several items at once. You'l likely have to rearrange your data structure for that though to be a structure of vectors rather than a vector od structures like now. I doubt it would help too much though. How many items are in your vector anyway?
It looks like maybe all your time goes into ... Drawing Here.
It's easy enough to find out for sure where the time is going.

Simulated time in a game loop using c++

I am building a 3d game from scratch in C++ using OpenGL and SDL on linux as a hobby and to learn more about this area of programming.
Wondering about the best way to simulate time while the game is running. Obviously I have a loop that looks something like:
void main_loop()
{
while(!quit)
{
handle_events();
DrawScene();
...
SDL_Delay(time_left());
}
}
I am using the SDL_Delay and time_left() to maintain a framerate of about 33 fps.
I had thought that I just need a few global variables like
int current_hour = 0;
int current_min = 0;
int num_days = 0;
Uint32 prev_ticks = 0;
Then a function like :
void handle_time()
{
Uint32 current_ticks;
Uint32 dticks;
current_ticks = SDL_GetTicks();
dticks = current_ticks - prev_ticks; // get difference since last time
// if difference is greater than 30000 (half minute) increment game mins
if(dticks >= 30000) {
prev_ticks = current_ticks;
current_mins++;
if(current_mins >= 60) {
current_mins = 0;
current_hour++;
}
if(current_hour > 23) {
current_hour = 0;
num_days++;
}
}
}
and then call the handle_time() function in the main loop.
It compiles and runs (using printf to write the time to the console at the moment) but I am wondering if this is the best way to do it. Is there easier ways or more efficient ways?
I've mentioned this before in other game related threads. As always, follow the suggestions by Glenn Fiedler in his Game Physics series
What you want to do is to use a constant timestep which you get by accumulating time deltas. If you want 33 updates per second, then your constant timestep should be 1/33. You could also call this the update frequency. You should also decouple the game logic from the rendering as they don't belong together. You want to be able to use a low update frequency while rendering as fast as the machine allows. Here is some sample code:
running = true;
unsigned int t_accum=0,lt=0,ct=0;
while(running){
while(SDL_PollEvent(&event)){
switch(event.type){
...
}
}
ct = SDL_GetTicks();
t_accum += ct - lt;
lt = ct;
while(t_accum >= timestep){
t += timestep; /* this is our actual time, in milliseconds. */
t_accum -= timestep;
for(std::vector<Entity>::iterator en = entities.begin(); en != entities.end(); ++en){
integrate(en, (float)t * 0.001f, timestep);
}
}
/* This should really be in a separate thread, synchronized with a mutex */
std::vector<Entity> tmpEntities(entities.size());
for(int i=0; i<entities.size(); ++i){
float alpha = (float)t_accum / (float)timestep;
tmpEntities[i] = interpolateState(entities[i].lastState, alpha, entities[i].currentState, 1.0f - alpha);
}
Render(tmpEntities);
}
This handles undersampling as well as oversampling. If you use integer arithmetic like done here, your game physics should be close to 100% deterministic, no matter how slow or fast the machine is. This is the advantage of increasing the time in fixed time intervals. The state used for rendering is calculated by interpolating between the previous and current states, where the leftover value inside the time accumulator is used as the interpolation factor. This ensures that the rendering is is smooth, no matter how large the timestep is.
Other than the issues already pointed out (you should use a structure for the times and pass it to handle_time() and your minute will get incremented every half minute) your solution is fine for keeping track of time running in the game.
However, for most game events that need to happen every so often you should probably base them off of the main game loop instead of an actual time so they will happen in the same proportions with a different fps.
One of Glenn's posts you will really want to read is Fix Your Timestep!. After looking up this link I noticed that Mads directed you to the same general place in his answer.
I am not a Linux developer, but you might want to have a look at using Timers instead of polling for the ticks.
http://linux.die.net/man/2/timer_create
EDIT:
SDL Seem to support Timers: SDL_SetTimer