GL_DYNAMIC_DRAW performance spikes - c++

I'm working on my own game engine in C++ with OpenGL, and currently working on the UI. Because I wanted the UI to be scale nicely with different screens, every time there's a window resize it subs in some vertex data to the buffer so the width is consistent.
void UICanvas::windowResize(int width, int height) {
for (std::vector<int>::iterator itr = wIndices.begin(); itr != wIndices.end(); itr++) {
UIWindow& window = uiWindows.at(*itr);
float newX = window.size.x / (100.0f * (*aspectRatio));
float newY = window.size.y / 100.0f;
float newPositions = {
newX, 0.0f,
newX, -newY
};
glBindBuffer(GL_ARRAY_BUFFER, window.positions_vbo);
std::chrono::high_resolution_clock::timepoint start, end;
for (int i = 0; i < 1000; i++) { //For benchmarking purposes
start = std::chrono::high_resolution_clock::now();
glBufferSubData(GL_ARRAY_BUFFER, 16, 16, &newPositions[0]);
end = std::chrono::high_resolution_clock::now();
//The duration of glBufferSubData is all I care about
printEvent("windowResize", start, end); //Outputs to json for profiling
}
}
}
Knowing it would be updated occasionally, I knew I should probably use GL_DYNAMIC_DRAW. However, instead of just listening to people I wanted to benchmark the difference myself. For GL_DYNAMIC_DRAW, most of the resize function calls were 3-5 microseconds, with steady occasional spikes of 180-200 microseconds, and even more sparse spikes up to 4 milliseconds, 1000x the normal value. GL_STATIC_DRAW, while the average was 7-13 ms, only spiked up to 100-150 microseconds, but almost rarely. I know there are thousands of factors that can affect benchmarking, like the CPU caching values, or optimizing certain things, but it would almost make more sense if STATIC_DRAW followed the same giant spike pattern. I think I read somewhere that dynamic buffers are stored in VRAM and static ones in regular RAM, but would that be the reason for this behavior? If I use DYNAMIC_DRAW, will it have occasional spikes, or is it due to the repeated calls?
EDIT: Apologies if this was confusing, but the for loop in the code is my testing method, I am looping the same exact function over and over again, and timing only the buffer sub data, as that's where I'm looking for the difference between STATIC_DRAW and DYNAMIC_DRAW.

Related

Delay function makes openFrameworks window freeze until specified time passes

I have network of vertices and I want to change their color every second. I tried using Sleep() function and currently using different delay function but both give same result - lets say I want to color 10 vertices red with 1 second pause. When Im starting project it seems like the window freezes for 10 seconds and then shows every vertice already with red color.
This is my update function.
void ofApp::update(){
for (int i = 0; i < vertices.size(); i++) {
ofColor red(255, 0, 0);
vertices[i].setColor(red);
delay(1);
}
}
Here is draw function
void ofApp::draw(){
for (int i = 0; i < vertices.size(); i++) {
for (int j = 0; j < G[i].size(); j++) {
ofDrawLine(vertices[i].getX(), vertices[i].getY(), vertices[G[i][j]].getX(), vertices[G[i][j]].getY());
}
vertices[i].drawBFS();
}
}
void vertice::drawBFS() {
ofNoFill();
ofSetColor(_color);
ofDrawCircle(_x, _y, 20);
ofDrawBitmapString(_id, _x - 3, _y + 3);
ofColor black(0, 0, 0);
ofSetColor(black);
}
This is my delay() function
void ofApp::delay(int number_of_seconds) {
// Converting time into milli_seconds
int milli_seconds = 1000 * number_of_seconds;
// Stroing start time
clock_t start_time = clock();
// looping till required time is not acheived
while (clock() < start_time + milli_seconds)
;
}
There is no bug in your code, just a misconception about waiting. So this is no definitive answer, just a hint in the right direction.
You have probably just one thread. One thread can do exactly one thing at a time. When you call delay all this one thread is doing is checking the time over and over again until some time has passed.
During this time the thread can do nothing else (it can not magically skip instructions or detect your intentions). So it can not issue drawing commands or swap some buffers to display vectors on screen. That's the reason why you application seems to freeze - 99.9% of the time it's checking if the interval has passed. This also places a heavy load on the cpu.
The solution can be a bit tricky and requires threading. Usually you have a UI-Thread that regularly draws stuff, refreshes the display, maybe takes inputs and so on. This thread should never do heavy calculations to keep the UI responsive. A second thread will then manage heavier calculations or updating data.
If you want to run a task in an interval you don't just loop until the time is over, but essentially "tell the OS" that the second thread should be inactive for a certain period. The OS will manage this way more efficient than implementing active waiting.
But that's quite a large topic, so I suggest you read on about Multithreading. C++ has a small thread library since C++11. May be worth a look.
// .h
float nextEventSeconds = 0;
// .cpp
void ofApp::draw(){
float now = ofGetElapsedTimef();
if(now > nextEventSeconds) {
// do something here that should only happen
// every 3 seconds
nextEventSeconds = now + 3;
}
}
Following this example I managed to solve my problem.

C++ plotting Mandlebrot set, bad performance

I'm not sure if there is an actual performance increase to achieve, or if my computer is just old and slow, but I'll ask anyway.
So I've tried making a program to plot the Mandelbrot set using the cairo library.
The loop that draws the pixels looks as follows:
vector<point_t*>::iterator it;
for(unsigned int i = 0; i < iterations; i++){
it = points->begin();
//cout << points->size() << endl;
double r,g,b;
r = (double)i+1 / (double)iterations;
g = 0;
b = 0;
while(it != points->end()){
point_t *p = *it;
p->Z = (p->Z * p->Z) + p->C;
if(abs(p->Z) > 2.0){
cairo_set_source_rgba(cr, r, g, b, 1);
cairo_rectangle (cr, p->x, p->y, 1, 1);
cairo_fill (cr);
it = points->erase(it);
} else {
it++;
}
}
}
The idea is to color all points that just escaped the set, and then remove them from list to avoid evaluating them again.
It does render the set correctly, but it seems that the rendering takes a lot longer than needed.
Can someone spot any performance issues with the loop? or is it as good as it gets?
Thanks in advance :)
SOLUTION
Very nice answers, thanks :) - I ended up with a kind of hybrid of the answers. Thinking of what was suggested, i realized that calculating each point, putting them in a vector and then extract them was a huge waste CPU time and memory. So instead, the program now just calculate the Z value of each point witout even using the point_t or vector. It now runs A LOT faster!
Edit: I think the suggestion in the answer of kuroi neko is also a very good idea if you do not care about "incremental" computation, but have a fixed number of iterations.
You should use vector<point_t> instead of vector<point_t*>.
A vector<point_t*> is a list of pointers to point_t. Each point is stored at some random location in the memory. If you iterate over the points, the pattern in which memory is accessed looks completely random. You will get a lot of cache misses.
On the opposite vector<point_t> uses continuous memory to store the points. Thus the next point is stored directly after the current point. This allows efficient caching.
You should not call erase(it); in your inner loop.
Each call to erase has to move all elements after the one you remove. This has O(n) runtime. For example, you could add a flag to point_t to indicate that it should not be processed any longer. It may be even faster to remove all the "inactive" points after each iteration.
It is probably not a good idea to draw individual pixels using cairo_rectangle. I would suggest you create an image and store the color for each pixel. Then draw the whole image with one draw call.
Your code could look like this:
for(unsigned int i = 0; i < iterations; i++){
double r,g,b;
r = (double)i+1 / (double)iterations;
g = 0;
b = 0;
for(vector<point_t>::iterator it=points->begin(); it!=points->end(); ++it) {
point_t& p = *it;
if(!p.active) {
continue;
}
p.Z = (p.Z * p.Z) + p.C;
if(abs(p.Z) > 2.0) {
cairo_set_source_rgba(cr, r, g, b, 1);
cairo_rectangle (cr, p.x, p.y, 1, 1);
cairo_fill (cr);
p.active = false;
}
}
// perhaps remove all points where p.active = false
}
If you can not change point_t, you can use an additional vector<char> to store if a point has become "inactive".
The Zn divergence computation is what makes the algorithm slow (depending on the area you're working on, of course). In comparison, pixel drawing is mere background noise.
Your loop is flawed because it makes the Zn computation slow.
The way to go is to compute divergence for each point in a tight, optimized loop, and then take care of the display.
Besides, it's useless and wasteful to store Z permanently.
You just need C as an input and the number of iterations as an output.
Assuming your points array only holds C values (basically you don't need all this vector crap, but it won't hurt performances either), you could do something like that :
for(vector<point_t>::iterator it=points->begin(); it!=points->end(); ++it)
{
point_t Z = 0;
point_t C = *it;
for(unsigned int i = 0; i < iterations; i++) // <-- this is the CPU burner
{
Z = Z * Z + C;
if(abs(Z) > 2.0) break;
}
cairo_set_source_rgba(cr, (double)i+1 / (double)iterations, g, b, 1);
cairo_rectangle (cr, p->x, p->y, 1, 1);
cairo_fill (cr);
}
Try to run this with and without the cairo thing and you should see no noticeable difference in execution time (unless you're looking at an empty spot of the set).
Now if you want to go faster, try to break down the Z = Z * Z + C computation in real and imaginary parts and optimize it. You could even use mmx or whatever to do parallel computations.
And of course the way to go to gain another significant speed factor is to parallelize your algorithm over the available CPU cores (i.e. split your display area is subsets and have different worker threads compute these parts in parallel).
This is not as obvious at it might seem, though, since each sub-picture will have a different computation time (black areas are very slow to compute while white areas are computed almost instantly).
One way to do it is to split the area is a large number of rectangles, and have all worker threads pick a random rectangle from a common pool until all rectangles have been processed.
This simple load balancing scheme that makes sure no CPU core will be left idle while its buddies are busy on other parts of the display.
The first step to optimizing performance is to find out what is slow. Your code mixes three tasks- iterating to calculate whether a point escapes, manipulating a vector of points to test, and plotting the point.
Separate these three operations and measure their contribution. You can optimise the escape calculation by parallelising it using simd operations. You can optimise the vector operations by not erasing from the vector if you want to remove it but adding it to another vector if you want to keep it ( since erase is O(N) and addition O(1) ) and improve locality by having a vector of points rather than pointers to points, and if the plotting is slow then use an off-screen bitmap and set points by manipulating the backing memory rather than using cairo functions.
(I was going to post this but #Werner Henze already made the same point in a comment, hence community wiki)

is it possible to speed-up matlab plotting by calling c / c++ code in matlab?

It is generally very easy to call mex files (written in c/c++) in Matlab to speed up certain calculations. In my experience however, the true bottleneck in Matlab is data plotting. Creating handles is extremely expensive and even if you only update handle data (e.g., XData, YData, ZData), this might take ages. Even worse, since Matlab is a single threaded program, it is impossible to update multiple plots at the same time.
Therefore my question: Is it possible to write a Matlab GUI and call C++ (or some other parallelizable code) which would take care of the plotting / visualization? I'm looking for a cross-platform solution that will work on Windows, Mac and Linux, but any solution that get's me started on either OS is greatly appreciated!
I found a C++ library that seems to use Matlab's plot() syntax but I'm not sure whether this would speed things up, since I'm afraid that if I plot into Matlab's figure() window, things might get slowed down again.
I would appreciate any comments and feedback from people who have dealt with this kind of situation before!
EDIT: obviously, I've already profiled my code and the bottleneck is the plotting (dozen of panels with lots of data).
EDIT2: for you to get the bounty, I need a real life, minimal working example on how to do this - suggestive answers won't help me.
EDIT3: regarding the data to plot: in a most simplistic case, think about 20 line plots, that need to be updated each second with something like 1000000 data points.
EDIT4: I know that this is a huge amount of points to plot but I never said that the problem was easy. I can not just leave out certain data points, because there's no way of assessing what points are important, before actually plotting them (data is sampled a sub-ms time resolution). As a matter of fact, my data is acquired using a commercial data acquisition system which comes with a data viewer (written in c++). This program has no problem visualizing up to 60 line plots with even more than 1000000 data points.
EDIT5: I don't like where the current discussion is going. I'm aware that sub-sampling my data might speeds up things - however, this is not the question. The question here is how to get a c / c++ / python / java interface to work with matlab in order hopefully speed up plotting by talking directly to the hardware (or using any other trick / way)
Did you try the trivial solution of changing the render method to OpenGL ?
opengl hardware;
set(gcf,'Renderer','OpenGL');
Warning!
There will be some things that disappear in this mode, and it will look a bit different, but generally plots will runs much faster, especially if you have a hardware accelerator.
By the way, are you sure that you will actually gain a performance increase?
For example, in my experience, WPF graphics in C# are considerably slower than Matlabs, especially scatter plot and circles.
Edit: I thought about the fact that the number of points that is actually drawn to the screen can't be that much. Basically it means that you need to interpolate at the places where there is a pixel in the screen. Check out this object:
classdef InterpolatedPlot < handle
properties(Access=private)
hPlot;
end
methods(Access=public)
function this = InterpolatedPlot(x,y,varargin)
this.hPlot = plot(0,0,varargin{:});
this.setXY(x,y);
end
end
methods
function setXY(this,x,y)
parent = get(this.hPlot,'Parent');
set(parent,'Units','Pixels')
sz = get(parent,'Position');
width = sz(3); %Actual width in pixels
subSampleX = linspace(min(x(:)),max(x(:)),width);
subSampleY = interp1(x,y,subSampleX);
set(this.hPlot,'XData',subSampleX,'YData',subSampleY);
end
end
end
And here is an example how to use it:
function TestALotOfPoints()
x = rand(10000,1);
y = rand(10000,1);
ip = InterpolatedPlot(x,y,'color','r','LineWidth',2);
end
Another possible improvement:
Also, if your x data is sorted, you can use interp1q instead of interp, which will be much faster.
classdef InterpolatedPlot < handle
properties(Access=private)
hPlot;
end
% properties(Access=public)
% XData;
% YData;
% end
methods(Access=public)
function this = InterpolatedPlot(x,y,varargin)
this.hPlot = plot(0,0,varargin{:});
this.setXY(x,y);
% this.XData = x;
% this.YData = y;
end
end
methods
function setXY(this,x,y)
parent = get(this.hPlot,'Parent');
set(parent,'Units','Pixels')
sz = get(parent,'Position');
width = sz(3); %Actual width in pixels
subSampleX = linspace(min(x(:)),max(x(:)),width);
subSampleY = interp1q(x,y,transpose(subSampleX));
set(this.hPlot,'XData',subSampleX,'YData',subSampleY);
end
end
end
And the use case:
function TestALotOfPoints()
x = rand(10000,1);
y = rand(10000,1);
x = sort(x);
ip = InterpolatedPlot(x,y,'color','r','LineWidth',2);
end
Since you want maximum performance you should consider writing a minimal OpenGL viewer. Dump all the points to a file and launch the viewer using the "system"-command in MATLAB. The viewer can be really simple. Here is one implemented using GLUT, compiled for Mac OS X. The code is cross platform so you should be able to compile it for all the platforms you mention. It should be easy to tweak this viewer for your needs.
If you are able to integrate this viewer more closely with MATLAB you might be able to get away with not having to write to and read from a file (= much faster updates). However, I'm not experienced in the matter. Perhaps you can put this code in a mex-file?
EDIT: I've updated the code to draw a line strip from a CPU memory pointer.
// On Mac OS X, compile using: g++ -O3 -framework GLUT -framework OpenGL glview.cpp
// The file "input" is assumed to contain a line for each point:
// 0.1 1.0
// 5.2 3.0
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>
#include <GLUT/glut.h>
using namespace std;
struct float2 { float2() {} float2(float x, float y) : x(x), y(y) {} float x, y; };
static vector<float2> points;
static float2 minPoint, maxPoint;
typedef vector<float2>::iterator point_iter;
static void render() {
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(minPoint.x, maxPoint.x, minPoint.y, maxPoint.y, -1.0f, 1.0f);
glColor3f(0.0f, 0.0f, 0.0f);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(2, GL_FLOAT, sizeof(points[0]), &points[0].x);
glDrawArrays(GL_LINE_STRIP, 0, points.size());
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
}
int main(int argc, char* argv[]) {
ifstream file("input");
string line;
while (getline(file, line)) {
istringstream ss(line);
float2 p;
ss >> p.x;
ss >> p.y;
if (ss)
points.push_back(p);
}
if (!points.size())
return 1;
minPoint = maxPoint = points[0];
for (point_iter i = points.begin(); i != points.end(); ++i) {
float2 p = *i;
minPoint = float2(minPoint.x < p.x ? minPoint.x : p.x, minPoint.y < p.y ? minPoint.y : p.y);
maxPoint = float2(maxPoint.x > p.x ? maxPoint.x : p.x, maxPoint.y > p.y ? maxPoint.y : p.y);
}
float dx = maxPoint.x - minPoint.x;
float dy = maxPoint.y - minPoint.y;
maxPoint.x += dx*0.1f; minPoint.x -= dx*0.1f;
maxPoint.y += dy*0.1f; minPoint.y -= dy*0.1f;
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowSize(512, 512);
glutCreateWindow("glview");
glutDisplayFunc(render);
glutMainLoop();
return 0;
}
EDIT: Here is new code based on the discussion below. It renders a sin function consisting of 20 vbos, each containing 100k points. 10k new points are added each rendered frame. This makes a total of 2M points. The performance is real-time on my laptop.
// On Mac OS X, compile using: g++ -O3 -framework GLUT -framework OpenGL glview.cpp
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>
#include <cmath>
#include <iostream>
#include <GLUT/glut.h>
using namespace std;
struct float2 { float2() {} float2(float x, float y) : x(x), y(y) {} float x, y; };
struct Vbo {
GLuint i;
Vbo(int size) { glGenBuffersARB(1, &i); glBindBufferARB(GL_ARRAY_BUFFER, i); glBufferDataARB(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW); } // could try GL_STATIC_DRAW
void set(const void* data, size_t size, size_t offset) { glBindBufferARB(GL_ARRAY_BUFFER, i); glBufferSubData(GL_ARRAY_BUFFER, offset, size, data); }
~Vbo() { glDeleteBuffers(1, &i); }
};
static const int vboCount = 20;
static const int vboSize = 100000;
static const int pointCount = vboCount*vboSize;
static float endTime = 0.0f;
static const float deltaTime = 1e-3f;
static std::vector<Vbo*> vbos;
static int vboStart = 0;
static void addPoints(float2* points, int pointCount) {
while (pointCount) {
if (vboStart == vboSize || vbos.empty()) {
if (vbos.size() >= vboCount+2) { // remove and reuse vbo
Vbo* first = *vbos.begin();
vbos.erase(vbos.begin());
vbos.push_back(first);
}
else { // create new vbo
vbos.push_back(new Vbo(sizeof(float2)*vboSize));
}
vboStart = 0;
}
int pointsAdded = pointCount;
if (pointsAdded + vboStart > vboSize)
pointsAdded = vboSize - vboStart;
Vbo* vbo = *vbos.rbegin();
vbo->set(points, pointsAdded*sizeof(float2), vboStart*sizeof(float2));
pointCount -= pointsAdded;
points += pointsAdded;
vboStart += pointsAdded;
}
}
static void render() {
// generate and add 10000 points
const int count = 10000;
float2 points[count];
for (int i = 0; i < count; ++i) {
float2 p(endTime, std::sin(endTime*1e-2f));
endTime += deltaTime;
points[i] = p;
}
addPoints(points, count);
// render
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(endTime-deltaTime*pointCount, endTime, -1.0f, 1.0f, -1.0f, 1.0f);
glColor3f(0.0f, 0.0f, 0.0f);
glEnableClientState(GL_VERTEX_ARRAY);
for (size_t i = 0; i < vbos.size(); ++i) {
glBindBufferARB(GL_ARRAY_BUFFER, vbos[i]->i);
glVertexPointer(2, GL_FLOAT, sizeof(float2), 0);
if (i == vbos.size()-1)
glDrawArrays(GL_LINE_STRIP, 0, vboStart);
else
glDrawArrays(GL_LINE_STRIP, 0, vboSize);
}
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
int main(int argc, char* argv[]) {
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowSize(512, 512);
glutCreateWindow("glview");
glutDisplayFunc(render);
glutMainLoop();
return 0;
}
As a number of people have mentioned in their answers, you do not need to plot that many points. I think it is important to rpeat Andrey's comment:
that is a HUGE amount of points! There isn't enough pixels on the screen to plot that amount.
Rewriting plotting routines in different languages is a waste of your time. A huge number of hours have gone into writing MATLAB, whay makes you think you can write a significantly faster plotting routine (in a reasonable amount of time)? Whilst your routine may be less general, and therefore would remove some of the checks that the MATLAB code will perform, your "bottleneck" is that you are trying to plot so much data.
I strongly recommend one of two courses of action:
Sample your data: You do not need 20 x 1000000 points on a figure - the human eye won't be able to distinguish between all the points, so it is a waste of time. Try binning your data for example.
If you maintain that you need all those points on the screen, I would suggest using a different tool. VisIt or ParaView are two examples that come to mind. They are parallel visualisation programs designed to handle extremenly large datasets (I have seen VisIt handle datasets that contained PetaBytes of data).
There is no way you can fit 1000000 data points on a small plot. How about you choose one in every 10000 points and plot those?
You can consider calling imresize on the large vector to shrink it, but manually building a vector by omitting 99% of the points may be faster.
#memyself The sampling operations are already occurring. Matlab is choosing what data to include in the graph. Why do you trust matlab? It looks to me that the graph you showed significantly misrepresents the data. The dense regions should indicate that the signal is at a constant value, but in your graph it could mean that the signal is at that value half the time - or was at that value at least once during the interval corresponding to that pixel?
Would it be possible to use an alternate architectue? For example, use MATLAB to generate the data and use a fast library or application (GNUplot?) to handle the plotting?
It might even be possible to have MATLAB write the data to a stream as the plotter consumes the data. Then the plot would be updated as MATLAB generates the data.
This approach would avoid MATLAB's ridiculously slow plotting and divide the work up between two separate processes. The OS/CPU would probably assign the process to different cores as a matter of course.
I think it's possible, but likely to require writing the plotting code (at least the parts you use) from scratch, since anything you could reuse is exactly what's slowing you down.
To test feasibility, I'd start with testing that any Win32 GUI works from MEX (call MessageBox), then proceed to creating your own window, test that window messages arrive to your WndProc. Once all that's going, you can bind an OpenGL context to it (or just use GDI), and start plotting.
However, the savings is likely to come from simpler plotting code and use of newer OpenGL features such as VBOs, rather than threading. Everything is already parallel on the GPU, and more threads don't help transfer of commands/data to the GPU any faster.
I did a very similar thing many many years ago (2004?). I needed an oscilloscope-like display for kilohertz sampled biological signals displayed in real time. Not quite as many points as the original question has, but still too many for MATLAB to handle on its own. IIRC I ended up writing a Java component to display the graph.
As other people have suggested, I also ended up down-sampling the data. For each pixel on the x-axis, I calculated the minimum and maximum values taken by the data, then drew a short vertical line between those values. The entire graph consisted of a sequence of short vertical lines, each immediately adjacent to the next.
Actually, I think that the implementation ended up writing the graph to a bitmap that scrolled continuously using bitblt, with only new points being drawn ... or maybe the bitmap was static and the viewport scrolled along it ... anyway it was a long time ago and I might not be remembering it right.
Blockquote
EDIT4: I know that this is a huge amount of points to plot but I never said that the problem was easy. I can not just leave out certain data points, because there's no way of assessing what points are important, before actually plotting them
Blockquote
This is incorrect. There is a way to to know which points to leave out. Matlab is already doing it. Something is going to have to do it at some point no matter how you solve this. I think you need to redirect your problem to be "how do I determine which points I should plot?".
Based on the screenshot, the data looks like a waveform. You might want to look at the code of audacity. It is an open source audio editing program. It displays plots to represent the waveform in real time, and they look identical in style to the one in your lowest screen shot. You could borrow some sampling techniques from them.
What you are looking for is the creation of a MEX file.
Rather than me explaining it, you would probably benefit more from reading this: Creating C/C++ and Fortran Programs to be Callable from MATLAB (MEX-Files) (a documentation article from MathWorks).
Hope this helps.

Help with code optimization

I've written a little particle system for my 2d-application. Here is raining code:
// HPP -----------------------------------
struct Data
{
float x, y, x_speed, y_speed;
int timeout;
Data();
};
std::vector<Data> mData;
bool mFirstTime;
void processDrops(float windPower, int i);
// CPP -----------------------------------
Data::Data()
: x(rand()%ScreenResolutionX), y(0)
, x_speed(0), y_speed(0), timeout(rand()%130)
{ }
void Rain::processDrops(float windPower, int i)
{
int posX = rand() % mWindowWidth;
mData[i].x = posX;
mData[i].x_speed = WindPower*0.1; // WindPower is float
mData[i].y_speed = Gravity*0.1; // Gravity is 9.8 * 19.2
// If that is first time, process drops randomly with window height
if (mFirstTime)
{
mData[i].timeout = 0;
mData[i].y = rand() % mWindowHeight;
}
else
{
mData[i].timeout = rand() % 130;
mData[i].y = 0;
}
}
void update(float windPower, float elapsed)
{
// If this is first time - create array with new Data structure objects
if (mFirstTime)
{
for (int i=0; i < mMaxObjects; ++i)
{
mData.push_back(Data());
processDrops(windPower, i);
}
mFirstTime = false;
}
for (int i=0; i < mMaxObjects; i++)
{
// Sleep until uptime > 0 (To make drops fall with randomly timeout)
if (mData[i].timeout > 0)
{
mData[i].timeout--;
}
else
{
// Find new x/y positions
mData[i].x += mData[i].x_speed * elapsed;
mData[i].y += mData[i].y_speed * elapsed;
// Find new speeds
mData[i].x_speed += windPower * elapsed;
mData[i].y_speed += Gravity * elapsed;
// Drawing here ...
// If drop has been falled out of the screen
if (mData[i].y > mWindowHeight) processDrops(windPower, i);
}
}
}
So the main idea is: I have some structure which consist of drop position, speed. I have a function for processing drops at some index in the vector-array. Now if that's first time of running I'm making array with max size and process it in cycle.
But this code works slower that all another I have. Please, help me to optimize it.
I tried to replace all int with uint16_t but I think it doesn't matter.
Replacing int with uint16_t shouldn't do any difference (it'll take less memory, but shouldn't affect running time on most machines).
The shown code already seems pretty fast (it's doing only what it's needed to do, and there are no particular mistakes), I don't see how you could optimize it further (at most you could remove the check on mFirstTime, but that should make no difference).
If it's slow it's because of something else. Maybe you've got too many drops, or the rest of your code is so slow that update gets called little times per second.
I'd suggest you to profile your program and see where most time is spent.
EDIT:
one thing that could speed up such algorithm, especially if your system hasn't got an FPU (! That's not the case of a personal computer...), would be to replace your floating point values with integers.
Just multiply the elapsed variable (and your constants, like those 0.1) by 1000 so that they will represent milliseconds, and use only integers everywhere.
Few points:
Physics is incorrect: wind power should be changed as speed makes closed to wind speed, also for simplicity I would assume that initial value of x_speed is the speed of the wind.
You don't take care the fraction with the wind at all, so drops getting faster and faster. but that depends on your want to model.
I would simply assume that drop fails in constant speed in constant direction because this is really what happens very fast.
Also you can optimize all this very simply as you don't need to solve motion equation using integration as it can be solved quite simply directly as:
x(t):= x_0 + wind_speed * t
y(t):= y_0 - fall_speed * t
This is the case of stable fall when the gravity force is equal to friction.
x(t):= x_0 + wind_speed * t;
y(t):= y_0 - 0.5 * g * t^2;
If you want to model drops that fall faster and faster.
Few things to consider:
In your processDrops function, you pass in windPower but use some sort of class member or global called WindPower, is that a typo? If the value of Gravity does not change, then save the calculation (i.e. mult by 0.1) and use that directly.
In your update function, rather than calculating windPower * elapsed and Gravity * elapsed for every iteration, calculate and save that before the loop, then add. Also, re-organise the loop, there is no need to do the speed calculation and render if the drop is out of the screen, do the check first, and if the drop is still in the screen, then update the speed and render!
Interestingly, you never check to see if the drop is out of the screen interms of it's x co-ordinate, you check the height, but not the width, you could save yourself some calculations and rendering time if you did this check as well!
In loop introduce reference Data& current = mData[i] and use it instead of mData[i]. And use this reference instead of index also in procesDrops.
BTW I think that consulting mFirstTime in processDrops serves no purpose because it will never be true. Hmm, I missed processDrops in initialization loop. Never mind this.
This looks pretty fast to me already.
You could get some tiny speedup by removing the "firsttime" code and putting it in it's own functions to call once rather that testing every calls.
You are doing the same calculation on lots of similar data so maybe you could look into using SSE intrinsics to process several items at once. You'l likely have to rearrange your data structure for that though to be a structure of vectors rather than a vector od structures like now. I doubt it would help too much though. How many items are in your vector anyway?
It looks like maybe all your time goes into ... Drawing Here.
It's easy enough to find out for sure where the time is going.

Simulated time in a game loop using c++

I am building a 3d game from scratch in C++ using OpenGL and SDL on linux as a hobby and to learn more about this area of programming.
Wondering about the best way to simulate time while the game is running. Obviously I have a loop that looks something like:
void main_loop()
{
while(!quit)
{
handle_events();
DrawScene();
...
SDL_Delay(time_left());
}
}
I am using the SDL_Delay and time_left() to maintain a framerate of about 33 fps.
I had thought that I just need a few global variables like
int current_hour = 0;
int current_min = 0;
int num_days = 0;
Uint32 prev_ticks = 0;
Then a function like :
void handle_time()
{
Uint32 current_ticks;
Uint32 dticks;
current_ticks = SDL_GetTicks();
dticks = current_ticks - prev_ticks; // get difference since last time
// if difference is greater than 30000 (half minute) increment game mins
if(dticks >= 30000) {
prev_ticks = current_ticks;
current_mins++;
if(current_mins >= 60) {
current_mins = 0;
current_hour++;
}
if(current_hour > 23) {
current_hour = 0;
num_days++;
}
}
}
and then call the handle_time() function in the main loop.
It compiles and runs (using printf to write the time to the console at the moment) but I am wondering if this is the best way to do it. Is there easier ways or more efficient ways?
I've mentioned this before in other game related threads. As always, follow the suggestions by Glenn Fiedler in his Game Physics series
What you want to do is to use a constant timestep which you get by accumulating time deltas. If you want 33 updates per second, then your constant timestep should be 1/33. You could also call this the update frequency. You should also decouple the game logic from the rendering as they don't belong together. You want to be able to use a low update frequency while rendering as fast as the machine allows. Here is some sample code:
running = true;
unsigned int t_accum=0,lt=0,ct=0;
while(running){
while(SDL_PollEvent(&event)){
switch(event.type){
...
}
}
ct = SDL_GetTicks();
t_accum += ct - lt;
lt = ct;
while(t_accum >= timestep){
t += timestep; /* this is our actual time, in milliseconds. */
t_accum -= timestep;
for(std::vector<Entity>::iterator en = entities.begin(); en != entities.end(); ++en){
integrate(en, (float)t * 0.001f, timestep);
}
}
/* This should really be in a separate thread, synchronized with a mutex */
std::vector<Entity> tmpEntities(entities.size());
for(int i=0; i<entities.size(); ++i){
float alpha = (float)t_accum / (float)timestep;
tmpEntities[i] = interpolateState(entities[i].lastState, alpha, entities[i].currentState, 1.0f - alpha);
}
Render(tmpEntities);
}
This handles undersampling as well as oversampling. If you use integer arithmetic like done here, your game physics should be close to 100% deterministic, no matter how slow or fast the machine is. This is the advantage of increasing the time in fixed time intervals. The state used for rendering is calculated by interpolating between the previous and current states, where the leftover value inside the time accumulator is used as the interpolation factor. This ensures that the rendering is is smooth, no matter how large the timestep is.
Other than the issues already pointed out (you should use a structure for the times and pass it to handle_time() and your minute will get incremented every half minute) your solution is fine for keeping track of time running in the game.
However, for most game events that need to happen every so often you should probably base them off of the main game loop instead of an actual time so they will happen in the same proportions with a different fps.
One of Glenn's posts you will really want to read is Fix Your Timestep!. After looking up this link I noticed that Mads directed you to the same general place in his answer.
I am not a Linux developer, but you might want to have a look at using Timers instead of polling for the ticks.
http://linux.die.net/man/2/timer_create
EDIT:
SDL Seem to support Timers: SDL_SetTimer