Benchmarking GLSL shaders to compare speed of alternative implementations - opengl

I want to plot two-dimensional function z = f(x,y) using OpenGL and GLSL shaders. I'd like to map the value of function to color using a colormap, but some colormaps are expressed using HSL or HSV colorspace (for example hue maps).
You can find (here and in other places) different alternative implementtions of hsv2rgb() in GLSL.
How can I benchmark those shaders (those functions) to find out which one is fastest?

Implement all alternatives you want to try and apply the usual benchmark suggestions:
Repeat the individual benchmark enough times to have a time is seconds (less is going to be subject to too much noise)
Repeat the benchmarks in the environments you want to run it in.
Try to have a setup as close to reality as possible (same background processes, etc).
Repeat the benchmark runs several times and disregard outliers.
Randomize the order of algorithms/tests between runs.
Make sure you disable caching for the section that you are testing (if it's applicable).
Since you include OpenGL solutions you should consider if you want to count data transfers as well. Make sure you flush the pipelines (opengl deffers some calls, but flush will wait until they are actually finished).
If the run-times are too close you can either say they are about the same or increase data size/repetitions to make the difference more prominent.

For having implemented color maps, I'd just recommend to use a texture.
Something like a 256x1 texture (1D texture are not supported in ES, if that matters to you) and then from the float result of f(x,y), just use that as the texture coordinate.
If you have a lot of points to plot, that's going to be faster than evaluating it in glsl each time and GPU are good at texturing :)

You need to be able to measure time of GL rendering first. I do it in C++ like this:
//------------------------------------------------------------------------------
class OpenGLtime
{
public:
unsigned int id[2];
OpenGLtime();
OpenGLtime(OpenGLtime& a);
~OpenGLtime();
OpenGLtime* operator = (const OpenGLtime *a);
//OpenGLtime* operator = (const OpenGLtime &a);
void _init();
void tbeg(); // mark start time for measure
void tend(); // mark end time for measure
double time(); // wait for measure and return time [s]
};
//------------------------------------------------------------------------------
OpenGLtime::OpenGLtime()
{
id[0]=0;
id[1]=0;
}
//------------------------------------------------------------------------------
OpenGLtime::OpenGLtime(OpenGLtime& a)
{
id[0]=0;
id[1]=0;
}
//------------------------------------------------------------------------------
OpenGLtime::~OpenGLtime()
{
}
//------------------------------------------------------------------------------
OpenGLtime* OpenGLtime::operator = (const OpenGLtime *a)
{
*this=*a;
return this;
}
//------------------------------------------------------------------------------
void OpenGLtime::_init()
{
// generate two queries
glGenQueries(2,id);
}
//------------------------------------------------------------------------------
void OpenGLtime::tbeg()
{
if (!id[0]) _init();
// issue the query
glQueryCounter(id[0],GL_TIMESTAMP);
}
//------------------------------------------------------------------------------
void OpenGLtime::tend()
{
if (!id[0]) _init();
// issue the query
glQueryCounter(id[1],GL_TIMESTAMP);
}
//------------------------------------------------------------------------------
double OpenGLtime::time()
{
if (!id[0]) { _init(); return 0.0; }
double dt;
GLuint64 t0,t1;
int _stop;
// wait until the results are available
for (_stop=0;!_stop;Sleep(1)) glGetQueryObjectiv(id[0],GL_QUERY_RESULT_AVAILABLE,&_stop);
for (_stop=0;!_stop;Sleep(1)) glGetQueryObjectiv(id[1],GL_QUERY_RESULT_AVAILABLE,&_stop);
// get query results
glGetQueryObjectui64v(id[0],GL_QUERY_RESULT,&t0);
glGetQueryObjectui64v(id[1],GL_QUERY_RESULT,&t1);
dt=double(t1)-double(t0); dt*=0.000000001;
return dt;
}
//------------------------------------------------------------------------------
Now you just use it like this:
// few variables
OpenGLtime tim;
double draw_time;
// measurement
tim.tbeg();
// here render your stuff using binded shader
tim.tend();
draw_time=tim.time(); // time the render took in [s] just output it somewhere so you can see it
Now you should create your rendering stuff. And can compare the runtimes directly.
As yo can see you will measure time of the whole rendering pass/call and not of a part of GLSL code. So you have to take that into account. I do not know any way to measure part of GLSL code directly. Instead you can measure time without the part in the question and with it ... and substract the times but the compiler optimizations could mess it up.

Related

Surprising performance degradation with std::vector and std::string

I am processing really large text file in following way:
class Loader{
template<class READER>
bool loadFile(READER &reader){
/* for each line of the input file */ {
processLine_(line);
}
}
bool processLine_(std::string_view line){
std::vector<std::string> set; // <-- here
std::string buffer; // <-- here
// I can not do set.reserve(),
// because I have no idea how much items I will put.
// do something...
}
void printResult(){
// print aggregated result
}
}
The processing of 143,000,000 records take around 68 minutes.
So I decided to do some very tricky optimizations with several std::array buffers. Result was about 62 minutes.
However the code become very unreadable so I decided not to use them in production.
Then I decided to do partial optimization, e.g.
class Loader{
template<class READER>
bool loadFile(READER &reader);
std::vector<std::string> set; // <-- here
std::string buffer; // <-- here
bool processLine_(std::string_view line){
set.clear();
// do something...
}
void printResult();
}
I was hoping this will reduce malloc / free (new[] / delete[]) operation from buffer and from the set vector. I realize the strings inside the set vector still dynamically allocate memory.
However result went to 83 minutes.
Note I do not change anything except I move set and buffer on "class" level. I use them only inside processLine_ method.
Why is that?
Locality of reference?
Only explanation I think about is some strings to be small enough and to fit in SSO, but this sounds unlikely.
Using clang with -O3
I did profile and I found that most of the time is spent in a third party C library.
I supposed this library to be very fast, but this was not the case.
I am still puzzling with the slowdown, but even if I optimize it, it wont make such a big difference.

Two variants of a function with a simple if statement in the middle

This question is kind of a design one. Basically I often ten to end up with a function which performs high computation, but it has an if statement somewhere in the middle of it, which has a big impact on the performance of the whole program.
Consider this example:
void f(bool visualization)
{
while(...)
{
// Many lines of computation
if (visualization)
{
// do the visualization of the algorithm
}
// More lines of computation
}
}
The problem in this example is, if the bool visualization is set to false, I guess the program will check it it's true each iteration of the loop.
The one solution is to just make two separate functions, with and without the visualization:
void f()
{
while(...)
{
// Many lines of computation
// More lines of computation
}
}
void f_with_visualization()
{
while(...)
{
// Many lines of computation
// do the visualization of the algorithm
// More lines of computation
}
}
So now I don't have if checks. But it creates another problem: a mess in my code and it's a violation of DRY.
My question here is: Is there a way to do this better, without copying the code? Or maybe the C++ compiler optimizer would check which version of a function I want to execute (with bool = true or bool = false) and then create a dummy functions without this if check itself (like the ones I created myself)?
You can template the function on the bool parameter and use if constexpr. Like this:
template<bool visualization>
void f_impl()
{
while(...)
{
// Many lines of computation
if constexpr (visualization)
{
// do the visualization of the algorithm
}
// More lines of computation
}
}
void f(bool visualization)
{
if (visualization)
f_impl<true>();
else
f_impl<false>();
}

Get a mock Cairo::Context to test conditions on the path

this is a follow up on this post, in which I asked about checking some condition on the border of a shape drawn using Cairomm in a Gtk::DrawingArea derived widget. In my case, I have a void drawBorder(const Cairo::RefPtr<Cairo::Context>& p_context) method which is virtual and is overriden to specify the shape's border. For example, if I wanted a circle, I could provide the following implementation:
void drawBorder(const Cairo::RefPtr<Cairo::Context>& p_context)
{
const Gtk::Allocation allocation{get_allocation()};
const int width{allocation.get_width()};
const int height{allocation.get_height()};
const int smallestDimension{std::min(width, height)};
const int xCenter{width / 2};
const int yCenter{height / 2};
p_context->arc(xCenter,
yCenter,
smallestDimension / 2.5,
0.0,
2.0 * M_PI);
}
I would like to use this method to check my condition on the border curve, as suggested in the answer:
So, you would somehow get a cairo context (cairo_t in C), create your shape there (with line_to, curve_to, arc etc). Then you do not call fill or stroke, but instead cairo_copy_path_flat.
So far, I am unable to get a usable Cairo::Context mock to perform the check. I don't need to draw anything to perform my check, I only need to get the underlying path and work on it.
So far, I have tried:
passing nullptr as the Cairo::Surface (which of course failed);
get an equivalent surface to my widget.
But it failed. This: gdk_window_create_similar_surface looked promising, but I have not found an equivalent for widgets.
How could one go about getting a minimal mock context to perform such checks? This would also help me very much in my unit testing, later on.
So far I got this code:
bool isTheBorderASimpleAndClosedCurve()
{
const Gtk::Allocation allocation{get_allocation()};
Glib::RefPtr<Gdk::Window> widgetWindow{get_window()};
Cairo::RefPtr<Cairo::Surface> widgetSurface{widgetWindow->create_similar_surface(Cairo::Content::CONTENT_COLOR_ALPHA,
allocation.get_width(), allocation.get_height()) };
Cairo::Context nakedContext{cairo_create(widgetSurface->cobj())};
const Cairo::RefPtr<Cairo::Context> context{&nakedContext};
drawBorder(context);
// Would like to get the path and test my condition here...!
}
It compiles and links, but at runtime I get a segfault with this message and a bunch of garbage:
double free or corruption (out): 0x00007ffc0401c740
Just create a cairo image surface with size 0x0 and create a context for that.
Cairo::RefPtr<Cairo::Surface> surface = Cairo::ImageSurface::create(
Cairo::Format::FORMAT_ARGB32, 0, 0);
Cairo::RefPtr<Cairo::Context> context = Cairo::Context::create(surface);
Since the surface is not used for anything, it does not matter which size it has.
(Side note: According to the API docs that Google gave me, the constructor of Context wants a cairo_t* as argument, not a Cairo::Context*; this might explain the crash that you are seeing)

QCustomPlot in real time in an ECG style

I want to make a real time graph to plot the data from my Arduino and I want to use the following function from QCustomPlot to plot the graph in an ECG style (starting again after few seconds and replacing the previous data):
void QCPGraph::addData(const QVector<double> &keys, const QVector<double> &values)`
with keys=time and values=data from serial port.
I already have the serial data and a graph that is continuous but I don't know how to modify this with the function above and make time vector.
Can you give me an example of how to call that function?
If I get it right, you have a graph that it's xAxis range is constant. Lets say it is defined as MAX_RANGE seconds, and you want that once it passes MAX_RANGE seconds it will clear the graph and start over again from 0 sec.
If all this is right then I guess you already have a function that you are calling once every T seconds in order to update the plot. If not then take a look at this example.
Lets assume that you already have a function that you are calling every T seconds:
void MyPlot::updatePlot(int yValue)
Then simply add a timeCounter as a class variable that will be updated every call. Then add an if statement that will check if it passed MAX_RANGE. If it did then clear the graph using clearData(), add the new value and reset timeCounter. If it didn't then just add the new value. Simple example (just make the changes to fit for your case):
void MyPlot::updatePlot(int yValue){
this->timeCounter += T;
if (this->timeCounter >= MAX_RANGE) {
ui->customPlot->graph(0)->clearData();
ui->customPlot->graph(0)->addData(0, yValue);
this->timeCounter = 0;
}
else {
ui->customPlot->graph(0)->addData(this->timeCounter, yValue);
}
}

Dynamic Constantbuffer or Dynamic Vertex Buffer in c++ and DX11

I have a question realted to meemory usage by using
Dynamic ConstantBuffer vs Constant Buffer updated frequentlky(Using Defualt usagetype) vs Dynamic Vertex Buffer
I always Defined Constnat buffer usage as defualt and updated the changes in realtime like so
Eg 1
D3D11_BUFFER_DESC desc;
desc.Usage = D3D11_USAGE_DEFAULT;
// irrelevant code omitted
void Render()
{
WORLD = XMMatrixTranslation(x,y,z); // x,y,z are chaned dynamiclly
ConstantBuffer cb;
cb.world = WORLD;
devcon->UpdateSubResource(constantBuffer,0,0,&cb,0,0);
// Set the VSSetBuffer and PSSetBuffer
}
But Recently I came across a tutorial from rastertek that used
devcon->Map() and devcon->Unmap() to update them and he had defined the usage as Dynamic
Eg 2
void CreateBuffer(){
D3D11_BUFFER_DESC desc;
desc.Usage = D3D11_USAGE_DYNAMIC; // irrelkavant code ommited
}
void Render()
{
WORLD = XMMatrixTranslation(x,y,z); // x,y,z are chaned dynamiclly
D3D11_MAPPED_SUBRESOURCE mappedRes;
ConstantBuffer *cbPtr;
devcon->Map(constantBuffer,0,D3D11_MAP_WRITE_DISCARD,0,&mappedRes);
cbPtr = (ConstantBuffer*)mappedRes.pData;
cbPtr->World = WORLD;
devcon->UnMap(constantBuffer,0);
}
So the question is .
Is there any performance gain or hit by using Dynamic Constant buffer(eg2) over the the default ConstatnBuffer updatee at runtim(eg1) ??
Please do help me clear this doubt..
Thanks
The answer here like most performance advice is "It depends". Both are valid and it really depends on your content and rendering pattern.
The classic performance reference here is Windows to Reality: Getting the Most out of Direct3D 10 Graphics in Your Games from Gamefest 2007.
If you are dealing with lots of constants, Map of a DYNAMIC Constant Buffer is better if your data is scattered about and is collected as part of the update cycle. If all your constants are already laid out correctly in system memory, then UpdateSubResource is probably better. If you are reusing the same CB many times a frame and Map/Locking it, then you might run into 'rename' limits with Map/Lock that are less problematic with UpdateSuResource so "it depends' is really the answer here.
And of course, all of this goes out the window with DirectX 12 which has an entirely different mechanism for handling the equivalent of dynamic updates.