VTK Toolkit - vtkCutter Performance

VTK Toolkit - vtkCutter Performance - c++

I use the VTK Toolkit to load an OBJ file and a vtkCutter to cut through the data set with a play and then draw the outline of the cut. For large objects this is can become quite slow as another user pointed out in the VTK Users Forum.
Is there a way to make the cutter use a hierarchical data structure to gain better performance?
This is the code:
#include <vtkSmartPointer.h>
#include <vtkCubeSource.h>
#include <vtkPolyDataMapper.h>
#include <vtkPlane.h>
#include <vtkCutter.h>
#include <vtkProperty.h>
#include <vtkActor.h>
#include <vtkRenderer.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkOBJReader.h>
int main(int argc, char *argv[])
{
// Parse command line arguments
if (argc != 2) {
std::cout << "Usage: " << argv[0] << " Filename(.obj)" << std::endl;
return EXIT_FAILURE;
}
std::string filename = argv[1];
vtkSmartPointer<vtkOBJReader> obj = vtkSmartPointer<vtkOBJReader>::New();
obj->SetFileName(filename.c_str());
obj->Update();
vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
mapper->SetInputConnection(obj->GetOutputPort());
// Create a plane to cut,here it cuts in the XZ direction (xz normal=(1,0,0);XY =(0,0,1),YZ =(0,1,0)
vtkSmartPointer<vtkPlane> plane = vtkSmartPointer<vtkPlane>::New();
plane->SetOrigin(0, 0, 0);
plane->SetNormal(1, 0, 0);
// Create cutter
vtkSmartPointer<vtkCutter> cutter = vtkSmartPointer<vtkCutter>::New();
cutter->SetCutFunction(plane);
cutter->SetInputConnection(obj->GetOutputPort());
cutter->Update();
vtkSmartPointer<vtkPolyDataMapper> cutterMapper = vtkSmartPointer<vtkPolyDataMapper>::New();
cutterMapper->SetInputConnection(cutter->GetOutputPort());
// Create plane actor
vtkSmartPointer<vtkActor> planeActor = vtkSmartPointer<vtkActor>::New();
planeActor->GetProperty()->SetColor(1.0, 1, 0);
planeActor->GetProperty()->SetLineWidth(2);
planeActor->SetMapper(cutterMapper);
// Create cube actor
vtkSmartPointer<vtkActor> cubeActor = vtkSmartPointer<vtkActor>::New();
cubeActor->GetProperty()->SetColor(0.5, 1, 0.5);
cubeActor->GetProperty()->SetOpacity(0.5);
cubeActor->SetMapper(mapper);
// Create renderers and add actors of plane and cube
vtkSmartPointer<vtkRenderer> renderer = vtkSmartPointer<vtkRenderer>::New();
renderer->AddActor(planeActor); //display the rectangle resulting from the cut
renderer->AddActor(cubeActor); //display the cube
// Add renderer to renderwindow and render
vtkSmartPointer<vtkRenderWindow> renderWindow = vtkSmartPointer<vtkRenderWindow>::New();
renderWindow->AddRenderer(renderer);
renderWindow->SetSize(600, 600);
vtkSmartPointer<vtkRenderWindowInteractor> interactor = vtkSmartPointer<
vtkRenderWindowInteractor>::New();
interactor->SetRenderWindow(renderWindow);
renderer->SetBackground(0, 0, 0);
renderWindow->Render();
interactor->Start();
return EXIT_SUCCESS;
}

vtkCutter slices meshes using an arbitrarily complex func(x,y,z) and is used here with a simple plane to describe that function, which is a common and well covered special case, as the cut countour lies on a simple plane and will hence be a simple (flat) polygon.
These generic implementations usually cost alot of CPU time, because all special cases of poly cutting are expected to occur in case of vtkCutter.
There's also a slowdown coming from calling virtual functions in the vast class hierarchy of VTK. Without special hacks, it solely depends on the compiler to optimize the virtual function pointer lookup out of a loop, while VTK calls virtual functions (the filter function, for example) many times in one or more nested loops.
See this for related info: about the cost of virtual function
VTK uses doubles almost everywhere, even if one could live with floats. Conversion and high precision also add quiet a bit of computation and memory overhead.
VTK (5.8) does not explicitly involve SIMD operations like SSE, afaik.
...
Search for topics like these:
Algorithm or software for slicing a mesh
Generate 2D cross-section polygon from 3D mesh.
Despite doing this on a CPU, one could also use an OpenGL geometry shader in a transform feedback pass to extract the cut contour determined by a cut plane. Doing this in OpenCL is also possible, however, if no GPU based compute device is available, it might get slower than a C or C++ implementation.
To render the meshes, one could use any OpenGL 3+ capable Renderer:
Ogre3D
Unity3D
Irrlicht
OSG
a simple, self made OpenGL 3 renderer.
...
more: What is the best way to have realtime 3D rendering in an engineering application?

Related

How is SFML so fast?

I need to draw some graphics in c++, pixel by pixel on a window. In order to do this I create a SFML window, sprite and texture. I draw my desired graphics to a uint8_t array and then update the texture and sprite with it. This process takes about 2500 us. Drawing two triangles which fill the entire window takes only 10 us. How is this massive difference possible? I've tried multithreading the pixel-by-pixel drawing, but the difference of two orders of magnitude remains. I've also tried drawing the pixels using a point-map, with no improvement. I understand that SFML uses some GPU-acceleration in the background, but simply looping and assigning the values to the pixel array already takes hundreds of microseconds.
Does anyone know of a more effective way to assign the values of pixels in a window?
Here is an example of the code I'm using to compare the speed of triangle and pixel-by-pixel drawing:
#include <SFML/Graphics.hpp>
#include <chrono>
using namespace std::chrono;
#include <iostream>
#include<cmath>
uint8_t* pixels;
int main(int, char const**)
{
const unsigned int width=1200;
const unsigned int height=1200;
sf::RenderWindow window(sf::VideoMode(width, height), "MA: Rasterization Test");
pixels = new uint8_t[width*height*4];
sf::Texture pixels_texture;
pixels_texture.create(width, height);
sf::Sprite pixels_sprite(pixels_texture);
sf::Clock clock;
sf::VertexArray triangle(sf::Triangles, 3);
triangle[0].position = sf::Vector2f(0, height);
triangle[1].position = sf::Vector2f(width, height);
triangle[2].position = sf::Vector2f(width/2, height-std::sqrt(std::pow(width,2)-std::pow(width/2,2)));
triangle[0].color = sf::Color::Red;
triangle[1].color = sf::Color::Blue;
triangle[2].color = sf::Color::Green;
while (window.isOpen()){
sf::Event event;
while (window.pollEvent(event)) {
if (event.type == sf::Event::Closed) {
window.close();
}
if (event.type == sf::Event::KeyPressed && event.key.code == sf::Keyboard::Escape) {
window.close();
}
}
window.clear(sf::Color(255,255,255,255));
// Pixel-by-pixel
int us = duration_cast< microseconds >(system_clock::now().time_since_epoch()).count();
for(int i=0;i!=width*height*4;++i){
pixels[i]=255;
}
pixels_texture.update(pixels);
window.draw(pixels_sprite);
int duration=duration_cast< microseconds >(system_clock::now().time_since_epoch()).count()-us;
std::cout<<"Background: "<<duration<<" us\n";
// Triangle
us = duration_cast< microseconds >(system_clock::now().time_since_epoch()).count();
window.draw(triangle);
duration=duration_cast< microseconds >(system_clock::now().time_since_epoch()).count()-us;
std::cout<<"Triangle: "<<duration<<" us\n";
window.display();
}
return EXIT_SUCCESS;
}

Graphics drawing in modern devices using Graphic cards, and the speed of drawing depends on how many triangles in the data you sent to the Graphic memory. That's why just drawing two triangles is fast.
As you mentioned about multithreading, if you using OpenGL (I don't remember what SFML use, but should be the same), what you thinking you are drawing is basically send commands and data to graphic cards, so multithreading here is not very useful, the graphic card has it's own thread to do this.
If you are curious about how graphic card works, this tutorial is the
book you should read.
P.S. As you edit you question, I guess the duration 2500us vs 10us is because you for loop create a texture(even if the texture is a pure white background)(and the for loop, you probably need to start counting after the for loop), and send texture to graphic card need time, while draw triangle only send several points. Still, I suggest to read the tutorial, create a texture pixel by pixel potentially prove the miss understanding of how GPU works.

VertexArray of circles

I am wondering if it is possible to create a VertexArray of circles in SFML. I have looked for answers but I didn't find anything that could help. Moreover, I don't understand the part on the SFML documentation where it is written that I can create my own entities, I think this is maybe what I want to do in fact.
EDIT : I want to do that because I have to draw a lot of circles.
Thanks for helping me

While #nvoigt answer is correct, I found it useful in my implementations to work with vectors (see http://en.cppreference.com/w/cpp/container/vector for more details, look up "c++ containers", there are several types of containers to optimize read/write times).
You probably do not need it for the above described use case, but you could need it in future implementations and consider this for a good coding practice.
#include <SFML/Graphics.hpp>
#include <vector>
int main()
{
// create the window
sf::RenderWindow window(sf::VideoMode(800, 600), "My window");
// run the program as long as the window is open
while (window.isOpen())
{
// check all the window's events that were triggered since the last iteration of the loop
sf::Event event;
while (window.pollEvent(event))
{
// "close requested" event: we close the window
if (event.type == sf::Event::Closed)
window.close();
}
// clear the window with black color
window.clear(sf::Color::Black);
// initialize myvector
std::vector<sf::CircleShape> myvector;
// add 10 circles
for (int i = 0; i < 10; i++)
{
sf::CircleShape shape(50);
// draw a circle every 100 pixels
shape.setPosition(i * 100, 25);
shape.setFillColor(sf::Color(100, 250, 50));
// copy shape to vector
myvector.push_back(shape);
}
// iterate through vector
for (std::vector<sf::CircleShape>::iterator it = myvector.begin() ; it != myvector.end(); ++it)
{
// draw all circles
window.draw(*it);
}
window.display();
}
return 0;
}

sf::CircleShape is already using a vertex array (thanks to being inherited from sf::Shape). There is nothing extra you need to do.
If you have a lot of circles, try using sf::CircleShape first and only optimize when you have a real use-case that you can measure your solution against.

In addition two previous answers I will try to explain why there is no default VertexArray of circles.
By ideology of computer graphics (and SFML in our case) vertex is a smallest drawing primitive with least necessary functionality. Classical example of vertices are point, line, triange, guad, and polygone. The first four are really simple for your videocard to store and to draw. Polygon can be any geometrical figure, but it will be heavier to process, that's why e.g in 3D grapichs polygons are triangles.
Circle is a bit more complicated. For example videocard doesn't know how much points she need to draw your circle smooth enough. So, as #nvoigt answered there exists a sf::CircleShape that is being built from more primitive verticies.

VTK pipeline update

I use VTK-6.2, C++ (gcc-4.7.2) on Linux and I have the following VTK pipeline setup (please ignore implementation, details and focus on the pipeline: cone->filter->mapper->actor):
// cone/initialize
vtkConeSource cone;
// add cone(s) to filter
vtkAppendFilter filter;
filter.AddInputData(cone.GetOutput());
// add filter to mapper
vtkDataSetMapper mapper;
mapper.SetInputData(filter->GetOutput());
// actor
vtkActor actor;
actor.SetMapper(mapper);
The scene renders fine.
The Problem
I want to update the original data (i.e. the cones) and the actor to be rendered correctly.
How do I access the original cone data if I have just the actors? Does this guarantee that the actors will be updated too? Because when I decided to keep track of the original data (via pointers: the whole implementation is with vtkSmartPointers) and then change some of their attributes, the pipeline did not update. Shouldn't it update automatically?
(When I change the actor (e.g. their visibility), the scene renders fine)
Forgive me, I am not a VTK expert and the pipelines are confusing. Maybe one approach would be to simplify my pipeline.
Thanks
[update]
According to this answer to a similar post, the original data (vtkConeSource) are transformed (to vtkUnstructuredGrid when added in the vtkAppendFilter) so even if I keep track of the original data, changing them is useless.

VTK pipelines are demand-driven pipelines. They do not update automatically even if one of the elements of the pipeline is modified. We need to explicitly call the Update() function on the last vtkAlgorithm( or its derived class object) of the pipeline to update the entire pipeline. The correct way to set up a pipeline is when we are connecting two objects which are derived from vtkAlgorithm type is to use
currAlgoObj->SetInputConnection( prevAlgoObj->GetOutputPort() )
instead of
currAlgoObj->SetInputData( prevAlgo->GetOutput() )
Then we can update the pipeline using the pointer to the actor object by doing actor->GetMapper()->Update() like shown in the example below.
In this example, we will create a cone from a cone source, pass it through vtkAppendFilter and then change the height of the original cone source and render it in another window to see the updated cone. (You will have to close the first render window to see the updated cone in second window.)
#include <vtkConeSource.h>
#include <vtkDataSetMapper.h>
#include <vtkActor.h>
#include <vtkRenderer.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkSmartPointer.h>
#include <vtkAppendFilter.h>
int main(int, char *[])
{
// Set up the data pipeline
auto cone = vtkSmartPointer<vtkConeSource>::New();
cone->SetHeight( 1.0 );
auto appf = vtkSmartPointer<vtkAppendFilter>::New();
appf->SetInputConnection( cone->GetOutputPort() );
auto coneMapper = vtkSmartPointer<vtkDataSetMapper>::New();
coneMapper->SetInputConnection( appf->GetOutputPort() );
auto coneActor = vtkSmartPointer<vtkActor>::New();
coneActor->SetMapper( coneMapper );
// We need to update the pipeline otherwise nothing will be rendered
coneActor->GetMapper()->Update();
// Connect to the rendering portion of the pipeline
auto renderer = vtkSmartPointer<vtkRenderer>::New();
renderer->AddActor( coneActor );
renderer->SetBackground( 0.1, 0.2, 0.4 );
auto renderWindow = vtkSmartPointer<vtkRenderWindow>::New();
renderWindow->SetSize( 200, 200 );
renderWindow->AddRenderer(renderer);
auto renderWindowInteractor =
vtkSmartPointer<vtkRenderWindowInteractor>::New();
renderWindowInteractor->SetRenderWindow(renderWindow);
renderWindowInteractor->Start();
// Change cone property
cone->SetHeight( 10.0 );
//Update the pipeline using the actor object
coneActor->GetMapper()->Update();
auto renderer2 = vtkSmartPointer<vtkRenderer>::New();
renderer2->AddActor( coneActor );
renderer2->SetBackground( 0.1, 0.2, 0.4 );
auto renderWindow2 = vtkSmartPointer<vtkRenderWindow>::New();
renderWindow2->SetSize( 200, 200 );
renderWindow2->AddRenderer(renderer2);
auto renderWindowInteractor2 =
vtkSmartPointer<vtkRenderWindowInteractor>::New();
renderWindowInteractor2->SetRenderWindow(renderWindow2);
renderWindowInteractor2->Start();
return EXIT_SUCCESS;
}

Add GLSL shader to a VTKActor (VTK 6.1)

I'm trying to add a shader to a vtkActor into my application. I have different vtkActors and they must have different shaders each one.
I tried with the vtkShader2, vtkShaderProgram2 and vtkOpenGLProperty to set the program loaded with the shader to the actor, but it didn't work (vtk told me in a warning window that it has 4 shaders in the actor, the default shaders and mine).
Someone knows the right way to do it?

The solution finally appeared in vtk mailing list after months of waiting! I didn't test it myself, but user #carlinhos says it works. He resumes the steps:
Create a shader file with the function propFuncFS(Fragment shader) or propFuncVS(Vertex shader).
Load the shader from disk.
Create a vtkShader2 and set the source code.
Create a vtkShaderProgram2 and initialize it (DO NOT BUILD THE PROGRAM).
Add the shader to the program.
Obtain the actor vtkOpenGLProperty and set the program
Set the shading on.
EDIT: Is #carlinhos you? I am feeding you your own answer? :)

After running into this myself, I'd like to add a little more to mpcarlos87 / carlinhos / Nil's answer...
The code below is the smallest that I could make an informative working sample. Key points are:
vtkSmartPointer use means less need to clean-up ptrClass->Delete() style
smart pointers also do automatic casting: vtkRenderWindow* to vtkOpenGLRenderWindow* for SetContect(), which is nice
inline frag shader definition is good for fast testing, but bad for every other reason (use with care!)
inline frag shader is very sensitive to new lines (\n) for things like #version
#include "vtkConeSource.h"
#include "vtkPolyDataMapper.h"
#include "vtkRenderWindow.h"
#include "vtkCamera.h"
#include "vtkActor.h"
#include "vtkRenderer.h"
#include "vtkShader2.h"
#include "vtkShaderProgram2.h"
#include "vtkShader2Collection.h"
#include "vtkSmartPointer.h"
#include "vtkOpenGLRenderWindow.h"
#include "vtkOpenGLProperty.h"
int main()
{
vtkSmartPointer<vtkConeSource> cone = vtkConeSource::New();
vtkSmartPointer<vtkPolyDataMapper> coneMapper = vtkPolyDataMapper::New();
coneMapper->SetInputConnection( cone->GetOutputPort() );
vtkSmartPointer<vtkActor> coneActor = vtkActor::New();
coneActor->SetMapper( coneMapper );
vtkSmartPointer<vtkRenderer> ren= vtkRenderer::New();
ren->AddActor( coneActor );
vtkSmartPointer<vtkRenderWindow> renWin = vtkRenderWindow::New();
renWin->AddRenderer( ren );
const char* frag = "void propFuncFS(void){ gl_FragColor = vec4(255,0,0,1);}";
vtkSmartPointer<vtkShaderProgram2> pgm = vtkShaderProgram2::New();
pgm->SetContext(renWin);
vtkSmartPointer<vtkShader2> shader=vtkShader2::New();
shader->SetType(VTK_SHADER_TYPE_FRAGMENT);
shader->SetSourceCode(frag);
shader->SetContext(pgm->GetContext());
pgm->GetShaders()->AddItem(shader);
vtkSmartPointer<vtkOpenGLProperty> openGLproperty =
static_cast<vtkOpenGLProperty*>(coneActor->GetProperty());
openGLproperty->SetPropProgram(pgm);
openGLproperty->ShadingOn();
int i;
for (i = 0; i < 360; ++i)
{
renWin->Render();
ren->GetActiveCamera()->Azimuth( 1 );
}
return 0;
}
Took a bit of trial and error to get the above working - hope that it helps!

is it possible to speed-up matlab plotting by calling c / c++ code in matlab?

It is generally very easy to call mex files (written in c/c++) in Matlab to speed up certain calculations. In my experience however, the true bottleneck in Matlab is data plotting. Creating handles is extremely expensive and even if you only update handle data (e.g., XData, YData, ZData), this might take ages. Even worse, since Matlab is a single threaded program, it is impossible to update multiple plots at the same time.
Therefore my question: Is it possible to write a Matlab GUI and call C++ (or some other parallelizable code) which would take care of the plotting / visualization? I'm looking for a cross-platform solution that will work on Windows, Mac and Linux, but any solution that get's me started on either OS is greatly appreciated!
I found a C++ library that seems to use Matlab's plot() syntax but I'm not sure whether this would speed things up, since I'm afraid that if I plot into Matlab's figure() window, things might get slowed down again.
I would appreciate any comments and feedback from people who have dealt with this kind of situation before!
EDIT: obviously, I've already profiled my code and the bottleneck is the plotting (dozen of panels with lots of data).
EDIT2: for you to get the bounty, I need a real life, minimal working example on how to do this - suggestive answers won't help me.
EDIT3: regarding the data to plot: in a most simplistic case, think about 20 line plots, that need to be updated each second with something like 1000000 data points.
EDIT4: I know that this is a huge amount of points to plot but I never said that the problem was easy. I can not just leave out certain data points, because there's no way of assessing what points are important, before actually plotting them (data is sampled a sub-ms time resolution). As a matter of fact, my data is acquired using a commercial data acquisition system which comes with a data viewer (written in c++). This program has no problem visualizing up to 60 line plots with even more than 1000000 data points.
EDIT5: I don't like where the current discussion is going. I'm aware that sub-sampling my data might speeds up things - however, this is not the question. The question here is how to get a c / c++ / python / java interface to work with matlab in order hopefully speed up plotting by talking directly to the hardware (or using any other trick / way)

Did you try the trivial solution of changing the render method to OpenGL ?
opengl hardware;
set(gcf,'Renderer','OpenGL');
Warning!
There will be some things that disappear in this mode, and it will look a bit different, but generally plots will runs much faster, especially if you have a hardware accelerator.
By the way, are you sure that you will actually gain a performance increase?
For example, in my experience, WPF graphics in C# are considerably slower than Matlabs, especially scatter plot and circles.
Edit: I thought about the fact that the number of points that is actually drawn to the screen can't be that much. Basically it means that you need to interpolate at the places where there is a pixel in the screen. Check out this object:
classdef InterpolatedPlot < handle
properties(Access=private)
hPlot;
end
methods(Access=public)
function this = InterpolatedPlot(x,y,varargin)
this.hPlot = plot(0,0,varargin{:});
this.setXY(x,y);
end
end
methods
function setXY(this,x,y)
parent = get(this.hPlot,'Parent');
set(parent,'Units','Pixels')
sz = get(parent,'Position');
width = sz(3); %Actual width in pixels
subSampleX = linspace(min(x(:)),max(x(:)),width);
subSampleY = interp1(x,y,subSampleX);
set(this.hPlot,'XData',subSampleX,'YData',subSampleY);
end
end
end
And here is an example how to use it:
function TestALotOfPoints()
x = rand(10000,1);
y = rand(10000,1);
ip = InterpolatedPlot(x,y,'color','r','LineWidth',2);
end
Another possible improvement:
Also, if your x data is sorted, you can use interp1q instead of interp, which will be much faster.
classdef InterpolatedPlot < handle
properties(Access=private)
hPlot;
end
% properties(Access=public)
% XData;
% YData;
% end
methods(Access=public)
function this = InterpolatedPlot(x,y,varargin)
this.hPlot = plot(0,0,varargin{:});
this.setXY(x,y);
% this.XData = x;
% this.YData = y;
end
end
methods
function setXY(this,x,y)
parent = get(this.hPlot,'Parent');
set(parent,'Units','Pixels')
sz = get(parent,'Position');
width = sz(3); %Actual width in pixels
subSampleX = linspace(min(x(:)),max(x(:)),width);
subSampleY = interp1q(x,y,transpose(subSampleX));
set(this.hPlot,'XData',subSampleX,'YData',subSampleY);
end
end
end
And the use case:
function TestALotOfPoints()
x = rand(10000,1);
y = rand(10000,1);
x = sort(x);
ip = InterpolatedPlot(x,y,'color','r','LineWidth',2);
end

Since you want maximum performance you should consider writing a minimal OpenGL viewer. Dump all the points to a file and launch the viewer using the "system"-command in MATLAB. The viewer can be really simple. Here is one implemented using GLUT, compiled for Mac OS X. The code is cross platform so you should be able to compile it for all the platforms you mention. It should be easy to tweak this viewer for your needs.
If you are able to integrate this viewer more closely with MATLAB you might be able to get away with not having to write to and read from a file (= much faster updates). However, I'm not experienced in the matter. Perhaps you can put this code in a mex-file?
EDIT: I've updated the code to draw a line strip from a CPU memory pointer.
// On Mac OS X, compile using: g++ -O3 -framework GLUT -framework OpenGL glview.cpp
// The file "input" is assumed to contain a line for each point:
// 0.1 1.0
// 5.2 3.0
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>
#include <GLUT/glut.h>
using namespace std;
struct float2 { float2() {} float2(float x, float y) : x(x), y(y) {} float x, y; };
static vector<float2> points;
static float2 minPoint, maxPoint;
typedef vector<float2>::iterator point_iter;
static void render() {
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(minPoint.x, maxPoint.x, minPoint.y, maxPoint.y, -1.0f, 1.0f);
glColor3f(0.0f, 0.0f, 0.0f);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(2, GL_FLOAT, sizeof(points[0]), &points[0].x);
glDrawArrays(GL_LINE_STRIP, 0, points.size());
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
}
int main(int argc, char* argv[]) {
ifstream file("input");
string line;
while (getline(file, line)) {
istringstream ss(line);
float2 p;
ss >> p.x;
ss >> p.y;
if (ss)
points.push_back(p);
}
if (!points.size())
return 1;
minPoint = maxPoint = points[0];
for (point_iter i = points.begin(); i != points.end(); ++i) {
float2 p = *i;
minPoint = float2(minPoint.x < p.x ? minPoint.x : p.x, minPoint.y < p.y ? minPoint.y : p.y);
maxPoint = float2(maxPoint.x > p.x ? maxPoint.x : p.x, maxPoint.y > p.y ? maxPoint.y : p.y);
}
float dx = maxPoint.x - minPoint.x;
float dy = maxPoint.y - minPoint.y;
maxPoint.x += dx*0.1f; minPoint.x -= dx*0.1f;
maxPoint.y += dy*0.1f; minPoint.y -= dy*0.1f;
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowSize(512, 512);
glutCreateWindow("glview");
glutDisplayFunc(render);
glutMainLoop();
return 0;
}
EDIT: Here is new code based on the discussion below. It renders a sin function consisting of 20 vbos, each containing 100k points. 10k new points are added each rendered frame. This makes a total of 2M points. The performance is real-time on my laptop.
// On Mac OS X, compile using: g++ -O3 -framework GLUT -framework OpenGL glview.cpp
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>
#include <cmath>
#include <iostream>
#include <GLUT/glut.h>
using namespace std;
struct float2 { float2() {} float2(float x, float y) : x(x), y(y) {} float x, y; };
struct Vbo {
GLuint i;
Vbo(int size) { glGenBuffersARB(1, &i); glBindBufferARB(GL_ARRAY_BUFFER, i); glBufferDataARB(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW); } // could try GL_STATIC_DRAW
void set(const void* data, size_t size, size_t offset) { glBindBufferARB(GL_ARRAY_BUFFER, i); glBufferSubData(GL_ARRAY_BUFFER, offset, size, data); }
~Vbo() { glDeleteBuffers(1, &i); }
};
static const int vboCount = 20;
static const int vboSize = 100000;
static const int pointCount = vboCount*vboSize;
static float endTime = 0.0f;
static const float deltaTime = 1e-3f;
static std::vector<Vbo*> vbos;
static int vboStart = 0;
static void addPoints(float2* points, int pointCount) {
while (pointCount) {
if (vboStart == vboSize || vbos.empty()) {
if (vbos.size() >= vboCount+2) { // remove and reuse vbo
Vbo* first = *vbos.begin();
vbos.erase(vbos.begin());
vbos.push_back(first);
}
else { // create new vbo
vbos.push_back(new Vbo(sizeof(float2)*vboSize));
}
vboStart = 0;
}
int pointsAdded = pointCount;
if (pointsAdded + vboStart > vboSize)
pointsAdded = vboSize - vboStart;
Vbo* vbo = *vbos.rbegin();
vbo->set(points, pointsAdded*sizeof(float2), vboStart*sizeof(float2));
pointCount -= pointsAdded;
points += pointsAdded;
vboStart += pointsAdded;
}
}
static void render() {
// generate and add 10000 points
const int count = 10000;
float2 points[count];
for (int i = 0; i < count; ++i) {
float2 p(endTime, std::sin(endTime*1e-2f));
endTime += deltaTime;
points[i] = p;
}
addPoints(points, count);
// render
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(endTime-deltaTime*pointCount, endTime, -1.0f, 1.0f, -1.0f, 1.0f);
glColor3f(0.0f, 0.0f, 0.0f);
glEnableClientState(GL_VERTEX_ARRAY);
for (size_t i = 0; i < vbos.size(); ++i) {
glBindBufferARB(GL_ARRAY_BUFFER, vbos[i]->i);
glVertexPointer(2, GL_FLOAT, sizeof(float2), 0);
if (i == vbos.size()-1)
glDrawArrays(GL_LINE_STRIP, 0, vboStart);
else
glDrawArrays(GL_LINE_STRIP, 0, vboSize);
}
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
int main(int argc, char* argv[]) {
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowSize(512, 512);
glutCreateWindow("glview");
glutDisplayFunc(render);
glutMainLoop();
return 0;
}

As a number of people have mentioned in their answers, you do not need to plot that many points. I think it is important to rpeat Andrey's comment:
that is a HUGE amount of points! There isn't enough pixels on the screen to plot that amount.
Rewriting plotting routines in different languages is a waste of your time. A huge number of hours have gone into writing MATLAB, whay makes you think you can write a significantly faster plotting routine (in a reasonable amount of time)? Whilst your routine may be less general, and therefore would remove some of the checks that the MATLAB code will perform, your "bottleneck" is that you are trying to plot so much data.
I strongly recommend one of two courses of action:
Sample your data: You do not need 20 x 1000000 points on a figure - the human eye won't be able to distinguish between all the points, so it is a waste of time. Try binning your data for example.
If you maintain that you need all those points on the screen, I would suggest using a different tool. VisIt or ParaView are two examples that come to mind. They are parallel visualisation programs designed to handle extremenly large datasets (I have seen VisIt handle datasets that contained PetaBytes of data).

There is no way you can fit 1000000 data points on a small plot. How about you choose one in every 10000 points and plot those?
You can consider calling imresize on the large vector to shrink it, but manually building a vector by omitting 99% of the points may be faster.
#memyself The sampling operations are already occurring. Matlab is choosing what data to include in the graph. Why do you trust matlab? It looks to me that the graph you showed significantly misrepresents the data. The dense regions should indicate that the signal is at a constant value, but in your graph it could mean that the signal is at that value half the time - or was at that value at least once during the interval corresponding to that pixel?

Would it be possible to use an alternate architectue? For example, use MATLAB to generate the data and use a fast library or application (GNUplot?) to handle the plotting?
It might even be possible to have MATLAB write the data to a stream as the plotter consumes the data. Then the plot would be updated as MATLAB generates the data.
This approach would avoid MATLAB's ridiculously slow plotting and divide the work up between two separate processes. The OS/CPU would probably assign the process to different cores as a matter of course.

I think it's possible, but likely to require writing the plotting code (at least the parts you use) from scratch, since anything you could reuse is exactly what's slowing you down.
To test feasibility, I'd start with testing that any Win32 GUI works from MEX (call MessageBox), then proceed to creating your own window, test that window messages arrive to your WndProc. Once all that's going, you can bind an OpenGL context to it (or just use GDI), and start plotting.
However, the savings is likely to come from simpler plotting code and use of newer OpenGL features such as VBOs, rather than threading. Everything is already parallel on the GPU, and more threads don't help transfer of commands/data to the GPU any faster.

I did a very similar thing many many years ago (2004?). I needed an oscilloscope-like display for kilohertz sampled biological signals displayed in real time. Not quite as many points as the original question has, but still too many for MATLAB to handle on its own. IIRC I ended up writing a Java component to display the graph.
As other people have suggested, I also ended up down-sampling the data. For each pixel on the x-axis, I calculated the minimum and maximum values taken by the data, then drew a short vertical line between those values. The entire graph consisted of a sequence of short vertical lines, each immediately adjacent to the next.
Actually, I think that the implementation ended up writing the graph to a bitmap that scrolled continuously using bitblt, with only new points being drawn ... or maybe the bitmap was static and the viewport scrolled along it ... anyway it was a long time ago and I might not be remembering it right.

Blockquote
EDIT4: I know that this is a huge amount of points to plot but I never said that the problem was easy. I can not just leave out certain data points, because there's no way of assessing what points are important, before actually plotting them
Blockquote
This is incorrect. There is a way to to know which points to leave out. Matlab is already doing it. Something is going to have to do it at some point no matter how you solve this. I think you need to redirect your problem to be "how do I determine which points I should plot?".
Based on the screenshot, the data looks like a waveform. You might want to look at the code of audacity. It is an open source audio editing program. It displays plots to represent the waveform in real time, and they look identical in style to the one in your lowest screen shot. You could borrow some sampling techniques from them.

What you are looking for is the creation of a MEX file.
Rather than me explaining it, you would probably benefit more from reading this: Creating C/C++ and Fortran Programs to be Callable from MATLAB (MEX-Files) (a documentation article from MathWorks).
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js