DirectX 9 not rendering after adding transforms

DirectX 9 not rendering after adding transforms - c++

so far I got a cube rendered without any transforms (thus it was rendered in an orthographic perspective), and I am working on the previous code to get it into a perspective view, with all the matrices involved. I changed the Flexible Vertex Format so as not to have RHW (thus only having XYZ coordinates and color, tried ARGB and XRGB but I don't think it matters), and I added a function that sets all the matrices.
Debugging
showed that matrices are being created correctly, functions return correctly (as far as I could see), no crashes (DirectX will never complain if something goes wrong, it just doesn't render) and in general, step-by-step debugging shows no paranormal activity.
Existing project (which I modify and eventually prevent from working):
As I advance I also write tutorials of sorts so I can go back and see what I did last time to get it to work, and this time I've kept versions so you can get the code here along with the VS2010 solution, all the DirectX work is done in 3Dheader.h and D3DLoader.h
Changes:
- the custom vertex format FVF_CUSTOMVERTEX has been changed so as not to include RHW, as I understand it has to be removed so as to be computed through the transformations
- In Render() I add a call to the function setMatrices() which does all the matrix and transform work, and is as follows:
void setMatrices()
{
//--------------transformation code----------------//
D3DXMATRIX objectM, translationM, rotationM, projectionM, lookAtM, finalM;
HRESULT hr;
D3DXMatrixIdentity(&objectM);
D3DXMatrixRotationY(&rotationM, D3DX_PI/4);
//D3DXMatrixMultiply(&finalM, &objectM, &rotationM);
D3DXMatrixPerspectiveFovLH(&projectionM,D3DX_PI/4,(float)yRes/xRes, 1, 100);
D3DXVECTOR3 camera;
camera.x = -10;
camera.y = 0;
camera.z = 0;
D3DXVECTOR3 cameraTarget;
cameraTarget.x = 0;
cameraTarget.y = 0;
cameraTarget.z = 0;
D3DXVECTOR3 cameraUp;
cameraUp.x = 0;
cameraUp.y = 1;
cameraUp.z = 0;
D3DXMatrixLookAtLH(&lookAtM,&camera,&cameraTarget, &cameraUp);
hr = pd3dDevice->SetTransform(D3DTS_WORLD, &objectM);
hr = pd3dDevice->SetTransform(D3DTS_PROJECTION, &projectionM);
hr = pd3dDevice->SetTransform(D3DTS_VIEW, &lookAtM);
D3DVIEWPORT9 view_port;
view_port.X=0;
view_port.Y=0;
view_port.Width=xRes;
view_port.Height=yRes;
view_port.MinZ=0.0f;
view_port.MaxZ=1.0f;
pd3dDevice->SetViewport(&view_port);
}
Note of course that some elements may not be needed, placed there just in case during my attempts, this is the code I have currently so we have a common reference.
Thanks in advance for any answers and/or attempts to answer.

In your code (downloaded), xRes and yRes are ints. Due to integer division, yRes/xRes will be zero, because xRes > yRes. You are passing this into the D3DXMatrixPerspextiveFovLH function as the aspect ratio, which will produce an invalid matrix. Instead, cast them to floats first, before doing the division, and pass the result in.

Related

Rendering point cloud data with draw instancing from OSG Cookbook not working

I am rendering a point cloud using OSG. I followed the example in the OSG cookbook titled "Rendering point cloud data with draw instancing" that shows how to make one point with many instances and then transfer the point locations to the graphics card via a texture. It then uses a shader to pull the points out of the texture and move each instance to the right location. There appear to be two problems with what is getting rendered.
First, the points aren't in the right location compared to a more straight forward, working approach to rendering. It looks like they are roughly scaled from zero wrong, some kind of multiplicative factor on position.
Second, the imagery is blurry. Points tend to be generally in the right place; there are many points in the place where a large object should be. However, I can't tell what the object. Data rendered with my working (but slower) rendering method looks sharp.
I have verified that I have the same input data going into the texture and draw list in both methods so it seems it has to be something with the rendering.
Here is the code to set up the Geometry which is nearly directly copied from the text book.
osg::Geometry* geo = new osg::Geometry;
osg::ref_ptr<osg::Image> img = new osg::Image;
img->allocateImage(w,h, 1, GL_RGBA, GL_FLOAT);
osg::BoundingBox box;
float* data = (float*)img->data();
for (unsigned long int k=0; k<NPoints; k++)
{
*(data++) = cloud->x[k];
*(data++) = cloud->y[k];
*(data++) = cloud->z[k];
*(data++) = cloud->meta[0][k];
box.expandBy(cloud->x[k],cloud->y[k],cloud->z[k]);
}
geo->setUseDisplayList(false);
geo->setUseVertexBufferObjects(true);
geo->setVertexArray( new osg::Vec3Array(1));
geo->addPrimitiveSet( new osg::DrawArrays(GL_POINTS, 0, 1, stop) );
geo->setInitialBound(box);
osg::ref_ptr<osg::Texture2D> tex = new osg::Texture2D;
tex->setImage( img);
tex->setInternalFormat( GL_RGBA32F_ARB );
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::LINEAR);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::LINEAR);
And here is the shader code.
void main () {
float row;
row = float(gl_InstanceID) / float(width);
vec2 uv = vec2( fract(row), floor(row) / float(height) );
vec4 texValue = texture2D(defaultTex,uv);
vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);
gl_Position = gl_ModelViewProjectionMatrix * pos;
}

After a bunch of experimenting, I found that the example code from the OSG Cookbook has some problems.
The scale issue (the first problem) is in the shader.
vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);
Should be
vec4 pos = gl_Vertex + vec4(texValue.xyz, 0.0);
This is because the gl_Vertex is a 3-vector with an extra 1 element to aide with matrix transformation. That element should always be 1. The example created another 3+1 vector and added it to gl_Vertex making it a 2. Replace the 1 with a zero and the scale problem goes away.
The blurriness (the second problem) was caused by texture interpolation.
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::LINEAR);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::LINEAR);
needs to be
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::NEAREST);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::NEAREST);
so that the interpolator will just take the values from the texture instead of interpolating them from neighboring texture pixels which may be points on the other side of the point cloud. After fixing these two issues, the example works as advertised and seems to be a bit faster in my limited testing.

Can't get texture.Sample to work, although I can get texture.Load to work fine in Direct 3d 11 shader

In my HLSL for Direct3d 11 app, I'm having a problem where the texture.Sample intrinsic always return 0. I know my data and parameters are correct because if I use texture.Load instead of Sample the value returned is correct.
Here are my declarations:
extern Texture2D<float> texMask;
SamplerState TextureSampler : register (s2);
Here is the code in my pixel shader that works-- this confirms that my texture is available correctly to the shader and my texcoord values are correct:
float maskColor = texMask.Load(int3(8192*texcoord.x, 4096*texcoord.y, 0));
If I substitute for this the following line, maskColor is always 0, and I can't figure out why.
float maskColor = texMask.Sample(TextureSampler, texcoord);
TextureSampler has the default state values; texMask is defined with 1 mip level.
I've also tried:
float maskColor = texMask.SampleLevel(TextureSampler, texcoord, 0);
and that also always returns 0.
C++ code for setting up sampler:
D3D11_SAMPLER_DESC sd;
ZeroMemory(&sd, sizeof(D3D11_SAMPLER_DESC));
sd.Filter = D3D11_FILTER_MIN_MAG_MIP_LINEAR;
sd.AddressU = D3D11_TEXTURE_ADDRESS_WRAP;
sd.AddressV = D3D11_TEXTURE_ADDRESS_WRAP;
ID3D11SamplerState* pSampler;
dev->CreateSamplerState(&sd, &pSampler);
devcon->PSSetSamplers(2, 1, &pSampler);

Forgive me for reviving such an old post, but I figured it important to add another possible cause for this sort of issue for others, and this post is the most relevant place I could find to post in.
I, too, had an issue where the HLSL Sample function would always return 0, but only on specific textures, and not on others. I checked, ensured the texture was properly bound, and that the color values should not have been 0, and still was left wondering why I was always getting 0 back for this one specific texture, but not others used in the same shader pass. The Load function worked fine, but then I lost the nice features that samplers give us.
As it turns out, in my case, I had accidentally created this texture's description as:
D3D11_TEXTURE2D_DESC desc;
desc.Width = _width;
desc.Height = _height;
desc.MipLevels = 0; // <- Bad!
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R16G16B16A16_FLOAT;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_RENDER_TARGET;
desc.CPUAccessFlags = 0;
desc.MiscFlags = 0;
This worked and created a texture that was visible and renderable, however what happens when defining MipLevels to 0, is that DirectX generates an entire mip chain for that texture. Me being me, however, I forgot this while working on my project further, and while DirectX may generate the textures for the mip chain, drawing to the texture does not cascade through all the levels of the chain (which does make sense, I suppose).
Now, I suppose it's important to note that I'm still new to the whole graphics programming thing, if that wasn't already obvious enough. I have absolutely no idea what mip level, or combination of mip levels, the regular Sample function uses. But I can say that in my case, it didn't happen to be level 0. Maybe it will for a smaller mip chain, but this texture in particular had 12 levels in total, with which only level 0 had any actual color information drawn to it. Using the Load function, or SampleLevel to explicitly access mip level 0 worked fine. As I do not need, nor want, the texture I'm trying to sample to have a mip chain, I simply changed it's description to fix it.

I found my problem -- I needed to specify a register for the texture as well as the sampler in my HLSL. I can't find any documentation anywhere that describes why this is necessary, but it did fix my problem.

DirectX using multiple Render Targets as input to each other

I have a fairly simple DirectX 11 framework setup that I want to use for various 2D simulations. I am currently trying to implement the 2D Wave Equation on the GPU. It requires I keep the grid state of the simulation at 2 previous timesteps in order to compute the new one.
How I went about it was this - I have a class called FrameBuffer, which has the following public methods:
bool Initialize(D3DGraphicsObject* graphicsObject, int width, int height);
void BeginRender(float clearRed, float clearGreen, float clearBlue, float clearAlpha) const;
void EndRender() const;
// Return a pointer to the underlying texture resource
const ID3D11ShaderResourceView* GetTextureResource() const;
In my main draw loop I have an array of 3 of these buffers. Every loop I use the textures from the previous 2 buffers as inputs to the next frame buffer and I also draw any user input to change the simulation state. I then draw the result.
int nextStep = simStep+1;
if (nextStep > 2)
nextStep = 0;
mFrameArray[nextStep]->BeginRender(0.0f,0.0f,0.0f,1.0f);
{
mGraphicsObj->SetZBufferState(false);
mQuad->GetRenderer()->RenderBuffers(d3dGraphicsObj->GetDeviceContext());
ID3D11ShaderResourceView* texArray[2] = { mFrameArray[simStep]->GetTextureResource(),
mFrameArray[prevStep]->GetTextureResource() };
result = mWaveShader->Render(d3dGraphicsObj, mQuad->GetRenderer()->GetIndexCount(), texArray);
if (!result)
return false;
// perform any extra input
I_InputSystem *inputSystem = ServiceProvider::Instance().GetInputSystem();
if (inputSystem->IsMouseLeftDown()) {
int x,y;
inputSystem->GetMousePos(x,y);
int width,height;
mGraphicsObj->GetScreenDimensions(width,height);
float xPos = MapValue((float)x,0.0f,(float)width,-1.0f,1.0f);
float yPos = MapValue((float)y,0.0f,(float)height,-1.0f,1.0f);
mColorQuad->mTransform.position = Vector3f(xPos,-yPos,0);
result = mColorQuad->Render(&viewMatrix,&orthoMatrix);
if (!result)
return false;
}
mGraphicsObj->SetZBufferState(true);
}
mFrameArray[nextStep]->EndRender();
prevStep = simStep;
simStep = nextStep;
ID3D11ShaderResourceView* currTexture = mFrameArray[nextStep]->GetTextureResource();
// Render texture to screen
mGraphicsObj->SetZBufferState(false);
mQuad->SetTexture(currTexture);
result = mQuad->Render(&viewMatrix,&orthoMatrix);
if (!result)
return false;
mGraphicsObj->SetZBufferState(true);
The problem is nothing is happening. Whatever I draw appears on the screen(I draw using a small quad) but no part of the simulation is actually ran. I can provide the shader code if required, but I am certain it works since I've implemented this before on the CPU using the same algorithm. I'm just not certain how well D3D render targets work and if I'm just drawing wrong every frame.
EDIT 1:
Here is the code for the begin and end render functions of the frame buffers:
void D3DFrameBuffer::BeginRender(float clearRed, float clearGreen, float clearBlue, float clearAlpha) const {
ID3D11DeviceContext *context = pD3dGraphicsObject->GetDeviceContext();
context->OMSetRenderTargets(1, &(mRenderTargetView._Myptr), pD3dGraphicsObject->GetDepthStencilView());
float color[4];
// Setup the color to clear the buffer to.
color[0] = clearRed;
color[1] = clearGreen;
color[2] = clearBlue;
color[3] = clearAlpha;
// Clear the back buffer.
context->ClearRenderTargetView(mRenderTargetView.get(), color);
// Clear the depth buffer.
context->ClearDepthStencilView(pD3dGraphicsObject->GetDepthStencilView(), D3D11_CLEAR_DEPTH, 1.0f, 0);
void D3DFrameBuffer::EndRender() const {
pD3dGraphicsObject->SetBackBufferRenderTarget();
}
Edit 2 Ok, I after I set up the DirectX debug layer I saw that I was using an SRV as a render target while it was still bound to the Pixel stage in out of the shaders. I fixed that by setting shader resources to NULL after I render with the wave shader, but the problem still persists - nothing actually gets ran or updated. I took the render target code from here and slightly modified it, if its any help: http://rastertek.com/dx11tut22.html

Okay, as I understand correct you need a multipass-rendering to texture.
Basiacally you do it like I've described here: link
You creating SRVs with both D3D11_BIND_SHADER_RESOURCE and D3D11_BIND_RENDER_TARGET bind flags.
You ctreating render targets from textures
You set first texture as input (*SetShaderResources()) and second texture as output (OMSetRenderTargets())
You Draw()*
then you bind second texture as input, and third as output
Draw()*
etc.
Additional advices:
If your target GPU capable to write to UAVs from non-compute shaders, you can use it. It is much more simple and less error prone.
If your target GPU suitable, consider using compute shader. It is a pleasure.
Don't forget to enable DirectX debug layer. Sometimes we make obvious errors and debug output can point to them.
Use graphics debugger to review your textures after each draw call.
Edit 1:
As I see, you call BeginRender and OMSetRenderTargets only once, so, all rendering goes into mRenderTargetView. But what you need is to interleave:
SetSRV(texture1);
SetRT(texture2);
Draw();
SetSRV(texture2);
SetRT(texture3);
Draw();
SetSRV(texture3);
SetRT(backBuffer);
Draw();
Also, we don't know what is mRenderTargetView yet.
so, before
result = mColorQuad->Render(&viewMatrix,&orthoMatrix);
somewhere must be OMSetRenderTargets .
Probably, it s better to review your Begin()/End() design, to make resource binding more clearly visible.
Happy coding! =)

Text rendering terribly slow

I'm using FTGL library to render text in my C++, OpenGL application, but I find it terribly slow, even though it is said to be fast and efficient library for this.
Even for small amounts of text, performance drop is visible, but when I try to render few lines of text, FPS drops from 350~ to 30~:
Yes, I already know that FPS isn't a good way to check efficiency, yet in this case there shouldn't be so big difference.
I found a function which allows me to make FTGL use display lists internally in order to increase speed, but it appears to be turned on by default. Anyway I tried using it, but it gave me nothing. So I thought that maybe it's somehow corrupted, or I don't understand it quite well, so I decided to put rendering text into my own display lists, but difference is either so slight that I can't even see it, or there's no difference.
bool TFontManager::renderWrappedText(font_ptr font, int lineLength, const TPoint& position, const std::string& text) {
if(font == nullptr) {
return false;
}
string key = sizeToString(font->FaceSize()); // key to look for it in map
key.append(TUtil::intToString(lineLength));
key.append(text);
GLuint displayListId = getDisplayListId(key); // get display list id from internal map
if(displayListId != 0) { // if display list id was found in map, i can call it
glCallList(displayListId);
return true;
}
// if id was not found, i'm creating new display list
FTSimpleLayout simpleLayout;
simpleLayout.SetLineLength((float)lineLength);
simpleLayout.SetFont(font.get());
displayListId = glGenLists(1);
glNewList(displayListId, GL_COMPILE);
glPushMatrix();
glTranslatef(position.x, position.y, 0.0f);
simpleLayout.Render(TUtil::stringToWString(text).c_str(), -1, FTPoint(), FTGL::RENDER_FRONT | FTGL::RENDER_BACK); // according to visual studio's profiler, bottleneck is inside this function. more exactly in drawing textured quads when i looked into FTGL code.
glPopMatrix();
glEndList();
m_textDisplayLists[key] = displayListId;
glCallList(displayListId);
return true;
}
I checked with breakpoints in debug mode - it creates display list only once, later it only calls previously created one.
What might be the reason for such slow rendering? How may I speed it up?
Edit:
I'm using FTTextureFont (which uses one texture per glyph). According to this FTGL tutorial, I should rather use FTBufferFont, because it uses only one texture per line. Buffer font should be faster, but after I tried it it's uglier and even slower (6 fps whereas texture font gave me 30 fps).
Edit2:
This is how I create my fonts:
font_ptr TFontManager::getFont(const std::string& filename, int size) {
string fontKey = filename;
fontKey.append(sizeToString(size));
FontIter result = fonts.find(fontKey);
if(result != fonts.end()) {
return result->second; // Found font in list
}
// If font wasn't found, create a new one and store it in list of fonts
font_ptr font(new FTTextureFont(filename.c_str()));
font->UseDisplayList(true);
if(font->Error()) {
string message = "Failed to open font";
message.append(filename);
TError::showMessage(message);
return nullptr;
}
if(!font->FaceSize(size)) {
string message = "Failed to set font size";
TError::showMessage(message);
return nullptr;
}
fonts[fontKey] = font;
return font;
}
Edit3:
This is function taken from FTGL library source code which renders glyph in FTTextureFont. It uses the same texture for separate glyphs, just with other coordinates, so this shouldn't be a problem.
const FTPoint& FTTextureGlyphImpl::RenderImpl(const FTPoint& pen,
int renderMode)
{
float dx, dy;
if(activeTextureID != glTextureID)
{
glBindTexture(GL_TEXTURE_2D, (GLuint)glTextureID);
activeTextureID = glTextureID;
}
dx = floor(pen.Xf() + corner.Xf());
dy = floor(pen.Yf() + corner.Yf());
glBegin(GL_QUADS);
glTexCoord2f(uv[0].Xf(), uv[0].Yf());
glVertex2f(dx, dy);
glTexCoord2f(uv[0].Xf(), uv[1].Yf());
glVertex2f(dx, dy - destHeight);
glTexCoord2f(uv[1].Xf(), uv[1].Yf());
glVertex2f(dx + destWidth, dy - destHeight);
glTexCoord2f(uv[1].Xf(), uv[0].Yf());
glVertex2f(dx + destWidth, dy);
glEnd();
return advance;
}

Rendering typography from normal typeface files is a pretty computationally intensive operation. The font glyphs are read as a set of splines that are used to generate character boundaries which are tessellated and fed into the graphics pipeline. I'm not highly familiar with FreeType2 but I have used FTGL. You should be using a FontAtlas to render type. A FontAtlas is a regular texture atlas (much like a sprite sheet) that is rendered once for each font size and then stored for future glyph renders.
Check out this link for more information on the process:
http://antongerdelan.net/opengl4/freetypefonts.html
This should greatly improve performance. Although you may lose out on some font-rendering flexibility.

is it possible to speed-up matlab plotting by calling c / c++ code in matlab?

It is generally very easy to call mex files (written in c/c++) in Matlab to speed up certain calculations. In my experience however, the true bottleneck in Matlab is data plotting. Creating handles is extremely expensive and even if you only update handle data (e.g., XData, YData, ZData), this might take ages. Even worse, since Matlab is a single threaded program, it is impossible to update multiple plots at the same time.
Therefore my question: Is it possible to write a Matlab GUI and call C++ (or some other parallelizable code) which would take care of the plotting / visualization? I'm looking for a cross-platform solution that will work on Windows, Mac and Linux, but any solution that get's me started on either OS is greatly appreciated!
I found a C++ library that seems to use Matlab's plot() syntax but I'm not sure whether this would speed things up, since I'm afraid that if I plot into Matlab's figure() window, things might get slowed down again.
I would appreciate any comments and feedback from people who have dealt with this kind of situation before!
EDIT: obviously, I've already profiled my code and the bottleneck is the plotting (dozen of panels with lots of data).
EDIT2: for you to get the bounty, I need a real life, minimal working example on how to do this - suggestive answers won't help me.
EDIT3: regarding the data to plot: in a most simplistic case, think about 20 line plots, that need to be updated each second with something like 1000000 data points.
EDIT4: I know that this is a huge amount of points to plot but I never said that the problem was easy. I can not just leave out certain data points, because there's no way of assessing what points are important, before actually plotting them (data is sampled a sub-ms time resolution). As a matter of fact, my data is acquired using a commercial data acquisition system which comes with a data viewer (written in c++). This program has no problem visualizing up to 60 line plots with even more than 1000000 data points.
EDIT5: I don't like where the current discussion is going. I'm aware that sub-sampling my data might speeds up things - however, this is not the question. The question here is how to get a c / c++ / python / java interface to work with matlab in order hopefully speed up plotting by talking directly to the hardware (or using any other trick / way)

Did you try the trivial solution of changing the render method to OpenGL ?
opengl hardware;
set(gcf,'Renderer','OpenGL');
Warning!
There will be some things that disappear in this mode, and it will look a bit different, but generally plots will runs much faster, especially if you have a hardware accelerator.
By the way, are you sure that you will actually gain a performance increase?
For example, in my experience, WPF graphics in C# are considerably slower than Matlabs, especially scatter plot and circles.
Edit: I thought about the fact that the number of points that is actually drawn to the screen can't be that much. Basically it means that you need to interpolate at the places where there is a pixel in the screen. Check out this object:
classdef InterpolatedPlot < handle
properties(Access=private)
hPlot;
end
methods(Access=public)
function this = InterpolatedPlot(x,y,varargin)
this.hPlot = plot(0,0,varargin{:});
this.setXY(x,y);
end
end
methods
function setXY(this,x,y)
parent = get(this.hPlot,'Parent');
set(parent,'Units','Pixels')
sz = get(parent,'Position');
width = sz(3); %Actual width in pixels
subSampleX = linspace(min(x(:)),max(x(:)),width);
subSampleY = interp1(x,y,subSampleX);
set(this.hPlot,'XData',subSampleX,'YData',subSampleY);
end
end
end
And here is an example how to use it:
function TestALotOfPoints()
x = rand(10000,1);
y = rand(10000,1);
ip = InterpolatedPlot(x,y,'color','r','LineWidth',2);
end
Another possible improvement:
Also, if your x data is sorted, you can use interp1q instead of interp, which will be much faster.
classdef InterpolatedPlot < handle
properties(Access=private)
hPlot;
end
% properties(Access=public)
% XData;
% YData;
% end
methods(Access=public)
function this = InterpolatedPlot(x,y,varargin)
this.hPlot = plot(0,0,varargin{:});
this.setXY(x,y);
% this.XData = x;
% this.YData = y;
end
end
methods
function setXY(this,x,y)
parent = get(this.hPlot,'Parent');
set(parent,'Units','Pixels')
sz = get(parent,'Position');
width = sz(3); %Actual width in pixels
subSampleX = linspace(min(x(:)),max(x(:)),width);
subSampleY = interp1q(x,y,transpose(subSampleX));
set(this.hPlot,'XData',subSampleX,'YData',subSampleY);
end
end
end
And the use case:
function TestALotOfPoints()
x = rand(10000,1);
y = rand(10000,1);
x = sort(x);
ip = InterpolatedPlot(x,y,'color','r','LineWidth',2);
end

Since you want maximum performance you should consider writing a minimal OpenGL viewer. Dump all the points to a file and launch the viewer using the "system"-command in MATLAB. The viewer can be really simple. Here is one implemented using GLUT, compiled for Mac OS X. The code is cross platform so you should be able to compile it for all the platforms you mention. It should be easy to tweak this viewer for your needs.
If you are able to integrate this viewer more closely with MATLAB you might be able to get away with not having to write to and read from a file (= much faster updates). However, I'm not experienced in the matter. Perhaps you can put this code in a mex-file?
EDIT: I've updated the code to draw a line strip from a CPU memory pointer.
// On Mac OS X, compile using: g++ -O3 -framework GLUT -framework OpenGL glview.cpp
// The file "input" is assumed to contain a line for each point:
// 0.1 1.0
// 5.2 3.0
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>
#include <GLUT/glut.h>
using namespace std;
struct float2 { float2() {} float2(float x, float y) : x(x), y(y) {} float x, y; };
static vector<float2> points;
static float2 minPoint, maxPoint;
typedef vector<float2>::iterator point_iter;
static void render() {
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(minPoint.x, maxPoint.x, minPoint.y, maxPoint.y, -1.0f, 1.0f);
glColor3f(0.0f, 0.0f, 0.0f);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(2, GL_FLOAT, sizeof(points[0]), &points[0].x);
glDrawArrays(GL_LINE_STRIP, 0, points.size());
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
}
int main(int argc, char* argv[]) {
ifstream file("input");
string line;
while (getline(file, line)) {
istringstream ss(line);
float2 p;
ss >> p.x;
ss >> p.y;
if (ss)
points.push_back(p);
}
if (!points.size())
return 1;
minPoint = maxPoint = points[0];
for (point_iter i = points.begin(); i != points.end(); ++i) {
float2 p = *i;
minPoint = float2(minPoint.x < p.x ? minPoint.x : p.x, minPoint.y < p.y ? minPoint.y : p.y);
maxPoint = float2(maxPoint.x > p.x ? maxPoint.x : p.x, maxPoint.y > p.y ? maxPoint.y : p.y);
}
float dx = maxPoint.x - minPoint.x;
float dy = maxPoint.y - minPoint.y;
maxPoint.x += dx*0.1f; minPoint.x -= dx*0.1f;
maxPoint.y += dy*0.1f; minPoint.y -= dy*0.1f;
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowSize(512, 512);
glutCreateWindow("glview");
glutDisplayFunc(render);
glutMainLoop();
return 0;
}
EDIT: Here is new code based on the discussion below. It renders a sin function consisting of 20 vbos, each containing 100k points. 10k new points are added each rendered frame. This makes a total of 2M points. The performance is real-time on my laptop.
// On Mac OS X, compile using: g++ -O3 -framework GLUT -framework OpenGL glview.cpp
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>
#include <cmath>
#include <iostream>
#include <GLUT/glut.h>
using namespace std;
struct float2 { float2() {} float2(float x, float y) : x(x), y(y) {} float x, y; };
struct Vbo {
GLuint i;
Vbo(int size) { glGenBuffersARB(1, &i); glBindBufferARB(GL_ARRAY_BUFFER, i); glBufferDataARB(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW); } // could try GL_STATIC_DRAW
void set(const void* data, size_t size, size_t offset) { glBindBufferARB(GL_ARRAY_BUFFER, i); glBufferSubData(GL_ARRAY_BUFFER, offset, size, data); }
~Vbo() { glDeleteBuffers(1, &i); }
};
static const int vboCount = 20;
static const int vboSize = 100000;
static const int pointCount = vboCount*vboSize;
static float endTime = 0.0f;
static const float deltaTime = 1e-3f;
static std::vector<Vbo*> vbos;
static int vboStart = 0;
static void addPoints(float2* points, int pointCount) {
while (pointCount) {
if (vboStart == vboSize || vbos.empty()) {
if (vbos.size() >= vboCount+2) { // remove and reuse vbo
Vbo* first = *vbos.begin();
vbos.erase(vbos.begin());
vbos.push_back(first);
}
else { // create new vbo
vbos.push_back(new Vbo(sizeof(float2)*vboSize));
}
vboStart = 0;
}
int pointsAdded = pointCount;
if (pointsAdded + vboStart > vboSize)
pointsAdded = vboSize - vboStart;
Vbo* vbo = *vbos.rbegin();
vbo->set(points, pointsAdded*sizeof(float2), vboStart*sizeof(float2));
pointCount -= pointsAdded;
points += pointsAdded;
vboStart += pointsAdded;
}
}
static void render() {
// generate and add 10000 points
const int count = 10000;
float2 points[count];
for (int i = 0; i < count; ++i) {
float2 p(endTime, std::sin(endTime*1e-2f));
endTime += deltaTime;
points[i] = p;
}
addPoints(points, count);
// render
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(endTime-deltaTime*pointCount, endTime, -1.0f, 1.0f, -1.0f, 1.0f);
glColor3f(0.0f, 0.0f, 0.0f);
glEnableClientState(GL_VERTEX_ARRAY);
for (size_t i = 0; i < vbos.size(); ++i) {
glBindBufferARB(GL_ARRAY_BUFFER, vbos[i]->i);
glVertexPointer(2, GL_FLOAT, sizeof(float2), 0);
if (i == vbos.size()-1)
glDrawArrays(GL_LINE_STRIP, 0, vboStart);
else
glDrawArrays(GL_LINE_STRIP, 0, vboSize);
}
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
int main(int argc, char* argv[]) {
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowSize(512, 512);
glutCreateWindow("glview");
glutDisplayFunc(render);
glutMainLoop();
return 0;
}

As a number of people have mentioned in their answers, you do not need to plot that many points. I think it is important to rpeat Andrey's comment:
that is a HUGE amount of points! There isn't enough pixels on the screen to plot that amount.
Rewriting plotting routines in different languages is a waste of your time. A huge number of hours have gone into writing MATLAB, whay makes you think you can write a significantly faster plotting routine (in a reasonable amount of time)? Whilst your routine may be less general, and therefore would remove some of the checks that the MATLAB code will perform, your "bottleneck" is that you are trying to plot so much data.
I strongly recommend one of two courses of action:
Sample your data: You do not need 20 x 1000000 points on a figure - the human eye won't be able to distinguish between all the points, so it is a waste of time. Try binning your data for example.
If you maintain that you need all those points on the screen, I would suggest using a different tool. VisIt or ParaView are two examples that come to mind. They are parallel visualisation programs designed to handle extremenly large datasets (I have seen VisIt handle datasets that contained PetaBytes of data).

There is no way you can fit 1000000 data points on a small plot. How about you choose one in every 10000 points and plot those?
You can consider calling imresize on the large vector to shrink it, but manually building a vector by omitting 99% of the points may be faster.
#memyself The sampling operations are already occurring. Matlab is choosing what data to include in the graph. Why do you trust matlab? It looks to me that the graph you showed significantly misrepresents the data. The dense regions should indicate that the signal is at a constant value, but in your graph it could mean that the signal is at that value half the time - or was at that value at least once during the interval corresponding to that pixel?

Would it be possible to use an alternate architectue? For example, use MATLAB to generate the data and use a fast library or application (GNUplot?) to handle the plotting?
It might even be possible to have MATLAB write the data to a stream as the plotter consumes the data. Then the plot would be updated as MATLAB generates the data.
This approach would avoid MATLAB's ridiculously slow plotting and divide the work up between two separate processes. The OS/CPU would probably assign the process to different cores as a matter of course.

I think it's possible, but likely to require writing the plotting code (at least the parts you use) from scratch, since anything you could reuse is exactly what's slowing you down.
To test feasibility, I'd start with testing that any Win32 GUI works from MEX (call MessageBox), then proceed to creating your own window, test that window messages arrive to your WndProc. Once all that's going, you can bind an OpenGL context to it (or just use GDI), and start plotting.
However, the savings is likely to come from simpler plotting code and use of newer OpenGL features such as VBOs, rather than threading. Everything is already parallel on the GPU, and more threads don't help transfer of commands/data to the GPU any faster.

I did a very similar thing many many years ago (2004?). I needed an oscilloscope-like display for kilohertz sampled biological signals displayed in real time. Not quite as many points as the original question has, but still too many for MATLAB to handle on its own. IIRC I ended up writing a Java component to display the graph.
As other people have suggested, I also ended up down-sampling the data. For each pixel on the x-axis, I calculated the minimum and maximum values taken by the data, then drew a short vertical line between those values. The entire graph consisted of a sequence of short vertical lines, each immediately adjacent to the next.
Actually, I think that the implementation ended up writing the graph to a bitmap that scrolled continuously using bitblt, with only new points being drawn ... or maybe the bitmap was static and the viewport scrolled along it ... anyway it was a long time ago and I might not be remembering it right.

Blockquote
EDIT4: I know that this is a huge amount of points to plot but I never said that the problem was easy. I can not just leave out certain data points, because there's no way of assessing what points are important, before actually plotting them
Blockquote
This is incorrect. There is a way to to know which points to leave out. Matlab is already doing it. Something is going to have to do it at some point no matter how you solve this. I think you need to redirect your problem to be "how do I determine which points I should plot?".
Based on the screenshot, the data looks like a waveform. You might want to look at the code of audacity. It is an open source audio editing program. It displays plots to represent the waveform in real time, and they look identical in style to the one in your lowest screen shot. You could borrow some sampling techniques from them.

What you are looking for is the creation of a MEX file.
Rather than me explaining it, you would probably benefit more from reading this: Creating C/C++ and Fortran Programs to be Callable from MATLAB (MEX-Files) (a documentation article from MathWorks).
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js