DirectX - GetSurfaceLevel Performance Issue - c++

I'm implementing deferred shading in a directx 9 application. My method of deferred shading requires 3 render targets( color, position, and normal ). It is necessary to:
set the render targets in the device at the beginning of the 'render' function
draw the data to them in the 'rt pass'
remove the render targets from the device( so as not to draw over them during subsequent passes)
set the render targets as textures for subsequent passes so that the effect can recall data 'drawn' to the rt's in the 'rt pass'...
This method works fine, however, I am experiencing performance issues. I've narrowed them down to two function calls:
IDirect3DTexture9::GetSurfaceLevel()
IDirect3DDevice9::SetRenderTarget()
Here is code to set render target:
IDirect3DDevice9 *pd3dDevice = CEffectManager::GetDevice();
IDirect3DTexture9 *pRT = CEffectManager::GetColorRT();
IDirect3DSurface9 *pSrf = NULL;
pRT->GetSurfaceLevel( 0, &pSrf );
pd3dDevice->SetRenderTarget( 0, pSrf );
PIX indicates that the duration( cycles ) of the call to GetSurfaceLevel() is very high ~1/2 ms per call( Duration / Total Duration * 1 / FrameRate ). Because it is necessary to get 3 surfaces, combined, the duration is too high! Its more than 4 times greater than the combined draw calls...
I tried to eliminate the call to GetSurfaceLevel() by storing a pointer to the surface during render target creation...oddly enough, the SetRenderTarget() function assumed the same duration( when before its duration was negligible ). Here is altered code:
IDirect3DDevice9 *pd3dDevice = CEffectManager::GetDevice();
IDirect3DSurface9 *pSrf = CEffectManager::GetColorSurface();
pd3dDevice->SetRenderTarget( 0, pSrf );
Is there a way around this performance issue? Why does the second method take as long as the first? It seems as though the process within IDirect3DDevice9::SetRenderTarget() simply takes time...is there a device state that I can set to help performance?
Update:
I've implemented the following code in order to better test performance:
IDirect3DDevice9 *pd3dDevice = CEffectManager::GetDevice();
IDirect3DTexture9 *pRT = CEffectManager::GetColorRT();
IDirect3DSurface9 *pSRF = NULL;
IDirect3DQuery9 *pEvent = NULL;
LARG_INTEGER lnStart, lnStop, lnFrequency;
// create query
pd3dDevice->CreateQuery( D3DQUERYTYPE_EVENT, &pEvent );
// insert 'end' marker
pEvent->Issue( D3DISSUE_END );
// flush command buffer
while( S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH ) );
// get start time
QueryPerformanceCounter( &lnStart );
// api call
pRT->GetSurfaceLevel();
// insert 'end' marker
pEvent->Issue( D3DISSUE_END )
// flush the command buffer
while( S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH ) );
QueryPerformanceCounter( &lnStop );
QueryPerformanceFrequency( &lnFreq );
lnStop.QuadPart -= lnStart.QuadPart;
float fElapsedTime = ( float )lnStop.QuadPart / ( float )lnFreq.QuadPart;
fElapsedTime on average measured 10 - 50 microseconds
I performed the same test on IDirect3DDevice9::SetRenderTarget() and the results on average measured 5 - 30 microseconds...
This data is much better than what I got from PIX...It suggests that there is not as much of a delay as I thought, however, the framerate is drastically reduced using deferred shading...this seems to be the most likely source for the loss of performance...did I effectively query the device?

Related

Nv Path Rendering fonts optimal implementation

I'm using NV path rendering having read Getting Started with NV Path Rendering by Mark Kilgard
My implementation is based on the render_font example in the Tiger3DES project in NVidia Graphics Samples.
This implementation seems slower than a normal texture based font solution so I'm wondering is it flawed? NVidia state NV Path rendering is faster than alternatives but I am hitting a performance limit far quicker than I expected.
I have a scene with 1000 'messages'. My FPS is incredibly poor on a Quadro K4200. If I combine the text into a single message there is no performance issue but formatting the messages separately is then impossible. If I reduce the number of messages to 100 I get a decent framerate (200+ unlocked).
Are calls to stencil, coverstroke and coverfill expensive?
Here's a code snippet...
Init FontFace:
/* Create a range of path objects corresponding to Latin-1 character codes. */
m_glyphBase = glGenPathsNV(numChars);
glPathGlyphRangeNV(m_glyphBase,
target,
name.c_str(),
style,
0,
numChars,
GL_USE_MISSING_GLYPH_NV,
pathParamTemplate,
GLfloat(emScale)
);
/* Load base character set for unsupported glyphs. */
glPathGlyphRangeNV(m_glyphBase,
GL_STANDARD_FONT_NAME_NV,
"Sans",
style,
0,
numChars,
GL_USE_MISSING_GLYPH_NV,
pathParamTemplate,
GLfloat(emScale)
);
/* Query font and glyph metrics. */
GLfloat fontData[4];
glGetPathMetricRangeNV(GL_FONT_Y_MIN_BOUNDS_BIT_NV | GL_FONT_Y_MAX_BOUNDS_BIT_NV |
GL_FONT_UNDERLINE_POSITION_BIT_NV | GL_FONT_UNDERLINE_THICKNESS_BIT_NV,
m_glyphBase + ' ',
/*count*/1,
4 * sizeof(GLfloat),
fontData
);
m_yMin = fontData[0];
m_yMax = fontData[1];
m_underlinePosition = fontData[2];
m_underlineThickness = fontData[3];
glGetPathMetricRangeNV(GL_GLYPH_HORIZONTAL_BEARING_ADVANCE_BIT_NV,
m_glyphBase,
numChars,
0, /* stride of zero means sizeof(GLfloat) since 1 bit in mask */
&m_horizontalAdvance[0]
);
Init Message:
glGetPathSpacingNV(GL_ACCUM_ADJACENT_PAIRS_NV,
(GLsizei)message.size(),
GL_UNSIGNED_BYTE,
message.c_str(),
m_font->glyphBase(),
1.0, 1.0,
GL_TRANSLATE_X_NV,
&m_xtranslate[1]
);
/* Total advance is accumulated spacing plus horizontal advance of
the last glyph */
m_totalAdvance = m_xtranslate[m_messageLength - 1] +
m_font->horizontalAdvance(uint32(message[m_messageLength - 1]));
Draw Message:
glStencilStrokePathInstancedNV((GLsizei)m_messageLength,
GL_UNSIGNED_BYTE,
message().c_str(),
font()->glyphBase(),
1, ~0U, /* Use all stencil bits */
GL_TRANSLATE_X_NV,
&m_xtranslate[0]
);
glColor3f(m_colour.r, m_colour.g, m_colour.b);
glCoverStrokePathInstancedNV((GLsizei)m_messageLength,
GL_UNSIGNED_BYTE,
message().c_str(),
font()->glyphBase(),
GL_BOUNDING_BOX_OF_BOUNDING_BOXES_NV,
GL_TRANSLATE_X_NV,
&m_xtranslate[0]
);
glStencilFillPathInstancedNV((GLsizei)m_messageLength,
GL_UNSIGNED_BYTE,
message().c_str(),
font()->glyphBase(),
GL_PATH_FILL_MODE_NV,
~0U, /* Use all stencil bits */
GL_TRANSLATE_X_NV,
&m_xtranslate[0]
);
glCoverFillPathInstancedNV((GLsizei)m_messageLength,
GL_UNSIGNED_BYTE,
message().c_str(),
font()->glyphBase(),
GL_BOUNDING_BOX_OF_BOUNDING_BOXES_NV,
GL_TRANSLATE_X_NV,
&m_xtranslate[0]
);
I located the cause of the slowness and it wasn't related to the above referenced functions. These functions perform very well once the offending code was removed. Full disclosure - I was using std::stack for the matrices used in the scene and calls to push and pop on the stack were expensive. So in answer to the question NVidia path rendering for text is blisteringly fast and stencil, coverstroke and coverfill are inexpensive.

Movement goes Slow After a period of time

hello every body : a problem in the issue of Player movement.
I have a Player character in empty scene moves from point A to B and returns to A in constant speed , At first hour of game running it was OKAY , After three Hours the movement became slow and slow , thank beforehand
Game.h
D3DXMATRIX Player_Matrix ; //main player matrix .
D3DXVECTOR3 PlayerPos; //main player position .
D3DXVECTOR3 PlayerLook; //main player Look at position .
Game.cpp
//Initialize()
D3DXMatrixIdentity(&Player_Matrix);
PlayerPos = D3DXVECTOR3(10.0f,0.0f,10.0f);
PlayerLook = D3DXVECTOR3(0.0f,0.0f,1.0f);
.
//MovePlayer()
//declarations
static float angle = D3DXToRadian(0);
float Speed = 70.0f ;
PlayerPos += ( PlayerLook * ( Speed * (m_timeDelta)) );
if(PlayerPos.x >= 320) // 320:(B)
{
angle = D3DXToRadian(180);
}
if(PlayerPos.x <= 0) // 0:(A)
{
angle = D3DXToRadian(180);
}
//Setting up player matrixes
D3DXMATRIX TransMat , RotMat , TempMat , ;
D3DXMatrixIdentity(&TempMat);
D3DXMatrixIdentity(&RotMat);
D3DXMatrixIdentity(&TransMat);
//Setup Rotation matrix .
D3DXMatrixRotationY(&RotMat,angle);
angle = 0.0f ;
//Attach PlayerLook Vector to rotation matrix
D3DXVec3TransformCoord(&PlayerLook,&PlayerLook,&RotMat);
//gathering rotation matrix with player matrix
D3DXMatrixMultiply(&Player_Matrix,&Player_Matrix,&RotMat);
//transmat is an empty matrix to collect new player position
D3DXMatrixTranslation(&TransMat, PlayerPos.x,PlayerPos.y, PlayerPos.z);
//multiply new position matrix with main player matrix
D3DXMatrixMultiply(&TempMat,&Player_Matrix,&TransMat);
d3ddev->SetTransform(D3DTS_WORLD,&TempMat);
Main_Player->Render();
I would look into how you're calculating m_timeDelta. Perhaps your methodology is allowing floating point error to build up.
Here's an article on the subject: https://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/
I'm with Tom Forsyth, and think that 64-bit integer is the best storage type for absolute times (http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[A%20matter%20of%20precision]]), but double will work fine too.
PlayerLook seems suspicious.
It is being iteratively rotated every frame, with floating point errors it's possible it might gradually be shrinking, probably only on the frames where the rotation changes.
You could confirm by looking at its value in the debugger after several hours of running, or you could eliminate it as a possibility if you renormalize it every frame and see if the slowdown disappears.
I've just found both the problem and its solution :
As I've explained before that all the App works fine , the problem is : the movement (just x++,y++,z++) gets slow and slow after a period of time . First I thought that it's a memory leak but the animation and delta time works fine .here, tried to find the reasonable problem that causes this .
I got the release of the App to other PC , after periods of time the App works fine , at that moment I saw the FPS won't reach 60.0 frame per sec , after search in MS DirectX SDK inside DXUT I found a struct which control FPS , there a Doc that speaks about GPU and the acceleration , they advice to control FPS and its limitation , Here is the Code :
//-----------------------------------------------------------------------------
// Name: LockFrameRate()
// Desc: Limit The frame Rate to specified
//-----------------------------------------------------------------------------
bool LockFrameRate(int frame_rate , float SecPerCnt )
{
static __int64 StartTime = 0 ;
__int64 CurTime = 0 ;
QueryPerformanceCounter((LARGE_INTEGER*)&CurTime);
float CurrentSecond = (float)((CurTime - StartTime )* SecPerCnt ) ;
// Get the elapsed time by subtracting the current time from the last time
// If the desired frame rate amount of seconds has passed -- return true (ie Blit())
if( CurrentSecond > (1.0f / frame_rate) )
{
// Reset the last time
StartTime = CurTime;
return true;
}
return false;
}
// int WINAPI WinMain(....)
//***************************
// Initialize Timing Preformance . *
//***************************
//Store Counts per second
__int64 CountPerSec = 0 ;
//Gets how many counts does the CPU do per second
QueryPerformanceFrequency((LARGE_INTEGER*)&CountPerSec);
//Gets second per count to preform it with different typs of CPUs
float SecondPerCount = 1.0f / CountPerSec ;
//Initial Previous Time
__int64 PrevTime = 0 ;
QueryPerformanceCounter((LARGE_INTEGER*)&PrevTime);
while(msg.message != WM_QUIT) // while not quit message , go on
{
if(PeekMessage(&msg,NULL,0U,0U,PM_REMOVE))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
else if(LockFrameRate(60 , SecondPerCount)) // If it is time to draw, do so , I selected 60 fps to limit
{
//Capture Current Time
__int64 CurTime = 0 ;
QueryPerformanceCounter((LARGE_INTEGER*)&CurTime);
//Calculate Delta Time
float DeltaTime =(float) ((CurTime - PrevTime) * SecondPerCount) ;
//Engine loop
Engine->Engine_Run(DeltaTime , SecondPerCount );
//After Frame Ends set Pervious Time to Current Time
PrevTime = CurTime ;
}
//else
//Sleep(1); // Give the OS a little bit of time to process other things
}
I commented Sleep(1) because I know scientifically it's just a burden on the CPU but I put it for Scientific confidence , that's between App and other there is a idle proc that waits for each other this is in nowadays computer technologies .
if you want to try it you can feel some screen stops happens these are unwanted
.
thank you Stack Over Flow .
. thank you guys .

SDL image disappears after 15 seconds

I'm learning SDL and I have a frustrating problem. Code is below.
Even though there is a loop that keeps the program alive, when I load an image and change the x value of the source rect to animate, the image that was loaded disappears after exactly 15 seconds. This does not happen with static images. Only with animations. I'm sure there is a simple thing I'm missing but I cant see it.
void update(){
rect1.x = 62 * int ( (SDL_GetTicks() / 100) % 12);
/* 62 is the width of a frame, 12 is the number of frames */
}
void shark(){
surface = IMG_Load("s1.png");
if (surface != 0){
texture = SDL_CreateTextureFromSurface(renderer,surface);
SDL_FreeSurface(surface);
}
rect1.y = 0;
rect1.h = 90;
rect1.w = 60;
rect2.x = 0;
rect2.y = 0;
rect2.h = rect1.h+30; // enlarging the image
rect2.w = rect1.w+30;
SDL_RenderCopy(renderer,texture,&rect1,&rect2);
}
void render(){
SDL_SetRenderDrawColor(renderer, 0, 0, 100, 150);
SDL_RenderPresent(renderer);
SDL_RenderClear(renderer);
}
and in main
update();
shark();
render();
SDL_image header is included, linked, dll exists. Could be the dll is broken?
I left out rest of the program to keep it simple. If this is not enough, I can post the whole thing.
Every time you call the shark function, it loads another copy of the texture. With that in a loop like you have it, you will run out of video memory quickly (unless you are calling SDL_DestroyTexture after every frame, which you have not indicated). At which point, you will no longer be able to load textures. Apparently this takes about fifteen seconds for you.
If you're going to use the same image over and over, then just load it once, before your main loop.
This line int ( (SDL_GetTicks() / 100) % 12);
SDL_GetTicks() returns the number of miliseconds that have elapsed since the lib initialized (https://wiki.libsdl.org/SDL_GetTicks). So you're updating with the TOTAL AMOUNT OF TIME since your application started, not the time since last frame.
You're supposed to keep count of the last time and update the application with how much time has passed since the last update.
Uint32 currentTime=SDL_GetTicks();
int deltaTime = (int)( currentTime-lastTime );
lastTime=currentTime; //declared previously
update( deltaTime );
shark();
render();
Edit: Benjamin is right, the update line works fine.
Still using the deltaTime is a good advice. In a game, for instance, you won't use the total time since the beginning of the application, you'll probably need to keep your own counter of how much time has passed (since you start an animation).
But there's nothing wrong with that line for your program anyhow.

MFT Frame Extraction in c++

I have working solution to extract frames from a video in c++ at github. Problem is its very slow. What I am doing is I am using a timer and playing video and whenever frame is ready I convert it into bitmap and saves it and seek to next position . This is not the right approach I think, there must be another way of pulling out frames. Please go through Github project and suggest any changes.
following is my Timer function
if (m_spMediaEngine != nullptr)
{
LONGLONG pts;
if (m_spMediaEngine->OnVideoStreamTick(&pts) == S_OK)
{
// new frame available at the media engine so get it
ComPtr<ID3D11Texture2D> spTextureDst;
MEDIA::ThrowIfFailed(
m_d3dDevice->CreateTexture2D(
&CD3D11_TEXTURE2D_DESC(
DXGI_FORMAT_B8G8R8A8_UNORM,
m_rcTarget.right, // Width
m_rcTarget.bottom, // Height
1, // MipLevels
1, // ArraySize
D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_RENDER_TARGET
),
nullptr,
&spTextureDst
)
);
if (FAILED(
m_spMediaEngine->TransferVideoFrame(spTextureDst.Get(), nullptr, &m_rcTarget, &m_bkgColor)
))
{
return;
}
Position = Position + interval;
SetPlaybackPosition(Position);
ComPtr<IDXGISurface2> surface;
MEDIA::ThrowIfFailed(
spTextureDst.Get()->QueryInterface(
__uuidof(IDXGISurface2), &surface)
);
D2D1_BITMAP_PROPERTIES1 bitmapProperties =
D2D1::BitmapProperties1(
D2D1_BITMAP_OPTIONS_TARGET | D2D1_BITMAP_OPTIONS_CANNOT_DRAW,
D2D1::PixelFormat(DXGI_FORMAT_B8G8R8A8_UNORM, D2D1_ALPHA_MODE_PREMULTIPLIED),
96,
96
);
m_d2dContext->CreateBitmapFromDxgiSurface(surface.Get(), &bitmapProperties, &bitmap);
SaveBitmapToFile();
}
}
My Question is : Is this the right and only way of extracting frames ?
I would do something along the lines of converting it to a hashsum and storing that, which should speed it up. For example, you could create a class to hold a specific instance of a hash, (or create a linked list for those hashes), and then extract a frame and hash it using dhash http://www.hackerfactor.com/blog/index.php?/archives/2013/01/21.html

Display Different images per monitor directX 10

I am fairly new to DirectX 10 programming, and I have been trying to do the following with my limited skills (though I have a strong background with OpenGL)
I am trying to display 2 different textured Quads, 1 per monitor. To do so, I understood that I need a single D3D10 Device, multiple (2) swap chains, and single VertexBuffer
While I think I'm able to create all of those, I'm still pretty unsure how to handle all of them. Do I need multiple ID3D10RenderTargetView(s) ? How and where should I Use OMSetRenderTargets(...) ?
Other than MSDN, documentation or explaination of those concepts are rather limited, so any help would be very welcome. Here is some code I have :
Here's the rendering code
for(int i = 0; i < screenNumber; i++){
//clear scene
pD3DDevice->ClearRenderTargetView( pRenderTargetView, D3DXCOLOR(0,1,0,0) );
//fill vertex buffer with vertices
UINT numVertices = 4;
vertex* v = NULL;
//lock vertex buffer for CPU use
pVertexBuffer->Map(D3D10_MAP_WRITE_DISCARD, 0, (void**) &v );
v[0] = vertex( D3DXVECTOR3(-1,-1,0), D3DXVECTOR4(1,0,0,1), D3DXVECTOR2(0.0f, 1.0f) );
v[1] = vertex( D3DXVECTOR3(-1,1,0), D3DXVECTOR4(0,1,0,1), D3DXVECTOR2(0.0f, 0.0f) );
v[2] = vertex( D3DXVECTOR3(1,-1,0), D3DXVECTOR4(0,0,1,1), D3DXVECTOR2(1.0f, 1.0f) );
v[3] = vertex( D3DXVECTOR3(1,1,0), D3DXVECTOR4(1,1,0,1), D3DXVECTOR2(1.0f, 0.0f) );
pVertexBuffer->Unmap();
// Set primitive topology
pD3DDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP );
//set texture
pTextureSR->SetResource( textureSRV[textureIndex] );
//get technique desc
D3D10_TECHNIQUE_DESC techDesc;
pBasicTechnique->GetDesc( &techDesc );
// This is where you actually use the shader code
for( UINT p = 0; p < techDesc.Passes; ++p )
{
//apply technique
pBasicTechnique->GetPassByIndex( p )->Apply( 0 );
//draw
pD3DDevice->Draw( numVertices, 0 );
}
//flip buffers
pSwapChain[i]->Present(0,0);
}
And here's the code for creating rendering targets, which I am not sure is good
for(int i = 0; i < screenNumber; ++i){
//try to get the back buffer
ID3D10Texture2D* pBackBuffer;
if ( FAILED( pSwapChain[1]->GetBuffer(0, __uuidof(ID3D10Texture2D), (LPVOID*) &pBackBuffer) ) ) return fatalError("Could not get back buffer");
//try to create render target view
if ( FAILED( pD3DDevice->CreateRenderTargetView(pBackBuffer, NULL, &pRenderTargetView) ) ) return fatalError("Could not create render target view");
pBackBuffer->Release();
pD3DDevice->OMSetRenderTargets(1, &pRenderTargetView, NULL);
}
return true;
}
I hope I got the gist of what you wish to do - render different content on two different monitors while using a single graphics card (graphics adapter) which maps its output to those monitors. For that, you're going to need one device (for the single graphics card/adapter) and enumerate just how many outputs there are at the user's machine.
So, in total - that means one device, two outputs, two windows and therefore - two swap chains.
Here's a quick result of my little experiment:
A little introduction
With DirectX 10+, this falls into the DXGI (DirectX Graphics Infrastructure) which manages the common low-level logistics involved with DirectX 10+ development which, as you probably know, dumped the old requirement of enumerating feature sets and the like - requiring every DX10+ capable card to share in on all of the features defined by the API. The only thing that varies is the extent and capability of the card (in other words, lousy performance is preferable to the app crashing and burning). This was all within DirectX 9 in the past, but people at Microsoft decided to pull it out and call it DXGI. Now, we can use DXGI functionality to set up our multi monitor environment.
Do I need multiple ID3D10RenderTargetView(s) ?
Yes, you do need multiple render target views, count depends (like the swap chains and windows) on the number of monitors you have. But, to save you from spewing words, let's write it out as simple as possible and additional information where it's needed:
Enumerate all adapters available on the system.
For each adapter, enumerate all outputs available (and active) and create a device to accompany it.
With the enumerated data stored in a suitable structure (think arrays which can quickly relinquish size information), use it to create n windows, swap chains, render target views, depth/stencil textures and their respective views where n is equal to the number of outputs.
With everything created, for each window you are rendering into, you can define special routines which will use the available geometry (and other) data to output your results - which resolves to what each monitor gets in fullscreen (don't forget to adjust the viewport for every window accordingly).
Present your data by iterating over every swap chain which is linked to its respective window and swap buffers with Present()
Now, while this is rich in word count, some code is worth a lot more. This is designed to give you a coarse idea of what goes into implementing a simple multimonitor application. So, assumptions are that there is only one adapter ( a rather bold statement nowadays ) and multiple outputs - and no failsafes. I'll leave the fun part to you. Answer to the second question is downstairs...
Do note there's no memory management involved. We assume everything magically gets cleaned up when it is not needed for illustration purposes. Be a good memory citizen.
Getting the adapter
IDXGIAdapter* adapter = NULL;
void GetAdapter() // applicable for multiple ones with little effort
{
// remember, we assume there's only one adapter (example purposes)
for( int i = 0; DXGI_ERROR_NOT_FOUND != factory->EnumAdapters( i, &adapter ); ++i )
{
// get the description of the adapter, assuming no failure
DXGI_ADAPTER_DESC adapterDesc;
HRESULT hr = adapter->GetDesc( &adapterDesc );
// Getting the outputs active on our adapter
EnumOutputsOnAdapter();
}
Acquiring the outputs on our adapter
std::vector<IDXGIOutput*> outputArray; // contains outputs per adapter
void EnumOutputsOnAdapter()
{
IDXGIOutput* output = NULL;
for(int i = 0; DXGI_ERROR_NOT_FOUND != adapter->EnumOutputs(i, &output); ++i)
{
// get the description
DXGI_OUTPUT_DESC outputDesc;
HRESULT hr = output->GetDesc( &outputDesc );
outputArray.push_back( output );
}
}
Now, I must assume that you're at least aware of the Win32 API considerations, creating window classes, registering with the system, creating windows, etc... Therefore, I will not qualify its creation, only elaborate how it pertains to multiple windows. Also, I will only consider the fullscreen case here, but creating it in windowed mode is more than possible and rather trivial.
Creating the actual windows for our outputs
Since we assume existence of just one adapter, we only consider the enumerated outputs linked to that particular adapter. It would be preferable to organize all window data in neat little structures, but for the purposes of this answer, we'll just shove them into a simple struct and then into yet another std::vector object, and by them I mean handles to respective windows (HWND) and their size (although for our case it's constant).
But still, we have to address the fact that we have one swap chain, one render target view, one depth/stencil view per window. So, why not feed all of that in that little struct which describes each of our windows? Makes sense, right?
struct WindowDataContainer
{
//Direct3D 10 stuff per window data
IDXGISwapChain* swapChain;
ID3D10RenderTargetView* renderTargetView;
ID3D10DepthStencilView* depthStencilView;
// window goodies
HWND hWnd;
int width;
int height;
}
Nice. Well, not really. But still... Moving on! Now to create the windows for outputs:
std::vector<WindowDataContainer*> windowsArray;
void CreateWindowsForOutputs()
{
for( int i = 0; i < outputArray.size(); ++i )
{
IDXGIOutput* output = outputArray.at(i);
DXGI_OUTPUT_DESC outputDesc;
p_Output->GetDesc( &outputDesc );
int x = outputDesc.DesktopCoordinates.left;
int y = outputDesc.DesktopCoordinates.top;
int width = outputDesc.DesktopCoordinates.right - x;
int height = outputDesc.DesktopCoordinates.bottom - y;
// Don't forget to clean this up. And all D3D COM objects.
WindowDataContainer* window = new WindowDataContainer;
window->hWnd = CreateWindow( windowClassName,
windowName,
WS_POPUP,
x,
y,
width,
height,
NULL,
0,
instance,
NULL );
// show the window
ShowWindow( window->hWnd, SW_SHOWDEFAULT );
// set width and height
window->width = width;
window->height = height;
// shove it in the std::vector
windowsArray.push_back( window );
//if first window, associate it with DXGI so it can jump in
// when there is something of interest in the message queue
// think fullscreen mode switches etc. MSDN for more info.
if(i == 0)
factory->MakeWindowAssociation( window->hWnd, 0 );
}
}
Cute, now that's done. Since we only have one adapter and therefore only one device to accompany it, create it as usual. In my case, it's simply a global interface pointer which can be accessed all over the place. We are not going for code of the year here, so why the hell not, eh?
Creating the swap chains, views and the depth/stencil 2D texture
Now, our friendly swap chains... You might be used to actually creating them by invoking the "naked" function D3D10CreateDeviceAndSwapChain(...), but as you know, we've already made our device. We only want one. And multiple swap chains. Well, that's a pickle. Luckily, our DXGIFactory interface has swap chains on its production line which we can receive for free with complementary kegs of rum. Onto the swap chains then, create for every window one:
void CreateSwapChainsAndViews()
{
for( int i = 0; i < windowsArray.size(); i++ )
{
WindowDataContainer* window = windowsArray.at(i);
// get the dxgi device
IDXGIDevice* DXGIDevice = NULL;
device->QueryInterface( IID_IDXGIDevice, ( void** )&DXGIDevice ); // COM stuff, hopefully you are familiar
// create a swap chain
DXGI_SWAP_CHAIN_DESC swapChainDesc;
// fill it in
HRESULT hr = factory->CreateSwapChain( DXGIDevice, &swapChainDesc, &p_Window->swapChain );
DXGIDevice->Release();
DXGIDevice = NULL;
// get the backbuffer
ID3D10Texture2D* backBuffer = NULL;
hr = window->swapChain->GetBuffer( 0, IID_ID3D10Texture2D, ( void** )&backBuffer );
// get the backbuffer desc
D3D10_TEXTURE2D_DESC backBufferDesc;
backBuffer->GetDesc( &backBufferDesc );
// create the render target view
D3D10_RENDER_TARGET_VIEW_DESC RTVDesc;
// fill it in
device->CreateRenderTargetView( backBuffer, &RTVDesc, &window->renderTargetView );
backBuffer->Release();
backBuffer = NULL;
// Create depth stencil texture
ID3D10Texture2D* depthStencil = NULL;
D3D10_TEXTURE2D_DESC descDepth;
// fill it in
device->CreateTexture2D( &descDepth, NULL, &depthStencil );
// Create the depth stencil view
D3D10_DEPTH_STENCIL_VIEW_DESC descDSV;
// fill it in
device->CreateDepthStencilView( depthStencil, &descDSV, &window->depthStencilView );
}
}
We now have everything we need. All that you need to do is define a function which iterates over all windows and draws different stuff appropriately.
How and where should I Use OMSetRenderTargets(...) ?
In the just mentioned function which iterates over all windows and uses the appropriate render target (courtesy of our per-window data container):
void MultiRender( )
{
// Clear them all
for( int i = 0; i < windowsArray.size(); i++ )
{
WindowDataContainer* window = windowsArray.at(i);
// There is the answer to your second question:
device->OMSetRenderTargets( 1, &window->renderTargetView, window->depthStencilView );
// Don't forget to adjust the viewport, in fullscreen it's not important...
D3D10_VIEWPORT Viewport;
Viewport.TopLeftX = 0;
Viewport.TopLeftY = 0;
Viewport.Width = window->width;
Viewport.Height = window->height;
Viewport.MinDepth = 0.0f;
Viewport.MaxDepth = 1.0f;
device->RSSetViewports( 1, &Viewport );
// TO DO: AMAZING STUFF PER WINDOW
}
}
Of course, don't forget to run through all the swap chains and swap buffers per window basis. The code here is just for the purposes of this answer, it requires a bit more work, error checking (failsafes) and contemplation to get it working just the way you like it - in other words - it should give you a simplified overview, not a production solution.
Good luck and happy coding! Sheesh, this is huge.