parallel_for function causes memory leaks (sometimes) - c++

I'm making a simple native MFC application and I'm using th Concurrency namespace to perform a simple parallel program (drawing a Mandelbrot set). The program is very very basic so far, clicking one button draws in parallel, the other draws in serial. The serial execution function is very basic and draws the right picture. As does the parallel execution function, however when running the debug build and exiting the program, the output tells me there is a memory leak.
Here's the code:
void CMandelbrotView::DrawSetParallel() {
// Get client area dimension which will be the image size
RECT rect;
GetClientRect(&rect);
//GetClientRect(pDC, &rect);
int imageHeight(rect.bottom);
int imageWidth(rect.right);
const double realMin(-2.1); // Minimum real value
double imaginaryMin(-1.3); // Minimum imaginary value
double imaginaryMax(+1.3); // Maximum imaginary value
// Set maximum imaginary so axes are the same scale
double realMax(realMin+(imaginaryMax-imaginaryMin)*imageWidth/imageHeight);
// Get scale factors to convert pixel coordinates
double realScale((realMax-realMin)/(imageWidth-1));
double imaginaryScale((imaginaryMax-imaginaryMin)/(imageHeight-1));
CClientDC hdc(this); // DC is for this view
OnPrepareDC(&hdc); // Get origin adjusted
critical_section cs; // Mutex for BitBlt() operation
parallel_for(0, imageHeight, [&](int y) // Iterate parallel over image rows
{
cs.lock(); // Lock for access to client DC
// Create bitmap for one row of pixels in image
HDC memDC = CreateCompatibleDC(hdc); // Get device context to draw pixels
HBITMAP bmp = CreateCompatibleBitmap(hdc, imageWidth, 1);
cs.unlock(); // We are done with hdc here so unlock
HGDIOBJ oldBmp = SelectObject(memDC, bmp); // Select bitmap into DC
double cReal(0.0), cImaginary(0.0); // Stores c components
double zReal(0.0), zImaginary(0.0); // Stores z components
zImaginary = cImaginary = imaginaryMax - y*imaginaryScale;
for(int x = 0; x < imageWidth; x++) // Iterate over pixels in a row
{
zReal = cReal = realMin + x*realScale;
// Set current pixel color based on n
SetPixel(memDC, x, 0, Color(IteratePoint(zReal, zImaginary, cReal, cImaginary)));
}
cs.lock(); // Lock to write to hdc
// Transfer pixel row to client area device context
BitBlt(hdc, 0, y, imageWidth, 1, memDC, 0, 0, SRCCOPY);
cs.unlock(); // Release the lock
SelectObject(memDC, oldBmp);
DeleteObject(bmp); // Delete bmp
DeleteDC(memDC); // and our working DC
});}
The code for parallel execution is different from the serial execution code that it creates separate rows of the Mandelbrot image in parallel, and uses a critical section lock to make sure the threads don't fight over the same device context handle.
Now the reason I said there is memory leak reported sometimes is because running the release build does not cause a memory leak to be reported. Also, when running parallel execution function multiple times I don't really notice more memory being being used up, I have 6GB of RAM in case anyone is wondering.
As far as the performance goes, my quad-core machine does actually present a roughly 4x increase in calculation+drawing speed from serial execution.
I've also seen similar questions asks on the msdn website, but not of much use, because this may be a VS bug. Anyway, I'd like a parallel programmer's opinion.

This problem is documented in the fix-list for VS2010 SP1. The feedback article is here. Beware that SP1 is still in beta right now so avoid installing it on important production machines. Download is here.

What I would say is, write a quick deleter function for unique_ptr and use that for your resources. The only way I could see this code leaking is by throwing, but I don't see any places that throw. Once you remove that vulnerability, I don't see any leaks at all.

You have mentioned that the leak occurs 'some times' and not always. However, there is no branching in your code (specifically: branching with memory allocations) so either it should always cause the leak or never cause the leak. Moreover, as already mentioned in my comment, Memory is being allocated only at two place for memDc and bmp and you are properly cleaning up the resources for both the variables.
In simple words, I want to say that there is no leak in the above code :)
If there is actually a leak in your application, that would be in some other code path probably.

Related

Win32 C++ effective storage or creation of bitmaps used for colour picker background

I am writing an HSV colour picker in plain win32 c++.
I have a Sat/Val box and a Hue slider exactly like the image on the left here:
Up until now I was just generating the background of the Sat-Val box whenever I needed it.
But now that I have a simple prototype and I am circling around to refactor and clean up I have realized that it actually takes a sizeable amount of time to generate the background bitmap for the sat-val box.
Since scrolling the hue slider should update the sat-val box with the appropriate hue, and it should be responsive and fast, I guess I cannot generate the background on the fly because it's too costly.
I have been using a very simple function like this:
HBITMAP ColorPicker::genSVBackground(uint32_t hue)
{
uint32_t width = 256;
uint32_t height = 256;
HDC hDC = GetDC(hwnd);
HDC memDC = CreateCompatibleDC(hDC);
HBITMAP bitmap = CreateCompatibleBitmap(hDC, width, height);
HGDIOBJ oldObj = SelectObject(memDC, bitmap);
for (uint32_t y = 0; y < height; ++y) {
for (uint32_t x = 0; x < width; ++x) {
RGBColor rgbCol = hsv_to_rgb(HSVColor(hue, x, 255 - y));
COLORREF col = (rgbCol.blue << 16) | (rgbCol.green << 8) | (rgbCol.red);
SetPixel(memDC, x, y, col);
}
}
SelectObject(memDC, oldObj);
DeleteDC(memDC);
return bitmap;
}
So the first question is:
Can I make this faster? Fast enough that I can still generate it on the fly? Should I?
And if I cannot make it faster, or if there's really no point and I might as well just use an external resource instead.
What is the best approach to go about storing this in an external resource?
Should I create one giant array that describes a 'cube' of hue x sat x val (my hue, sat and val system is 0-255 each) so I can just load the entire thing into memory and index certain positions to read out an entire background slice?
I know how to do the specifics of the resource storage/loading I'm just not sure if I'm approaching this problem the right way.
Should I store each slice as a separate resource? 256 of them?
Is there a standard way to solving this kind of thing?
I think the source of your slowness is writing one pixel at a time to the memory DC.
Instead of calling SetPixel 256x256 times in a loop, blast an entire matrix of pixels to the DC at once. At the very least, that's 64K function invocations.
I used to do this kind of buffering with GDI+ all the time. I'd create a Bitmap object and then call the LockBits method on it. Do my rendering directly on the returned pointer, and then UnlockBits. It's been a while since I've done the pure Win32 variations of this, but I'm sure it's possible.

Trying to render web browser control using IViewObject::Draw() into HDC fails with IE8 but succeeds with IE11

I have an MFC dialog window where I added a WebBrowser control (that encapsulates the Internet Explorer engine.)
The goal of the following code is to render the contents of the said web browser control into a device context that I can later use for printing:
//MFC code, error checks are omitted for brevity
//'m_browser' = is a web browser control of type `CExplorer1`
IDispatch* pHtmlDoc = m_browser.get_Document();
CComPtr<IHTMLDocument2> pHtmlDocument2;
pHtmlDoc->QueryInterface(IID_IHTMLDocument2, (void**)&pHtmlDocument2));
//Get IViewObject2 for the entire document that we will use to render into a DC
CComPtr<IViewObject2> pViewObject;
pHtmlDocument2->QueryInterface(IID_IViewObject2, (void **)&pViewObject));
CComPtr<IHTMLElement> pBody;
pHtmlDocument2->get_body(&pBody));
CComPtr<IHTMLElement2> pBody2;
pBody->QueryInterface(IID_IHTMLElement2, (void **)&pBody2));
//Get default printer DC
CPrintDialog pd(TRUE, PD_ALLPAGES | PD_USEDEVMODECOPIES | PD_NOPAGENUMS | PD_HIDEPRINTTOFILE | PD_NOSELECTION);
pd.m_pd.Flags |= PD_RETURNDC | PD_RETURNDEFAULT;
pd.DoModal(); //corrected later
HDC hPrintDC = pd.CreatePrinterDC();
//Calc bitmap width based on printer DC specs
//Note that this width will be larger than
//the width of the WebControl window itself due to
//printer's much higher DPI setting...
int n_bitmapWidth = ::GetDeviceCaps(hPrintDC, HORZRES); //Use entire printable area
//Get full size of the document
long n_scrollWidth;
long n_scrollHeight;
pBody2->get_scrollWidth(&n_scrollWidth);
pBody2->get_scrollHeight(&n_scrollHeight);
//Calc proportional size of the bitmap in the DC to render
int nWidth = n_bitmapWidth;
int nHeight = n_bitmapWidth * n_scrollHeight / n_scrollWidth;
//Create memory DC to render into
HDC hDc = ::GetDC(hWnd);
HDC hCompDc = ::CreateCompatibleDC(hDC);
//I'm using a raw DIB section here as I'll need to access
//its bitmap bits directly later in my code...
BITMAPINFOHEADER infoHeader = {0};
infoHeader.biSize = sizeof(infoHeader);
infoHeader.biWidth = nWidth;
infoHeader.biHeight = -nHeight;
infoHeader.biPlanes = 1;
infoHeader.biBitCount = 24;
infoHeader.biCompression = BI_RGB;
BITMAPINFO info;
info.bmiHeader = infoHeader;
//Create a bitmap as DIB section of size `nWidth` by `nHeight` pixels
BYTE* pMemory = 0;
HBITMAP hBitmap = ::CreateDIBSection(hDc, &info, DIB_RGB_COLORS, (void**)&pMemory, 0, 0);
HBITMAP hOldBmp = (HBITMAP)::SelectObject(hCompDc, hBitmap);
RECT rcAll = {0, 0, nWidth, nHeight};
::FillRect(hCompDc, &rcAll, (HBRUSH)::GetStockObject(WHITE_BRUSH));
RECTL rectPrnt = {0, 0, nWidth, nHeight};
//Do the upscaling & render -- note that IE8 fails to render it here!!!!
pViewObject->Draw(DVASPECT_CONTENT, //DVASPECT_DOCPRINT
-1, NULL, NULL, NULL, hCompDc,
&rectPrnt,
NULL,
NULL, 0));
::SelectObject(hCompDc, hOldBmp);
//Now the bitmap in `hCompDc` contains the resulting pixels
//For debugging purposes, save it as .bmp file
BITMAPFILEHEADER fileHeader = {0};
fileHeader.bfType = 0x4d42;
fileHeader.bfSize = 0;
fileHeader.bfReserved1 = 0;
fileHeader.bfReserved2 = 0;
fileHeader.bfOffBits = sizeof(BITMAPFILEHEADER) + sizeof(BITMAPINFOHEADER);
CFile file(
L"path-to\\test.bmp",
CFile::modeCreate | CFile::modeReadWrite | CFile::shareDenyNone);
file.Write((char*)&fileHeader, sizeof(fileHeader));
file.Write((char*)&infoHeader, sizeof(infoHeader));
int bytes = (((24 * nWidth + 31) & (~31)) / 8) * nHeight;
file.Write(pMemory, bytes);
//Clean up
::DeleteObject(hBitmap);
::DeleteDC(hCompDc);
::ReleaseDC(hWnd, hDc);
::DeleteDC(hPrintDC);
This code works fine if I have the latest IE11 installed on my development machine. But if, for instance, someone has IE8 installed on their Windows 7, the IViewObject::Draw method will render only a small part of the document (equal to the size of the web browser control itself.)
The best way to describe it is to illustrate it with the examples:
Normally rendered test page with IE11 installed:
and here's what happens with IE8 installed:
Does anyone have any idea what am I doing wrong here that IE8 doesn't like?
EDIT1: Did some more digging into the IViewObject::Draw function with WinDbg and then found the source code for it. Here's CServer::Draw() that is IViewObject::Draw, and then CDoc::Draw() that is called internally from CServer::Draw().
First, thanks for the interesting question. While not so practical - not a lot of people use IE8 today - it was not so trivial to solve. I'll describe what is the problem and provide a simplistic but working solution that you can improve.
Before I go into IE8 solution, a couple of points:
The solution with window resize to fit scroll size is not stable if you can encounter large documents. If you don't know what to anticipate, you need to refactor the solution for later explorers as well, to avoid relying on resizing window to scroll size.
Why carry giant bitmap? Metafiles etc. may fit better. Any large enough page at this resolution is going to blow memory on PC with the naive DIB creation. Google page in provided sample renders to 100Mb bitmap file while emf, from which rasterization is done, takes less than 1Mb.
While I don't know exact requirements and limitations of your project, I'm 99% sure that drawing into gigantic DIB is not the best solution. Even EMF, while better, is not the best either. If you need, for example, add a signature and then print, there are better ways to handle this. This, of course, just a side note, not related to the question itself.
IE8 rendering problem
In IE8 renderer there is a bug. Draw() will be clipped at pixel dimensions of actual display area (the visible rectangle you see is the original display area in the scale of rendering context).
So, if you have a scaled target that is larger than the actual size, while scaled, it will be clipped to the size in scaled pixels anyway (so it has much less content than original rectangle).
If it doesn't clip for somebody on genuine IE8, then there are remainders of later IE in the system or there is other non-scratch setup, system update or alike.
Workaround possibilities
Good news it is possible to workaround, bad news workarounds are a bit nasty.
First, it is still possible to workaround with IViewObject. But, because there is arbitrary scaling involved and accessible source rectangle is very small, this solution has some complexities that I think are beyong an SO answer. So I would not dive into this path.
Instead, we can render through another, now outdated API: IHTMLElementRender. It allows to render the page with DrawToDC into arbitrary context. Unfortunately, it is not so simple as it may appear and goes beyond just providing device context.
First, there is similar bug of clipping. It can be handled easier because clipping occurs at large values beyond screen dimensions. Second, when using device context transformations it either will not work or will mess the rendered html, so you can't actually rely on scale or translate. Both problems require relatively non-trivial handling and complicate one another.
The solution
I'll describe and provide sample code for non-optimal but working on most simple pages solution. In general, it is possible to achieve a perfect and more efficient solution, but, again, this goes beyond the scope of an answer. Obviously, it is IE8 only, so you'll need to check browser version and execute different handlers for IE8 vs. IE9 or higher, but you can take some ideas to improve other browsers rendering too.
There are two interrelated workarounds here.
Up-scaling
First, how do we upscale the vector content to the printer quality if we can't transform? The workaround here is to render to a context compatible with printer dc. What will happen is that content will be rendered at printer DPI. Note it will not fit exactly printer width, it will scale to printerDPI/screenDPI.
Later, on rasterization, we downscale to fit the printer width. We render initially to EMF, so there is no much of a quality loss (that will occur on the printer itself anyway). If you need higher quality (I doubt it), there are two possibilities - modify the solution to render for target width (this is not trivial) or work with resulting emf instead of bitmap and let printer to make the downscale fit.
Another note is that you currently use just printer width, but there may be non-printable margins that you need to query from printer and account for them. So it may re-scaled by printer even if you provide bitmap in exact printer dimensions. But again, I doubt this resolution disparity will make any difference for your project.
Clipping
Second to overcome is the clipping. To overcome this limitation, we render content in small chunks so they are not clipped by the renderer. After rendering a chunk, we change the scroll position of the document and render next chunk to the appropriate position in the target DC. This can be optimized to use larger chunks e.g. nearest DPI multiple to 1024 (using window resize), but I didn't implement that (it is just a speed optimization). If you don't make this optimization, ensure that minimum browser window size is not too small.
Note, doing this scroll on an arbitrary fractional scale will be an approximation and is not so simple to implement in generic case. But with regular printer and screen we can make integer chunk steps in multiplies of DPI, e.g. if screen is 96 DPI and printer is 600DPI, we make steps in the same multiple of 96 and 600 on each context and everything is much simpler. However, the remainder from top or bottom after processing all whole chunks will not be in DPI multiplies so we can't scroll as easily there.
In general, we can approximate scroll position in printer space and hope there will be no misfit between final chunks. What I did instead is appending an absolutely positioned div with chunk size at the right bottom of the page.
Note, this can interfere with some pages and change the layout (probably not the case with simple reports). If that is a problem, you'll need to add remainder handling after loops instead of adding an element. Most simple solution in that case is still to pad with div but not with the full chunk size but just to make content width multiple of screen DPI.
Even simpler idea, as I've realized later, is just to resize window to the nearest multiple of DPI and take this window size as a chunk size. You can try that instead of div, this will simplify the code and fix pages that may interfere with the injected div.
The code
This is just a sample.
No error handling. You need to add checks for every COM and API call, etc.
No code style, just quick and dirty.
Not sure all acquired resources are released as needed, do your checks
You must disable page borders on browser control for this sample to work (if you need borders around browser, just render them separately, built-in are not consistent anyway). On IE8 this is not so trivial but there are many answers here or on the web. Anyway, I will include this patch in sample solution project. You can render with borders as well and exclude them, but this will be unnecessary complication for a problem that has simple solution.
The full solution project can be found at this link, I'll post only the relevant code here.
The code below renders the page and saves in c:\temp\test.emf + c:\temp\test.bmp
void convertEmfToBitmap(const RECT& fitRect, HDC hTargetDC, HENHMETAFILE hMetafile, LPCTSTR fileName);
CComPtr<IHTMLDOMNode> appendPadElement(IHTMLDocument2* pDoc, IHTMLElement* pBody, long left, long top, long width, long height);
void removeElement(IHTMLElement* pParent, IHTMLDOMNode* pChild);
void CMFCApplication1Dlg::OnBnClickedButton2()
{
COleVariant varNull;
COleVariant varUrl = L"http://www.google.com/search?q=ie+8+must+die";
m_browser.Navigate2(varUrl, varNull, varNull, varNull, varNull);
}
void CMFCApplication1Dlg::OnBnClickedButton1()
{
//get html interfaces
IDispatch* pHtmlDoc = m_browser.get_Document();
CComPtr<IHTMLDocument2> pHtmlDocument2;
pHtmlDoc->QueryInterface(IID_IHTMLDocument2, (void**)&pHtmlDocument2);
CComPtr<IHTMLElement> pBody;
pHtmlDocument2->get_body(&pBody);
CComPtr<IHTMLElement2> pBody2;
pBody->QueryInterface(IID_IHTMLElement2, (void**)&pBody2);
CComPtr<IHTMLBodyElement> pBodyElement;
pBody->QueryInterface(IID_IHTMLBodyElement, (void**)&pBodyElement);
CComPtr<IHTMLElement> pHtml;
pBody->get_parentElement(&pHtml);
CComPtr<IHTMLElement2> pHtml2;
pHtml->QueryInterface(IID_IHTMLElement2, (void**)&pHtml2);
CComPtr<IHTMLStyle> pHtmlStyle;
pHtml->get_style(&pHtmlStyle);
CComPtr<IHTMLStyle> pBodyStyle;
pBody->get_style(&pBodyStyle);
//get screen info
HDC hWndDc = ::GetDC(m_hWnd);
const int wndLogPx = GetDeviceCaps(hWndDc, LOGPIXELSX);
const int wndLogPy = GetDeviceCaps(hWndDc, LOGPIXELSY);
//keep current values
SIZE keptBrowserSize = { m_browser.get_Width(), m_browser.get_Height() };
SIZE keptScrollPos;
//set reasonable viewport size
//m_browser.put_Width(docSize.cx);
//m_browser.put_Height(docSize.cy*2);
pHtml2->get_scrollLeft(&keptScrollPos.cx);
pHtml2->get_scrollTop(&keptScrollPos.cy);
COleVariant keptOverflow;
pBodyStyle->get_overflow(&keptOverflow.bstrVal);
//setup style and hide scroll bars
pHtmlStyle->put_border(L"0px;");
pHtmlStyle->put_overflow(L"hidden");
pBodyStyle->put_border(L"0px;");
pBodyStyle->put_overflow(L"hidden");
//get document size and visible area in screen pixels
SIZE docSize;
pBody2->get_scrollWidth(&docSize.cx);
pBody2->get_scrollHeight(&docSize.cy);
RECT clientRect = { 0 };
pHtml2->get_clientWidth(&clientRect.right);
pHtml2->get_clientHeight(&clientRect.bottom);
//derive chunk size
const SIZE clientChunkSize = {
clientRect.right - clientRect.right % wndLogPx,
clientRect.bottom - clientRect.bottom % wndLogPy };
//pad with absolutely positioned element to have enough scroll area for all chunks
//alternatively, browser can be resized to chunk multiplies (simplest), to DPI multiplies (more work).
//This pad also can be made smaller, to modulus DPI, but then need more work in the loops below
CComPtr<IHTMLDOMNode> pPadNode =
appendPadElement(pHtmlDocument2, pBody, docSize.cx, docSize.cy, clientChunkSize.cx, clientChunkSize.cy);
//get printer info
CPrintDialog pd(TRUE, PD_ALLPAGES | PD_USEDEVMODECOPIES | PD_NOPAGENUMS | PD_HIDEPRINTTOFILE | PD_NOSELECTION);
pd.m_pd.Flags |= PD_RETURNDC | PD_RETURNDEFAULT;
pd.DoModal();
HDC hPrintDC = pd.CreatePrinterDC();
const int printLogPx = GetDeviceCaps(hPrintDC, LOGPIXELSX);
const int printLogPy = GetDeviceCaps(hPrintDC, LOGPIXELSY);
const int printHorRes = ::GetDeviceCaps(hPrintDC, HORZRES);
const SIZE printChunkSize = { printLogPx * clientChunkSize.cx / wndLogPx, printLogPy * clientChunkSize.cy / wndLogPy };
//browser total unscaled print area in printer pixel space
const RECT printRectPx = { 0, 0, docSize.cx* printLogPx / wndLogPx, docSize.cy*printLogPy / wndLogPy };
//unscaled target EMF size in 0.01 mm with printer resolution
const RECT outRect001Mm = { 0, 0, 2540 * docSize.cx / wndLogPx, 2540 * docSize.cy / wndLogPy };
HDC hMetaDC = CreateEnhMetaFile(hPrintDC, L"c:\\temp\\test.emf", &outRect001Mm, NULL);
::FillRect(hMetaDC, &printRectPx, (HBRUSH)::GetStockObject(BLACK_BRUSH));
//unscaled chunk EMF size in pixels with printer resolution
const RECT chunkRectPx = { 0, 0, printChunkSize.cx, printChunkSize.cy };
//unscaled chunk EMF size in 0.01 mm with printer resolution
const RECT chunkRect001Mm = { 0, 0, 2540 * clientChunkSize.cx / wndLogPx, 2540 * clientChunkSize.cy / wndLogPy };
////////
//render page content to metafile by small chunks
//get renderer interface
CComPtr<IHTMLElementRender> pRender;
pHtml->QueryInterface(IID_IHTMLElementRender, (void**)&pRender);
COleVariant printName = L"EMF";
pRender->SetDocumentPrinter(printName.bstrVal, hMetaDC);
//current positions and target area
RECT chunkDestRectPx = { 0, 0, printChunkSize.cx, printChunkSize.cy };
POINT clientPos = { 0, 0 };
POINT printPos = { 0, 0 };
//loop over chunks left to right top to bottom until scroll area is completely covered
const SIZE lastScroll = { docSize.cx, docSize.cy};
while (clientPos.y < lastScroll.cy)
{
while (clientPos.x < lastScroll.cx)
{
//update horizontal scroll position and set target area
pHtml2->put_scrollLeft(clientPos.x);
chunkDestRectPx.left = printPos.x;
chunkDestRectPx.right = printPos.x + printChunkSize.cx;
//render to new emf, can be optimized to avoid recreation
HDC hChunkDC = CreateEnhMetaFile(hPrintDC, NULL, &chunkRect001Mm, NULL);
::FillRect(hChunkDC, &chunkRectPx, (HBRUSH)::GetStockObject(WHITE_BRUSH));
pRender->DrawToDC(hChunkDC);
HENHMETAFILE hChunkMetafile = CloseEnhMetaFile(hChunkDC);
//copy chunk to the main metafile
PlayEnhMetaFile(hMetaDC, hChunkMetafile, &chunkDestRectPx);
DeleteEnhMetaFile(hChunkMetafile);
//update horizontal positions
clientPos.x += clientChunkSize.cx;
printPos.x += printChunkSize.cx;
}
//reset horizontal positions
clientPos.x = 0;
printPos.x = 0;
//update vertical positions
clientPos.y += clientChunkSize.cy;
printPos.y += printChunkSize.cy;
pHtml2->put_scrollTop(clientPos.y);
chunkDestRectPx.top = printPos.y;
chunkDestRectPx.bottom = printPos.y + printChunkSize.cy;
}
//restore changed values on browser
//if for large pages on slow PC you get content scrolling during rendering and it is a problem,
//you can either hide the browser and show "working" or place on top first chunk content
pBodyStyle->put_overflow(keptOverflow.bstrVal);
pHtml2->put_scrollLeft(keptScrollPos.cx);
pHtml2->put_scrollTop(keptScrollPos.cy);
m_browser.put_Width(keptBrowserSize.cx);
m_browser.put_Height(keptBrowserSize.cy);
removeElement(pBody, pPadNode);
//draw to bitmap and close metafile
HENHMETAFILE hMetafile = CloseEnhMetaFile(hMetaDC);
RECT fitRect = { 0, 0, printHorRes, docSize.cy * printHorRes / docSize.cx };
convertEmfToBitmap(fitRect, hWndDc, hMetafile, L"c:\\temp\\test.bmp");
DeleteEnhMetaFile(hMetafile);
//cleanup - probably more here
::ReleaseDC(m_hWnd, hWndDc);
::DeleteDC(hPrintDC);
//{
// std::stringstream ss;
// ss << "====" << docSize.cx << "x" << docSize.cy << " -> " << fitRect.right << "x" << fitRect.bottom << "" << "\n";
// OutputDebugStringA(ss.str().c_str());
//}
}
///////////////
////some util
void convertEmfToBitmap(const RECT& fitRect, HDC hTargetDC, HENHMETAFILE hMetafile, LPCTSTR fileName)
{
//Create memory DC to render into
HDC hCompDc = ::CreateCompatibleDC(hTargetDC);
//NOTE this
BITMAPINFOHEADER infoHeader = { 0 };
infoHeader.biSize = sizeof(infoHeader);
infoHeader.biWidth = fitRect.right;
infoHeader.biHeight = -fitRect.bottom;
infoHeader.biPlanes = 1;
infoHeader.biBitCount = 24;
infoHeader.biCompression = BI_RGB;
BITMAPINFO info;
info.bmiHeader = infoHeader;
//create bitmap
BYTE* pMemory = 0;
HBITMAP hBitmap = ::CreateDIBSection(hCompDc, &info, DIB_RGB_COLORS, (void**)&pMemory, 0, 0);
HBITMAP hOldBmp = (HBITMAP)::SelectObject(hCompDc, hBitmap);
PlayEnhMetaFile(hCompDc, hMetafile, &fitRect);
BITMAPFILEHEADER fileHeader = { 0 };
fileHeader.bfType = 0x4d42;
fileHeader.bfSize = 0;
fileHeader.bfReserved1 = 0;
fileHeader.bfReserved2 = 0;
fileHeader.bfOffBits = sizeof(BITMAPFILEHEADER)+sizeof(BITMAPINFOHEADER);
CFile file(
fileName,
CFile::modeCreate | CFile::modeReadWrite | CFile::shareDenyNone);
file.Write((char*)&fileHeader, sizeof(fileHeader));
file.Write((char*)&infoHeader, sizeof(infoHeader));
int bytes = (((24 * infoHeader.biWidth + 31) & (~31)) / 8) * abs(infoHeader.biHeight);
file.Write(pMemory, bytes);
::SelectObject(hCompDc, hOldBmp);
//Clean up
if (hBitmap)
::DeleteObject(hBitmap);
::DeleteDC(hCompDc);
}
CComPtr<IHTMLDOMNode> appendPadElement(IHTMLDocument2* pDoc, IHTMLElement* pBody, long left, long top, long width, long height)
{
CComPtr<IHTMLElement> pPadElement;
pDoc->createElement(L"DIV", &pPadElement);
CComPtr<IHTMLStyle> pPadStyle;
pPadElement->get_style(&pPadStyle);
CComPtr<IHTMLStyle2> pPadStyle2;
pPadStyle->QueryInterface(IID_IHTMLStyle2, (void**)&pPadStyle2);
pPadStyle2->put_position(L"absolute");
CComVariant value = width;
pPadStyle->put_width(value);
value = height;
pPadStyle->put_height(value);
pPadStyle->put_posLeft((float)left);
pPadStyle->put_posTop((float)top);
CComPtr<IHTMLDOMNode> pPadNode;
pPadElement->QueryInterface(IID_IHTMLDOMNode, (void**)&pPadNode);
CComPtr<IHTMLDOMNode> pBodyNode;
pBody->QueryInterface(IID_IHTMLDOMNode, (void **)&pBodyNode);
pBodyNode->appendChild(pPadNode, NULL);
return pPadNode;
}
void removeElement(IHTMLElement* pParent, IHTMLDOMNode* pChild)
{
CComPtr<IHTMLDOMNode> pNode;
pParent->QueryInterface(IID_IHTMLDOMNode, (void **)&pNode);
pNode->removeChild(pChild, NULL);
}
Sample page output (4958x7656)
I have taken your code and run it on IE11 when the WebBrowser control is smaller then the page size. It rendered a portion of the page equal to control's size. Not sure why you say IE8 and IE11 are any different.
It seems that common approach to taking full page screenshots is adjusting WebBrowser size before taking screenshot, like this:
const long oldH = m_browser.get_Height();
const long oldW = m_browser.get_Width();
m_browser.put_Height(n_scrollHeight);
m_browser.put_Width(n_scrollWidth);
//Do the upscaling & render -- note that IE8 fails to render it here!!!!
pViewObject->Draw(DVASPECT_CONTENT, //DVASPECT_DOCPRINT
-1, NULL, NULL, NULL, hCompDc,
&rectPrnt,
NULL,
NULL, 0);
m_browser.put_Height(oldH);
m_browser.put_Width(oldW);
This seems to work well, even on large pages such as the one you're currently reading (I have taken screenshot 1920x8477). It works both on my IE11 and on IE8 virtual machine
It has a side effect of resetting scrollbars, but that can be solved, for example by using an invisible copy of WebBrowser for screenshots.
PS: You could have done a better job by providing example code that can be compiled, at least ;)

C++ faster execution [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am using a while loop to execute some algorithms and what not, but it's not very fast, how would I allow my program to use more RAM? (Which I assumes is what limits it) It's currently sitting steadily at 504kB.
I am using
C::B 13.12
Windows 7 64bit
mingw32-g++.exe (I don't think I need the 64bit version unless I want to go over 4GB ram right?)
I apologize if this question has been asked and answered before, but I can't seem to find it if it has.
Edit: So this will scan 100 pixels, what is causing this to take 2.2 seconds?
#include <windows.h>
#include <iostream>
#include <wingdi.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
using namespace std;
void scan(HDC dc, int x, int y, int &r, int &g, int &b) {
COLORREF color = GetPixel(dc, x, y);
r = GetRValue(color);
g = GetGValue(color);
b = GetBValue(color);
}
int main() {
HDC dc = GetDC(NULL);
int r,g,b;
for(int i = 0; i < 100; i++) {
scan(dc,100,100 + i,r,g,b);
}
ReleaseDC(NULL, dc);
return 0;
}
2:nd edit: Is it possible to reduce this time without editing the code? I mean, it has to be limited by either my RAM or my CPU right?
Your program isn't limited to that small amount of memory. Since it's most likely compiled as a 32 bit application, it will be able to allocate up to 2 GB of ram by default.
So, no, memory is most likely not your issue, unless you constantly request and free it again (but even then it really depends on your code).
If your program is too slow, you might be able to use parallelization to get faster processing, but once again this really depends on your actual code.
You might also be able to use templates and instanziation to optimize your algorithm(s) at compile time, but yet again without knowing the code... no.
Since the edit:
The bottleneck is - as mentioned already - the repeated calls to GetPixel() which in itself is rather expensive, since there's work to be done that isn't cached etc.
Instead, you should copy the window contents to your own memory area and read the pixels (as bytes) directly.
You can follow this MSDN example. They're writing the bitmap contents/pixels to a file, but you'll essentially want to do the same, just read that data directly. Just look for the use of the variable lpbitmap to find the related lines.
In short you'll want to create a screenshot with BitBlt() to a bitmap and then copy those pixels utilizing GetDIBits():
// memory dc for the window
hdcMemDC = CreateCompatibleDC(hdcWindow);
// bitmap for the screenshot
hbmScreen = CreateCompatibleBitmap(hdcWindow, rcClient.right-rcClient.left, rcClient.bottom-rcClient.top);
// connect both
SelectObject(hdcMemDC,hbmScreen);
// copy the window contents
BitBlt(hdcMemDC, 0,0, rcClient.right-rcClient.left, rcClient.bottom-rcClient.top, hdcWindow, 0, 0, SRCCOPY);
// get the bitmap object
GetObject(hbmScreen, sizeof(BITMAP), &bmpScreen);
// access the bitmap
HANDLE hDIB = GlobalAlloc(GHND,dwBmpSize);
// lock the bitmap
char *lpbitmap = (char *)GlobalLock(hDIB);
// copy the pixel data
GetDIBits(hdcWindow, hbmScreen, 0, (UINT)bmpScreen.bmHeight, lpbitmap, (BITMAPINFO *)&bi, DIB_RGB_COLORS);
// now access lpbitmap inside your loop and later on clean up everything
The problem is with GetPixel. It is an extremely slow API. An alternate approach is to copy the screen to a memory buffer and then access the memory buffer.

How to efficiently render a small sprite in Direct3D / C++ on a large Window (DWM)?

I'm implementing a custom cursor in DirectX/C++ that is drawn on a transparent window on top of the desktop.
I have stripped it down to a basic example. The magic of executing Direct3D on the DWM is based on this article on Code Project
The problem is that when using a very big window (e.g. 2560x1440) as a base for the DirectX rendering, it will give up to 40% GPU Load according to GPU-Z. Even if the only thing I am displaying is a static 128x128 sprite, or nothing at all. If I use an area like 256x256, the GPU Load is around 1-3%.
Basically this loop would make the GPU go crazy on a big window while it's smooth sailing on a small window:
while(true) {
g_pD3DDevice->PresentEx(NULL, NULL, NULL, NULL, NULL);
Sleep(10);
}
So it seems like it re-renders the whole screen whether anything changes or not, am I right? Can I tell Direct3D to only re-render specific parts that needs to be updated?
EDIT:
I have found a way to tell Direct3D to render a specific part by providing RGNDATA Dirty region information to PresentEx. It is now 1% GPU Load instead of 20-40%.
std::vector<RECT> dirtyRects;
//Fill dirtyRects with previous and new cursor boundaries
DWORD size = dirtyRects.size() * sizeof(RECT)+sizeof(RGNDATAHEADER);
RGNDATA *rgndata = NULL;
rgndata = (RGNDATA *)HeapAlloc(GetProcessHeap(), 0, size);
RECT* pRectInitial = (RECT*)rgndata->Buffer;
RECT rectBounding = dirtyRects[0];
for (int i = 0; i < dirtyRects.size(); i++)
{
RECT rectCurrent = dirtyRects[i];
rectBounding.left = min(rectBounding.left, rectCurrent.left);
rectBounding.right = max(rectBounding.right, rectCurrent.right);
rectBounding.top = min(rectBounding.top, rectCurrent.top);
rectBounding.bottom = max(rectBounding.bottom, rectCurrent.bottom);
*pRectInitial = dirtyRects[i];
pRectInitial++;
}
//preparing rgndata header
RGNDATAHEADER header;
header.dwSize = sizeof(RGNDATAHEADER);
header.iType = RDH_RECTANGLES;
header.nCount = dirtyRects.size();
header.nRgnSize = dirtyRects.size() * sizeof(RECT);
header.rcBound.left = rectBounding.left;
header.rcBound.top = rectBounding.top;
header.rcBound.right = rectBounding.right;
header.rcBound.bottom = rectBounding.bottom;
rgndata->rdh = header;
// Update display
g_pD3DDevice->PresentEx(NULL, NULL, NULL, rgndata, 0);
But it's something I do not understand. It will only give 1% GPU Load if I add the following
SetLayeredWindowAttributes(hWnd, 0, 180, LWA_ALPHA);
I want it transparent anyway so it's good, but instead I get some weird tearing effects after a while. It is more noticeable the faster I move the cursor. What does that come from? It looks like image provided. I am sure I have set the dirty rects perfectly accurate.
The above tearing seem to differ from computer to computer.

GDI Acceleration In Windows 7 / Drawing To Memory Bitmap

My GDI program runs fine on Windows XP but on Windows Vista and 7 it looks pretty terrible due to the lack of GDI hardware acceleration. I recall reading an article a few years back saying that Windows 7 added hardware acceleration to some GDI functions, including BitBlt() function. Supposedly, if you if you draw to a memory bitmap and then use BitBlt() to copy the image to your main window it runs about the same speed as XP. Is that true?
If it is true, how do you do it? I'm terrible at programming and am having a bit of trouble. I created the below class to to try and get it working:
class CMemBmpTest
{
private:
CDC m_dcDeviceContext;
CBitmap m_bmpDrawSurface;
public:
CMemBmpTest();
~CMemBmpTest();
void Init();
void Draw();
};
CMemBmpTest::CMemBmpTest()
{
}
CMemBmpTest::~CMemBmpTest()
{
m_bmpDrawSurface.DeleteObject();
m_dcDeviceContext.DeleteDC();
}
void CMemBmpTest::Init()
{
m_dcDeviceContext.CreateCompatibleDC(NULL);
m_bmpDrawSurface.CreateCompatibleBitmap(&m_dcDeviceContext, 100, 100);
}
void CMemBmpTest::Draw()
{
m_dcDeviceContext.SelectObject(I.m_brshRedBrush);
m_dcDeviceContext.PatBlt(0, 0, 100, 100, BLACKNESS);
}
In the OnPaint() function of the window I added the line:
pDC->BitBlt(2, 2, 100, 100, &m_MemBmp, 0, 0, SRCCOPY);
I was hoping to see a 100x100 black box in the corner of the window but it didn't work. I'm probably doing everything horrifically wrong, so would be grateful if somebody could advise me as to how to do this correctly.
Thanks for any advice you can offer.
AFAIK you get hardware acceleration on GDI functions on all versions of Windows (I'm happy to stand corrected on this if someone can explain it in more detail). But either way, you're correct that double buffering (which is what you're talking about) provides a massive performance boost (and more importantly no flickering) relative to drawing direct to the screen.
I've done quite a lot of this and have come up with a method to allow you to use GDI and GDI+ at the same time in your drawing functions, but benefit from the hardware acceleration of the BitBlt in drawing to screen. GDI+ isn't hardware accelerated AFAIK but can be very useful in many more complex drawing techniques so it can be useful to have the option of.
So, my basic view class will have the following members :
Graphics *m_gr;
CDC *m_pMemDC;
CBitmap *m_pbmpMemBitmap;
Then the class itself will have code something like this
/*======================================================================================*/
CBaseControlPanel::CBaseControlPanel()
/*======================================================================================*/
{
m_pMemDC = NULL;
m_gr = NULL;
m_pbmpMemBitmap = NULL;
}
/*======================================================================================*/
CBaseControlPanel::~CBaseControlPanel()
/*======================================================================================*/
{
// Clean up all the GDI and GDI+ objects we've used
if(m_pMemDC)
{ delete m_pMemDC; m_pMemDC = NULL; }
if(m_pbmpMemBitmap)
{ delete m_pbmpMemBitmap; m_pbmpMemBitmap = NULL; }
if(m_gr)
{ delete m_gr; m_gr = NULL; }
}
/*======================================================================================*/
void CBaseControlPanel::OnPaint()
/*======================================================================================*/
{
pDC->BitBlt(rcUpdate.left, rcUpdate.top, rcUpdate.Width(), rcUpdate.Height(),
m_pMemDC, rcUpdate.left, rcUpdate.top, SRCCOPY);
}
/*======================================================================================*/
void CBaseControlPanel::vCreateScreenBuffer(const CSize szPanel, CDC *pDesktopDC)
// In :
// szPanel = The size that we want the double buffer bitmap to be
// Out : None
/*======================================================================================*/
{
// Delete anything we're already using first
if(m_pMemDC)
{
delete m_gr;
m_gr = NULL;
delete m_pMemDC;
m_pMemDC = NULL;
delete m_pbmpMemBitmap;
m_pbmpMemBitmap = NULL;
}
// Make a compatible DC
m_pMemDC = new CDC;
m_pMemDC->CreateCompatibleDC(pDesktopDC);
// Create a new bitmap
m_pbmpMemBitmap = new CBitmap;
// Create the new bitmap
m_pbmpMemBitmap->CreateCompatibleBitmap(pDesktopDC, szPanel.cx, szPanel.cy);
m_pbmpMemBitmap->SetBitmapDimension(szPanel.cx, szPanel.cy);
// Select the new bitmap into the memory DC
m_pMemDC->SelectObject(m_pbmpMemBitmap);
// Then create a GDI+ Graphics object
m_gr = Graphics::FromHDC(m_pMemDC->m_hDC);
// And update the bitmap
rcUpdateBitmap(rcNewSize, true);
}
/*======================================================================================*/
CRect CBaseControlPanel::rcUpdateBitmap(const CRect &rcInvalid, const bool bInvalidate, const bool bDrawBackground /*=true*/)
// Redraws an area of the double buffered bitmap
// In :
// rcInvalid - The rect to redraw
// bInvalidate - Whether to refresh to the screen when we're done
// bDrawBackground - Whether to draw the background first (can give speed benefits if we don't need to)
// Out : None
/*======================================================================================*/
{
// The memory bitmap is actually updated here
// Then make the screen update
if(bInvalidate)
{ InvalidateRect(rcInvalid); }
}
So, you can then either just draw direct to the memory DC and call InvalidateRect() or put all your drawing code in rcUpdateBitmap() which was more convenient for the way I was using it. You'll need to call vCreateScreenBuffer() in OnSize().
Hopefully that gives you some ideas anyway. Double buffering is definitely the way to go for speed and non-flickering UI. It can take a little bit of effort to get going but it's definitely worth it.
CMemDC:
http://www.codeproject.com/Articles/33/Flicker-Free-Drawing-In-MFC
http://msdn.microsoft.com/en-us/library/cc308997(v=vs.90).aspx