I use Juce C++ 4.0.2 to build an audio plugin with a relatively heavy GUI. It takes 5s to load the GUI on a DAW like reaper on OsX, but it takes 10 times more on Windows using the same DAW.
I eventually figure out that is it due to the Typeface::createSystemTypefaceFor function that takes 100ms on Windows. It was an issue on my side because I used it many times.
Does anyone face the same issue?
Typeface::createSystemTypefaceFor is not designed to be called frequently; you should call it once for each typeface ideally and cache the results. Calling it frequently will cause a performance hit which varies depending on platform, as you are experiencing.
I have just recently installed Windows 8, and I tried to compile and build a simple c++ game project in VS 2010, but when I did, it was running at 5 fps. On windows 7, it runs at a solid 60 fps. Nothing has been changed in the code, but there is just horrible slow down.
I have updated my video drivers, but there is still horrible lag. I thought the problem was to do with compatibility issues with windows 8 and OpenGL, but I can't find anything to confirm this. I was wondering if anyone else has had this problem, and if you have solved it.
I would recommend you test your graphics card / drivers first. All sorts of driver issues could arise when you upgrade operating systems. One of the best tests would be to download Cinebench and see how it performs. Cinebench will evaluate your OpenGL performance. If you get poor results, then you know it's a hardware / driver issue and not an issue with your application.
If the Cinebench results are good, then you can move on to the recommendations made by #Robert Rouhani (comments).
http://www.maxon.net/products/cinebench/overview.html
What sort of video card do you have in the Win8 machine?
If it's a laptop you might be battling against nVidia Optimus (or an equivalent technology?). Basically programs have to tell the OS in advance that they want to use the video card or they get defaulted to using the low power GPU embedded in the CPU (note: over-simplification).
If this is the case, there's some options in the nVidia control panel to let you create a profile telling the OS to run your app with the discrete GPU, rather than the embedded one.
I have created 16 Direct3D devices with size approximately 320x200 pixels. I invoke IDirect3DDevice9::Present for each device in a separate thread every 40 ms. On laptops with Windows XP and integrated Intel GMA945 graphics part of devices is not updated if system tooltip or Start menu are shown. IDirect3DDevice9::Present doesn't return any error codes at that moment, in program everything looks fine, but user can see that move on several of devices freezes. What could be a reason for that?
This works fine on Windows 7 with the same hardware and on Windows XP with different hardware, so the problem only with this combination. I should support this since my customers are use this combination of the hardware and OS. MSDN says nothing about that I should create only one D3D device (at least I can't find it) so problem should be elsewhere.
What I'm trying to find is that possibly there's some combination of flags that could solve my problem. At the moment I use the following:
D3DPRESENT_PARAMETERS param = {};
param.Windowed = TRUE;
param.SwapEffect = D3DSWAPEFFECT_DISCARD;
param.hDeviceWindow = GetSafeHwnd();
param.BackBufferCount = 1;
param.BackBufferFormat = D3DFMT_UNKNOWN;
param.BackBufferWidth = m_szDevice.Width;
param.BackBufferHeight = m_szDevice.Height;
param.MultiSampleType = D3DMULTISAMPLE_NONMASKABLE;
param.Flags = D3DPRESENTFLAG_VIDEO;
param.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;
param.MultiSampleType = D3DMULTISAMPLE_NONE;
param.MultiSampleQuality = 0;
Don't do that. The device is supposed to map basically 1-to-1 to a GPU. Create one device, and use it to draw to 16 different windows, in whichever way works for you. (Multiple swap chains is the usual approach, afaik)
Creating 16 devices and trying to get them to render in parallel is just asking for trouble.
D3D is designed around the assumption that only one device will be doing serious rendering at any time.
In theory, the difference should only be a matter of performance, but in your case, trying to run 16 devices in parallel on a crappy Intel GPU, it wouldn't surprise me if it causes rendering errors such as you'er seeing.
I've distributed DirectX software for a couple of years and along the way learnt that Intel graphics chipsets have incredibly crap drivers. Once I even saw a driver revision that couldn't render a quad properly. So when you have a problem with an Intel chipset, if you're on the latest driver version, you pretty much have to accept your solution is going to be "start shotgun hacking things until it works".
Sorry to give you a lame answer, but Intel chipsets are not well engineered at all. They're solely there to get something - anything - on the screen, probably for office worker type use. Beyond "does it do aero glass" Intel probably don't give a hoot what it does or how well it works. An alternative "solution" is to distribute your application anyway, state that Intel chipsets are not supported due to glitches in the hardware/driver support, and contact Intel and see if you can get a fix from them.
And people say OpenGL has bad drivers...
First of all, when you say "doesn't return any error codes at that moment", are you running the D3D9 debug version at max debugging level?
Second, everytime you create a new device and it gains focus, the surfaces of all the existing devices are lost. Are you calling reset on all of them after creation?
Other than that, it's like the other answers state: don't create many devices from a single application. Device creation might start throwing errors after 9 or 10 devices, you are really pushing it with 16. Use a single device with multiple swap chains in stead, see for instance this DirectX 8 tutorial.
Intel graphics chips, particularly the integrated GMA ones, have never been known for their capabilities. They can report caps they don't have and later fail, with or without errors codes (had bug reports of this, supposedly supported shader models later failed to compile). It is possible that you're running into a similar problem with their chips/drivers. Does it work on other hardware or with different drivers?
I assume, from having multiple devices, they are windowed? Have you checked the window handles, or tried explicitly passing the handle/viewport when presenting? Do any of the devices get reset?
It is possible the display drivers are not properly repainting the window after the tooltip or start menu are shown (more likely if its a window under the tooltip/menu). Have you checked the window for focus, made sure it gets painted, etc?
I'm considering writing a new Windows GUI app, where one of the requirements is that the app must be very responsive, quick to load, and have a light memory footprint.
I've used WTL for previous apps I've built with this type of requirement, but as I use .NET all the time in my day job WTL is getting more and more painful to go back to. I'm not interested in using .NET for this app, as I still find the performance of larger .NET UIs lacking, but I am interested in using a better C++ framework for the UI - like Qt.
What I want to be sure of before starting is that I'm not going to regret this on the performance front.
So: Is Qt fast?
I'll try and qualify the question by examples of what I'd like to come close to matching: My current WTL app is Programmer's Notepad. The current version I'm working on weighs in at about 4mb of code for a 32-bit, release compiled version with a single language translation. On a modern fast PC it takes 1-3 seconds to load, which is important as people fire it up often to avoid IDEs etc. The memory footprint is usually 12-20 mb on 64-bit Win7 once you've been editing for a while. You can run the app non-stop, leave it minimized, whatever and it always jumps to attention instantly when you switch to it.
For the sake of argument let's say I want to port my WTL app to Qt for potential future cross-platform support and/or the much easier UI framework. I want to come close to if not match this level of performance with Qt.
Just chiming in with my experience in case you still haven't solved it or anyone else is looking for more experience. I've recently developed a pretty heavy (regular QGraphicsView, OpenGL QGraphicsView, QtSQL database access, ...) application with Qt 4.7 AND I'm also a stickler for performance. That includes startup performance of course, I like my applications to show up nearly instantly, so I spend quite a bit of time on that.
Speed: Fantastic, I have no complaints. My heavy app that needs to instantiate at least 100 widgets on startup alone (granted, a lot of those are QLabels) starts up in a split second (I don't notice any delay between doubleclicking and the window appearing).
Memory: This is the bad part, Qt with many subsystems in my experience does use a noticeable amount of memory. Then again this does count for the many subsystems usage, QtXML, QtOpenGL, QtSQL, QtSVG, you name it, I use it. My current application at startup manages to use about 50 MB but it starts up lightning fast and responds swiftly as well
Ease of programming / API: Qt is an absolute joy to use, from its containers to its widget classes to its modules. All the while making memory management easy (QObject) system and mantaining super performance. I've always written pure win32 before this and I wil never go back. For example, with the QtConcurrent classes I was able to change a method invocation from myMethod(arguments) to QtConcurrent::run(this, MyClass::myMethod, arguments)and with one single line a non-GUI heavy processing method was threaded. With a QFuture and QFutureWatcher I could monitor when the thread had ended (either with signals or just method checking). What ease of use! Very elegant design all around.
So in retrospect: very good performance (including app startup), quite high memory usage if many submodules are used, fantastic API and possibilities, cross-platform
Going native API is the most performant choice by definition - anything other than that is a wrapper around native API.
What exactly do you expect to be the performance bottleneck? Any strict numbers? Honestly, vague ,,very responsive, quick to load, and have a light memory footprint'' sounds like a requirement gathering bug to me. Performance is often overspecified.
To the point:
Qt's signal-slot mechanism is really fast. It's statically typed and translates with MOC to quite simple slot method calls.
Qt offers nice multithreading support, so that you can have responsive GUI in one thread and whatever else in other threads without much hassle. That might work.
Programmer's Notepad is an text editor which uses Scintilla as the text editing core component and WTL as UI library.
JuffEd is a text editor which uses QScintilla as the text editing core component and Qt as UI library.
I have installed the latest versions of Programmer's Notepad and JuffEd and studied the memory footprint of both editors by using Process Explorer.
Empty file:
- juffed.exe Private Bytes: 4,532K Virtual Size: 56,288K
- pn.exe Private Bytes: 6,316K Virtual Size: 57,268K
"wtl\Include\atlctrls.h" (264K, ~10.000 lines, scrolled from beginning to end a few times):
- juffed.exe Private Bytes: 7,964K Virtual Size: 62,640K
- pn.exe Private Bytes: 7,480K Virtual Size: 63,180K
after a select all (Ctrl-A), cut (Ctrl-X) and paste (Ctrl-V)
- juffed.exe Private Bytes: 8,488K Virtual Size: 66,700K
- pn.exe Private Bytes: 8,580K Virtual Size: 63,712K
Note that while scrolling (Pg Down / Pg Up pressed) JuffEd seemed to eat more CPU than Programmer's Notepad.
Combined exe and dll sizes:
- juffed.exe QtXml4.dll QtGui4.dll QtCore4.dll qscintilla2.dll mingwm10.dll libjuff.dll 14Mb
- pn.exe SciLexer.dll msvcr80.dll msvcp80.dll msvcm80.dll libexpat.dll ctagsnavigator.dll pnse.dll 4.77 Mb
The above comparison is not fair because JuffEd was not compiled with Visual Studio 2005, which should generate smaller binaries.
We have been using Qt for multiple years now, developing a good size UI application with various elements in the UI, including a 3D window. Whenever we hit a major slowdown in app performance it is usually our fault (we do a lot of database access) and not the UIs.
They have done a lot of work over the last years to speed up drawing (this is where most of the time is spent). In general unless you really do implement a kind of editor usually there is not a lot of time spent executing code inside the UI. It mostly waits on input from the user.
Qt is a very nice framework, but there is a performance penalty. This has mostly to do with painting. Qt uses its own renderer for painting everything - text, rectangles, you name it... To the underlying window system every Qt application looks like a single window with a big bitmap inside. No nested windows, no nothing. This is good for flicker-free rendering and maximum control over the painting, but this comes at the price of completely forgoing any possibility for hardware acceleration. Hardware acceleration is still noticeable nowadays, e.g. when filling large rectangles in a single color, as is often the case in windowing systems.
That said, Qt is "fast enough" in almost all cases.
I mostly notice slowness when running on a Macbook whose CPU fan is very sensitive and will come to life after only a few seconds of moderate CPU activity. Using the mouse to scroll around in a Qt application loads the CPU a lot more than scrolling around in a native application. The same goes for resizing windows.
As I said, Qt is fast enough but if increased battery draining matters to you, or if you care about very smooth window resizing, then you don't have much choice besides going native.
Since you seem to consider a 3 second application startup "fast", it doesn't sound like you would care at all about Qt's performance, though. I would consider 3 second startup dog-slow, but opinions on that vary naturally.
The overall program performance will of course be up to you, but I don't think that you have to worry about the UI. Thanks to the graphics scene and OpenGL support you can do fast 2D/3D graphics too.
Last but not least, an example from my own experience:
Using Qt on Linux/Embedded XP machine with 128 MB of Ram. Windows uses MFC, Linux uses Qt. Custom user GUI with lots of painting, and a regular admin GUI with controls/widgets. From a user's point of view, Qt is as fast as MFC. Note: it was a full screen program that could not be minimized.
Edited after you have added more info:
you can expect a larger executable size (especially with Qt MinGW) and more memory usage. In your case, try playing with one of the IDEs (e.g. Qt Creator) or text editors written in Qt and see what you think.
I personally would choose Qt as I've never seen any performance hit for using it. That said, you can get a little closer to native with wxWidgets and still have a cross-platform app. You'll never be quite as fast as straight Win32 or MFC (and family) but you gain a multi-platform audience. So the question for you is, is this worth a small trade-off?
My experience is mostly with MFC, and more recently with C#. MFC is pretty close to the bare metal so unless you define a ton of data structure, it should be pretty quick.
For graphics painting, I always find it useful to render to a memory bitmap, and then blt that to the screen. It looks faster, and it may even be faster, because it's not worrying about clipping.
There usually is some kind of performance problem that creeps in, in spite of my trying to avoid it. I use a very simple way to find these problems: just wait until it's being subjectively slow, pause it, and examine the call stack. I do this a number of times - 10 is usually more than enough. It's a poor man's profiler but works well, no fuss, no bother. The problem is always something no one could have guessed, and usually easy to fix. This is why it works.
If there are dialogs of any complexity, I use my own technique, Dynamic Dialogs, because I'm spoiled. They are not for the faint-of-heart, but are very flexible and perform nicely.
I once made an app to determine the "primeness" of a number (whether it was prime or composite).
I first attempted a Qt GUI, and it took 5 hours to return the answer for 1,299,827 on a computer with 8GB of RAM and an AMD 1090T # 4GHz running no other foreground processes under Linux.
My second attempt used a QProcess of a console application that used the exact same code. On a laptop with 1.3GB of RAM and a 1.4GHz CPU, the response came with no perceivable delay.
I will not deny, though, that it is far easier than GTK+ or Win32, and it handles things quite nicely, but separate intensive processing ENTIRELY from the GUI if you use it.
I know this is probably general, please bear with me!
We've got a program that uses a web camera and, based on what the camera is seeing, runs certain functions. The program runs excellently on MacOS and Linux, and it compiles and it does run on Windows, but a couple of the functions, (including one that iterates pixel by pixel, 640x480) drop the FPS to 1 or less. Occasionally dropping it to freeze for a nunber of seconds.
Like I said, I know this is very general... I was just (desperately) hoping for anybody else's input on possible explanations? These same functions work fine on other platforms. I'm curious if possibly the camera's running in it's own thread, which gets bogged down? Maybe we just aren't looking in the right places to optimize? And is there possibly a resource on what to optimze when porting code to windows?
Thanks so much, and any input is very much appreciated!
<<< EDIT >>>
As for the video source code, I'm using ewclib and
const char * m_buffer;
EWC_Open(MEDIASUBTYPE_RGB24, 640, 480, FPS, true);
m_buffer = new unsigned char[EWC_GetBufferSize(0)];
EWC_GetImage(0, m_buffer);
What do you use to compile the program on Windows? Visual Studio? Cygwin? Are you sure you are not compiling a debug version? Have you turned on compiler optimization? You may also want to check your data types. You may be assuming int to be 64 bits, while you may be using 32-bit Windows, where it is 32 bits.
The hypothesis by rmeador that it's because Windows is slow is ridiculous: Aside from grabbing the picture, all actions are in userspace, no syscalls necessary. Therefore, I'd suggest removing all your recognition/function code and seeing whether the problem persists.
If this is the case, check your image grabbing mechanism. Maybe you are acquiring and releasing a handle to the camera everytime you take a picture.
Otherwise, use a normal profiler to find the weak spots. If you suspect pixel manipulation might be at fault, ensure that you do that in userspace. I'm not familiar with Windows programming but I can imagine the problem could be that you are operating on a Windows resource for the manipulation/reading and calling for every pixel.
Do you call EWC_Open for every frame, or only once at the start? If the library is implemented in DirectShow and EWC_Open starts the graph, it will be quite slow.