I have a problem with Qt Creator, or one of its components.
I have a program which needs lots of memory (about 4 GBytes) and I use calloc to allocate it. If I compile the C code with mingw/gcc (without using the Qt-framework) it works, but if I compile it within the Qt Creator (with the C code embedded in the Qt framework using C++), using the mingw/gcc toolchain, calloc returns a null-pointer.
I already searched and found the qt-pro option QMAKE_LFLAGS += -Wl,--large-address-aware, which worked for some cases (around 3.5GBytes), but if I go above 4GBytes, it only works with the C code compiled with gcc, not with Qt.
How can I allocate the needed amount of memory using calloc when compiling with Qt Creator?
So your cigwin tool chain builds 64-bit applications for your. Possible size of memory, that can be allocated by 64-bit application is 264 bytes that far exceeds 4Gb. But Qt Creator (if you installed it from QtSDK and not reconfigured it manually) uses Qt's tool chain, that builds 32 bit applications. You theoretically can allocate 4Gb of memory by 32 bit application, but do not forget, that all libraries will be also loaded into this memory. In practice, you are possible to allocate about 3 Gb of memory and not in one continuous chunk.
You have 3 ways to solve your problem:
reconsider your algorithm. Do not allocate 4Gb of RAM, use smarter data structures, or use disk cache etc. I believe if your problem would actually require more then 4 GB of memory to solve, you wouldn't ask this question.
separate your Qt code from your C program. Then, you can still use 64-bit-target-compiler for C program and 32-bit-target-compiler for Qt/C++ part. You can communicate with your C program through any interprocess communication mechanism. (Actually standard input/output streams are often enough)
Move to 64 bit. I mean, use 64-bit-target-compiler for both C and C++ code. But it is not so simple, as one could think. You'll need to rebuild Qt in 64 bit mode. It is possible with some modules turned off and some code fixups (I've tried once), but Windows 64 bit officially not supported.
Related
I am running Windows 10 64 bit. My compiler is Visual Studio 2015.
What I want is:
unsigned char prime[UINT_MAX];
(and larger arrays).
That example gives compiler error C2148 because the application is a "Win32 console application". Likewise I can't use new to create the array; same problem. I am building it as an "x64 Release", but I guess the WIN32 console part is winning!
I want to unleash the power of my 64 bit operating system and break free of this tiresome INT_MAX limitation on array indexes, ie proper 64 bit operation. My application is a simple C/C++ thing which neither needs nor wants anything other than a command line interface.
I did install the (free) Visual Studio 2017 application, but it didn't give me the simple console apps that I like (so I uninstalled it).
Is there some other application type I can build in Visual Studio that gives access to more that 4GB of memory? Is there some other (free) compiler I can use to get full 64bit access under Windows?
Let me first answer the question properly so the information is readily available to others. All three actions were necessary. Any one or two alone would not work.
Change project type from “Win32 Console” to “C++/CLR console”
Change the array definition as kindly indicated by WhozCraig
Change the project properties, Linker | System | EnableLargeAddresses YES (/LARGEADDRESSAWARE)
Now let’s mention some of the comments:
“Compile the program for x64 architecture, not x32” **
I explicitly stated that it was compiled as x64 release, and that the Win32 aspect was probably winning.
It won’t work if allocated on the stack.
It was allocated on the heap as a global variable, but I also said I tried allocating it with new which would also allocate to the heap.
How much memory does you machine have?
Really. My 8GB RAM is a bit weak for the application, but won’t give a compiler error, and is enough to run the program with 4GB allocated to it.
Possible duplicate of …
No, there are some very old questions which are not very relevant.
Memory mapped files (Thomas Matthews)
A very good idea. Thank you.
As for 6 down votes on the question, seriously. Most commenters don’t even seem to have understood the problem, let alone the solution. Standard C arrays seem to be indexed by signed ints (32 bit) regardless of the /LARGEADDRESSAWARE switch and the x64 compilation.
Thanks again to WhozCraig and Thomas Matthews for helping me to solve the problem.
#include <vector>
typedef unsigned long long U64;
const U64 MAX_SIZE = 3*((U64)INT_MAX);
std::vector<unsigned char>prime(MAX_SIZE);
// The prime vector is then accessed in the usual way, prime[bigAddress]
I also turned off Unicode support in the project settings as that might have made the chars 2-bytes long.
The program is now running on a Xeon workstation with 32GB ECC RAM.
6GB is allocated to the process according to the task manager.
I assuming this is a Windows console program built in X64 mode with linker system option to yes: /LARGEADDRESSAWARE. Don't declare an array that large, instead allocate it using malloc(), and later use free() to deallocate it. Using C++ new operator will result in a compiler error "array too large", but malloc() doesn't have this issue. I was able to allocate an 8GB array on a 16GB laptop using malloc(), and use it without issue. You can use size_t, int64_t or uint64_t for index types without issue.
I've tested this with VS2015 and VS2019.
I have been testing for years a software that I am developing, compiled only for 64-bit systems through Qt-MinGW64, without experiencing any kind of issue regarding video encoding (which is one of the features of such application). Recently, I have been attempting to build the corresponding x86 version of my software by compiling it with Qt-MingGW32.
However, after bulding the same ffmpeg and x264 library versions to create the 32-bit versions and successfully linking them to my project, the application keeps crashing after few frames are encoded, due to segmentation fault error. This is very strange because, as I indicated before, it works flawlessly when compiled for 64-bit systems.
I have also lost a considerable amount hours trying to combine a big amount of different versions of both ffmpeg and x264 libraries, but with no luck either. Neither it works when disabling threads both for x264 and ffmpeg libraries, hence it does not seem to be a win32 threading issue. Therefore, I have concluded that the error is most likely in my code, which by some chance beyond my comprehension tells ffmpeg to allocate the right amount of memory in the x64 version, but not in the x86 version.
It must be pointed out also that before the avcodec_encode_video2 call, I do the following calls, among others, in order to allocate the memory associated to the corresponding items (AVFrame, AVCodec, etc.), such as
avcodec_open2( my_codec_context, my_codec, &opt );
my_av_frame=av_frame_alloc();
More precisely, the details involving the code structure that I am using can be found here.
Therefore, the error appears to be more subtle than just issues regarding uninitialized memory.
Many thanks in advance.
UPDATE:
I have discovered the focus of the problem. For some reason, FFmpeg/x264 libraries behave abnormally in Win32 GUI applications compiled with Qt-MinGW32, while they run correctly in Win32 console applications, also compiled with Qt-WinGW32. I have proven this claim performing two dummy tests, in which the exact same piece of code is run through a console application and on a GUI application, succeeding in the first case and failing in the latter one. The code for such tests can be found below, along with the x264 and FFmpeg libraries used in my projects, together with instructions to build them in msys2 with MinGW32:
https://takeafile.com/?f=hisusinivi
I have no idea whether it can be solved by simply tweaking the code or it is a serious bug involving an incompatibility issue. If it is the latter, should it be reported to the Qt/FFmpeg/x264 staff as a major bug?
Looks like you are going out memory (virtual address space available to 32-bit app) at least that is what happens with your QT GUI-test app. Your settings for encoding YUV 4:4:4 FullHD video need around 1.3 GB of memory and this should be available for 32-bit app at 64-bit OS by default (and it is for your console test). But for some reason your QT GUI-test starts to fail after allocating only 1 GB of memory. I am dunno if this 1 GB limit is by QT or Windows for any GUI app. If you make you video resolution 960x540 instead of 1920x1080 than it should work (as it needs less than 1 GB of memory). Otherwise you should set LARGE_ADDRESS_ AWARE flag in PE header by specifying -Wl,--large-address-aware to linker and than 4 GB of memory should be available to 32-bit app on 64-bit OS.
UPDATE
Looks like QT GUI-test have less memory than console test because it links also to Qt5Guid.dll and Qt5Widgetsd.dll which takes additional 450 MB of address space in addition to other libraries which also links in console app and so only 1 GB of free address space from 2 GB available remains to memory heap.
I am trying to allocate memory of 1 GiB using malloc() on Windows and it fails. I know malloc's uncertainty. What is best solution to allocate memory of 1 GiB?
If you are using a 32-bit (x86) application, you are unlikely to be able to allocate a 1 GB continuous chunk of memory (and certainly can't allocate 2GB). As to why this happens, you should see the venerable presentation "Why Your Windows Game Won't Run In 2,147,352,576 Bytes" (Gamefest 2007) attached to this blog post.
You should build your application as an x64 native (x64) application instead.
You could enable /LARGEADDRESSAWARE and stick with a 32-bit application on Windows x64, but it has a number of quirks and may limit what kinds of 3rd party support libraries you can use. A better solution is to use x64 native if possible.
Use the /LARGEADDRESSAWARE flag to tell Windows that you're not doing funny things with addresses. This unlocks an extra 2GB of address space on Win64.
How can I find details of the Windows C++ memory allocator that I am using?
Debugging my C++ application is showing the following in the call stack:
ntdll.dll!RtlEnterCriticalSection() - 0x4b75 bytes
ntdll.dll!RtlpAllocateHeap() - 0x2f860 bytes
ntdll.dll!RtlAllocateHeap() + 0x178 bytes
ntdll.dll!RtlpAllocateUserBlock() + 0x56c2 bytes
ntdll.dll!RtlpLowFragHeapAllocFromContext() - 0x2ec64 bytes
ntdll.dll!RtlAllocateHeap() + 0xe8 bytes
msvcr100.dll!malloc() + 0x5b bytes
msvcr100.dll!operator new() + 0x1f bytes
My multithreaded code is scaling very poorly, and profiling through random sampling indicates that malloc is currently a bottleneck in my multithreading code. The stack seems to indicate some locking going on during memory allocation. How can I find details of this particular malloc implementation?
I've read that Windows 7 system allocator performance is now competitive with allocators like tcmalloc and jemalloc. I am running on Windows 7 and I'm building with Visual Studio 2010. Is msvcr100.dll the fast/scalable "Windows 7 system allocator" often referenced as "State of the Art"?
On Linux, I've seen dramatic performance gains in multithreaded code by changing the allocator, but I've never experimented with this on Windows -- thanks.
am simply asking what malloc implementation I am using with maybe a
link to some details about my particular version of this
implementation.
The callstack you are seeing indicates that the MSVCRT (more exactly, it default operator new => malloc are calling into the Win32 Heap functions. (I do not know whether malloc routes all requests directly to the CRT's Win32 Heap, or whether it does some additional caching - but if you have VS, you should have the CRT source code too, so should be able to check that.) (The Windows Internals book also talk about the Heap.)
General advice I can give is that in my experience (VS 2005, but judging from Hans' answer on the other question VS2010 may be similar) the multithreaded performance of the CRT heap can cause noticeable problems, even if you're not doing insane amounts of allocations.
That RtlEnterCriticalSection is just that, a Win32 Critical Section: Cheap to lock with low contention, but with higher you will see suboptimal runtime behaviour. (Bah! Ever tried to profile / optimize code that coughs on synchronization performance? It's a mess.)
One solution is to split the heaps: Using different Heaps has given us significant improvements, even though each heap still is MT enabled (no HEAP_NO_SERIALIZE).
Since you're "coming in" via operator new, you might be able to use different allocators for some of the different classes that are allocated often. Or maybe some of your containers could benefit from custom allocators (that then use a separate heap).
One case we had, was that we were using libxml2 for XML parsing, and while building up the DOM tree, it simply swamps the system in malloc calls. Luckily, it uses its own set of memory allocation routines that can be easily replaced by a thin wrapper over the Win32 Heap functions. This gave us huge improvements, as XML parsing didn't interfere with the rest of the system's allocations anymore.
I have a c++ program which takes really long time to run in cygwin versus quick turnaround on a linux machine. I thought it could be a memory issue and tried to print the memory used and this is waht I see:
Linux
virtual memory: 5072 KB, Resident set size (RSS) : 1064 KB
Cygwin
virtual memory: 7672 KB, Resident set size (RSS) : 108928 KB
Can anyone help me understand what causes this difference? The cygwin is running on a laptop with 64-bit windows & and 3 GB memory. There is some old "C" code which does malloc in the program. Would converting these to standard c++ containers help?
Cygwin provides a POSIX compatibility layer on to of Windows. That is bound to be slower than code built against the native OS CRT.
If your code is Standard C or C++, recompile it with MSVC or MinGW/GCC and then compare it.
On another note, malloc vs new is a non-issue. Heap allocation is expensive.
What might be important is that Windows heap allocation is in general more expensive than Linux' implementation. The effect of this difference depends on your code.
As rubenvb says you can't really say without seeing the code - but:
The amount of memory is irrelevent, it may be that either the cygwin launcher or the OS decides to just allocate a lot of memeory to the cygwin job because that memory isn't being used. So future memory allocations by the cygwin app will be quicker. There is also an issue with how Linux reports memory use, it does optomistic allocation so if you allocate say a Gb of memory that memory isn't actualy locked to that process until it's used and the task won't show as using 1Gb.
There are some tasks which are very cheap on a Unix system but are very slow on Windows architecture. The most notorious is fork() which is very common on Unix apps but is a bad idea on Windows