I have a c++ program which takes really long time to run in cygwin versus quick turnaround on a linux machine. I thought it could be a memory issue and tried to print the memory used and this is waht I see:
Linux
virtual memory: 5072 KB, Resident set size (RSS) : 1064 KB
Cygwin
virtual memory: 7672 KB, Resident set size (RSS) : 108928 KB
Can anyone help me understand what causes this difference? The cygwin is running on a laptop with 64-bit windows & and 3 GB memory. There is some old "C" code which does malloc in the program. Would converting these to standard c++ containers help?
Cygwin provides a POSIX compatibility layer on to of Windows. That is bound to be slower than code built against the native OS CRT.
If your code is Standard C or C++, recompile it with MSVC or MinGW/GCC and then compare it.
On another note, malloc vs new is a non-issue. Heap allocation is expensive.
What might be important is that Windows heap allocation is in general more expensive than Linux' implementation. The effect of this difference depends on your code.
As rubenvb says you can't really say without seeing the code - but:
The amount of memory is irrelevent, it may be that either the cygwin launcher or the OS decides to just allocate a lot of memeory to the cygwin job because that memory isn't being used. So future memory allocations by the cygwin app will be quicker. There is also an issue with how Linux reports memory use, it does optomistic allocation so if you allocate say a Gb of memory that memory isn't actualy locked to that process until it's used and the task won't show as using 1Gb.
There are some tasks which are very cheap on a Unix system but are very slow on Windows architecture. The most notorious is fork() which is very common on Unix apps but is a bad idea on Windows
Related
I made a 32 bit c++ program which is always run on x64 machines. A client is saying that running 5 instances of this process is using causing all of their 24 GB RAM to be used.
Immediately I would think there was a memory leak but I am unable to reproduce this memory issue.
Doing a bit more research into memory allocations I found Memory Limits for Windows. This tells me that a 32 bit process will not be allowed more than 2 GB of memory by the OS.
Is it at all possible that a 32 bit application on a 64 bit windows will be able to have a memory leak use more than 2 GB?
P.S. Killing the process results in the memory being restored to normal operating levels (about 2 GB).
[EDIT] I have now seen that most of the memory being used is Kernel Memory: Nonpaged. Does this mean that it is some system resource which is being used and not a memory leak?
[UPDATE] The problem is not a driver or memory leak. It seems to be a process handle leak. There is something which is continuously starting new handles to a file. This was found using perfmon to monitor the process. As a rule of thumb if a process has more than 2000 to 3000 handles you should investigate. Especially if that number is increasing every few seconds.
As stated in Memory Limits for Windows, limit for 32-bit process on 64-bit system is 4 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE set, thus your 5 processes could consume 20 GB of memory total. This can be set through LARGEADDRESSAWARE option, which expands virtual address space.
It is obviously possible, as the client is experiencing it.
(maybe you did expect like some ideas how? You don't provide much info or code, so in a very general way I would suggest the memory allocation may be not in the app itself directly. Maybe the app itself will take only ~1-2GiB, but will also stir the OS to do something stupid, like virtual memory map of file of size of 4+GiB, or other devices lock, where the device driver does something stupid, etc...)
You should profile the memory usage on the target system to have idea how much your code does use. Then you can try to search for the rest of it.
In general, using the /LARGEADDRESSAWARE:ON linker switch can allow a 32bit application use more than 2GB. Also using the Address Windowing Extensions can allow using more memory. But if you aren't using any of these techniques in your application then it should have the 2GB range. However since the upper 2GB range is used for system resources maybe you are leaking system resources?
I am trying to allocate memory of 1 GiB using malloc() on Windows and it fails. I know malloc's uncertainty. What is best solution to allocate memory of 1 GiB?
If you are using a 32-bit (x86) application, you are unlikely to be able to allocate a 1 GB continuous chunk of memory (and certainly can't allocate 2GB). As to why this happens, you should see the venerable presentation "Why Your Windows Game Won't Run In 2,147,352,576 Bytes" (Gamefest 2007) attached to this blog post.
You should build your application as an x64 native (x64) application instead.
You could enable /LARGEADDRESSAWARE and stick with a 32-bit application on Windows x64, but it has a number of quirks and may limit what kinds of 3rd party support libraries you can use. A better solution is to use x64 native if possible.
Use the /LARGEADDRESSAWARE flag to tell Windows that you're not doing funny things with addresses. This unlocks an extra 2GB of address space on Win64.
I'm compiling some code that makes a heave use of templates (its based on boost::msm framework). When compiled with g++ 4.7.1 the cc1plus process reaches about 2.4 Gb of RAM size and than fails with "virtual memory exhausted: Cannot allocate memory" error.
I'm using a 32-bit compiler (switching to 64 bit is not an option ATM), the machine itself is a 64-bit Ubuntu with 16Gb of RAM, the compilation is performed under a 64-bit chroot of Debian wheezy distribution. At the time of compilation there is plenty of RAM available, so if the compilation is to fail because of lack of physically available RAM it is to reach 4Gb first. I tried playing with "ulimit -m" options, setting to different values and setting it to smaller sizes causes the compiler to fail earlier but when left to "unlimited" it fails at the above mentions 2+ Gb.
So I guess something else must be limiting me. Maybe someone encountered a similar issue and knows a way to change the limitation?
In 32-bit application (including compilers), you typically get somewhere between 2 and 3GB that is available for usermode in virtual space. This is caused by a combination of memory space being reserved, memory space fragmentation (there is virtual memory available, just not a big enough chunk to hold whatever size block that new or malloc is requesting), and "memory reservation", where process has allocated a fairly large chunk of memory, but it's not actually using all of it, so it's not "populated".
Any particular reason you can't use a 64-bit GCC to generate 32-bit code - using -M32? That would be my solution.
I have a problem with Qt Creator, or one of its components.
I have a program which needs lots of memory (about 4 GBytes) and I use calloc to allocate it. If I compile the C code with mingw/gcc (without using the Qt-framework) it works, but if I compile it within the Qt Creator (with the C code embedded in the Qt framework using C++), using the mingw/gcc toolchain, calloc returns a null-pointer.
I already searched and found the qt-pro option QMAKE_LFLAGS += -Wl,--large-address-aware, which worked for some cases (around 3.5GBytes), but if I go above 4GBytes, it only works with the C code compiled with gcc, not with Qt.
How can I allocate the needed amount of memory using calloc when compiling with Qt Creator?
So your cigwin tool chain builds 64-bit applications for your. Possible size of memory, that can be allocated by 64-bit application is 264 bytes that far exceeds 4Gb. But Qt Creator (if you installed it from QtSDK and not reconfigured it manually) uses Qt's tool chain, that builds 32 bit applications. You theoretically can allocate 4Gb of memory by 32 bit application, but do not forget, that all libraries will be also loaded into this memory. In practice, you are possible to allocate about 3 Gb of memory and not in one continuous chunk.
You have 3 ways to solve your problem:
reconsider your algorithm. Do not allocate 4Gb of RAM, use smarter data structures, or use disk cache etc. I believe if your problem would actually require more then 4 GB of memory to solve, you wouldn't ask this question.
separate your Qt code from your C program. Then, you can still use 64-bit-target-compiler for C program and 32-bit-target-compiler for Qt/C++ part. You can communicate with your C program through any interprocess communication mechanism. (Actually standard input/output streams are often enough)
Move to 64 bit. I mean, use 64-bit-target-compiler for both C and C++ code. But it is not so simple, as one could think. You'll need to rebuild Qt in 64 bit mode. It is possible with some modules turned off and some code fixups (I've tried once), but Windows 64 bit officially not supported.
I have the following problem:
A program run on a windows machine (32bit, 3.1Gb memory, both VC++2008 and mingw compiled code) fails with a bad_alloc exception thrown (after allocating around 1.2Gb; the exception is thrown when trying to allocate a vector of 9 million doubles, i.e. around 75Mb) with plenty of RAM still available (at least according to task manager).
The same program run on linux machines (32bit, 4Gb memory; 32bit, 2Gb memory) runs fine with peak memory usage of around 1.6Gb. Interestingly the win32 code generated by mingw run on the 4Gb linux machine under wine also fails with a bad_alloc, albeit at a different (later) place then when run under windows...
What are the possible problems?
Heap fragmentation? (How would I know? How can this be solved?)
Heap corruption? (I have run the code with pageheap.exe enabled with no errors reported; implemented vector access with bounds checking --- again no errors; the code is essentially free of pointers, only std::vectors and std::lists are used. Running
the program under Valgrind (memcheck) consumes too much memory and ends prematurely, but does not find any errors)
Out of memory??? (There should be enough memory)
Moreover, what could be the reason that the windows version fails while the
linux version works (and even on machines with less memory)? (Also note that
the /LARGEADDRESSAWARE linker flag is used with VC+2008 if that can have any effect)
Any ideas would be much appreciated, I am at my wits end with this... :-(
It has nothing to do with how much RAM is in your system. You are running out of virtual address space. For a 32 bit windows OS process, you get a 4GB virtual address space (irrespective of how much RAM you are using) out of 2GB for the user-mode (3GB in case of LARGEADDRESSAWARE) and 2 GB for kernel. When you do try to allocate memory using new, OS will try to find the contiguos block of virtual memory which is large enough to satisfy the memory allocation request. If your virtual address space is badly fragmented or you are asking for a huge block of memory then it will fail throwing a bad_alloc exception. Check how much virtual memory your process is using.
With Windows XP x86 and the default settings, 1.2 GB is about all the address space you have left for your heap after system libraries, your code, the stack and other stuff get their share. Note that largeaddressaware requires you to boot with the /3GB boot flag to try to give your process up to 3GB. The /3GB flag causes instability on a lot of XP systems, which is why it's not enabled by default.
Server variants of Windows x86 give you more address space, both by using the 3GB/1GB split and by using PAE to allow the use of your full 4GB of RAM.
Linux x86 uses a 3GB/1GB split by default.
A 64 bit OS would give you more address space, even for a 32bit process.
Are you compiling in Debug mode? If so, the allocation will generate a huge amount of debugging data which might generate the error you have seen, with a genuine out-of-memory. Try in Release to see if that solves the problem.
I have only experienced this with VC, not MinGW, but then I haven't checked either, this could still explain the problem.
To elaborate more about the virtual memory:
Your application fails when it tries to allocate a single 90MB array, and there is no contiguous space of virtual memory where this can fit left. You might be able to get a little farther if you switched to data structures that use less memory -- perhaps some class that approximates a huge array by using a tree where all data is kept in 1MB (or so) leaf nodes. Also, under c++ when doing a huge amount of allocations, it really helps if all those big allocations are of same size, this helps reusing memory and keeps fragmentation much lower.
However, the correct thing to do in the long run is simply to switch to a 64-bit system.