Dynamic allocation in uClinux - c++

I'm new to embedded development, and the big differences I see between traditional Linux and uClinux is that uClinux lacks the MMU.
From this article:
Without VM, each process must be located at a place in memory where it can be run. In the simplest case, this area of memory must be contiguous. Generally, it cannot be expanded as there may be other processes above and below it. This means that a process in uClinux cannot increase the size of its available memory at runtime as a traditional Linux process would.
To me, this sounds like all data must reside on the stack, and that heap allocation is impossible, meaning malloc() and/or "new" are out of the question... is that accurate? Perhaps there are techniques/libraries which allow for managing a "static heap" (i.e. a stack based area from which "dynamic" allocations can be requested)?
Or am I over thinking it? Or over simplifying it?

Under regular Linux, the programmer does not need to deal with physical resources. The kernel takes care of this, and a user space process sees only its own address space. As the stack grows, or malloc-type requests are made, the kernel will map free memory into the process's virtual address space.
In uClinux, the programmer must be more concerned with physical memory. The MMU and VM are not available, and all address space is shared with the kernel. When a user space program is loaded, the process is allocated physical memory pages for the text, stack, and variables. The process's program counter, stack pointer, and data/bss table pointers are set to physical memory addresses. Heap allocations (via malloc-type calls) are made from the same pool.
You will not have to get rid of heap allocation in programs. You will need to be concerned with some new issues. Since the stack cannot grow via virtual memory, you must size it correctly during linking to prevent stack overflows. Memory fragmentation becomes an issue because there's no MMU to consolidate smaller free pages. Errant pointers become more dangerous because they can now cause unintended writes to anywhere in physical memory.

It's been a while since I've worked with uCLinux (it was before it was integrated into the main tree), but I thought malloc was still available as part of the c library. There was a lot higher chance of doing Very Bad Things (tm) in memory since the heap wasn't isolated, but it was possible.

yes you can use malloc in user space applications on uclinux ,but then you have to increase the size of stack of user space application(before running the program cause stack size would be static),so that when malloc runs it will get the space it needs.
for e.g. uclinux on arm-cortex
arm toolchain provides command to find and change size of stack used by binary of user application then you can tranfer it to your embedded system and run
----- > arm-uclinuxeabi-flthdr

Related

What part of the process virtual memory does Windows Task Manager display

My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?
Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics

Contiguous Memory Allocation

In an application i have to allocate two buffers of 480 MB each. Memory allocation is done using HeapAlloc method. The application works fine in the systems where not many applications are running. But in system where other applications are also running memory is not allocated because of non availability of contiguous memory. Even though the memory space(non contiguous) is available but it is not allocated.
Need help to allocate two buffers of 480 MB even if non contiguous memory is available.
The situation you describe is not possible in a full featured OS which gives each process its own address space. It doesn't matter how many other applications are running, they won't affect contiguity of the free address space in your process. And virtual memory can map discontiguous physical memory addresses to a contiguous range in virtual address space.
Only in an embedded system without a memory management unit could the existence of other tasks cause your program to suffer memory fragmentation.
HeapAlloc() suggests Windows, which does give a separate address space to each process. The most likely explanation there is that your private address space is fragmented by libraries (DLLs) loading in scattered locations. You can rebase the libraries you use to avoid this and provide larger contiguous blocks of address space.
You can use VirtualAlloc with fAllocation specified as MEM_LARGE_PAGES. This enables large page support, note that you must check GetLargePageMinimum to ensure that the system supports lage pages.
Also note that this is likely to be slow as this page details.
Large-page memory regions may be difficult to obtain after the system has been running for a long time because the physical space for each large page must be contiguous, but the memory may have become fragmented. Allocating large pages under these conditions can significantly affect system performance. Therefore, applications should avoid making repeated large-page allocations and instead allocate all large pages one time, at startup.
Use VirtualAlloc. The underlying memory that backs the virtual pages need not be contiguous and you will always have your full virtual address space (2GB on a 32 bit system, I think 8 or 16 TB on Windows x64, I can't remember.) HeapAlloc can become fragmented (through your process's use, not others.) Your address space can also become fragmented, so try allocating it early in your application. I actually don't recommend HeapAlloc for anything, you can just use new and delete (which call malloc and free) For large blocks like yours malloc will call VirtualAlloc on Windows.

Memory stability of a C++ application in Linux

I want to verify the memory stability of a C++ application I wrote and compiled for Linux.
It is a network application that responds to remote clients connectings in a rate of 10-20 connections per second.
On long run, memory was rising to 50MB, eventhough the app was making calls to delete...
Investigation shows that Linux does not immediately free memory. So here are my questions :
How can force Linux to free memory I actually freed? At least I want to do this once to verify memory stability.
Otherwise, is there any reliable memory indicator that can report memory my app is actually holding?
What you are seeing is most likely not a memory leak at all. Operating systems and malloc/new heaps both do very complex accounting of memory these days. This is, in general, a very good thing. Chances are any attempt on your part to force the OS to free the memory will only hurt both your application performance and overall system performance.
To illustrate:
The Heap reserves several areas of virtual memory for use. None of it is actually committed (backed by physical memory) until malloc'd.
You allocate memory. The Heap grows accordingly. You see this in task manager.
You allocate more memory on the Heap. It grows more.
You free memory allocated in Step 2. The Heap cannot shrink, however, because the memory in #3 is still allocated, and Heaps are unable to compact memory (it would invalidate your pointers).
You malloc/new more stuff. This may get tacked on after memory allocated in step #3, because it cannot fit in the area left open by free'ing #2, or because it would be inefficient for the Heap manager to scour the heap for the block left open by #2. (depends on the Heap implementation and the chunk size of memory being allocated/free'd)
So is that memory at step #2 now dead to the world? Not necessarily. For one thing, it will probably get reused eventually, once it becomes efficient to do so. In cases where it isn't reused, the Operating System itself may be able to use the CPU's Virtual Memory features (the TLB) to "remap" the unused memory right out from under your application, and assign it to another application -- on the fly. The Heap is aware of this and usually manages things in a way to help improve the OS's ability to remap pages.
These are valuable memory management techniques that have the unmitigated side effect of rendering fine-grained memory-leak detection via Process Explorer mostly useless. If you want to detect small memory leaks in the heap, then you'll need to use runtime heap leak-detection tools. Since you mentioned that you're able to build on Windows as well, I will note that Microsoft's CRT has adequate leak-checking tools built-in. Instructions for use found here:
http://msdn.microsoft.com/en-us/library/974tc9t1(v=vs.100).aspx
There are also open-source replacements for malloc available for use with GCC/Clang toolchains, though I have no direct experience with them. I think on Linux Valgrind is the preferred and more reliable method for leak-detection anyway. (and in my experience easier to use than MSVCRT Debug).
I would suggest using valgrind with memcheck tool or any other profiling tool for memory leaks
from Valgrind's page:
Memcheck
detects memory-management problems, and is aimed primarily at
C and C++ programs. When a program is run under Memcheck's
supervision, all reads and writes of memory are checked, and calls to
malloc/new/free/delete are intercepted. As a result, Memcheck can
detect if your program:
Accesses memory it shouldn't (areas not yet allocated, areas that have been freed, areas past the end of heap blocks, inaccessible areas
of the stack).
Uses uninitialised values in dangerous ways.
Leaks memory.
Does bad frees of heap blocks (double frees, mismatched frees).
Passes overlapping source and destination memory blocks to memcpy() and related functions.
Memcheck reports these errors as soon as they occur, giving the source
line number at which it occurred, and also a stack trace of the
functions called to reach that line. Memcheck tracks addressability at
the byte-level, and initialisation of values at the bit-level. As a
result, it can detect the use of single uninitialised bits, and does
not report spurious errors on bitfield operations. Memcheck runs
programs about 10--30x slower than normal. Cachegrind
Massif
Massif is a heap profiler. It performs detailed heap profiling by
taking regular snapshots of a program's heap. It produces a graph
showing heap usage over time, including information about which parts
of the program are responsible for the most memory allocations. The
graph is supplemented by a text or HTML file that includes more
information for determining where the most memory is being allocated.
Massif runs programs about 20x slower than normal.
Using valgrind is as simple as running application with desired switches and give it as an input of valgrind:
valgrind --tool=memcheck ./myapplication -f foo -b bar
I very much doubt that anything beyond wrapping malloc and free [or new and delete ] with another function can actually get you anything other than very rough estimates.
One of the problems is that the memory that is freed can only be released if there is a long contiguous chunk of memory. What typically happens is that there are "little bits" of memory that are used all over the heap, and you can't find a large chunk that can be freed.
It's highly unlikely that you will be able to fix this in any simple way.
And by the way, your application is probably going to need those 50MB later on when you have more load again, so it's just wasted effort to free it.
(If the memory that you are not using is needed for something else, it will get swapped out, and pages that aren't touched for a long time are prime candidates, so if the system runs low on memory for some other tasks, it will still reuse the RAM in your machine for that space, so it's not sitting there wasted - it's just you can't use 'ps' or some such to figure out how much ram your program uses!)
As suggested in a comment: You can also write your own memory allocator, using mmap() to create a "chunk" to dole out portions from. If you have a section of code that does a lot of memory allocations, and then ALL of those will definitely be freed later, to allocate all those from a separate lump of memory, and when it's all been freed, you can put the mmap'd region back into a "free mmap list", and when the list is sufficiently large, free up some of the mmap allocations [this is in an attempt to avoid calling mmap LOTS of times, and then munmap again a few millisconds later]. However, if you EVER let one of those memory allocations "escape" out of your fenced in area, your application will probably crash (or worse, not crash, but use memory belonging to some other part of the application, and you get a very strange result somewhere, such as one user gets to see the network content supposed to be for another user!)
Use valgrind to find memory leaks : valgrind ./your_application
It will list where you allocated memory and did not free it.
I don't think it's a linux problem, but in your application. If you monitor the memory usage with « top » you won't get very precise usages. Try using massif (a tool of valgrind) : valgrind --tool=massif ./your_application to know the real memory usage.
As a more general rule to avoid leaks in C++ : use smart pointers instead of normal pointers.
Also in many situations, you can use RAII (http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) instead of allocating memory with "new".
It is not typical for an OS to release memory when you call free or delete. This memory goes back to the heap manager in the runtime library.
If you want to actually release memory, you can use brk. But that opens up a very large can of memory-management worms. If you directly call brk, you had better not call malloc. For C++, you can override new to use brk directly.
Not an easy task.
The latest dlmalloc() has a concept called an mspace (others call it a region). You can call malloc() and free() against an mspace. Or you can delete the mspace to free all memory allocated from the mspace at once. Deleting an mspace will free memory from the process.
If you create an mspace with a connection, allocate all memory for the connection from that mspace, and delete the mspace when the connection closes, you would have no process growth.
If you have a pointer in one mspace pointing to memory in another mspace, and you delete the second mspace, then as the language lawyers say "the results are undefined".

Is it true that when memory is allocated into a process in Windows, it always triggers a page fault?

I'm trying to wrap my head around the internals of Windows memory management at the OS level.
Is it true that when memory is allocated, it always triggers a page fault behind the scenes? Does this imply that the only way to stop soft page faults is to stop allocating new memory within the process?
Definitions
I define "memory allocation" as any form of malloc, i.e. new, LocalAlloc, VirtualAlloc, HeapAlloc, etc.
I define a "page fault" as the process of mapping memory from the OS pool into the process Working Set, an operation which takes a constant 250us on a high end Xeon.
You need to be very clear about the different things that are going on here. There are two independent parts to the process, committing the memory and paging the memory into the process. Neither of these are related to calling malloc, HeapAlloc or LocalAlloc.
I've tried to break the process down for you below, but the summary is that if you use HeapAlloc or another equivalent function then you'll trigger very few page-faults (at least once your application has initialized and the heap has grown to a stable size) and so shouldn't worry about it too much.
Allocating memory
When you call malloc, HeapAlloc or LocalAlloc the memory allocator will try to find a piece of memory in the heap that's available and large enough. In the majority of cases it will succeed and return the memory to you.
If it can not find sufficient memory it will allocate more by calling VirtualAlloc (on Linux this would be sbrk or mmap). This commits the memory. It will return a small fragment of the new memory to you.
Memory commitment
When you, or the allocator, call VirtualAlloc this will mark a new region of your virtual memory as accessible. This does not trigger a page-fault, nor does it actually assign physical memory to those pages. From the MSDN docs for VirtualAlloc:
Allocates memory charges (from the overall size of memory and the paging files on disk)
for the specified reserved memory pages. The function also guarantees that when the caller
later initially accesses the memory, the contents will be zero. Actual physical pages are
not allocated unless/until the virtual addresses are actually accessed.
Paging memory in
When you access a page of memory that VirtualAlloc has returned to you for the first time this triggers a soft page-fault. The operating system will find a single page of free physical memory, zero it out and assign it to the virtual page you accessed. This is transparent to you and takes very little time (a single-digit number of microseconds). It's plausible that the operating system may swap this memory out to disk if you stop using it, if it does then a subsequent access will trigger a hard page-fault.
Well, yes, a page fault will bring the CPU back into kernel mode, and the kernel gets a chance to learn that the userspace process wanted a certain memory page. The kernel can then consult its internal memory management bookkeeping data, make a suitably large area of physical memory available, and adjust the processor's page table so that the requested virtual address is mapped. Once that's done, execution passes back to the userspace process, which resumes with a successful allocation.
This is not to be confused with an unexpected page fault, by which the userspace process refers to an address that is neither mapped in the page tables nor known to the kernel's memory manager to belong to the process. In that case, the kernel will kill the rogue process.

Deallocation doesn't free memory in Windows/C++ Application

My Windows/C++ application allocates ~1Gb of data in memory with the operator new and processes this data. After processing the data is deleted.
I noticed that if I run the processing again without exiting the application, the second call to the operatornew to allocate ~1Gb of data fails.
I would expect Windows to deliver the memory back. Could this be managed in a better way with some other Win32 calls etc.?
I don't think this is a Windows problem. Check if you used delete or delete[] correctly. Perhaps it would help if you post the code that is allocating/freeing the memory.
In most runtime environments memory allocated to an application from the operating system remains in the application, and is seldom returned back to the operating system. Freeing a memory block allows you to reuse the block from within the application, but does not free it to the operating system to make it available to other applications.
Microsoft's C runtime library tries to return memory back to the operating system by having _heapmin_region call _heap_free_region or _free_partial_region which call VirtualFree to release data to the operating system. However, if whole pages in the corresponding region are not empty, then they will not be freed. A common cause of this is the bookkeeping information and storage caching of C++ containers.
This could be due to memory fragmentation (in reality, address space fragmentation), where various factors have contributed to your program address space not having a 1gb contiguous hole available. In reality, I suspect a bug in your memory management (sorry) - have you run your code through leak detection?
Since you are using very large memory blocks, you should consider using VirtualAlloc() and VirtualFree(), as they allow you to allocate and free pages directly, without the overhead (in memory and time) of interacting with a heap manager.
Since you are using C++, it is worth noting that you can construct C++ objects in the memory you allocate this way by using placement new.
This problem is almost certainly memory fragmentation. On 32 bit Windows the largest contiguous region you can allocate is about 1.1GB (because various DLLs in your EXE preventa larger contiguous range being available). If after deallocating a memory allocation (or a DLL load, or a memory mapped file) ends up in the middle of your previous 1GB region then there will no longer be a 1GB region available for your next call to new to allocate 1GB. Thus it will fail.
You can visualize this process using VM Validator.