Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I read by some googling about Heap & Stack, but most answer says just its concept description, differences.
I am curious other things.
as title says, Where is Heap and Stack on Physical Memory?
How is their size? For example, I use 12 giga byte memory at my desktop PC, then how much is Heap? and how much is Stack size?
Who made these 2 different type concept?
Can I manipulate Heap & Stack's allocation? if they take 50% of memory each, (if Heap take 6 giga byte memory, Stack take 6 giga byte too in my case), can I resize them?
as title says, Where is Heap and Stack on Physical Memory?
Ever since CPUs have had MMUs to add a layer of indirection between virtual memory and physical memory, heap and stack have been anywhere in physical memory. Ever since modern Operating Systems have implemented ASLR, heap and stack have been anywhere in virtual memory, too.
How is their size? For example, I use 12 giga byte memory at my desktop PC, then how much is Heap? and how much is Stack size?
Both start small and grow on demand. On Unix, the maximum stack size is set by ulimit -s and the heap size is limited by ulimit -d. You can see what limits are set by default on your Unix OS with ulimit -a.
Who made these 2 different type concept?
I would bet this goes back to at least the 1960s. Wikipedia has a reference from 1960.
Can I manipulate Heap & Stack's allocation? if they take 50% of memory each, (if Heap take 6 giga byte memory, Stack take 6 giga byte too in my case), can I resize them?
As already said, they resize themselves, or more accurately, they grow on demand, within limits set by the OS and the user. See the help for ulimit if you are using Unix and bash.
1. It can be everywhere. Even outside the physical memory, because in terms of application there is no such thing. Everything in user land uses virtual memory, that can be mapped to RAM or swap area on HDD. No certain assumptions here, sorry.
2. They both grow dynamically, difference lies in speed and size limits:
Heap is usually considered slower. It is allocated, depending on application requirements. It is as huge as the amount of RAM or even larger (paging).
Stack is much faster, because it "allocated" by simple move of stack pointer. It usually has size limit. For example, in C++, this limit is set at phase of compilation (ulimit -s on GCC, /STACK:reserve, /STACK:reserve,commit on MSVC).
Stack is usually much smaller and can be easily overflowed (that's what we call stack overflow). For example, in C++, you most likely won't be able to do this:
int main()
{
int large_array[1000000];
return 0;
}
Because:
While this is perfectly fine:
int main()
{
int* large_array = new int[1000000]; //allocated from heap
return 0;
}
3. Some really smart people.
4. Read carefully points 1-3 and you will know the answer.
Related
This question already has answers here:
Does malloc lazily create the backing pages for an allocation on Linux (and other platforms)?
(6 answers)
Can you manually allocate virtual pages in Linux?
(2 answers)
Does mmap or malloc allocate RAM?
(2 answers)
How to allocate "huge" pages for C++ application on Linux
(3 answers)
Closed 6 months ago.
I have a program compiled with gcc 11.2, which allocates some RAM memory first (8 GB) on heap (using new), and later fills it with data read out in real-time from an oscilloscope.
uint32_t* buffer = new uint32_t[0x80000000];
for(uint64_t i = 0; i < 0x80000000; ++i) buffer[i] = GetValueFromOscilloscope();
The problem I am facing is that the optimizer skips the allocation on the first line, and dose it on the fly as am I traversing the loop. This slows down the time spent on each iteration of the loop. Because it is important to be as efficient as possible during the loop, I have found a way to force the compiler to allocate the memory before entering the for loop, namely to set all the reserved values to zero:
uint32_t* buffer = new uint32_t[0x80000000]();
My question is: ¿is there a less intrusive way of achieving the same effect without forcing the data to be zero on the first place (apart from switching off the optimization flags)? I just want to force the compiler to reserve the memory at moment of declaration, but I do not care if the reserved values are zero or not.
Thanks in advance!
EDIT1: The evidence I see for knowing that the optimizer delaying the allocation is that the 'gnome-system-monitor' shows a slowly growing RAM memory as I traverse the loop, and only after I finish the loop, it reaches 8 GiB. Whereas if I initialize all the values to zero, it the gnome-system-monitor shows a quick growth up to 8 GiB, and then it starts the loop.
EDIT2: I am using Ubuntu 22.04.1 LTS
It has very little to do with the optimizer. Nothing spectacular happens here. Your program doesn't skip any lines, and it does exactly what you ask it to do.
The problem is that, when you're allocating memory, you're interfacing with both the allocator and the operating system's paging system. Most likely, your operating system did not make all of those pages resident in memory, but instead made some pages marked as allocated by your program, and will only make this memory actually existing when you actually use it. This is how most operating systems work.
To fix the problem, you will need to interface with the virtual memory allocator of your system to make pages resident. On Linux, there is also the hugepage that may help you. On Windows, there's the VirtualAlloc api, but I haven't dug deep in that platform.
You seem to be misinterpreting the situation. Virtual memory within a user-space process (heap space in this case) does get allocated “immediately” (possibly after a few system calls that negotiate a larger heap).
However, each page-aligned page-sized chunk of virtual memory that you haven’t touched yet will initially lack a physical page backing. Virtual pages are mapped to physical pages lazily, (only) when the need arises.
That said, the “allocation” you are observing (as part of the first access to the big heap space) is happening a few layers of abstraction below what GCC can directly influence and is handled by your operating system’s paging mechanism.
Side note: Another consequence would be, for example, that allocating a 1 TB chunk of virtual memory on a machine with, say, 128 GB of RAM will appear to work perfectly fine, as long as you never access most of that huge (lazily) allocated space. (There are configuration options that can limit such memory overcommitment if need be.)
When you touch your newly allocated virtual memory pages for the first time, each of them causes a page fault and your CPU ends up in a handler in the kernel because of that. The kernel evaluates the situation and establishes that the access was in fact legit. So it “materializes” the virtual memory page, i.e. picks a physical page to back the virtual page and updates both its bookkeeping data structures and (equally importantly) the hardware page mapping mechanism(s) (e.g. page tables or TLB, depending on architecture). Then the kernel switches back to your userspace process, which will have no clue that all of this just happened. Repeat for each page.
Presumably, the description above is hugely oversimplified. (For example, there can be multiple page sizes to strike a balance between mapping maintenance efficiency and granularity / fragmentation etc.)
A simple and ugly way to ensure that the memory buffer gets its hardware backing would be to find the smallest possible page size on your architecture (which would be 4 kiB on a x86_64, for example, so 1024 of those integers (well, in most cases)) and then touch each (possible) page of that memory beforehand, as in: for (size_t i = 0; i < 0x80000000; i += 1024) buffer[i] = 1;.
There are (of course) more reasonable solutions than that↑; this is just an example to illustrate what’s happening and why.
This question already has answers here:
What and where are the stack and heap?
(31 answers)
Closed 2 years ago.
I search for hours and get thousands of dicussions but get confused. I'm fresher and I want to simply ask my questions.
Where is the location of heap memory and stack memory in RAM?
Is it like that, when a certain part of the memory is used as a stack data structure, it is called stack memory? And if the same memory is used as a heap, it is called heap memory?
I mean, Are they using same memory location in different times?
Or are there separate space for stack and heap in memory? I mean RAM is split into portions for performing stack and heap operations.
Heap memory and stack memory, if that distinction is made by the compiler, are just two arbitrary chunks of memory. Heap is expected to be expandable, the stack is typically fixed-size, but this is not a hard requirement and cannot be depended on to be the case. These are just conventions, not rules.
Where are they located in memory? With virtual memory and Address Space Layout Randomization (ASLR) the answer is: Nobody knows and it doesn't matter, as the addresses are fake and have no relationship to physical memory. They're just numbers.
Is there a separate space for stack and heap? Maybe! Probably. Possibly not. On very tiny embedded platforms it's all one chunk of memory where the stack is stuck at one end and grows down, while the heap is at another and grows up. On a modern operating system they're allocated independently and can be extended or shrunk as required.
This question already has answers here:
Why is stack memory size so limited?
(9 answers)
Closed 2 years ago.
I mainly program with C++. I have seen from many places that I should put large objects (like array of 10k elements) on heap (using new) but not on stack (using pure array type).
I don't really understand. The reason might be because abuse of stack many result in stack overflow error at runtime. But why should OS set a limit for the stack size of a process (or more precisely, a thread) as virtual memory can go as large as needed (or 4G in practice).
Can anyone help me, I really have no clue.
Tradition, and threads.
Stacks are per thread, and for efficiency must be contiguous memory space. Non contiguous stacks make every function call more expensive.
Heaps are usually shared between threads; when they aren't, they don't have to be contiguous.
In the 32 bit days, having 1000 threads isn't impossible. At 1 meg per thread, that is 1 gigabyte of address space. And 1 meg isn't that big.
In comparison, 2 gigs of heap serves all threads.
On 64 bit systems, often addressable memory is way less than 64 bits. At 40 bits, if you gave half of the address space to stacks and you had 10,000 threads, that is a mere 50 megs per stack.
48 bits is more common, but that still leaves you with mere gigabytes of address space per stack.
In comparison, the heap has tebibytes.
What more is that with a large object on the stack doesn't help much with cache coherance; no cpu cache can hold the tront and the back. Having to follow a single pointer is trivial if you are working heavily with it, and can even ensure the stack stays in cache better.
So (a) stack size cost scales with threads (b) address space can be limited (c) the benefits of mega stacks are small.
Finally, infinite recursion is a common bug. You want your stack to blow and trap (the binary loader surrounds the stack with trap pages often) before your user shell crashes from resource exhastion. A modest size stack makes that more likely.
It's totally possible to use a static array of 10k INTs. It's usually better to use "new" because when you're dealing with big chunks of data, you can't guarantee they'll always be the size of your array. Dynamic allocation of memory lets you use only what you need. It's more efficient.
If your curious about how the OS chooses stack sizes or "pages", read this:
https://en.m.wikipedia.org/wiki/Page_(computer_memory)
Also, I'm not sure you're using "heap" the right way. "heap" is a tree-based data structure. The Stack refers to the layout of the executable and process data in RAM.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm a little bit confused. In the OS course we were told that all OSes take care of memory fragmentation by paging or segmentation and there is no contiguous physical memory allocation at all. OS uses different levels of addressing (logical/physical) to avoid the contiguous memory allocation. Now here there are so many discussions about it. My question is:
Is this problem real in c++ programming for OSes supporting logical addressing (does any process crash just because of memory fragmentation)? if yes, why in the first place each OS tries to avoid contiguous addressing?
There are 2 layers: fragmentation in the virtual process address space and fragmentation in the physical memory.
If you look at any modern application, you can see how its memory usage grows over time as memory is not released to the OS. You can say this is caused by other things, but memory fragmentation (e.g. non-contiguous location of allocated memory chunks) is the core reason for this. In short, memory allocators refuse to release memory to the OS.
If you are interested about fragmentation in physical memroy, then even with memory organized in pages, there is still a need to allocate physically contiguous memory chunks. For example if you need to avoid virtual memory overhead, you might want to use large pages ("huge pages" in terms of Linux). x86_64 supports 4KiB, 2MiB and 1GiB pages. If there is no contiguous physical memory of the required size, you won't be able to use them.
If by OS you mean "kernel", then it cannot help you with fragmentation that happens in process address space (heap fragmentation). C library should try to avoid fragmentation, unfortunately, it is not always able to do so. See the linked question.
Memory allocator is usually not able to release large chunk of memory if there is at least something allocated in it. There is a partial solution to this that takes advantage of virtual memory organization in pages - so called "lazy-free" mechanism represented by MADV_FREE on Linux and BSDs and DiscardVirtualMemory on Windows. When you have a huge chunk of memory that is only partially used, you can notify the kernel that part of that memory is not needed anymore and that it can take it back under memory pressure. This is done lazily and only under memory pressure because memory deallocation is extremely expensive. But many memory allocators still do not use it for performance reasons.
So the answer to your question - it depends on how much you care about efficiency of your program. Most program do not care, as standard allocator just does the job for them. Some programs might suffer when standard allocator is not able to do its job efficiently.
OS is not avoiding contiguous memory allocation. At the top level you have hardware and software. Hardware has limited resources, physical memory in this case. To share the resource and to avoid user programs from taking care of it's sharing, virtual addressing layer was invented. It just maps contiguous virtual addressing space into sparse physical regions. In other words 0x10000 virtual address can point to 0x80000 physical address in one process and to 0xf0000 in another.
Paging and swapping means writing some pages or the whole app memory to disk and then bring it back at some point. It will most likely have different physical page mapping after it.
So, your program will always see contiguous virtual addressing space, which is really fragmented in physical hardware space. BTW, it is done with constant block sizes, and there is no waste or unused memory holes.
Now, the second level of fragmentation which is caused by the new/malloc functions and it is related to the fact that you allocate and delete different sizes of memory. This fragments your heap in virtual space. The functions make sure that there is as little waste as possible.
So, in your generic C++ (or any other language) programming you do not care about any of the memory fragmentation. All chunks which you allocate are guaranteed to be contiguous in virtual space (not necessarily in physical).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Is
WCHAR bitmapPathBuffer[512]
ok for stack allocation? or it is better to use heap for this size? What is reasonable indicative size when it is better to go from stack to heap... all says "is depens" but our brains need some limits to orients.
You might want to check your system's default stack size, and consider whatever use your application makes of recursion, to arrive at some reasonable threshold.
Anyway, for typical desktop PCs I'd say ~100kb was reasonable to put on the stack for function that won't be invoked recursively without any unusual considerations (I had to revise that downwards after seeing how restrictive Windows was below). You may be able to go an order of magnitude more or less on specific systems but it's around that point you'd start to care about checking your system limits.
If you find you're doing that in many functions, you'd better think carefully about whether those functions could be called from each other, or just allocate dynamically (preferably implicitly via use of vector, string etc.) and not worry about it.
The 100kb guideline is based on these default stack size numbers ripped from the 'net:
platform default size # bits # digits
===============================================================
SunOS/Solaris 8172K bytes <=39875 <=12003 (Shared Version)
Linux 8172K bytes <=62407 <=18786
Windows 1024K bytes <=10581 <=3185 (Release Version)
cygwin 2048K bytes <=3630 <=1092
As others have said, the answer to this question is dependent on the system on which you are running. In order to come to a sensible answer, you need to know:
The default stack size. This might be different for threads other than the main thread(!), or if you're using closures or a third-party threading or coroutine library.
Whether the system stack is dynamically resized. On some systems, the stack can grow automatically up to a point.
On some platforms (e.g. ARM or PowerPC-based systems) there is a “red zone”. If you are in a leaf function (one that calls no other functions), if your stack usage is less than the size of the red zone, the compiler can generate more efficient code.
As a general rule I'd agree with the other respondents that on a desktop system, 16–64KB or so is a reasonable limit, but even that depends on things like recursion depth. Certainly, large stack frames are a code smell and should be investigated to make sure they're necessary.
In particular, it's well worth contemplating the lengths of any buffers allocated on the stack… are they really large enough for any conceivable input? And are you checking that at runtime to avoid overrun? e.g. In your example, are you sure that bitmapPathBuffer is never longer than 512 WCHARs in length? If you don't know the maximum length for certain, the heap may be better. Even then, if it's an adversarial environment, you may care to put a large upper bound on it to avoid attacks involving memory exhaustion.
Answer is really "it depends".
If you have many such variables defined, or if you do relatively large stack allocations in your function and in functions this one calls, then it is possible that you will have stack overflow.
Typical default stack size for Win32 executable is 1MB. If you allocate more than that, you are in trouble and should change largest allocations to be on heap.
I would follow simple rule - if your allocations are more than say 16 - 64KB, then allocate on heap. Otherwise, it should be ok to allocate on stack.
Modern compilers under normal circumstances use stack size of about 1 megabyte. So 1 KB is not a problem for a simple program.
If the program is very complex, other functions in the call chain also use large portions of the stack, your current function is very deep in the call stack, etc., then you better avoid large automatic variables.
If you use recursion, then you should carefully consider how deep it can be.
If you write a function that will be used in other projects or by other people, then you never know whether it can be called in a recursive function or deep in the stack. So it's usually a good idea to avoid large automatic variables in this case.
There's no hard limit, but you might want to consider what
happens if the allocation fails. If allocation of a local
variable fails, your program crashes; if allocation of a dynamic
variable fails, you get (or should get) an exception. For this
reason, I tend to use dynamic allocation (in the form of
std::vector) for anything over about 1K. The fact that
std::vector does bounds checking (at least with the
implementations I use) when compiling without optimization is
also a plus.