C program with minimum RAM - c++

I want to understand the memory management in C and C++ programming for Application Development. Application will run on the PC.
If I want to make a program which uses RAM as less as possible while running, what are the points I need to consider while programming?
Here are two points according to what I understand, but I am not sure:
(1) Use minimum local variables in main() and other functions.
As local variables are saved in stack, which is RAM?
(2) Instead of local variables, use global variables on the top.
As global variables are saved in the uninitialized and initialized ROM area?
Thanks.

1) Generally the alternative to allocating on the stack is allocating on the heap (e.g., with malloc) which actually has a greater overhead due to bookkeeping/etc, and the stack already has memory reserved for it, so allocating on the stack where possible is often preferable. On the other hand there is less space on the stack while the heap can be close to “unlimited” on modern systems with virtual memory and 64-bit address space.
2) On PCs and other non-embedded system, everything in your program goes in RAM, i.e., it is not flashed to a ROM-like memory, so global versus local does not help in that regard. Also globals† tend to “live” as long as the application is running, while locals can be allocated and freed (either on the stack or heap) as required, and are thus preferable.
 
† More accurately, there can also be local variables with static duration, and variables with global scope that are pointers to dynamically allocated memory, so the terms local and global are used quite loosely here.
In general, modern desktop/laptop and even mobile operating systems are quite good at managing memory, so you probably shouldn't be trying to micro-optimize everything as you may actually do more harm than good.
If you really do need to bring down the memory footprint of your program, you must realize that everything in the program is stored in RAM, and so you need to work on reducing the number and size of the things you have, rather than trying to juggle their location. The other place where you can store things locally on a PC is the hard drive, so store large resources there and only load them as required (preferably only exactly the parts required). But remember that disk access is orders of magnitude slower than memory access, and that the operating system can also swap things out to disk if its memory gets full.
The program code itself is also stored in RAM, so have your compiler optimize for size (-Os or /Os option in many common compilers). Also remember that if you save a bit of space in variables by writing more complex code, the effort may be undone by the increased code size; save your optimizations for big wins (e.g., compressing large resources will require the added decompression code, but may still yield a large net win). Use of dynamically linked libraries (and other resources) also helps the overall memory footprint of the system if the same library is used by multiple programs running at the same time.
(Note that some of the above does not apply in embedded development, e.g., code and static constants may be indeed be stored in flash instead of RAM, etc.)

This is difficult because on your PC, the program will be running out of RAM unless you can somehow execute it out a ROM or Flash.
Here are the points to consider:
Reduce your code size.
Code takes up RAM.
Reduce variable quantity and size.
Variables need to live somewhere and that somewhere is in RAM.
Reduce character literals.
They too, take up space.
Reduce function call nesting.
A function may require parameters, which are placed in RAM.
A function that calls other functions needs a return path; the path is stored in RAM.
Use RAM from other devices.
Other devices, such as the Graphics Processor and your harddrive adaptor card, may have RAM you can use. If you use this RAM, you're not using the primary RAM.
Page memory to external device.
The OS is capable of virtual memory and can page memory out to an external device, such as a hard drive.
Edit 1 - Dynamic libraries
Too reduce the RAM footprint of your program, you could allocate a an area where you swap out library functions. This is similar to the DLL concept. When a function is needed, you load it from the hard drive into the reserved area.

Typically, a certain amount of space will be allocated for the stack; such space will be unavailable for other purposes whether or not it is used. If the space turns out to be inadequate, the program will die a gruesome death.
Local variables will be stored using some combination of registers and stack space. Some compilers will use the same registers or stack space for variables which are "live" at different times in a program's execution; others will not. Further, function arguments are typically pushed on the stack before calling a function and removed at the caller's convenience. In evaluating the code sequence:
function1(1,2,3,4,5);
function2(6,7,8,9,10);
the arguments for the first function will be pushed on the stack and that function will be called. At that point the compiler could remove those five values off the stack, but since a single instruction can remove any number of pushed values, many compilers will push the arguments for the second function (leaving the arguments of the first on the stack), call the second function, and then use one instruction to eliminate all ten. Normally this would be a non-issue, but in some deeply-nested recursive scenarios it could potentially be a problem.
Unless the "PC" you're developing for is tiny by today's standards, I wouldn't worry too much about trying to micro-optimize RAM usage. I've developed code for microcontrollers with only 25 bytes of RAM, and even written full-fledged games for use on a microprocessor-based console with a whopping 128 bytes (not KBytes!) of RAM, and on such system sit makes sense to worry about each individual byte. For PC applications, though, the only time it makes sense to worry about individual bytes is when they're part of a data structure which will get replicated many thousands of times in RAM.

You might want to get a book on "embedded" programming. Such a book will likely discuss ways to keep the memory footprint down, as embedded systems are more constrained than modern desktop or server systems.
When you use "local" variables, they are saved on the stack. As long as you don't use too much stack, this is basically free memory, as when the function exits the memory is returned. How much is "too much" varies... recently I had to work on a system where there is a limit of 8 KB of data for the stack per process.
When you use "global" variables or other static variables, the memory you use is tied up for the duration of the program. Thus you should minimize your use of globals, and/or find ways to share the same memory across multiple functions in your program.
I wrote a fairly elaborate "object manager" for a project I wrote a few years ago. A function can use the "get" operation to borrow an object, and then use the "release" operation when it is done borrowing the object. This means that all the functions in the system were able to share a relatively small amount of data space by taking turns using the shared objects. It's up to you to decide whether it is worth your time to build an "object manager" sort of thing or if you have enough memory to just use simple variables.
You can get much of the benefit of an "object manager" by simply calling malloc() and free() a lot. Then the heap allocator manages the shared resource, the heap memory, for you. The reason I wrote my own "object manager" was a need for speed. My system keeps using identical data objects, and it is way faster to just keep re-using the same ones than to keep freeing them and malloc-ing them again. Also, my system can be run on a DSP chip, and malloc() can be a surprisingly slow function on some DSP architectures.
Having multiple functions using the same global variables can lead you to tricky bugs, if one function tries to hold on to a global buffer while another one is overwriting the data. So your program will likely be more robust if you use malloc() and free() as long as each function only writes into data it allocated for itself. (But malloc() and free() can introduce bugs of their own: memory leaks, double-free errors, continuing to use a pointer after the data to which it points has been freed... if you use malloc() and free() be sure to use a tool such as Valgrind to check your code.)

Any variables, by definition, must be stored in read/write memory (or RAM). If you are talking about an embedded system with the code initially in ROM, then the runtime will copy the ROM image you identified into RAM to hold the values of the global variables.
Only items marked unchangeable (const) may be kept in the ROM during runtime.
Further, you need to reduce the depth of the calling structure of the program as each function call requires stack space (also in RAM) to record the return address and other values.
To minimise the use of memory, you can try to flag local variables with the register attribute, but this may not be honoured by your compiler.
Another common technique is to dynamically generate large variable data on the fly whenever it is required to avoid having to create buffers. These usually take up much more space the simple variables.

if this is a PC, then by default, you will be given a stack of a certain size ( you can make it bigger or smaller ). Using this stack is more efficient RAM wise than using global variables. because your ram usage will be the fixed stack size + globals + other stuff ( program, heap etc). The stack acts as a resuable piece of memory.

Related

Why do we need to create an object in heap?

why can we use the stack for all our needs?
NOTE: it will be very good if you give an example while explaining because it is easier to understand with examples.
sorry for bad English.
In practice the call stack is limited and small. The typical limit is a few megabytes. In contrast, you could often allocate gigabytes in heap memory.
(on some systems, you might configure the system to have a larger stack; but you need to tell your users if you need that)
Also, and most importantly, the call stack is a stack, so has a LIFO (last in first out) discipline. In many cases, you want to release objects in an order unrelated to their allocation or just in a "first allocated, first destroyed" order (and that is impossible on a stack).
Consider reading something about garbage collection, e.g. the GC handbook. It teaches you useful concepts and terminology about dynamic memory allocation (even for C programs with manual memory management). Read also about the virtual address space of your process (se also this answer, at least for Linux).
Another advantage of dynamic memory allocation is that the same executable can run on various computers (with various resources, in particular different amount of RAM), but won't be able to process the same amount of data. If you had to allocate all the memory statically, this would not be the case (e.g. a C program with 50 gigabytes of static data could not even start on my laptop).

Allocating on heap vs allocating on stack in recursive functions

When I'm defining a recursive function, is it better/safer to allocate local variables on heap, and then clean them up before the function returns, than to allocate them on stack. The stack size on embedded systems is very limited, and there is a danger of stack overflow, when the recursion runs too deep.
The answer depends on your application domain and the specifics of your platform. "Embedded system" is a fuzzy term. A big oscilloscope running MS Windows or Linux is on one end of the spectrum. Most of it can be programmed like a normal PC. They should not fail, but if they do, one simply reboots them.
On the other end of the spectrum are controllers in safety-critical fields. Consider these Siemens switches which must react under all circumstances within milliseconds. Failure is not an option. They probably do not run Windows. Available resources are limited. Programming rules and procedures are much different here.
Now let's examine the options you have, recursion (or not!) with dynamic or automatic memory allocation.
Performance-wise the stack is much faster, so it is preferred. Dynamic allocation also involves some memory overhead for book keeping which is big if the data units are small. And there can be issues with memory fragmentation which cannot occur with automatic memory (although the scenarios which lead to fragmentation -- different lifetimes of objects -- are probably not directly solvable without dynamic allocation).
But it is true that on some systems the stack size is (much) smaller than the heap size; you must read about the memory layout on your system. The oscilloscope will have lots of memory and big stacks; the power switch will not.
If you are concerned about running out of memory I would follow Christian's advice to avoid recursion altogether and instead use a loop. Iteration may just keep the memory usage flat. Besides, recursion always uses stack space, e.g. for the return address and -value. The idea to allocate "local" variables dynamically would only make sense for larger data structures because you would still have to hold pointers to the data as automatic variables, which use up space on the stack, too (and would increase overall memory foot print).
In general, on systems with limited resources it is important to limit the maximum resource use by your program. Time is also a resource; dynamic allocation makes real time near impossible.
The application domain dictates the safety requirements. In safety-critical fields (pacemaker!) the program must not fail at all. That ideal is impossible to achieve unless the program is trivial, but great efforts are made to come close. In other fields a program may fail in a defined way under circumstances it cannot handle, but it must not fail silently or in undefined ways (for example, by overwriting data). For example, instead of allocating unknown amounts of data dynamically one would just have a pre-defined array of fixed size for data and use the elements in the array instead, with bound checks.
When I'm defining a recursive function, is it better/safer to allocate local variables on heap, and then clean them up before the function returns, than to allocate them on stack.
You have both the C and C++ tags. Its a valid question for both, but I can only comment on C++.
In C++, its better to use the heap even though its slightly less efficient. That's be cause new can fail if you run out of heap memory. In the case of failure, new will throw an exception. However, running out of stack space does not cause an exception, and its one of the reasons alloca is frowned upon in C++.
I think generally speaking you should avoid recursion in embedded systems altogether, as every time the function is called the return address is pushed to the stack. This could cause an unexpected overflow. Try switching to a loop.
Back to your question though, mallocing will be slower but safer. If there is no heap space then malloc will return an error and you can safely clean up the memory. This comes at a large cost in speed as malloc can be quite slow.
If you know how many iterations you expect ahead of time, you have the option of mallocing an array of the variables that you need. That way you will only malloc once which saves time, and won't risk unexpectedly filling up the heap or stack. Also you will only be left with one variable to free.

Is it possible to partially free dynamically-allocated memory on a POSIX system?

I have a C++ application where I sometimes require a large buffer of POD types (e.g. an array of 25 billion floats) to be held in memory at once in a contiguous block. This particular memory organization is driven by the fact that the application makes use of some C APIs that operate on the data. Therefore, a different arrangement (such as a list of smaller chunks of memory like std::deque uses) isn't feasible.
The application has an algorithm that is run on the array in a streaming fashion; think something like this:
std::vector<float> buf(<very_large_size>);
for (size_t i = 0; i < buf.size(); ++i) do_algorithm(buf[i]);
This particular algorithm is the conclusion of a pipeline of earlier processing steps that have been applied to the dataset. Therefore, once my algorithm has passed over the i-th element in the array, the application no longer needs it.
In theory, therefore, I could free that memory in order to reduce my application's memory footprint as it chews through the data. However, doing something akin to a realloc() (or a std::vector<T>::shrink_to_fit()) would be inefficient because my application would have to spend its time copying the unconsumed data to the new spot at reallocation time.
My application runs on POSIX-compliant operating systems (e.g. Linux, OS X). Is there any interface by which I could ask the operating system to free only a specified region from the front of the block of memory? This would seem to be the most efficient approach, as I could just notify the memory manager that, for example, the first 2 GB of the memory block can be reclaimed once I'm done with it.
If your entire buffer has to be in memory at once, then you probably will not gain much from freeing it partially later.
The main point on this post is basically to NOT tell you to do what you want to do, because the OS will not unnecessarily keep your application's memory in RAM if it's not actually needed. This is the difference between "resident memory usage" and "virtual memory usage". "Resident" is what is currently used and in RAM, "virtual" is the total memory usage of your application. And as long as your swap partition is large enough, "virtual" memory is pretty much a non-issue. [I'm assuming here that your system will not run out of virtual memory space, which is true in a 64-bit application, as long as you are not using hundreds of terabytes of virtual space!]
If you still want to do that, and want to have some reasonable portability, I would suggest building a "wrapper" that behaves kind of like std::vector and allocates lumps of some megabytes (or perhaps a couple of gigabytes) of memory at a time, and then something like:
for (size_t i = 0; i < buf.size(); ++i) {
do_algorithm(buf[i]);
buf.done(i);
}
The done method will simply check if the value if i is (one element) past the end of the current buffer, and free it. [This should inline nicely, and produce very little overhead on the average loop - assuming elements are actually used in linear order, of course].
I'd be very surprised if this gains you anything, unless do_algorithm(buf[i]) takes quite some time (certainly many seconds, probably many minutes or even hours). And of course, it's only going to help if you actually have something else useful to do with that memory. And even then, the OS will reclaim memory that isn't actively used by swapping it out to disk, if the system is short of memory.
In other words, if you allocate 100GB, fill it, leave it sitting without touching, it will eventually ALL be on the hard-disk rather than in RAM.
Further, it is not at all unusual that the heap in the application retains freed memory, and that the OS does not get the memory back until the application exits - and certainly, if only parts of a larger allocation is freed, the runtime will not release it until the whole block has been freed. So, as stated at the beginning, I'm not sure how much this will actually help your application.
As with everything regarding "tuning" and "performance improvements", you need to measure and compare a benchmark, and see how much it helps.
Is it possible to partially free dynamically-allocated memory on a POSIX system?
You can not do it using malloc()/realloc()/free().
However, you can do it in a semi-portable way using mmap() and munmap(). The key point is that if you munmap() some page, malloc() can later use that page:
create an anonymous mapping using mmap();
subsequently call munmap() for regions that you don't need anymore.
The portability issues are:
POSIX doesn't specify anonymous mappings. Some systems provide MAP_ANONYMOUS or MAP_ANON flag. Other systems provide special device file that can be mapped for this purpose. Linux provides both.
I don't think that POSIX guarantees that when you munmap() a page, malloc() will be able to use it. But I think it'll work an all systems that have mmap()/unmap().
Update
If your memory region is so large that most pages surely will be written to swap, you will not loose anything by using file mappings instead of anonymous mappings. File mappings are specified in POSIX.
If you can do without the convenience of std::vector (which won't give you much in this case anyway because you'll never want to copy / return / move that beast anyway), you can do your own memory handling. Ask the operating system for entire pages of memory (via mmap) and return them as appropriate (using munmap). You can tell mmap via its fist argument and the optional MAP_FIXED flag to map the page at a particular address (which you must ensure to be not otherwise occupied, of course) so you can build up an area of contiguous memory. If you allocate the entire memory upfront, then this is not an issue and you can do it with a single mmap and let the operating system choose a convenient place to map it. In the end, this is what malloc does internally. For platforms that don't have sys/mman.h, it's not difficult to fall back to using malloc if you can live with the fact that on those platforms, you won't return memory early.
I'm suspecting that if your allocation sizes are always multiples of the page size, realloc will be smart enough not to copy any data. You'd have to try this out and see if it works (or consult your malloc's documentation) on your particular target platform, though.
mremap is probably what you need. As long as you're shifting whole pages, you can do a super fast realloc (actually the kernel would do it for you).

Is program runtime affected by where the objects reside in the memory?

My program allocates all of its resources which is slightly below 1MB in startup and no more, except primitive local variables. The allocation took place originally by malloc, so on the heap, but I wondered whether there will be any difference by putting them on the stack.
In various tests with program runtime from 3 seconds to 3 minutes. Accessing the stack steadily appears to be faster up to 10%. All I changed was whether to malloc the structs or to declare them as automatic variables.
Another interesting fact I found is that when I declare the objects as static. The program will run 20~30% slower. I have no idea why. I double checked whether I made a mistake but the only difference really is whether static is there or not. Do static variables go somewhere else in the stack than automatic variables?
Before I had quite an opposite experience that in a C++ class, when I made a const member array from non-static to static, the program did run faster. The memory consumption was same because there was only one instance of that object.
Is program runtime affected by where the objects reside in the memory? Even if so, can't the compiler manage to place the objects in the right place for maximum efficiency?
Well, yeah, program performance is affected by where objects reside in memory.
The problem is, unless you have intimate knowledge of how your compiler works and how it uses features of your particular host system (operating system services, hardware, processor cache, etc), and how those things are configured, you will not be able to consistently exploit it. Even if you do succeed, small changes (e.g. upgrading a compiler, changing optimisation settings, a change of process quotas, changing amount of physical memory, changing a hard drive [e.g. that is used for swap space]) can affect performance, and it won't always be easy to predict whether a change will improve or degrade performance). Performance is sensitive to all these things - and to the interaction between them, in ways that are not always obvious without close analysis.
Is program runtime affected by where the objects reside in the memory?
Yes, program performance will be affected by the location of an object in memory, among other factors.
Whenever an object in the heap is accessed, it's done by dereferencing a pointer. Dereferencing pointers requires additional computation to find the next memory address and every additional pointer that's between you and your data (e.g. pointer-> pointer -> actual data) will make it slightly worse. In addition to this, it increases the cache miss rate for the CPU because the data that it expects to access next is not really in contiguous memory blocks. In other words, the assumption the CPU made to try and optimize its pipeline turns out to be false, and it pays the penalty for an incorrect prediction.
Even if so, can't the compiler manage to place the objects in the
right place for maximum efficiency?
The C/C++ compilers will place the objects at whatever location you tell it to. If you're using malloc, you shouldn't expect the compiler to put things on the stack, and vice-versa.
The compiler wouldn't be able to do this even in principle because malloc dynamically allocates memory at runtime, long after the compiler's job has ended. What is known at compile time is the size for the stack, but how the contents of this memory are organized depends entirely on you.
You might be able to get some benefits from compiler optimization settings, but most of your benefits in optimization efforts will be better spent improving data structures and/or algorithms being used.

C++ Some Stack questions

Let me start by saying that I have read this tutorial and have read this question. My questions are:
How big can the stack get ? Is it
processor/architecture/compiler
dependent ?
Is there a way to know exactly how
much memory is available to my
function/class stack and how much is
currently being used in order to
avoid overflows ?
Using modern compilers (say gcc 4.5)
on a modern computer (say 6 GB ram),
do I need to worry for stack
overflows or is it a thing of the
past ?
Is the actual stack memory
physically on RAM or on CPU cache(s) ?
How much faster is stack memory
access and read compared to heap
access and read ? I realize that
times are PC specific, so a ratio is
enough.
I've read that it is not advisable
to allocate big vars/objects on the
stack. How much is too big ? This
question here is given an answer
of 1MB for a thread in win32. How
about a thread in Linux amd64 ?
I apologize if those questions have been asked and answered already, any link is welcome !
Yes, the limit on the stack size varies, but if you care you're probably doing something wrong.
Generally no you can't get information about how much memory is available to your program. Even if you could obtain such information, it would usually be stale before you could use it.
If you share access to data across threads, then yes you normally need to serialize access unless they're strictly read-only.
You can pass the address of a stack-allocated object to another thread, in which case you (again) have to serialize unless the access is strictly read-only.
You can certainly overflow the stack even on a modern machine with lots of memory. The stack is often limited to only a fairly small fraction of overall memory (e.g., 4 MB).
The stack is allocated as system memory, but usually used enough that at least the top page or two will typically be in the cache at any given time.
Being part of the stack vs. heap makes no direct difference to access speed -- the two typically reside in identical memory chips, and often even at different addresses in the same memory chip. The main difference is that the stack is normally contiguous and heavily used, do the top few pages will almost always be in the cache. Heap-based memory is typically fragmented, so there's a much greater chance of needing data that's not in the cache.
Little has changed with respect to the maximum size of object you should allocate on the stack. Even if the stack can be larger, there's little reason to allocate huge objects there.
The primary way to avoid memory leaks in C++ is RAII (AKA SBRM, Stack-based resource management).
Smart pointers are a large subject in themselves, and Boost provides several kinds. In my experience, collections make a bigger difference, but the basic idea is largely the same either way: relieve the programmer of keeping track of every circumstance when a particular object can be used or should be freed.
1.How big can the stack get ? Is it processor/architecture/compiler dependent ?
The size of the stack is limited by the amount of memory on the platform and the amount of memory allocated to the process by the operating system.
2.Is there a way to know exactly how much memory is available to my function/class stack and how much is currently being used in order to avoid overflows ?
There is no C or C++ facility for determining the amount of available memory. There may be platform specific functions for this. In general, most programs try to allocate memory, then come up with a solution for when the allocation fails.
3.Using modern compilers (say gcc 4.5) on a modern computer (say 6 GB ram), do I need to worry for stack overflows or is it a thing of the past ?
Stack Overflows can happen depending on the design of the program. Recursion is a good example of depleting the stack, regardless of the amount of memory.
4.Is the actual stack memory physically on RAM or on CPU cache(s) ?
Platform dependent. Some CPU's can load up their cache with local variables on the stack. Wide variety of scenarios on this topic. Not defined in the language specification.
5.How much faster is stack memory access and read compared to heap access and read ?
I realize that times are PC specific, so a ratio is enough.
Usuallly there is no difference in speed. Depends on how the platform organizes its memory (physically) and how the executable's memory is laid out. The heap or stack could reside in a serial access memory chip (a slow method) or even on a Flash memory chip. Not specified in the language specification.
6.I've read that it is not advisable to allocate big vars/objects on the stack. How much is too big ? This question here is given an answer of 1MB for a thread in win32. How about a thread in Linux amd64 ?
The best advice is to allocate local small variables as needed (a.k.a. via stack). Huge items are either allocted from dynamic memory (a.k.a. heap), or some kind of global (static local to function or local to translation unit or even global variable). If the size is known at compile time, use the global type allocation. Use dynamic memory when the size may change during run-time.
The stack also contains information about function addresses. This is one major reason to not allocate a lot of objects locally. Some compilers have smaller limits for stacks than for heap or global variables. The premise is that nested function calls require less memory than large data arrays or buffers.
Remember that when switching threads or tasks, the OS needs to save the state somewhere. The OS may have different rules for saving stack memory versus other types.
1-2 : On some embedded CPUs the stack may be limited to a few kbytes; on some machines it may expand to gigabytes. There's no platform-independent way to know how big the stack can get, in some measure because some platforms are capable of expanding the stack when they reach the limit; the success of such an operation cannot always be predicted in advance.
3 : The effects of nearly-simultaneous writes, or of writes in one thread that occur nearly simultaneously with reads in another, are largely unpredictable in the absence of locks, mutexes, or other such devices. Certain things can be assumed (for example, if one thread reads a heap-stored 'int' while another thread changes it from 4 to 5, the first thread may see 4 or it may see 5; on most platforms, it would be guaranteed not to see 27).
4 : Some platforms share stack address space among threads; others do not. Passing pointers to things on the stack is usually a bad idea, though, since the the foreign thread receiving the pointer will have no way of ensuring that the target is in scope and won't go out of scope.
5 : Generally one does not need to worry about stack space in any routine which is written to limit recursion to a reasonable level. One does, however, need to worry about the possibility of defective data structures causing infinite recursion, which would wipe out any stack no matter how large it might be. One should also be mindful of the possibility of nasty input which would cause a much greater stack depth than expected. For example, a compiler using a recursive-descent parser might choke if fed a file containing a billion repetitions of the sequence "1+(". Even if the machine has a gig of stack space, if each nested sub-expression uses 64 bytes of stack, the aforementioned three-gig file could kill it.
6 : Stack is stored generally in RAM and/or cache; the most-recently-accessed parts will generally be in cache, while the less-recently-accessed parts will be in main memory. The same is generally true of code, heap, and static storage areas as well.
7 : That is very system dependent; generally, "finding" something on the heap will take as much time as accessing a few things on the stack, but in many cases making multiple accesses to different parts of the same heap object can be as fast as accessing a stack object.