struct A { int i; };
...
A *p = (A*) (8); // or A *p = 0;
p->i = 5; // Undefined Behavior according C/C++ standard
However, practically most of the system would crash (segmentation fault) for such code.
Does it mean that all such Architectures/Systems have a hidden check for pointer indirection (i.e. p->) to verify if it's accessing a wrong memory location ?
If yes, then it implies that even in perfectly working code we are paying the price for that extra check, correct ?
There are generally no extra hidden checks, this is just an effect of using virtual memory.
Some of the potential virtual addresses are just not mapped to physical memory, so translating things like 8 will likely fail.
Yes, you are paying the price for that extra check. It's not just for pointer indirection, but any memory access (other than, say, DMA). However, the cost of the check is very small.
While your process is running, the page table does not change very often. Parts of the page table will be cached in the translation lookaside buffer, accessing pages with entries in the buffer incur no additional penalty.
If your process accesses a page without a TLB entry, then the CPU must make an additional memory access to fetch the page table entry for that page. It will then be cached.
You can see the effect of this in action by writing a test program. Give your test program a big chunk of memory and start randomly reading and writing locations in memory. Use a command line parameter to change the size.
Above the L1 cache size, performance will drop due to L2 cache latency.
Above the L2 cache size, performance will drop to RAM latency.
Above the size of the memory addressed by the TLB, performance will drop due to TLB misses. (This might happen before or after you run out of L2 cache space, depending on a number of factors.)
Above the size of available RAM, performance will drop due to swapping.
Above the size of available swap space and RAM, the application will be terminated by the OS.
If your operating system allows "big pages", the TLB might be able to cover a very large address space indeed. Perhaps you can sabotage the OS by allocating 4k chunks from mmap, in which case the TLB misses might be felt with only a few megs of working set, depending on your processor.
However: The small performance drop must be weighed against the benefits of virtual memory, which are too numerous to list here.
No, not correct. Those exact same checks are absolutely needed on valid memory accesses for two reasons:
1) Otherwise, how would the system know what physical memory you were accessing and whether the page was already resident?
2) Otherwise, how would the operating system know which pages of physical memory to page out if physical memory became tight?
It's integrated into the entire virtual memory system and part of what makes modern computers perform so amazingly well. It's not any kind of separate check, it's part of the process that determines which page of physical memory the operation is accessing. It's part of what makes copy-on-write work. (The very same check detects when a copy is needed.)
A segmentation fault is an attempt to access memory that the CPU cannot physically address. It occurs when the hardware notifies an operating system about a memory access violation. So I think there is no extra check as such If an attempt to access the memory location fails the hardware notifies the OS which then then sends a signal to the process which caused the exception. By default, the process receiving the signal dumps core and terminates.
First of all, you need to read and understand this: http://en.wikipedia.org/wiki/Virtual_memory#Page_tables
So what typically happens is, when a process attempts to dereference an invalid virtual memory location, the OS catches the page fault exception raised by the MMU (see link above) for the invalid virtual address (0x0, 0x8, whatever). The OS then looks up the address in its page table, doesn't find it, and issues a SIGSEGV signal (or similar) to the process which causes the process to crash.
The difference between a valid and invalid address is whether the OS has allocated a page for that address range. Most OSes are designed to never allocate the first page (the one starting at 0x0) so that NULL dereferences will always crash.
So what you're calling an "extra check" is really the same check that occurs for every single page fault, valid address or not -- it's just a matter of whether the page table lookup succeeds.
Related
I'm currently working on a project using the Zynq-7000 SoC. We have a custom DMA IP in PL to provide faster transactions between peripherals and main memory. The peripherals are generally serial devices such as UART. The data received by the serial device is transferred immediately to the main memory by DMA.
What I try to do is to reach the data stored at a pre-determined location of the memory. Before reading the data, I invalidate the related cache lines using a function provided by xil_cache.h library as below.
Xil_DCacheInvalidateRange(INTPTR adr, u32 len);
The problem here is that this function flushes the related cache lines before invalidating them. Due to flushing, the stored data is overwritten. Hence, every time I fetch the corrupted bytes. The process has been explained in library documentation as below.
If the address to be invalidated is not cache-line aligned, the
following choices are available:
Invalidate the cache line when
required and do not bother much for the side effects. Though it sounds
good, it can result in hard-to-debug issues. The problem is, if some
other variables are allocated in the same cache line and had been
recently updated (in cache), the invalidation would result in loss of
data.
Flush the cache line first. This will ensure that if any
other variable presents in the same cache line and updated recently are
flushed out to memory. Then it can safely be invalidated. Again it
sounds good, but this can result in issues. For example, when the
invalidation happens in a typical ISR (after a DMA transfer has
updated the memory), then flushing the cache line means, losing data
that were updated recently before the ISR got invoked.
As you can guess that I cannot always allocate a memory region that has a cache-line aligned address. Therefore, I follow a different way to solve the problem so that I calculate the cache-line aligned address which is located in memory right before my buffer. Then I call the invalidation method with that address. Note that the Zynq's L2 Cache is an 8-way set-associative 512KB cache with a fixed 32-byte line size. This is why I mask the last 5 bits of the given memory address. (Check the section 3.4: L2 Cache in Zynq's documentation)
INTPTR invalidationStartAddress = INTPTR(uint32_t(dev2memBuffer) - (uint32_t(dev2memBuffer) & 0x1F));
Xil_DCacheInvalidateRange(invalidationStartAddress, BUFFER_LENGTH);
This way I can solve the problem but I'm not sure if I'm violating any of the resources that are placed before the resource allocated for DMA.(I would like to add that the referred resource is allocated at heap using the dynamic allocation operator new.) Is there a way to overcome this issue, or am I overthinking it? I believe that this problem could be solved better if there was a function to invalidate the related cache lines without flushing them.
EDIT: Invalidating resources that are not residing inside the allocated area violates the reliability of variables placed close to the referred resource. So, the first solution is not applicable. My second solution is to allocate a buffer that is 32-byte bigger than the required one and crop its unaligned part. But, this one also can cause the same problem as its last part*(parts = 32-byte blocks)* is not guaranteed to have 32 bytes. Hence, it might corrupt the resources placed next to it. The library documentation states that:
Whenever possible, the addresses must be cache-line aligned. Please
note that not just the start address, even the end address must be
cache-line aligned. If that is taken care of, this will always work.
SOLUTION: As I stated in the last edit, the only way to overcome the problem was to allocate a memory region with a Cache-Aligned address and length. I'm not able to determine the start address of the allocated area, hence I've decided to allocate a space that is two Cache-Blocks bigger than the requested one and crop the unaligned parts. The unalignment can occur at the first or the last block. In order not to violate the destruction of the resources, I saved the originally allocated address carefully and used the Cache-Aligned one in all of the operations.
I believe that there are better solutions to the problem and I keep the question open.
Your solution is correct. There is no way to flush a subset of a cache line.
Normally this behavior is transparent to programs but it becomes visible in multithreaded code and when sharing memory with hardware accelerators.
Processes in OS have their own virtual address spaces. Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
virtual address space also has some read only chunk of memory. How does it protect that?
TL;DR No, it's not allowed.
In your case, when you got a valid non-NULL pointer to a memory address returned by malloc(), only the requested size of memory is allocated to your process and you're allowed to use (read and / or write) into that much space only.
In general, any allocated memory (compile-time or run-time) has an associated size with it. Either overrunning or underruning the allocated memory area is considered invalid memory access, which invokes undefined behavior.
Even if, the memory is accessible and inside the process address space, there's nothing stopping the OS/ memory manager to return the pointer to that particular address, so, at best, either your previous write will be overwritten or you will be overwriting some other value. The worst case, as mentioned earlier, UB.
Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
What addresses you can read/write/execute from are based on a processes current memory map, which is set up by the operating system.
On my linux box, if I run pmap on my current shell, I see something like this:
evaitl#bb /proc/13151 $ pmap 13151
13151: bash
0000000000400000 976K r-x-- bash
00000000006f3000 4K r---- bash
00000000006f4000 36K rw--- bash
00000000006fd000 24K rw--- [ anon ]
0000000001f25000 1840K rw--- [ anon ]
00007ff7cce36000 44K r-x-- libnss_files-2.23.so
00007ff7cce41000 2044K ----- libnss_files-2.23.so
00007ff7cd040000 4K r---- libnss_files-2.23.so
00007ff7cd041000 4K rw--- libnss_files-2.23.so
00007ff7cd042000 24K rw--- [ anon ]
...
[many more lines here...]
Each line has a base address, a size, and the permissions. These are considered memory segments. The last line either says what is being mapped in. bash is my shell. anon means this is allocated memory, perhaps for bss, maybe heap from malloc, or it could be a stack.
Shared libraries are also mapped in, that is where the the libnns_files lines come from.
When you malloc some memory, it will come from an anonymous program segment. If there isn't enough space in the current anon segment being used for the heap, the OS will increase its size. The permissions in those segments will almost certainly be rw.
If you try to read/write outside of space you allocated, behavior is undefined. In this case that means that you may get lucky and nothing happens, or you may trip over an unmapped address and get a SIGSEGV signal.
Now, I try to read what is written on that location which should be fine
It is not fine. According to the C++ standard, reading uninitialized memory has undefined behaviour.
but what about writing to that location?
Not fine either. Reading or writing unallocated memory also has undefined behaviour.
Sure, the memory address that you ended up in might be allocated - it's possible. But even if it happens to be, the pointer arithmetic outside of bounds of the allocation is already UB.
virtual address space also has some read only chunk of memory. How does it protect that?
This one is out of scope of C++ (and C) since it does not define virtual memory at all. This may differ across operating systems, but at least one approach is that when the process requests memory from the OS, it sends flags that specify the desired protection type. See prot argument in the man page of mmap as an example. The OS in turn sets up the virtual page table accordingly.
Once the protection type is known, the OS can raise an appropriate signal if the protection has been violated, and possibly terminate the process. Just like it does when a process tries to access unmapped memory. The violations are typically detected by the memory management unit of the CPU.
Processes in OS have their own virtual address spaces. Say, I allocate
some dynamic memory using malloc() function call in a c program and
subtract some positive value(say 1000) from the address returned by
it. Now, I try to read what is written on that location which should
be fine but what about writing to that location?
No, it should not be fine, since only the memory region allocated by malloc() is guaranteed to be accessible. There is no guarantee that the virtual address space is contiguous, and thus the memory addresses before and after your region are accessible (i.e. mapped to virtual address space).
Of course, no one is stopping you from doing so, but the behaviour will be really undefined. If you access non-mapped memory address, it will generate a page fault exception, which is a hardware CPU exception. When it is handled by the operating system, it will send SIGSEGV signal or access violation exception to your application (depending ot the OS).
virtual address space also has some read only chunk of memory. How
does it
protect that?
First it's important to note that virtual memory mapping is realized partly by an external hardware component, called a memory management unit. It might be integrated in the CPU chip, or not. Additionally to being able to map various virtual memory addresses to physical ones, it supports also marking these addresses with different flags, one of which enables and disables writing protection.
When the CPU tries to write on virtual address, marked as read-only, thus write-protected, (for examble by MOV instruction), the MMU fires a page fault exception on the CPU.
Same goes for trying to access a non-present virtual memory pages.
In the C language, doing arithmetic on a pointer to produce another pointer that does not point into (or one-past-the-end) the same object or array of objects is undefined behavior: from 6.5.6 Additive Operators:
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated
(for the purposes of this clause, a non-array object is treated as an array of length 1)
You could get unlucky and the compiler could produce still produce a pointer you're allowed to do things with and then doing things with them will do things — but precisely what those things are is anybody's guess and will be unreliable and often difficult to debug.
If you're lucky, the compiler produces a pointer into memory that "does not belong to you" and you get a segmentation fault to alert you to the problem as soon as you try to read or write through it.
How the system behaves when you read/write an unmapped memory address depends basically on your operating system implementation. Operating systems normally behave differently when you try to access an unmapped virtual address. What happens when you try one access to an unmapped (or mapped for not-memory ---for example to map a file in memory) the operating system is taking the control (by means of a trap) and what happens then is completely operating system dependant. Suppose you have mapped the video framebuffer somewhere in your virtual address... then, writing there makes the screen change. Suppose you have mapped a file, then reading/writing that memory means reading or writing a file. Suppose you (the process running) try to access a swapped zone (due to physical memory lack your process has been partially swapped) your process is stopped and work for bringing that memory from secondary storage is begun, and then the instruction will be restarted. For example, linux generates a SIGSEGV signal when you try to access memory not allocated. But you can install a signal handler to be called upon receiving this signal and then, trying to access unallocated memory means jumping into a piece of code in your own program to deal with that situation.
But think that trying to access memory that has not been correctly acquired, and more in a modern operating system, normally means that your program is behaving incorrectly, and normally it will crash, letting the system to take the control and it will be killed.
NOTE
malloc(3) is not a system call, but a library function that manages a variable size allocation segment on your RAM, so what happens if you try to access even the first address previous to the returned one or past the last allocated memory cell, means undefined behaviour. It does not mean you have accessed unallocated memory. Probably you will be reading a perfectly allocated piece of memory in your code or in your data (or the stack) without knowing. malloc(3) tends to ask the operating system for continous large amounts of memory to be managed for many malloc calls between costly asking the operating system for more memory. See sbrk(2) or memmap(2) system calls manpages for getting more on this.
For example, either linux or bsd unix allocate an entry in the virtual address space of each process at page 0 (for the NULL address) to make the null pointer invalid access, and if you try to read or write to this address (or all in that page) you'll get a signal (or your process killed) Try this:
int main()
{
char *p = 0; /* p is pointing to the null address */
p[0] = '\n'; /* a '\n' is being written to address 0x0000 */
p[1] = '\0'; /* a '\0' is being written to address 0x0001 */
}
This program should fail at runtime on all modern operating systems (try to compile it without optimization so the compiler doesn't eliminate the code in main, as it does effectively nothing) because you are trying to access an already allocated (for specific purposes) page of memory.
The program on my system (mac OS X, a derivative from BSD unix) just does the following:
$ a.out
Segmentation fault: 11
NOTE 2
Many modern operating systems (mostly unix derived) implement a type of memory access called COPY ON WRITE. This means that you can access that memory and modify it as you like, but the first time you access it for writing, a page fault is generated (normally, this is implemented as you receiving a read only page, letting the fault to happen and making the individual page copy to store your private modifications) This is very effective on fork(2), that normally are followed by an exec(2) syscall (only the pages modified by the program are actually copied before the process throws them all, saving a lot of computer power)
Another case is the stack growing example. Stack grows automatically as you enter/leave stack frames in your program, so the operating system has to deal with the page faults that happen when you PUSH something on the stack and that push crosses a virtual page and goes into the unknown. When this happens, the OS automatically allocates a page and converts that region (the page) into more valid memor (read-write normally).
Technically, a process has a logical address. However, that often gets conflated into a virtual address space.
The number of virtual addresses that can be mapped into that logical address space can be limited by:
Hardware
System resources (notably page file space)
System Parameters (e.g., limiting page table size)
Process quotas
Your logical address space consists of an array of pages that are mapped to physical page frames. Not every page needs to have such a mapping (or even is likely to).
The logical address space is usually divided into two (or more) areas: system (common to all processes) and user (created for each process).
Theoretically, there is nothing in the user space to being a process with, only the system address space exists.
If the system does not use up its entire range of logical addresses (which is normal), unused addresses cannot be accessed at all.
Now your program starts running. The O/S has mapped some pages into your logical address space. Very little of that address space it likely to be mapped. Your application can map more pages into the unmapped pages of logical address space.
Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
The processor uses a page table to map logical pages to physical page frames. If you do you say a number of things can happen:
There is no page table entry for the address => Access violation. Your system may not set up a page table that can span the entire logical address space.
There is a page table entry for the address but it is marked invalid => Access Violation.
You are attempting to access a page that is not accessible in your current processor mode (e.g., user mode access to a page that only allows kernel mode access) => Access Violation.
virtual address space also has some read only chunk of memory. How does it protect that?
You are attempting to access a page that in a manner not permitted to the page (e.g., write to readonly page, execute to a no execute page) => Access Violation The access allowed to a page is defined in the page table.
[Ignoring page faults]
If you make it though those tests, you can access the random memory address.
It does not. It's actually you duty as a programmer to handle this
This was an question I asked myself when I was a student, but failing to get a satisfying answer, I got it little by little out my mind... till today.
I known I can deal with an allocation memory error either by checking if the returned pointer is NULL or by handling the bad_alloc exception.
Ok, but I wonder: How and why the call of new can fail? Up to my knowledge, an allocation memory can fail if there is not enough space in the free store. But does this situation really occur nowadays, with several GB of RAM (at least on a regular computer; I am not talking about embedded systems)? Can we have other situations where an allocation memory failure may occur?
Although you've gotten a number of answers about why/how memory could fail, most of them are sort of ignoring reality.
In reality, on real systems, most of these arguments don't describe how things really work. Although they're right from the viewpoint that these are reasons an attempted memory allocation could fail, they're mostly wrong from the viewpoint of describing how things are typically going to work in reality.
Just for example, in Linux, if you try to allocate more memory than the system has available, your allocation will not fail (i.e., you won't get a null pointer or a strd::bad_alloc exception). Instead, the system will "over commit", so you get what appears to be a valid pointer -- but when/if you attempt to use all that memory, you'll get an exception, and/or the OOM Killer will run, trying to free memory by killing processes that use a lot of memory. Unfortunately, this may about as easily kill the program making the request as other programs (in fact, many of the examples given that attempt to cause allocation failure by just repeatedly allocating big chunks of memory should probably be among the first to be killed).
Windows works a little closer to how the C and C++ standards envision things (but only a little). Windows is typically configured to expand the swap file if necessary to meet a memory allocation request. This means that what as you allocate more memory, the system will go semi-crazy with swapping memory around, creating bigger and bigger swap files to meet your request.
That will eventually fail, but on a system with lots of drive space, it might run for hours (most of it madly shuffling data around on the disk) before that happens. At least on a typical client machine where the user is actually...well, using the computer, he'll notice that everything has dragged to a grinding halt, and do something to stop it well before the allocation fails.
So, to get a memory allocation that truly fails, you're typically looking for something other than a typical desktop machine. A few examples include a server that runs unattended for weeks at a time, and is so lightly loaded that nobody notices that it's thrashing the disk for, say, 12 hours straight, or a machine running MS-DOS or some RTOS that doesn't supply virtual memory.
Bottom line: you're basically right, and they're basically wrong. While it's certainly true that if you allocate more memory than the machine supports, that something's got to give, it's generally not true that the failure will necessarily happen in the way prescribed by the C++ standard -- and, in fact, for typical desktop machines that's more the exception (pardon the pun) than the rule.
Apart from the obvious "out of memory", memory fragmentation can also cause this. Imagine a program that does the following:
until main memory is almost full:
allocate 1020 bytes
allocate 4 bytes
free all the 1020 byte blocks
If the memory manager puts all these sequentially in memory in the order they are allocated, we now have plenty of free memory, but any allocation larger than 1020 bytes will not be able to find a contiguous space to put them, and fail.
Usually on modern machines it will fail due to scarcity of virtual address space; if you have a 32 bit process that tries to allocate more than 2/3 GB of memory1, even if there would be physical RAM (or paging file) to satisfy the allocation, simply there won't be space in the virtual address space to map such newly allocated memory.
Another (similar) situation happens when the virtual address space is heavily fragmented, and thus the allocation fails because there's not enough contiguous addresses for it.
Also, running out of memory can happen, and in fact I got in such a situation last week; but several operating systems (notably Linux) in this case don't return NULL: Linux will happily give you a pointer to an area of memory that isn't already committed, and actually allocate it when the program tries to write in it; if at that moment there's not enough memory, the kernel will try to kill some memory-hogging processes to free memory (an exception to this behavior seems to be when you try to allocate more than the whole capacity of the RAM and of the swap partition - in such a case you get a NULL upfront).
Another cause of getting NULL from a malloc may be due to limits enforced by the OS over the process; for example, trying to run this code
#include <cstdlib>
#include <iostream>
#include <limits>
void mallocbsearch(std::size_t lower, std::size_t upper)
{
std::cout<<"["<<lower<<", "<<upper<<"]\n";
if(upper-lower<=1)
{
std::cout<<"Found! "<<lower<<"\n";
return;
}
std::size_t mid=lower+(upper-lower)/2;
void *ptr=std::malloc(mid);
if(ptr)
{
free(ptr);
mallocbsearch(mid, upper);
}
else
mallocbsearch(lower, mid);
}
int main()
{
mallocbsearch(0, std::numeric_limits<std::size_t>::max());
return 0;
}
on Ideone you find that the maximum allocation size is about 530 MB, which is probably a limit enforced by setrlimit (similar mechanisms exist on Windows).
it varies between OSes and can often be configured; the total virtual address space of a 32 bit process is 4 GB, but on all the current mainstream OSes a big chunk of it (the upper 2 GB by for 32 bit Windows with default settings) is reserved for kernel data.
The amount of memory available to the given process is finite. If the process exhausts its memory, and tries to allocate more, the allocation would fail.
There are other reasons why an allocation could fail. For example, the heap could get fragmented and not have a single free block large enough to satisfy the allocation request.
#include <iostream>
using namespace std;
int main(void)
{
int *ptr = new int;
cout << "Memory address of ptr:" << ptr << endl;
cin.get();
delete ptr;
return 0;
}
Every time I run this program, I get the same memory address for ptr. Why?
[Note: my answer assumes you're working with a modern OS that uses a virtual memory system.]
Due to virtual memory, each process operates in its own unique address space, which is independent of and unaffected by any other process. The address you get from new is a virtual address, and is generated by whatever your compiler's implementation of new chooses to do.* There's no reason this couldn't be deterministic.
On the other hand, the physical address associated with your virtual memory address will most likely be different every time, and will be affected by all sorts of things. This mapping is controlled by the OS.
* new is probably implemented in terms of malloc.
I'd say it's mostly coincidence. As the memory allocator/OS can give you whatever address it wants.
The addresses you get are obviously not uniformly random (and is highly dependent on other OS factors), so it's often to get the same (virtual) address several times in the row.
So for example, on my machine: Window 7, compiled with VS2010, I get different addresses with different runs:
00134C40
00124C40
00214C40
00034C40
00144C40
001B4C40
This is an artifact of your environment. The cin.get() suggests to me that you are compiling and executing in Visual Studio, which provides an unusually predictable runtime environment. When I compile and run that code on my linux, two executions gave two different addresses.
ETA:
In comments you expressed an expectation that different processes could obtain the same memory address and that this address would be inaccessible to your program. In any modern operating system this is not the case, because the operating system is providing each process with virtual memory address spaces.
Only the operating system sees the true hardware addresses, and maintains virtual memory maps for each program, redirecting virtual addresses to physical addresses. Therefore, an arbitrary number of different processes can hold data in the same virtual address, while the operating system maps that address to a separate physical address for each process.
This guarantees that process A cannot read or write to memory in use by process B without a special provision enabling such access (such as by instructing the OS to map certain virtual memory in certain processes to the same physical memory). It allows the operating system to make different kinds of memory hardware transparent to programs.
It also allows the OS to move a program's data around behind its back to optimize system performance.
Classical example: Moving data that hasn't been used for some time to a special file on the hard disk. This is sometimes called the page file.
Memory maps are typically broken up into pages: Blocks of contiguous memory of a certain size (the page size). Data held within a page of virtual address space is usually also contiguous in physical memory, but if data runs over a page boundary, information that appears contiguous in virtual memory could easily be separated. If a C/C++ program enters undefined behavior, it may attempt to access memory in a page that the OS has not mapped to physical memory. This will cause the OS to generate an error.
This is in C++ on CentOS 64bit using G++ 4.1.2.
We're writing a test application to load up the memory usage on a system by n Gigabytes. The idea being that the overall system load gets monitored through SNMP etc. So this is just a way of exercising the monitoring.
What we've seen however is that simply doing:
char* p = new char[1000000000];
doesn't affect the memory used as shown in either top or free -m
The memory allocation only seems to become "real" once the memory is written to:
memcpy(p, 'a', 1000000000); //shows an increase in mem usage of 1GB
But we have to write to all of the memory, simply writing to the first element does not show an increase in the used memory:
p[0] = 'a'; //does not show an increase of 1GB.
Is this normal, has the memory actually been allocated fully? I'm not sure if it's the tools we are using (top and free -m) that are displaying incorrect values or whether there is something clever going on in the compiler or in the runtime and/or kernel.
This behavior is seen even in a debug build with optimizations turned off.
It was my understanding that a new[] allocated the memory immediately. Does the C++ runtime delay this actual allocation until later on when it is accessed. In that case can an out of memory exception be deferred until well after the actual allocation of the memory until the memory is accessed?
As it is it is not a problem for us, but it would be nice to know why this is occurring the way it is!
Cheers!
Edit:
I don't want to know about how we should be using Vectors, this isn't OO / C++ / the current way of doing things etc etc. I just want to know why this is happening the way it is, rather than have suggestions for alternative ways of trying it.
When your library allocates memory from the OS, the OS will just reserve an address range in the process's virtual address space. There's no reason for the OS to actually provide this memory until you use it - as you demonstrated.
If you look at e.g. /proc/self/maps you'll see the address range. If you look at top's memory use you won't see it - you're not using it yet.
Please look up for overcommit. Linux by default doesn't reserve memory until it is accessed. And if you end up by needing more memory than available, you don't get an error but a random process is killed. You can control this behavior with /proc/sys/vm/*.
IMO, overcommit should be a per process setting, not a global one. And the default should be no overcommit.
About the second half of your question:
The language standard doesn't allow any delays in throwing a bad_alloc. That must happen as an alternative to new[] returning a pointer. It cannot happen later!
Some OSs might try to overcommit memory allocations, and fail later. That is not conforming to the C++ language standard.