#include <iostream>
using namespace std;
int main(void)
{
int *ptr = new int;
cout << "Memory address of ptr:" << ptr << endl;
cin.get();
delete ptr;
return 0;
}
Every time I run this program, I get the same memory address for ptr. Why?
[Note: my answer assumes you're working with a modern OS that uses a virtual memory system.]
Due to virtual memory, each process operates in its own unique address space, which is independent of and unaffected by any other process. The address you get from new is a virtual address, and is generated by whatever your compiler's implementation of new chooses to do.* There's no reason this couldn't be deterministic.
On the other hand, the physical address associated with your virtual memory address will most likely be different every time, and will be affected by all sorts of things. This mapping is controlled by the OS.
* new is probably implemented in terms of malloc.
I'd say it's mostly coincidence. As the memory allocator/OS can give you whatever address it wants.
The addresses you get are obviously not uniformly random (and is highly dependent on other OS factors), so it's often to get the same (virtual) address several times in the row.
So for example, on my machine: Window 7, compiled with VS2010, I get different addresses with different runs:
00134C40
00124C40
00214C40
00034C40
00144C40
001B4C40
This is an artifact of your environment. The cin.get() suggests to me that you are compiling and executing in Visual Studio, which provides an unusually predictable runtime environment. When I compile and run that code on my linux, two executions gave two different addresses.
ETA:
In comments you expressed an expectation that different processes could obtain the same memory address and that this address would be inaccessible to your program. In any modern operating system this is not the case, because the operating system is providing each process with virtual memory address spaces.
Only the operating system sees the true hardware addresses, and maintains virtual memory maps for each program, redirecting virtual addresses to physical addresses. Therefore, an arbitrary number of different processes can hold data in the same virtual address, while the operating system maps that address to a separate physical address for each process.
This guarantees that process A cannot read or write to memory in use by process B without a special provision enabling such access (such as by instructing the OS to map certain virtual memory in certain processes to the same physical memory). It allows the operating system to make different kinds of memory hardware transparent to programs.
It also allows the OS to move a program's data around behind its back to optimize system performance.
Classical example: Moving data that hasn't been used for some time to a special file on the hard disk. This is sometimes called the page file.
Memory maps are typically broken up into pages: Blocks of contiguous memory of a certain size (the page size). Data held within a page of virtual address space is usually also contiguous in physical memory, but if data runs over a page boundary, information that appears contiguous in virtual memory could easily be separated. If a C/C++ program enters undefined behavior, it may attempt to access memory in a page that the OS has not mapped to physical memory. This will cause the OS to generate an error.
Related
I have a source file:
#include <cstdlib>
#include <iostream>
int main() {
void *p = std::malloc(8192);
std::cout << std::hex << (size_t)p << std::endl;
std::free(p);
}
I compiled it on two platforms: (a) on macOS, with clang++ 4.2.1, and (b) on Linux, with g++ 7.3.0.
On macOS, the printout is 7fa432001000, and on Linux it is 257ec20.
The macOS's printout is not expected. I thought malloc() should allocate memory in the heap, and if it allocates memory using mmap() under the hood then it's fine, too. But 7fa432001000 seems an address in the stack location, because the virtual memory's upper limit on a x86_64 is just below 7fffffffffff (at least it's the case for current Linux - maybe I was wrong).
My question is: why (or how) does the malloc() on macOS return such a high address? Is this because of the way Clang's libc++ was implemented?
This is due to modern systems mapping virtual addresses to actual physical addresses.
According to Wikipedia:
The computer's operating system, using a combination of hardware and
software, maps memory addresses used by a program, called virtual
addresses, into physical addresses in computer memory. Main storage,
as seen by a process or task, appears as a contiguous address space or
collection of contiguous segments. The operating system manages
virtual address spaces and the assignment of real memory to virtual
memory. Address translation hardware in the CPU, often referred to as
a memory management unit or MMU, automatically translates virtual
addresses to physical addresses. Software within the operating system
may extend these capabilities to provide a virtual address space that
can exceed the capacity of real memory and thus reference more memory
than is physically present in the computer.
As to why it's different on two different platforms, that's because different computer hardware handles memory in different ways; BIOS and other firmware could have different algorithms or techniques; the different OS or OS versions do things differently (Linux has different options when you build your kernel); the C++ run-time could be implemented differently.
In my program I declare an initialized global variable (as an array).
But it only affects the size of executable file, the memory usage by the program was not affected.
My program is like that
char arr[1014*1024*100] = {1};
int _tmain(int argc, _TCHAR* argv[])
{
while (true)
{
}
return 0;
}
Size of executable file is 118MB but memory usage when running program was only 0.3MB
Can anyone explain for me?
Most operating systems used demand-paged virtual memory.
This means that when you load a program, the executable file for that program isn't allow loaded into memory immediately. Instead, virtual memory pages are set up to map the file to memory. When (and if) you actually refer to an address, that causes a page fault, which the OS then handles by reading the appropriate part of the file into physical memory, then letting the instruction re-execute.
In your case, you don't refer to arr, so the OS never pulls that data into memory.
If you were to look at the virtual address space used by your program (rather than the physical memory you're apparently now looking at), you'd probably see address space allocated for all of arr. The virtual address space isn't often very interesting or useful to examine though, so most things that tell you about memory usage will tell you only about the physical RAM being used to store actual data, not the virtual address space that's allocated but never used.
Even if you do refer to the data, the OS can be fairly clever: depending on how often you refer to the data (and whether you modify it), only part of the data may ever be loaded into RAM at any given time. If it's been modified, the modified portions can be written to the paging file to make room in RAM for data that's being used more often. If it's not modified, it can be discarded (because the original data can be re-loaded from the original file on disk whenever it's needed).
The reason your memory in use while your executable is executing is significantly smaller than the space required on your hard-drive (or solid-state drive) to store the executable is because you're not pulling the array itself into memory.
In your program, you never access or call your array—let alone bring into memory all at once in parallel. Because of that, the memory needed to run your executable is incredibly small when compared to the size of the executable (which has to store your massively large array).
I hope that makes sense. The difference between the two is that one is executing and one is stored on your computer's internal disk. Something is only brought into execution when it's brought into memory.
Processes in OS have their own virtual address spaces. Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
virtual address space also has some read only chunk of memory. How does it protect that?
TL;DR No, it's not allowed.
In your case, when you got a valid non-NULL pointer to a memory address returned by malloc(), only the requested size of memory is allocated to your process and you're allowed to use (read and / or write) into that much space only.
In general, any allocated memory (compile-time or run-time) has an associated size with it. Either overrunning or underruning the allocated memory area is considered invalid memory access, which invokes undefined behavior.
Even if, the memory is accessible and inside the process address space, there's nothing stopping the OS/ memory manager to return the pointer to that particular address, so, at best, either your previous write will be overwritten or you will be overwriting some other value. The worst case, as mentioned earlier, UB.
Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
What addresses you can read/write/execute from are based on a processes current memory map, which is set up by the operating system.
On my linux box, if I run pmap on my current shell, I see something like this:
evaitl#bb /proc/13151 $ pmap 13151
13151: bash
0000000000400000 976K r-x-- bash
00000000006f3000 4K r---- bash
00000000006f4000 36K rw--- bash
00000000006fd000 24K rw--- [ anon ]
0000000001f25000 1840K rw--- [ anon ]
00007ff7cce36000 44K r-x-- libnss_files-2.23.so
00007ff7cce41000 2044K ----- libnss_files-2.23.so
00007ff7cd040000 4K r---- libnss_files-2.23.so
00007ff7cd041000 4K rw--- libnss_files-2.23.so
00007ff7cd042000 24K rw--- [ anon ]
...
[many more lines here...]
Each line has a base address, a size, and the permissions. These are considered memory segments. The last line either says what is being mapped in. bash is my shell. anon means this is allocated memory, perhaps for bss, maybe heap from malloc, or it could be a stack.
Shared libraries are also mapped in, that is where the the libnns_files lines come from.
When you malloc some memory, it will come from an anonymous program segment. If there isn't enough space in the current anon segment being used for the heap, the OS will increase its size. The permissions in those segments will almost certainly be rw.
If you try to read/write outside of space you allocated, behavior is undefined. In this case that means that you may get lucky and nothing happens, or you may trip over an unmapped address and get a SIGSEGV signal.
Now, I try to read what is written on that location which should be fine
It is not fine. According to the C++ standard, reading uninitialized memory has undefined behaviour.
but what about writing to that location?
Not fine either. Reading or writing unallocated memory also has undefined behaviour.
Sure, the memory address that you ended up in might be allocated - it's possible. But even if it happens to be, the pointer arithmetic outside of bounds of the allocation is already UB.
virtual address space also has some read only chunk of memory. How does it protect that?
This one is out of scope of C++ (and C) since it does not define virtual memory at all. This may differ across operating systems, but at least one approach is that when the process requests memory from the OS, it sends flags that specify the desired protection type. See prot argument in the man page of mmap as an example. The OS in turn sets up the virtual page table accordingly.
Once the protection type is known, the OS can raise an appropriate signal if the protection has been violated, and possibly terminate the process. Just like it does when a process tries to access unmapped memory. The violations are typically detected by the memory management unit of the CPU.
Processes in OS have their own virtual address spaces. Say, I allocate
some dynamic memory using malloc() function call in a c program and
subtract some positive value(say 1000) from the address returned by
it. Now, I try to read what is written on that location which should
be fine but what about writing to that location?
No, it should not be fine, since only the memory region allocated by malloc() is guaranteed to be accessible. There is no guarantee that the virtual address space is contiguous, and thus the memory addresses before and after your region are accessible (i.e. mapped to virtual address space).
Of course, no one is stopping you from doing so, but the behaviour will be really undefined. If you access non-mapped memory address, it will generate a page fault exception, which is a hardware CPU exception. When it is handled by the operating system, it will send SIGSEGV signal or access violation exception to your application (depending ot the OS).
virtual address space also has some read only chunk of memory. How
does it
protect that?
First it's important to note that virtual memory mapping is realized partly by an external hardware component, called a memory management unit. It might be integrated in the CPU chip, or not. Additionally to being able to map various virtual memory addresses to physical ones, it supports also marking these addresses with different flags, one of which enables and disables writing protection.
When the CPU tries to write on virtual address, marked as read-only, thus write-protected, (for examble by MOV instruction), the MMU fires a page fault exception on the CPU.
Same goes for trying to access a non-present virtual memory pages.
In the C language, doing arithmetic on a pointer to produce another pointer that does not point into (or one-past-the-end) the same object or array of objects is undefined behavior: from 6.5.6 Additive Operators:
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated
(for the purposes of this clause, a non-array object is treated as an array of length 1)
You could get unlucky and the compiler could produce still produce a pointer you're allowed to do things with and then doing things with them will do things — but precisely what those things are is anybody's guess and will be unreliable and often difficult to debug.
If you're lucky, the compiler produces a pointer into memory that "does not belong to you" and you get a segmentation fault to alert you to the problem as soon as you try to read or write through it.
How the system behaves when you read/write an unmapped memory address depends basically on your operating system implementation. Operating systems normally behave differently when you try to access an unmapped virtual address. What happens when you try one access to an unmapped (or mapped for not-memory ---for example to map a file in memory) the operating system is taking the control (by means of a trap) and what happens then is completely operating system dependant. Suppose you have mapped the video framebuffer somewhere in your virtual address... then, writing there makes the screen change. Suppose you have mapped a file, then reading/writing that memory means reading or writing a file. Suppose you (the process running) try to access a swapped zone (due to physical memory lack your process has been partially swapped) your process is stopped and work for bringing that memory from secondary storage is begun, and then the instruction will be restarted. For example, linux generates a SIGSEGV signal when you try to access memory not allocated. But you can install a signal handler to be called upon receiving this signal and then, trying to access unallocated memory means jumping into a piece of code in your own program to deal with that situation.
But think that trying to access memory that has not been correctly acquired, and more in a modern operating system, normally means that your program is behaving incorrectly, and normally it will crash, letting the system to take the control and it will be killed.
NOTE
malloc(3) is not a system call, but a library function that manages a variable size allocation segment on your RAM, so what happens if you try to access even the first address previous to the returned one or past the last allocated memory cell, means undefined behaviour. It does not mean you have accessed unallocated memory. Probably you will be reading a perfectly allocated piece of memory in your code or in your data (or the stack) without knowing. malloc(3) tends to ask the operating system for continous large amounts of memory to be managed for many malloc calls between costly asking the operating system for more memory. See sbrk(2) or memmap(2) system calls manpages for getting more on this.
For example, either linux or bsd unix allocate an entry in the virtual address space of each process at page 0 (for the NULL address) to make the null pointer invalid access, and if you try to read or write to this address (or all in that page) you'll get a signal (or your process killed) Try this:
int main()
{
char *p = 0; /* p is pointing to the null address */
p[0] = '\n'; /* a '\n' is being written to address 0x0000 */
p[1] = '\0'; /* a '\0' is being written to address 0x0001 */
}
This program should fail at runtime on all modern operating systems (try to compile it without optimization so the compiler doesn't eliminate the code in main, as it does effectively nothing) because you are trying to access an already allocated (for specific purposes) page of memory.
The program on my system (mac OS X, a derivative from BSD unix) just does the following:
$ a.out
Segmentation fault: 11
NOTE 2
Many modern operating systems (mostly unix derived) implement a type of memory access called COPY ON WRITE. This means that you can access that memory and modify it as you like, but the first time you access it for writing, a page fault is generated (normally, this is implemented as you receiving a read only page, letting the fault to happen and making the individual page copy to store your private modifications) This is very effective on fork(2), that normally are followed by an exec(2) syscall (only the pages modified by the program are actually copied before the process throws them all, saving a lot of computer power)
Another case is the stack growing example. Stack grows automatically as you enter/leave stack frames in your program, so the operating system has to deal with the page faults that happen when you PUSH something on the stack and that push crosses a virtual page and goes into the unknown. When this happens, the OS automatically allocates a page and converts that region (the page) into more valid memor (read-write normally).
Technically, a process has a logical address. However, that often gets conflated into a virtual address space.
The number of virtual addresses that can be mapped into that logical address space can be limited by:
Hardware
System resources (notably page file space)
System Parameters (e.g., limiting page table size)
Process quotas
Your logical address space consists of an array of pages that are mapped to physical page frames. Not every page needs to have such a mapping (or even is likely to).
The logical address space is usually divided into two (or more) areas: system (common to all processes) and user (created for each process).
Theoretically, there is nothing in the user space to being a process with, only the system address space exists.
If the system does not use up its entire range of logical addresses (which is normal), unused addresses cannot be accessed at all.
Now your program starts running. The O/S has mapped some pages into your logical address space. Very little of that address space it likely to be mapped. Your application can map more pages into the unmapped pages of logical address space.
Say, I allocate some dynamic memory using malloc() function call in a c program and subtract some positive value(say 1000) from the address returned by it. Now, I try to read what is written on that location which should be fine but what about writing to that location?
The processor uses a page table to map logical pages to physical page frames. If you do you say a number of things can happen:
There is no page table entry for the address => Access violation. Your system may not set up a page table that can span the entire logical address space.
There is a page table entry for the address but it is marked invalid => Access Violation.
You are attempting to access a page that is not accessible in your current processor mode (e.g., user mode access to a page that only allows kernel mode access) => Access Violation.
virtual address space also has some read only chunk of memory. How does it protect that?
You are attempting to access a page that in a manner not permitted to the page (e.g., write to readonly page, execute to a no execute page) => Access Violation The access allowed to a page is defined in the page table.
[Ignoring page faults]
If you make it though those tests, you can access the random memory address.
It does not. It's actually you duty as a programmer to handle this
If 2 programs are running, and one program stores a number at a memory address, and if I know that memory address, and hard code it into the 2nd program and print out the value at the address, would it actually get that info? Does C++ allow programs to access any data stored in RAM no matter if it is part of the program or not?
On system with no virtual memory management and no address space protection this would work. It would be undefined behavior from the point of view of the C standard, but it would produce the behavior that you expect.
Bad news is that most computer systems in use these days have both virtual memory management and address space protection. What this means is that a memory address, the number that your program sees, is not unique in the system. Every process in the system may see the same address, but it would be mapped to a different physical address on your computer at any given moment in time. The operating system and the hardware will create illusion to each process that it has the control of that memory address, while in fact the memory spaces of the processes would not overlap.
Good news is that modern operating systems support some form of shared memory access, meaning that one process can share a segment of memory with other processes, and exchange data by reading and writing the data into that shared segment.
No, you'd get a Segmentation Fault
If I try to run this code:
int main(int argc, char *argv[]) {
int *ptr = (int*) 0x1234;
*ptr = 10;
}
I'd get a segmentation fault (unless 0x1234 has been allocated by the process for some reason), which is the operating system's way of telling you that you're not allowed to do that. Usually they'll happen when you're doing tricky things with pointers, but they can also happen elsewhere.
By default, they'll terminate your program immediately unless you're running in a debugger or have registered a signal handler to continue your program
Edit: If you really want, there's ways to get the operating system to let you do that, used by debuggers and such.
I'm trying to make a program that reads the value at a certain address.
I have this:
int _tmain(int argc, _TCHAR* argv[])
{
int *address;
address = (int*)0x00000021;
cout << *address;
return 0;
}
But this gives a read violation error. What am I doing wrong?
Thanks
That reads the value at that address within the process's own space. You'll need to use other methods if you want to read another process's space, or physical memory.
It's open to some question exactly what OlyDbg is showing you. 32-bit (and 64-bit) Windows uses virtual memory, which means the address you use in your program is not the same as the address actually sent over the bus to the memory chips. Instead, Windows (and I should add that other OSes such as Linux, MacOS, *bsd, etc., do roughly the same) sets up some tables that say (in essence) when the program uses an address in this range, use that range of physical addresses.
This mapping is done on a page-by-page basis (where each page is normally 4K bytes, though other sizes are possible). In that table, it can also mark a page as "not present" -- this is what supports paging memory to disk. When you try to read a page that's marked as not present, the CPU generates an exception. The OS then handles that exception by reading the data from the disk into a block of memory, and updating the table to say the data is present at physical address X. Along with not-present, the tables support a few other values, such as read-only, so you can read by not write some addresses.
Windows (again, like the other OSes) sets up the tables for the first part of the address space, but does NOT associate any memory with them. From the viewpoint of a user program, those addresses simply should never be used.
That gets us back to my uncertainty about what OlyDbg is giving you when you ask it to read from address 0x21. That address simply doesn't refer to any real data -- never has and never will.
What others have said is true as well: a debugger will usually use some OS functions (E.g. ReadProcessMemory and WriteProcessMemory, among others under Windows) to get access to things that you can't read or write directly. These will let you read and write memory in another process, which isn't directly accessible by a normal pointer. Neither of those would help in trying to read from address 0x21 though -- that address doesn't refer to any real memory in any process.
You can only use a pointer that points to an actual object.
If you don't have an object at address 0x00000021, this won't work.
If you want to create an object on the free store (the heap), you need to do so using new:
int* address = new int;
*address = 42;
cout << *address;
delete address;
When your program is running on an operating system that provides virtual memory (Windows, *nix, OS X) Not all addresses are backed by memory. CPU's that support virtual memory use something called Page Tables to control which address refer to memory. The size of an individual page is usually 4096 bytes, but that does vary and is likely to be larger in the future.
The API's that you use to query the page tables isn't part of the standard C/C++ runtime, so you will need to use operating system specific functions to know which adresses are OK to read from and which will cause you to fault. On Windows you would use VirtualQuery to find out if a given address can be read, written, executed, or any/none of the above.
You can't just read data from an arbitrary address in memory.