When accessing a C++ variable how is its content resolved? - c++

When accessing a variable in C++, how is its content resolved?
Is it possible for the OS to remap the variable to a different address without affecting its logical address? Is it possible to have 2 variables pointing to the same logical address in 2 different processes?

Yes, it's absolutely possible for the OS to move variables around in memory. Virtually all modern computers use virtual memory, in which each process believes that it has access to the machine's full address space. Whenever a memory read or write occurs, though, the address is translated from the virtual address in the process's address space to some physical address in the computer's real address space. The operating system can change these mappings as it sees fit, possibly by moving the blocks of memory around, or by temporarily writing them out to disk, etc. This allows multiple processes to each use more memory than is available on the system, since the OS can move blocks of memory in and out of RAM transparently without the process being able to detect this.
One advantage of using virtual memory is that two processes can each use the same virtual address without conflicting with one another. For example, two processes might each use address 0xCAFEBABE, and each sees its own copy. However, when the processes read or write this value, the address will get translated to different physical addresses, and so each can have its own copy. Many OSes actually provide functionality to allow processes to share memory if they want, or for many processes with similar pieces of data (say, a shared library) to have different virtual addresses that map to the same physical address.
Because C++ directly accesses the machine's underlying memory, any time a variable is read or written in C++, the OS might trap the instruction, page in the physical memory into which the read or write occurs, and then restore control to the program. This isn't really a feature of C++ as much as the hardware's memory system.
In short - programs work with virtual addresses, which the OS maps to physical addresses in a way that ensures that each process thinks it has total ownership of the memory system. C++ programs use this system by default because they're using the underlying hardware.

You seem to be mixing C++ and OS-specific concepts here. As far as C++ is concerned, there is only one process running in the system and all variables belong to that process. However, most modern OSes use a virtual memory system so that each process gets its own address space, and there are usually OS-specific functions to share memory between processes. One common way of doing this is to use memory-mapped files so that multiple processes can map the same file to their own address spaces and access the same content.

Related

Heap Corruption's effect on a Windows 2012 r2 server

If each process running on a Windows 2012 r2 server has it's own heap, is it not then possible to have one process corrupt another process'es heap? I think it would be possible since all heaps are stored in a "global" address space, but an AV occurs when one process attempts to write from or read from memory outside it's address space...so that would prevent heap corruption for the process that owned the address space?
You need to read how virtual memory works. There is no all-encompassing address space.
There are physical RAM addresses, and physical disk addresses, which don't vary by process. But user-mode code never uses these directly.
Rather, the memory management unit provides a mapping from virtual addresses to physical addresses. Because the page tables are process specific, this mapping is unique to each process. Only ring 0 (kernel-mode) code can bypass the mapping step, this is enforced in hardware. For user processes, if there is no mapping leading to a particular physical address, it simply cannot be accessed from that context, because there's no way to name that physical location using virtual addresses. And there are no mappings leading to the page tables themselves.
This is the difference between a memory management unit and its lesser brother, the memory protection unit. Architectures that use a memory protection unit do have a single global addressing scheme, with hardware enforced permission bits that again can only be modified by privileged code.
The thing you asked about
one process attempts to write from [sic] or read from memory outside it's [sic] address space
just doesn't exist. It's like asking what the telephone number of my car is. My car is identified by a VIN and a license plate, but neither of those will allow you to talk to it through the phone system.
Access violations (sometimes also called segmentation faults) occur when a process attempts to write to, read from, or execute from unmapped portions of its own address space, or pages that have been explicitly set to trap access attempts (for stack expansion perhaps, or copy-on-write). All memory accesses by a process are by definition interpreted inside its address space.

Is shared virtual memory used when multiple processes read a file using file pointer in Linux?

I wrote a C++ program which read a file using file pointer. And I need to run multiple process at the same time. Since the size of file can be huge (100MB~), to reduce memory usage in multiple processes, I think I need use shared memory. (For example IPC library like boost::interprocess::shared_memory_object)
But does it really need? Because I think if multiple processes read same file, then virtual memory of each processes mapped to same physical memory of file thru page table.
I read a Linux doc and they said,
Shared Virtual Memory
Although virtual memory allows processes to have separate (virtual)
address spaces, there are times when you need processes to share
memory. For example there could be several processes in the system
running the bash command shell. Rather than have several copies of
bash, one in each processes virtual address space, it is better to
have only one copy in physical memory and all of the processes running
bash share it. Dynamic libraries are another common example of
executing code shared between several processes. Shared memory can
also be used as an Inter Process Communication (IPC) mechanism, with
two or more processes exchanging information via memory common to all
of them. Linux supports the Unix TM System V shared memory IPC.
Also, wiki said,
In computer software, shared memory is either
a method of inter-process communication (IPC), i.e. a way of exchanging data between programs running at the same time. One process
will create an area in RAM which other processes can access, or
a method of conserving memory space by directing accesses to what would ordinarily be copies of a piece of data to a single instance
instead, by using virtual memory mappings or with explicit support of
the program in question. This is most often used for shared libraries
and for XIP.
Therefore, what I really curious is that does shared virtual memory supported by OS level or not?
Thanks in advance.
Regarding your first question - if you want your data to be accessible by multiple processes without duplication you'll definitely need some kind of a shared storage.
In C++ I'd surely use boost's shared_memory_object. That's a valid option to share (large) data among processes and it has good documentation with examples (http://www.boost.org/doc/libs/1_55_0/doc/html/interprocess/sharedmemorybetweenprocesses.html).
Using mmap() is a more low-level approach usually used in C. To use it as an IPC you'll have to make the mapped region shared. From http://man7.org/linux/man-pages/man2/mmap.2.html:
MAP_SHARED
Share this mapping. Updates to the mapping are visible to
other processes that map this file, and are carried
through to the underlying file. The file may not actually
be updated until msync(2) or munmap() is called.
Also on that page there's an example of mapping a file to shared memory.
In either case there are at least two things to remember:
You need synchronization if there are multiple processes that modify the shared data.
You can't use pointers, only offsets from the beginning of the mapped region.
Here's an explanation from the boost docs:
If several processes map the same file/shared memory, the mapping address will be surely different in each process. Since each process might have used its address space in a different way (allocation of more or less dynamic memory, for example), there is no guarantee that the file/shared memory is going to be mapped in the same address.
If two processes map the same object in different addresses, this invalidates the use of pointers in that memory, since the pointer (which is an absolute address) would only make sense for the process that wrote it. The solution for this is to use offsets (distance) between objects instead of pointers: If two objects are placed in the same shared memory segment by one process, the address of each object will be different in another process but the distance between them (in bytes) will be the same.
Regarding the OS support - yes, shred memory is an OS specific feature.
In Linux mmap() is actually implemented in kernel and modules and can be used to transfer data between user and kernel-space.
Windows also has it's specifics:
Windows shared memory creation is a bit different from portable shared memory creation: the size of the segment must be specified when creating the object and can't be specified through truncate like with the shared memory object. Take in care that when the last process attached to a shared memory is destroyed the shared memory is destroyed so there is no persistency with native windows shared memory.
Your question doesn't make sense.
I think I need use shared memory. (For example IPC library like boost::interprocess::shared_memory_object).
If you use shared memroy, the memory is shared.
I think if multiple processes read same file, then virtual memory of each processes mapped to same physical memory of file thru page table.
Now you're talking about memory-mapped I/O. It isn't the same thing. However more probably it is what you need in this situation.

How do pointers allow Hardware access?

Pointers in C are very powerful and seem efficient. But how can using a pointer can give you access to hardware?
My idea of this would be setting a pointer's value equal to a hardware's associated object and than manipulating it through the pointer. But if you already have enough access to the hardware's objects and properties to use a pointer on it where does the pointer come into play? Perhaps im visualizing something wrong?
I'm running on windows 7.
A basic example along with an explanation of why the pointer is needed to manipulate that hardware property would be great.
The pointer holds a memory address. And not all of the memory addressing range points to RAM areas alone. Memory addresses have ranges and some ranges map to hardware registers. And by writing to these registers, we can access the hardware. Of course, this also depends on which operating system and which hardware. Here is an example.
In a free standing environment (like a microcontroller), a hardware platform that does not
have a Memory Management Unit (some ARM microprocessors), or an operating system that
does not support hardware protection (like DOS) pointers give you raw access to hardware
through the magic of memory mapped I/O. Pointers in program running on an operating
system like Windows or Linux (or just about any modern operating system) are pointers in
a virtual address space. These pointers will not allow you to directly access
hardware.
The way that memory mapped I/O works is that certain physical memory addresses are
reserved for communication with devices in the system. When an address that belongs to a
device is accessed the data is routed to the appropriate register of the device. On x86
platforms this translation is done by the north bridge.
Most hardware is memory mapped. What this means is that it exposes a range of hardware registers (or other hardware entities) as memory areas. These memory locations can be accessed like any other memory. You can read and write to it by using memory addresses - and these reads and writes make things happen in the hardware. Just as an example, a write to a hardware register (a memory address) may cause a LED to turn on, or a robot motor to start turning. All hardware operations are exposed via such memory mapped registers etc.
Now pointers are language entities that let you access a memory location. You stick an address into a pointer and dereference it to read (or write) from (or to) that address. So, basically the way you operate the hardware is by accessing its address space via pointers.

What type of address returned on applying ampersand to a variable or a data type in C/C++ or in any other such language?

This is a very basic question boggling mind since the day I heard about the concept of virtual and physical memory concept in my OS class. Now I know that at load time and compile time , virtual address and logical adress binding scheme is same but at execution time they differ.
First of all why is it beneficial to generate virtual address at compile and load time and and what is returned when we apply the ampersand operator to get the address of a variable, naive datatypes , user-defined type and function definition addresses?
And how does OS maps exactly from virtual to physical address when it does so? These questions are hust out from curiosity and I would love some good and deep insights considering modern day OS' , How was it in early days OS' .I am only C/C++ specific since I don't know much about other languages.
Physical addresses occur in hardware, not software. A possible/occasional exception is in the operating system kernel. Physical means it's the address that the system bus and the RAM chips see.
Not only are physical addresses useless to software, but it could be a security issue. Being able to access any physical memory without address translation, and knowing the addresses of other processes, would allow unfettered access to the machine.
That said, smaller or embedded machines might have no virtual memory, and some older operating systems did allow shared libraries to specify their final physical memory location. Such policies hurt security and are obsolete.
At the application level (e.g. Linux application process), only virtual addresses exist. Local variables are on the stack (or in registers). The stack is organized in call frames. The compiler generates the offset of a local variable within the current call frame, usually an offset relative to the stack pointer or frame pointer register (so the address of a local variable, e.g. in a recursive function, is known only at runtime).
Try to step by step a recursive function in your gdb debugger and display the address of some local variable to understand more. Try also the bt command of gdb.
Type
cat /proc/self/maps
to understand the address space (and virtual memory mapping) of the process executing that cat command.
Within the kernel, the mapping from virtual addresses to physical RAM is done by code implementing paging and driving the MMU. Some system calls (notably mmap(2) and others) can change the address space of your process.
Some early computers (e.g. those from the 1950-s or early 1960-s like CAB 500 or IBM 1130 or IBM 1620) did not have any MMU, even the original Intel 8086 didn't have any memory protection. At that time (1960-s), C did not exist. On processors without MMU you don't have virtual addresses (only physical ones, including in your embedded C code for a washing-machine manufacturer). Some machines could protect writing into some memory banks thru physical switches. Today, some low end cheap processors (those in washing machines) don't have any MMU. Most cheap microcontrollers don't have any MMU. Often (but not always), the program is in some ROM so cannot be overwritten by buggy code.

C++: Can I get out of the bounds of my app's memory with a pointer?

If I have some stupid code like this:
int nBlah = 123;
int* pnBlah = &nBlah;
pnBlah += 80000;
*pnBlah = 65;
Can I change another app's memory?
You have explained me this is evil, I know. But I was just interested.
And this isn't something to simply try. I don't know what would happen.
Thanks
In C++ terms, this is undefined behavior. What will actually happen depends on many factors, but most importantly it depends on the operating system (OS) you are using. On modern memory-managed OS's, your application will be terminated with a "segmentation fault" (the actual term is OS-dependent) for attempting to access memory outside of your process address space. Some OS's however don't have this protection, and you can willy-nilly poke and destroy things that belong to other programs. This is also usually the case if your code is inside kernel space, e.g. in a device driver.
Nope, it's not that simple. :)
Modern operating systems use virtual memory.
Every process is provided with a full virtual address space.
Every process is given its own "view" of all addresses (from 0x00000000 to 0xffffffff on a 32-bit system). Processes A and B can both write to the same address, without affecting each others, because they're not accessing physical memory addresses, but virtual addresses. When a process tries to access a virtual address, the OS translates that into some other physical address to avoid collisions.
Essentially, the OS keeps track of a table of allocate memory pages for every process. It tracks which address ranges have been allocated to a process, and which physical addresses they're mapped to. If a process tries to access an address not allocated to it, you get an access violation/segmentation fault. And if you try to access an address that is allocated to your process, you get your own data. So there is no way to read other processes data just by typing in the "wrong" address.
Under modern operating systems you don't get access to the real memory, but rather a virtual memory space of 4gb (under 32bit). Bottom 2gb for you to use, and top 2gb reserved for the operating system.
This does not reflect to actual memory bytes in the RAM.
Every app get's the same virtual address space, so there is no straight forward way of accessing another process's memory space.
I think this would raise 0x00000005, access violation on windows
Modern operating systems have various means of protecting against these kinds of exploits that write into the memory space of other programs. Your code wouldn't work either way, I don't think.
For more information, read up on Buffer Overflow exploits and how they gave Microsoft hell prior to the release of Windows XP SP2.