What type of address returned on applying ampersand to a variable or a data type in C/C++ or in any other such language? - c++

This is a very basic question boggling mind since the day I heard about the concept of virtual and physical memory concept in my OS class. Now I know that at load time and compile time , virtual address and logical adress binding scheme is same but at execution time they differ.
First of all why is it beneficial to generate virtual address at compile and load time and and what is returned when we apply the ampersand operator to get the address of a variable, naive datatypes , user-defined type and function definition addresses?
And how does OS maps exactly from virtual to physical address when it does so? These questions are hust out from curiosity and I would love some good and deep insights considering modern day OS' , How was it in early days OS' .I am only C/C++ specific since I don't know much about other languages.

Physical addresses occur in hardware, not software. A possible/occasional exception is in the operating system kernel. Physical means it's the address that the system bus and the RAM chips see.
Not only are physical addresses useless to software, but it could be a security issue. Being able to access any physical memory without address translation, and knowing the addresses of other processes, would allow unfettered access to the machine.
That said, smaller or embedded machines might have no virtual memory, and some older operating systems did allow shared libraries to specify their final physical memory location. Such policies hurt security and are obsolete.

At the application level (e.g. Linux application process), only virtual addresses exist. Local variables are on the stack (or in registers). The stack is organized in call frames. The compiler generates the offset of a local variable within the current call frame, usually an offset relative to the stack pointer or frame pointer register (so the address of a local variable, e.g. in a recursive function, is known only at runtime).
Try to step by step a recursive function in your gdb debugger and display the address of some local variable to understand more. Try also the bt command of gdb.
Type
cat /proc/self/maps
to understand the address space (and virtual memory mapping) of the process executing that cat command.
Within the kernel, the mapping from virtual addresses to physical RAM is done by code implementing paging and driving the MMU. Some system calls (notably mmap(2) and others) can change the address space of your process.
Some early computers (e.g. those from the 1950-s or early 1960-s like CAB 500 or IBM 1130 or IBM 1620) did not have any MMU, even the original Intel 8086 didn't have any memory protection. At that time (1960-s), C did not exist. On processors without MMU you don't have virtual addresses (only physical ones, including in your embedded C code for a washing-machine manufacturer). Some machines could protect writing into some memory banks thru physical switches. Today, some low end cheap processors (those in washing machines) don't have any MMU. Most cheap microcontrollers don't have any MMU. Often (but not always), the program is in some ROM so cannot be overwritten by buggy code.

Related

C++) What determines Global variable address? Why It is fixed? [duplicate]

Consider the following code snippet
int i=10;
int main()
{
cout<<&i;
}
Once an exe is generated for the program, will the output be the same for different runs of the program? Assume that the OS supports virtual memory
Edit:The ques is specific to global variables which are stored in data segment. Since this is the first global variable, should the address come out to be same or different?
You always get the same addresses if ASLR is disabled. You get unpredictable addresses if ASLR is enabled.
The virtual address will be whatever the linker decided. The physical address will vary with each load.
Simple answer: It depends :-)
If your OS starts for a program always the same environment with a virtual memory range which looks always the same, the output should be always the same.
But if you run the same os on different hardware ( maybe with different amount of ram available ) it could be res ult in a different address but normally the address is also the same, independent of the hardware.
But you should NEVER expect that the result is the same! In a short: Don't think about the virtual or real address of data in your prog. This is under control from the compiler, the OS and maybe some libraries as well. So simply ignore it!
Short answer: on a user-mode program running on a x86-64 machine: no, you shouldn't assume that for any reason ever.
Long answer: It might happen that the address is the same but that is absolutely not guaranteed (at least on a program running on a x86_64 OS and machine).
I read some confusion about virtual/physical memory and how come that an address is "random" so let me explain something at a high-level view:
Targeting a x86_64 architecture and OS (let's say Windows) you can't even assume that the operating system itself will load all its components into memory in the same physical locations (some exceptions for the old bootloader conventions at 0000:7C00H, I have no idea how that works in a UEFI environment).
After segmentation (used or not depends on the OS, usually Windows just sets some plain segments for usermode and kernelmode) is put in place, once you switch to protected mode (or long) you again have no control on how the OS manages the virtual memory mechanism which hides layers of complexity and MMU-related operations to give your process an address space of its own.
Plus there are security measures in place: the linker might decide the base address for your executable but in other cases when ASLR is activated the OS can move its modules and your executable around as it pleases for security purposes.
Conclusion: unless you're dealing with very low-level stuff (e.g. physical addresses or directly writing memory areas on an external device) you should absolutely not rely on the address of the variable being the same across different runs. There's no guarantee on that.

Is address of global variables the same for different runs of the program?

Consider the following code snippet
int i=10;
int main()
{
cout<<&i;
}
Once an exe is generated for the program, will the output be the same for different runs of the program? Assume that the OS supports virtual memory
Edit:The ques is specific to global variables which are stored in data segment. Since this is the first global variable, should the address come out to be same or different?
You always get the same addresses if ASLR is disabled. You get unpredictable addresses if ASLR is enabled.
The virtual address will be whatever the linker decided. The physical address will vary with each load.
Simple answer: It depends :-)
If your OS starts for a program always the same environment with a virtual memory range which looks always the same, the output should be always the same.
But if you run the same os on different hardware ( maybe with different amount of ram available ) it could be res ult in a different address but normally the address is also the same, independent of the hardware.
But you should NEVER expect that the result is the same! In a short: Don't think about the virtual or real address of data in your prog. This is under control from the compiler, the OS and maybe some libraries as well. So simply ignore it!
Short answer: on a user-mode program running on a x86-64 machine: no, you shouldn't assume that for any reason ever.
Long answer: It might happen that the address is the same but that is absolutely not guaranteed (at least on a program running on a x86_64 OS and machine).
I read some confusion about virtual/physical memory and how come that an address is "random" so let me explain something at a high-level view:
Targeting a x86_64 architecture and OS (let's say Windows) you can't even assume that the operating system itself will load all its components into memory in the same physical locations (some exceptions for the old bootloader conventions at 0000:7C00H, I have no idea how that works in a UEFI environment).
After segmentation (used or not depends on the OS, usually Windows just sets some plain segments for usermode and kernelmode) is put in place, once you switch to protected mode (or long) you again have no control on how the OS manages the virtual memory mechanism which hides layers of complexity and MMU-related operations to give your process an address space of its own.
Plus there are security measures in place: the linker might decide the base address for your executable but in other cases when ASLR is activated the OS can move its modules and your executable around as it pleases for security purposes.
Conclusion: unless you're dealing with very low-level stuff (e.g. physical addresses or directly writing memory areas on an external device) you should absolutely not rely on the address of the variable being the same across different runs. There's no guarantee on that.

How does a language talk to hardware? [duplicate]

This question already has answers here:
How does Software/Code actually communicate with Hardware?
(14 answers)
Closed 9 years ago.
Ok so I'm very very confused how a piece of hardware can understand code.
I read somewhere it has to do with voltages but how exactly does the piece of hardware know what an instruction in software means? I know drivers is the bridge between software and hardware but a driver is still software :S.
For example, in C++ we have pointers and they can point to some address in memory.. Can we have a pointer that points to some hardware address and then write to that address and it would affect the hardware? Or does hardware not have addresses?
I guess what I'm really asking is how does the OS or BIOS know where a piece of hardware is and how to talk to it?
For example, in C++ we have pointers and they can point to some
address in memory.. Can we have a pointer that points to some hardware
address and then write to that address and it would affect the
hardware? Or does hardware not have addresses?
Some hardware have addresses like pointers, some doesn't (In which case it most likely uses something called I/O ports, which requires special IN and OUT instructions instead of the regular memory operations). But much of the modern hardware has a memory address somewhere, and if you write the correct value to the correct address the hardware will do what you ask it to do. This varies from the really simple approach - say a serial port where you write a byte to an "output register", and the byte is sent along the serial line, and another address holds the input data being received on the serial port, to graphics cards that have a machine language of their own and can run hundreds or thousands of threads.
And normally, it's the OS's responsibility, via drivers, to access the hardware.
This is very simplified, and the whole subject of programming, OS and hardware is enough to write a fairly thick book about (and that's just in general terms, if you want to actually know about specific hardware, it's easily a few dozen pages for a serial port, and hundreds or thousands of pages for a graphics chip).
There are whole books on this topic. But briefly:
SW talks to hardware in a variety of ways. A given piece of hardware may respond to values written to very specific addresses ("memory mapped") or via I/O ports and instructions supported by the CPU (e.g., x86 instruction in and out instructions). When accessing a memory mapped port (address), the HW is designed to recognize the specific address or small range of addresses and route the signals to the peripheral hardware rather than memory in that case. Or in the case of I/O instructions, the CPU has a separate set of signals used specifically for that purpose.
The OS (at the lowest level - board support package) and BIOS have "knowledge" built in to them about the hardware address and/or the I/O ports needed to execute the various hardware functions available. That is, at some level, they have coded in exactly what addresses are needed for the different features.
You should read The soul of new machine, by Tracy Kidder. It's a 1981 Pullitzer price and it goes to great length to explain in layman terms how a computer works and how humans must think to create it. Besides, it's a real story and one of the few to convey the thrill of hardware and software.
All in all, a nice introduction to the subject.
The hardware engineers know where the memory and peripherals live in the processors address space. So it is something that is known because those addresses were chosen by someone and documented so that others could write drivers.
The processor does not know peripherals from ram. The instructions are simply using addresses ultimately determined by the programmers that wrote the software that the processor is running. So that implies, correctly, that the peripherals and ram (and rom) are all just addresses. If you were writing a video driver and were changing the resolution of the screen, there would be a handful of addresses that you would need to write to. At some point between the processor core and the peripheral (the video card) there would be hardware that examines the address and basically routes it to the right place. This is how the hardware was designed, it examines addresses, some address ranges are ram and sent to the memory to be handled and some are peripherals and sent there to be handled. Sometimes the memory ranges are programmable themselves so that you can organize your memory space for whatever reason. Similar to if you move from where you are living now to somewhere else, it is still you and your stuff at the new house, but it has a different address and the postal folks who deliver the mail know how to find your new address. And then there are MMU's that add a layer of protection and other features. The MMU (memory management unit) can also virtualize an address, so the processor may be programmed to write to address 0x100000 but the mmu translates that to 0x2300000 before it goes out on the normal bus to be sorted as memory or peripheral eventually finding its destination. Why would you do such a thing, well two major reasons. One is so that for example when you compile an application to run in your operating system, all programs for that OS can be compiled to run at the same address lets say address 0x8000. But there is only one physical address 0x8000 out there (lets assume) what happens is the operating system has configured the mmu for your program such that your program things it is running at that address, also the operating system can, if it chooses and the mmu has the feature, to add protections such that if your program tries to access something outside its allocated memory space then a fault occurs and your program is prevented from doing that. Prevented from hacking into or crashing other programs memory space. Likewise if the operating system supports it could also choose to use that fault to swap out some data from ram to disk and then give you more ram, virtual memory, allowing the programs to think there is more memory than there really is. An mmu is not the only way to do all of this but it is the popular way. So when you have that pointer in C++ running on some operating system it is most likely that that is a virtual address not the physical address, the mmu converts that address that has been given to your program into the real memory address. When the os chooses to switch out your program for another it is relatively easy to tell the mmu to let the other task think that that low numbered address space 0x8000 for example now belongs to the other program. And your program is put to sleep (not executed) for a while.

When accessing a C++ variable how is its content resolved?

When accessing a variable in C++, how is its content resolved?
Is it possible for the OS to remap the variable to a different address without affecting its logical address? Is it possible to have 2 variables pointing to the same logical address in 2 different processes?
Yes, it's absolutely possible for the OS to move variables around in memory. Virtually all modern computers use virtual memory, in which each process believes that it has access to the machine's full address space. Whenever a memory read or write occurs, though, the address is translated from the virtual address in the process's address space to some physical address in the computer's real address space. The operating system can change these mappings as it sees fit, possibly by moving the blocks of memory around, or by temporarily writing them out to disk, etc. This allows multiple processes to each use more memory than is available on the system, since the OS can move blocks of memory in and out of RAM transparently without the process being able to detect this.
One advantage of using virtual memory is that two processes can each use the same virtual address without conflicting with one another. For example, two processes might each use address 0xCAFEBABE, and each sees its own copy. However, when the processes read or write this value, the address will get translated to different physical addresses, and so each can have its own copy. Many OSes actually provide functionality to allow processes to share memory if they want, or for many processes with similar pieces of data (say, a shared library) to have different virtual addresses that map to the same physical address.
Because C++ directly accesses the machine's underlying memory, any time a variable is read or written in C++, the OS might trap the instruction, page in the physical memory into which the read or write occurs, and then restore control to the program. This isn't really a feature of C++ as much as the hardware's memory system.
In short - programs work with virtual addresses, which the OS maps to physical addresses in a way that ensures that each process thinks it has total ownership of the memory system. C++ programs use this system by default because they're using the underlying hardware.
You seem to be mixing C++ and OS-specific concepts here. As far as C++ is concerned, there is only one process running in the system and all variables belong to that process. However, most modern OSes use a virtual memory system so that each process gets its own address space, and there are usually OS-specific functions to share memory between processes. One common way of doing this is to use memory-mapped files so that multiple processes can map the same file to their own address spaces and access the same content.

What real platforms map hardware ports to memory addresses?

I sometimes see statements that on some platforms the following C or C++ code:
int* ptr;
*ptr = 0;
can result in writing to a hardware input-output port if ptr happens to store the address to which that port is mapped. Usually they are called "embedded platforms".
What are real examples of such platforms?
Most systems in my experience use memory-mapped I/O. The x86 platform has a separate, non-memory-mapped I/O address space (that uses the in/out family of processor op-codes), but the PC architecture also extensively uses the standard memory address space for device I/O, which has a larger address space, faster access (generally), and easier programming (generally).
I think that the separate I/O address space was used initially because the memory address space of processors was sometimes quite limited and it made little sense to use a portion of it for device access. Once the memory address space was opened up to megabytes or more, that reason to separate I/O addresses from memory addresses became less important.
I'm not sure how many processors provide a separate I/O address space like the x86 does. As an indication of how the separate I/O address space has fallen out of favor, when the x86 architecture moved into the 32-bit realm, nothing was done to increase the I/O address space from 64KB (though they did add the ability to move 32-bit chunks of data in one instruction). When x86 moved into the 64-realm, the I/O address space remained at 64KB and they didn't even add the ability to move data in 64-bit units...
Also note that modern desktop and server platforms (or other systems that use virtual memory) generally don't permit an application to access I/O ports, whether they're memory-mapped or not. That access is restricted to device drivers, and even device drivers will have some OS interface to deal with virtual memory mappings of the physical address and/or to set up DMA access.
On smaller systems, like embedded systems, I/O addresses are often accessed directly by the application. For systems that use memory-mapped addresses, that will usually be done by simply setting a pointer with the physical address of the device's I/O port and using that pointer like any other. However, to ensure that the access occurs and occurs in the right order, the pointer must be declared as pointing to a volatile object.
To access a device that uses something other than a memory-mapped I/O port (like the x86's I/O address space), a compiler will generally provide an extension that allows you to read or write to that address space. In the absence of such an extension, you'd need to call an assembly language function to perform the I/O.
This is called Memory-mapped I/O, and a good place to start is the Wikipedia article.
Modern operating systems usually protect you from this unless you're writing drivers, but this technique is relevant even on PC architectures. Remember the DOS 640Kb limit? That's because memory addresses from 640K to 1Mb were allocated for I/O.
PlayStation. That was how we got some direct optimized access to low-level graphics (and other) features of the system.
An NDIS driver on Windows is an example. This is called memory mapped I/O and the benefit of this is performance.
See Embedded-Systems for examples of devices that use Memory-mapped I/O e.g. routers,adsl-modems, microcontroller etc.
It is mostly used when writing drivers, since most peripheral devices communicate with the main CPU through memory mapped registers.
Motorola 68k series and PowerPC are the big ones.
You can do this in modern Windows (and I'm pretty sure Linux offers it too). It's called memory mapped files. You can load a file into memory on Windows and then write/alter it just by manipulating pointers.