How can I see a page-table maintained by each process in Virtual Memory - Linux?

How can I see a page-table maintained by each process in Virtual Memory - Linux? - c++

In the virtual memory concept- each process maintains their own page table. This page table maps the virtual address to the kernel virtual address. This kernel virtual address translates the address to Physical RAM. I understand that there is a Kernel Virtual adddres - vm area struct. This vm area struct finally maps this address to the Physical address.
When I do cat /proc//maps - I see the direct mapping of virtual address to physical address. Because it maps the address to the file - with inode. So, it looks that it is the address on the hard-disk, file descriptor, major-minor number. There are a few address that are on the RAM. So, I can say that I can't see the table where the Virtual address is mapped to Kernel virtual address. I want to see that table. How can I see that? It should not be in the kernel space. Because when process is accessing let's say memory - 0x1681010 then this should be translated to kernel virtual memory address. Finally, this address should be translated to physical memory address.

No, the Linux kernel maintains the processes page tables (but not the processes themselves). Processes are only seeing virtual memory thorough their address space. Processes use some syscalls, like e.g. mmap(2) or execve(2), to change their address space.
Physical addresses and page tables and dealing with and managing the MMU is the business of the kernel, which actually provides some "abstract machine" (with the virtual address spaces, the syscalls as atomic elementary operations, etc...) to user applications. The applications don't see the raw (x86) hardware, but only the user mode as given by the kernel. Some hardware resources and instructions are not available to them (they only run in user space).
The page tables are managed by the kernel, and indeed various processes may use different -or sometimes same- page tables. (So context switches managed by the kernl may need to reconfigure the MMU). You don't care, (and the user processes don't see page tables) the kernel will manage them.
And no, /proc/self/maps does not show anything about physical addresses, only about virtual one. The kernel is permitted to move processes from one core to another, to move pages from one physical (not virtual) address to another, etc...., at any time; and applications usually don't see this (they might query this with mincore(2), getcpu(2) and thru proc(5) ...)
Applications should not care about physical memory or interrupts, like page faults (only the kernel care about these; sometimes by sending signals).
The virtual to physical address translation happens in the MMU. Usually, it is successful (perhaps transparently accessing page tables), and the processor sends on the bus to the RAM the translated physical address (corresponding to some virtual address handled by the user-mode machine instruction). When the MMU cannot handle it, a page fault occurs, which is processed by the kernel (which could swap-in some page, send a SIGSEGV, do a context switch, etc...)
See also the processor architecture, instruction set, page table, paging, translation lookaside buffer, cache, x86 and x86-64 wikipages (and follow all the links I gave you).

Related

Heap Corruption's effect on a Windows 2012 r2 server

If each process running on a Windows 2012 r2 server has it's own heap, is it not then possible to have one process corrupt another process'es heap? I think it would be possible since all heaps are stored in a "global" address space, but an AV occurs when one process attempts to write from or read from memory outside it's address space...so that would prevent heap corruption for the process that owned the address space?

You need to read how virtual memory works. There is no all-encompassing address space.
There are physical RAM addresses, and physical disk addresses, which don't vary by process. But user-mode code never uses these directly.
Rather, the memory management unit provides a mapping from virtual addresses to physical addresses. Because the page tables are process specific, this mapping is unique to each process. Only ring 0 (kernel-mode) code can bypass the mapping step, this is enforced in hardware. For user processes, if there is no mapping leading to a particular physical address, it simply cannot be accessed from that context, because there's no way to name that physical location using virtual addresses. And there are no mappings leading to the page tables themselves.
This is the difference between a memory management unit and its lesser brother, the memory protection unit. Architectures that use a memory protection unit do have a single global addressing scheme, with hardware enforced permission bits that again can only be modified by privileged code.
The thing you asked about
one process attempts to write from [sic] or read from memory outside it's [sic] address space
just doesn't exist. It's like asking what the telephone number of my car is. My car is identified by a VIN and a license plate, but neither of those will allow you to talk to it through the phone system.
Access violations (sometimes also called segmentation faults) occur when a process attempts to write to, read from, or execute from unmapped portions of its own address space, or pages that have been explicitly set to trap access attempts (for stack expansion perhaps, or copy-on-write). All memory accesses by a process are by definition interpreted inside its address space.

How can I get address in physical addressing area by pointer in virtual addressing area?

If I have an address (pointer) in virtual addressing area of current process to the pinned (page-locked) memory, then how can I get an address (pointer) in physical addressing area, of this memory region, by using POSIX?
CPU: x86
OS: Linux 2.6 and Windows 7/8(Server 2008R2)

You cannot access physical addresses in user space. Everything you do goes through the MMU and the page tables. Even if you pin a page, the kernel may still move it around in physical memory.
Even if you got it, what would it do for you? A userspace process cannot access memory directly by physical access. Only kernel mode can.
If you really need the functionality (although I still can't image any way of using the information), you have to write a kernel mode driver.

What type of address returned on applying ampersand to a variable or a data type in C/C++ or in any other such language?

This is a very basic question boggling mind since the day I heard about the concept of virtual and physical memory concept in my OS class. Now I know that at load time and compile time , virtual address and logical adress binding scheme is same but at execution time they differ.
First of all why is it beneficial to generate virtual address at compile and load time and and what is returned when we apply the ampersand operator to get the address of a variable, naive datatypes , user-defined type and function definition addresses?
And how does OS maps exactly from virtual to physical address when it does so? These questions are hust out from curiosity and I would love some good and deep insights considering modern day OS' , How was it in early days OS' .I am only C/C++ specific since I don't know much about other languages.

Physical addresses occur in hardware, not software. A possible/occasional exception is in the operating system kernel. Physical means it's the address that the system bus and the RAM chips see.
Not only are physical addresses useless to software, but it could be a security issue. Being able to access any physical memory without address translation, and knowing the addresses of other processes, would allow unfettered access to the machine.
That said, smaller or embedded machines might have no virtual memory, and some older operating systems did allow shared libraries to specify their final physical memory location. Such policies hurt security and are obsolete.

At the application level (e.g. Linux application process), only virtual addresses exist. Local variables are on the stack (or in registers). The stack is organized in call frames. The compiler generates the offset of a local variable within the current call frame, usually an offset relative to the stack pointer or frame pointer register (so the address of a local variable, e.g. in a recursive function, is known only at runtime).
Try to step by step a recursive function in your gdb debugger and display the address of some local variable to understand more. Try also the bt command of gdb.
Type
cat /proc/self/maps
to understand the address space (and virtual memory mapping) of the process executing that cat command.
Within the kernel, the mapping from virtual addresses to physical RAM is done by code implementing paging and driving the MMU. Some system calls (notably mmap(2) and others) can change the address space of your process.
Some early computers (e.g. those from the 1950-s or early 1960-s like CAB 500 or IBM 1130 or IBM 1620) did not have any MMU, even the original Intel 8086 didn't have any memory protection. At that time (1960-s), C did not exist. On processors without MMU you don't have virtual addresses (only physical ones, including in your embedded C code for a washing-machine manufacturer). Some machines could protect writing into some memory banks thru physical switches. Today, some low end cheap processors (those in washing machines) don't have any MMU. Most cheap microcontrollers don't have any MMU. Often (but not always), the program is in some ROM so cannot be overwritten by buggy code.

When accessing a C++ variable how is its content resolved?

When accessing a variable in C++, how is its content resolved?
Is it possible for the OS to remap the variable to a different address without affecting its logical address? Is it possible to have 2 variables pointing to the same logical address in 2 different processes?

Yes, it's absolutely possible for the OS to move variables around in memory. Virtually all modern computers use virtual memory, in which each process believes that it has access to the machine's full address space. Whenever a memory read or write occurs, though, the address is translated from the virtual address in the process's address space to some physical address in the computer's real address space. The operating system can change these mappings as it sees fit, possibly by moving the blocks of memory around, or by temporarily writing them out to disk, etc. This allows multiple processes to each use more memory than is available on the system, since the OS can move blocks of memory in and out of RAM transparently without the process being able to detect this.
One advantage of using virtual memory is that two processes can each use the same virtual address without conflicting with one another. For example, two processes might each use address 0xCAFEBABE, and each sees its own copy. However, when the processes read or write this value, the address will get translated to different physical addresses, and so each can have its own copy. Many OSes actually provide functionality to allow processes to share memory if they want, or for many processes with similar pieces of data (say, a shared library) to have different virtual addresses that map to the same physical address.
Because C++ directly accesses the machine's underlying memory, any time a variable is read or written in C++, the OS might trap the instruction, page in the physical memory into which the read or write occurs, and then restore control to the program. This isn't really a feature of C++ as much as the hardware's memory system.
In short - programs work with virtual addresses, which the OS maps to physical addresses in a way that ensures that each process thinks it has total ownership of the memory system. C++ programs use this system by default because they're using the underlying hardware.

You seem to be mixing C++ and OS-specific concepts here. As far as C++ is concerned, there is only one process running in the system and all variables belong to that process. However, most modern OSes use a virtual memory system so that each process gets its own address space, and there are usually OS-specific functions to share memory between processes. One common way of doing this is to use memory-mapped files so that multiple processes can map the same file to their own address spaces and access the same content.

How to translate a virtual memory address to a physical address?

In my C++ program (on Windows), I'm allocating a block of memory and can make sure it stays locked (unswapped and contiguous) in physical memory (i.e. using VirtualAllocEx(), MapUserPhysicalPages() etc).
In the context of my process, I can get the VIRTUAL memory address of that block,
but I need to find out the PHYSICAL memory address of it in order to pass it to some external device.
1. Is there any way I can translate the virtual address to the physical one within my program, in USER mode?
2. If not, I can find out this virtual to physical mapping only in KERNEL mode. I guess it means I have to write a driver to do it...? Do you know of any readily available driver/DLL/API which I can use, that my application (program) will interface with to do the translation?
3. In case I'll have to write the driver myself, how do I do this translation? which functions do I use? Is it mmGetPhysicalAddress()? How do I use it?
4. Also, if I understand correctly, mmGetPhysicalAddress() returns the physical address of a virtual base address that is in the context of the calling process. But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called... so how do I translate the virtual address in the application (user-mode) memory space, not the driver?
Any answers, tips and code excerpts will be much appreciated!!
Thanks

In my C++ program (on Windows), I'm allocating a block of memory and can make sure it stays locked (unswapped and contiguous) in physical memory (i.e. using VirtualAllocEx(), MapUserPhysicalPages() etc).
No, you can't really ensure that it stays locked. What if your process crashes, or exits early? What if the user kills it? That memory will be reused for something else, and if your device is still doing DMA, that will eventually result in data loss/corruption or a bugcheck (BSOD).
Also, MapUserPhysicalPages is part of Windows AWE (Address Windowing Extensions), which is for handling more than 4 GB of RAM on 32-bit versions of Windows Server. I don't think it was intended to be used to hack up user-mode DMA.
1. Is there any way I can translate the virtual address to the physical one within my program, in USER mode?
There are drivers that let you do this, but you cannot program DMA from user mode on Windows and still have a stable and secure system. Letting a process that runs as a limited user account read/write physical memory allows that process to own the system. If this is for a one-off system or a prototype, this is probably acceptable, but if you expect other people (particularly paying customers) to use your software and your device, you should write a driver.
2. If not, I can find out this virtual to physical mapping only in KERNEL mode. I guess it means I have to write a driver to do it...?
That is the recommended way to approach this problem.
Do you know of any readily available driver/DLL/API which I can use, that my application (program) will interface with to do the translation?
You can use an MDL (Memory Descriptor List) to lock down arbitrary memory, including memory buffers owned by a user-mode process, and translate its virtual addresses into physical addresses. You can also have Windows temporarily create an MDL for the buffer passed into a call to DeviceIoControl by using METHOD_IN_DIRECT or METHOD_OUT_DIRECT.
Note that contiguous pages in the virtual address space are almost never contiguous in the physical address space. Hopefully your device is designed to handle that.
3. In case I'll have to write the driver myself, how do I do this translation? which functions do I use? Is it mmGetPhysicalAddress()? How do I use it?
There's a lot more to writing a driver than just calling a few APIs. If you're going to write a driver, I would recommend reading as much relevant material as you can from MSDN and OSR. Also, look at the examples in the Windows Driver Kit.
4. Also, if I understand correctly, mmGetPhysicalAddress() returns the physical address of a virtual base address that is in the context of the calling process. But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called... so how do I translate the virtual address in the application (user-mode) memory space, not the driver?
Drivers are not processes. A driver can run in the context of any process, as well as various elevated contexts (interrupt handlers and DPCs).

You have a virtually continguous buffer in your application. That range of virtual memory is, as you noted, only available in the context of your application and some of it may be paged out at any time. So, in order to do access the memory from a device (which is to say, do DMA) you need to both lock it down and get a description that can be passed to a device.
You can get a description of the buffer called an MDL, or Memory Descriptor List, by sending an IOCTL (via the DeviceControl function) to your driver using METHOD_IN_DIRECT or METHOD_OUT_DIRECT. See the following page for a discussion of defining IOCTLs.
http://msdn.microsoft.com/en-us/library/ms795909.aspx
Now that you have a description of the buffer in a driver for your device, you can lock it down so that the buffer remains in memory for the entire period that your device may act on it. Look up MmProbeAndLockPages on MSDN.
Your device may or may not be able to read or write all of the memory in the buffer. The device may only support 32-bit DMA and the machine may have more than 4GB of RAM. Or you may be dealing with a machine that has an IOMMU, a GART or some other address translation technology. To accomodate this, use the various DMA APIs to get a set of logical addresses that are good for use by your device. In many cases, these logical addresses will be equivalent to the physical addresses that your question orginally asked about, but not always.
Which DMA API you use depends on whether your device can handle scatter/gather lists and such. Your driver, in its setup code, will call IoGetDmaAdapter and use some of the functions returned by it.
Typically, you'll be interested in GetScatterGatherList and PutScatterGatherList. You supply a function (ExecutionRoutine) which actually programs your hardware to do the transfer.
There's a lot of details involved. Good Luck.

You can not access the page tables from user space, they are mapped in the kernel.
If you are in the kernel, you can simply inspect the value of CR3 to locate the base page table address and then begin your resolution.
This blog series has a wonderful explanation of how to do this. You do not need any OS facility/API to resolve virtual<->physical addresses.
Virtual Address: f9a10054
1: kd> .formats 0xf9a10054
Binary: 11111001 10100001 00000000 01010100
Page Directory Pointer Index(PDPI) 11 Index into
1st table(Page Directory Pointer
Table) Page Directory Index(PDI)
111001 101 Index into 2nd
table(Page Directory Table) Page
Table Index(PTI)
00001 0000 Index into 3rd
table(Page Table) Byte Index
0000 01010100 0x054, the offset
into the physical memory page
In his example, they use windbg, !dq is a physical memory read.

1) No
2) Yes, you have to write a driver. Best would be either a virtual driver, or change the driver for the special-external device.
3) This gets very confusing here. MmGetPhysicalAddress should be the method you are looking for, but I really don't know how the physical address is mapped to the bank/chip/etc. on the physical memory.
4) You cannot use paged memory, because that gets relocated. You can lock paged memory with MmProbeAndLockPages on an MDL you can build on memory passed in from the user mode calling context. But it is better to allocate non-paged memory and hand that to your user mode application.
PVOID p = ExAllocatePoolWithTag( NonPagedPool, POOL_TAG );
PHYSICAL_ADDRESS realAddr = MmGetPhysicalAddress( p );
// use realAddr

You really shouldn't be doing stuff like this in usermode; as Christopher says, you need to lock the pages so that mm doesn't decide to page out your backing memory while a device is using it, which would end up corrupting random memory pages.
But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called
Drivers don't have context like user-mode apps do; if you're calling into a driver via an IOCTL or something, you are usually (but not guaranteed!) to be in the calling user thread's context. But really, this doesn't matter for what you're asking, because kernel-mode memory (anything above 0x80000000) is the same mapping no matter where you are, and you'd end up allocating memory in the kernel side. But again, write a proper driver. Use WDF (http://www.microsoft.com/whdc/driver/wdf/default.mspx), and it will make writing a correct driver much easier (though still pretty tricky, Windows driver writing is not easy)
EDIT: Just thought I'd throw out a few book references to help you out, you should definitely (even if you don't pursue writing the driver) read Windows Internals by Russinovich and Solomon (http://www.amazon.com/Microsoft-Windows-Internals-4th-Server/dp/0735619174/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1229284688&sr=8-2); Programming the Microsoft Windows Driver Model is good too (http://www.amazon.com/Programming-Microsoft-Windows-Driver-Second/dp/0735618038/ref=sr_1_1?ie=UTF8&s=books&qid=1229284726&sr=1-1)

Wait, there is more. For the privilege of runnning on your customer's Vista 64 bit, you get expend more time and money to get your kernal mode driver resigned my Microsoft,

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js