In my C++ program (on Windows), I'm allocating a block of memory and can make sure it stays locked (unswapped and contiguous) in physical memory (i.e. using VirtualAllocEx(), MapUserPhysicalPages() etc).
In the context of my process, I can get the VIRTUAL memory address of that block,
but I need to find out the PHYSICAL memory address of it in order to pass it to some external device.
1. Is there any way I can translate the virtual address to the physical one within my program, in USER mode?
2. If not, I can find out this virtual to physical mapping only in KERNEL mode. I guess it means I have to write a driver to do it...? Do you know of any readily available driver/DLL/API which I can use, that my application (program) will interface with to do the translation?
3. In case I'll have to write the driver myself, how do I do this translation? which functions do I use? Is it mmGetPhysicalAddress()? How do I use it?
4. Also, if I understand correctly, mmGetPhysicalAddress() returns the physical address of a virtual base address that is in the context of the calling process. But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called... so how do I translate the virtual address in the application (user-mode) memory space, not the driver?
Any answers, tips and code excerpts will be much appreciated!!
Thanks
In my C++ program (on Windows), I'm allocating a block of memory and can make sure it stays locked (unswapped and contiguous) in physical memory (i.e. using VirtualAllocEx(), MapUserPhysicalPages() etc).
No, you can't really ensure that it stays locked. What if your process crashes, or exits early? What if the user kills it? That memory will be reused for something else, and if your device is still doing DMA, that will eventually result in data loss/corruption or a bugcheck (BSOD).
Also, MapUserPhysicalPages is part of Windows AWE (Address Windowing Extensions), which is for handling more than 4 GB of RAM on 32-bit versions of Windows Server. I don't think it was intended to be used to hack up user-mode DMA.
1. Is there any way I can translate the virtual address to the physical one within my program, in USER mode?
There are drivers that let you do this, but you cannot program DMA from user mode on Windows and still have a stable and secure system. Letting a process that runs as a limited user account read/write physical memory allows that process to own the system. If this is for a one-off system or a prototype, this is probably acceptable, but if you expect other people (particularly paying customers) to use your software and your device, you should write a driver.
2. If not, I can find out this virtual to physical mapping only in KERNEL mode. I guess it means I have to write a driver to do it...?
That is the recommended way to approach this problem.
Do you know of any readily available driver/DLL/API which I can use, that my application (program) will interface with to do the translation?
You can use an MDL (Memory Descriptor List) to lock down arbitrary memory, including memory buffers owned by a user-mode process, and translate its virtual addresses into physical addresses. You can also have Windows temporarily create an MDL for the buffer passed into a call to DeviceIoControl by using METHOD_IN_DIRECT or METHOD_OUT_DIRECT.
Note that contiguous pages in the virtual address space are almost never contiguous in the physical address space. Hopefully your device is designed to handle that.
3. In case I'll have to write the driver myself, how do I do this translation? which functions do I use? Is it mmGetPhysicalAddress()? How do I use it?
There's a lot more to writing a driver than just calling a few APIs. If you're going to write a driver, I would recommend reading as much relevant material as you can from MSDN and OSR. Also, look at the examples in the Windows Driver Kit.
4. Also, if I understand correctly, mmGetPhysicalAddress() returns the physical address of a virtual base address that is in the context of the calling process. But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called... so how do I translate the virtual address in the application (user-mode) memory space, not the driver?
Drivers are not processes. A driver can run in the context of any process, as well as various elevated contexts (interrupt handlers and DPCs).
You have a virtually continguous buffer in your application. That range of virtual memory is, as you noted, only available in the context of your application and some of it may be paged out at any time. So, in order to do access the memory from a device (which is to say, do DMA) you need to both lock it down and get a description that can be passed to a device.
You can get a description of the buffer called an MDL, or Memory Descriptor List, by sending an IOCTL (via the DeviceControl function) to your driver using METHOD_IN_DIRECT or METHOD_OUT_DIRECT. See the following page for a discussion of defining IOCTLs.
http://msdn.microsoft.com/en-us/library/ms795909.aspx
Now that you have a description of the buffer in a driver for your device, you can lock it down so that the buffer remains in memory for the entire period that your device may act on it. Look up MmProbeAndLockPages on MSDN.
Your device may or may not be able to read or write all of the memory in the buffer. The device may only support 32-bit DMA and the machine may have more than 4GB of RAM. Or you may be dealing with a machine that has an IOMMU, a GART or some other address translation technology. To accomodate this, use the various DMA APIs to get a set of logical addresses that are good for use by your device. In many cases, these logical addresses will be equivalent to the physical addresses that your question orginally asked about, but not always.
Which DMA API you use depends on whether your device can handle scatter/gather lists and such. Your driver, in its setup code, will call IoGetDmaAdapter and use some of the functions returned by it.
Typically, you'll be interested in GetScatterGatherList and PutScatterGatherList. You supply a function (ExecutionRoutine) which actually programs your hardware to do the transfer.
There's a lot of details involved. Good Luck.
You can not access the page tables from user space, they are mapped in the kernel.
If you are in the kernel, you can simply inspect the value of CR3 to locate the base page table address and then begin your resolution.
This blog series has a wonderful explanation of how to do this. You do not need any OS facility/API to resolve virtual<->physical addresses.
Virtual Address: f9a10054
1: kd> .formats 0xf9a10054
Binary: 11111001 10100001 00000000 01010100
Page Directory Pointer Index(PDPI) 11 Index into
1st table(Page Directory Pointer
Table) Page Directory Index(PDI)
111001 101 Index into 2nd
table(Page Directory Table) Page
Table Index(PTI)
00001 0000 Index into 3rd
table(Page Table) Byte Index
0000 01010100 0x054, the offset
into the physical memory page
In his example, they use windbg, !dq is a physical memory read.
1) No
2) Yes, you have to write a driver. Best would be either a virtual driver, or change the driver for the special-external device.
3) This gets very confusing here. MmGetPhysicalAddress should be the method you are looking for, but I really don't know how the physical address is mapped to the bank/chip/etc. on the physical memory.
4) You cannot use paged memory, because that gets relocated. You can lock paged memory with MmProbeAndLockPages on an MDL you can build on memory passed in from the user mode calling context. But it is better to allocate non-paged memory and hand that to your user mode application.
PVOID p = ExAllocatePoolWithTag( NonPagedPool, POOL_TAG );
PHYSICAL_ADDRESS realAddr = MmGetPhysicalAddress( p );
// use realAddr
You really shouldn't be doing stuff like this in usermode; as Christopher says, you need to lock the pages so that mm doesn't decide to page out your backing memory while a device is using it, which would end up corrupting random memory pages.
But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called
Drivers don't have context like user-mode apps do; if you're calling into a driver via an IOCTL or something, you are usually (but not guaranteed!) to be in the calling user thread's context. But really, this doesn't matter for what you're asking, because kernel-mode memory (anything above 0x80000000) is the same mapping no matter where you are, and you'd end up allocating memory in the kernel side. But again, write a proper driver. Use WDF (http://www.microsoft.com/whdc/driver/wdf/default.mspx), and it will make writing a correct driver much easier (though still pretty tricky, Windows driver writing is not easy)
EDIT: Just thought I'd throw out a few book references to help you out, you should definitely (even if you don't pursue writing the driver) read Windows Internals by Russinovich and Solomon (http://www.amazon.com/Microsoft-Windows-Internals-4th-Server/dp/0735619174/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1229284688&sr=8-2); Programming the Microsoft Windows Driver Model is good too (http://www.amazon.com/Programming-Microsoft-Windows-Driver-Second/dp/0735618038/ref=sr_1_1?ie=UTF8&s=books&qid=1229284726&sr=1-1)
Wait, there is more. For the privilege of runnning on your customer's Vista 64 bit, you get expend more time and money to get your kernal mode driver resigned my Microsoft,
Related
I was wondering if for example. Windows completely lock all the available ram so that some really bored person with too much time on their hands cannot start deleting memory from another process (somehow).
The question originated from what was happening when using the delete function in C++ (was C++ telling the OS that the OS can now release the memory for overwriting or was C++ telling the hardware to unlock the memory)... and then spawned on to me thinking of specific hardware created to interface with the RAM at hardware level and start deleting memory chunks for the fun of it. ie a hacker perhaps.
My thoughts were: The Windows memory management program is told memory is free to be written to again, right? But does that also mean that the memory address is still set to locked at a hardware level so that the memory can only be taken control of by windows rather than another OS. Or is it like the wild west down at hardware level... If Windows isn't locking memory, anything else can use the part that is now free.
I guess the real question is, is there a hardware level lock on memory addresses that operating systems can trigger... so that the memory has locked itself down and cannot be re-assigned then?
I was wondering if Windows completely lock all the available ram
Windows, like any other operating system, uses all the available RAM.
so that some really bored person with too much time on their hands cannot start deleting memory from another process (somehow).
Non sequitur. It does it because that's what it's supposed to do. It's an operating system, and it is supposed to control all the hardware resources.
The question originated from what was happening when you mark memory for deletion in C++.
I don't know what 'mark memory for deletion in C++' means, but if you refer to the delete operator, or the free() function, they in general do not release memory to the operating system.
My thoughts were: The Windows memory management program is told memory is free to be written to again, right?
Wrong, see above.
But does that also mean that the memory address is still set to locked at a hardware level so that the memory can only be taken control of by windows rather than another OS.
What other OS? Unless you're in a virtual environment, there is no other OS, and even if you are, the virtual environment hands control over all the designated RAM to the guest operating system.
Or is it like the wild west down at hardware level... If Windows isn't locking memory, anything else can use the part that is now free.
Anything else such as?
I guess the real question is, is there a hardware level lock on memory addresses that operating systems can trigger?
In general yes, there are hardware definitions of what privilege level is required to access each memory segment. For example, the operating system's own memory is immune to appliucation processes, and application processes are immune to each other: but this is all highly hardware-dependent.
Your question doesn't really make much sense.
The concept you're looking for is mapping, not *locking.
The memory is just there. The OS does nothing special about that.
What it does is map chunks of it into individual processes. Each process can see only the memory that is mapped into its address space. Trying to access any other address just leads to an access violation (or segmentation fault on Unixes). There's just nothing at those addresses. Not "locked memory", just nothing.
And when the OS decides to (or when the process requests it), a page of memory can be unmapped from a given process again.
It's not locking though. The memory isn't "owned" by the process it is mapped to. And the same memory can be mapped into the address spaces of multiple processes at the same time. That's one way to exchange data between processes.
So the OS doesn't "lock" or control ownership of memory. It just controls whether a given chunk of memory is visible to any particular process.
It is not as simple as that, also Windows is not open-source, so exactly what it does may not be published. However all addresses in user space code are virtual and MMU protected - address X in one process does not refer to the same physical memory as address X in another, and one process cannot access that of another. An attempt to access memory outside of the address space of a process will cause an MMU exception.
I believe that when Windows starts a process, it has an initial heap allocation, from which dynamic memory allocation occurs. Deleting a dynamically allocated block simply returns it to the process's existing heap (not to the OS). If the current heap has insufficient memory, additional memory is requested from the OS to expand it.
Memory can be shared between processes in a controlled manner - in Windows this is done via a memory-mapped file, and uses the same virtual-memory mechanisms as the swap-file uses to emulate more memory that is physically available.
I think rather than asking a question on SO for this you'd do better to first do a little basic research, start at About Memory Management on MSDN for example.
With respect to external hardware accessing the memory it is possible to implement shared memory between processors (it is not that uncommon; for example see here for example), but it is not a matter of "wild-west" the mechanisms for doing so are implemented via the OS.
Even on conventional PC architectures, many devices access memory directly via DMA as a method of performing I/O without CPU overhead. Again this is controlled by the OS and not at all "wild west", but an errant device driver could bring down your system - which is why Microsoft have a testing and approvals process for drivers.
No, there isn't.
RAM is managed by software, RAM can't lock itself.
You asked the wrong question. There is no such thing as a hardware lock on memory. The trick is virtual memory management which makes only controlled amounts of memory available to a process. The OS controls all available memory, that's its job, and processes only ever see the memory that the OS has given to them.
I am new to using mmap and mapping HW registers, maybe the questions are simple. So, my problem is that we have some custom HW, which has 32bit registers. One requirement is that I must use mmap to ensure fast IO operations.
I see in examples that people use dev/mem as a general file. Is this a good idea or should I create my own dev/custom and put the mapped memory in there? Are there any benefits in doing that?
Secondly, are there any tools to let me create a mapped file, like dev/custom or how does one go about doing that?
Thirdly, how do I ensure that the offset is always a multiple of the page size? In my case that is 4096 bytes.
I am using c++ and Linux.
It depends on your hardware platform. For intel PC you may do either port IO and memory mapped IO, for ARM you do memory IO.
The you should first state that what bus and configuration you support, for example, can it be enumerated with PCI, USB, or just hard-coded communications at special memory (SoC-way)?
The last thing to worry is how to actually map that device (physical) memory into your application or driver's address space. On linux you do mmap with the offset as your hardware's BAR (page-aligned), you may then access the memory mapped IO with the virtual address pointer. (You may need to further adjust the cache flags)
I'm working on some "free RAM" tool that has to force windows to send 'LOW_MEMORY' signal to all applications (that asks all application to free their unused data, SQL server and file caches get cleared so you'll end up with lots of extra free space).
What will be best approach to do it in C++? The most "natural" solution for me would be to allocate a big amount of memory, but is it a "good" and "stable" way? Maybe there is any c++ Windows native function for it in WinAPI or somewhere else?
p.s.
The concept of that tool came from (and I know that better way is to... buy some RAM, but I have to write such tool now):
https://superuser.com/questions/214526/how-does-a-free-up-ram-utility-free-up-ram
Another possibility could be to iterate thru the active process list, and ask each one to trim it's working set, via SetProcessWorkingSetSize( hProcess, (SIZE_T)-1, (SIZE_T)-1), as described here on MSDN, potentially skipping applications if your intent is to attempt to improve performance of some particular application (benchmarking is absolutely your friend here).
This causes the OS to flush virtual pages to disk, freeing up physical memory for other applications. I'm not sure if this will cause, e.g., SQL Server to relax it's memory demands, but it is certainly worth a try.
There are a few links which may be of use to you at MSDN:
freeing user physical pages
global free function
local free function
heap free
Hopefully these can give you a start. The other way you could free up ram is to signal windows to page every processes RAM allocation to swap file, which will free physical RAM up. Then as the user uses a particular application it will be moved back to physical ram by the OS, that way the management is still handled for the most part by the OS.
This question already has answers here:
How does Software/Code actually communicate with Hardware?
(14 answers)
Closed 9 years ago.
Ok so I'm very very confused how a piece of hardware can understand code.
I read somewhere it has to do with voltages but how exactly does the piece of hardware know what an instruction in software means? I know drivers is the bridge between software and hardware but a driver is still software :S.
For example, in C++ we have pointers and they can point to some address in memory.. Can we have a pointer that points to some hardware address and then write to that address and it would affect the hardware? Or does hardware not have addresses?
I guess what I'm really asking is how does the OS or BIOS know where a piece of hardware is and how to talk to it?
For example, in C++ we have pointers and they can point to some
address in memory.. Can we have a pointer that points to some hardware
address and then write to that address and it would affect the
hardware? Or does hardware not have addresses?
Some hardware have addresses like pointers, some doesn't (In which case it most likely uses something called I/O ports, which requires special IN and OUT instructions instead of the regular memory operations). But much of the modern hardware has a memory address somewhere, and if you write the correct value to the correct address the hardware will do what you ask it to do. This varies from the really simple approach - say a serial port where you write a byte to an "output register", and the byte is sent along the serial line, and another address holds the input data being received on the serial port, to graphics cards that have a machine language of their own and can run hundreds or thousands of threads.
And normally, it's the OS's responsibility, via drivers, to access the hardware.
This is very simplified, and the whole subject of programming, OS and hardware is enough to write a fairly thick book about (and that's just in general terms, if you want to actually know about specific hardware, it's easily a few dozen pages for a serial port, and hundreds or thousands of pages for a graphics chip).
There are whole books on this topic. But briefly:
SW talks to hardware in a variety of ways. A given piece of hardware may respond to values written to very specific addresses ("memory mapped") or via I/O ports and instructions supported by the CPU (e.g., x86 instruction in and out instructions). When accessing a memory mapped port (address), the HW is designed to recognize the specific address or small range of addresses and route the signals to the peripheral hardware rather than memory in that case. Or in the case of I/O instructions, the CPU has a separate set of signals used specifically for that purpose.
The OS (at the lowest level - board support package) and BIOS have "knowledge" built in to them about the hardware address and/or the I/O ports needed to execute the various hardware functions available. That is, at some level, they have coded in exactly what addresses are needed for the different features.
You should read The soul of new machine, by Tracy Kidder. It's a 1981 Pullitzer price and it goes to great length to explain in layman terms how a computer works and how humans must think to create it. Besides, it's a real story and one of the few to convey the thrill of hardware and software.
All in all, a nice introduction to the subject.
The hardware engineers know where the memory and peripherals live in the processors address space. So it is something that is known because those addresses were chosen by someone and documented so that others could write drivers.
The processor does not know peripherals from ram. The instructions are simply using addresses ultimately determined by the programmers that wrote the software that the processor is running. So that implies, correctly, that the peripherals and ram (and rom) are all just addresses. If you were writing a video driver and were changing the resolution of the screen, there would be a handful of addresses that you would need to write to. At some point between the processor core and the peripheral (the video card) there would be hardware that examines the address and basically routes it to the right place. This is how the hardware was designed, it examines addresses, some address ranges are ram and sent to the memory to be handled and some are peripherals and sent there to be handled. Sometimes the memory ranges are programmable themselves so that you can organize your memory space for whatever reason. Similar to if you move from where you are living now to somewhere else, it is still you and your stuff at the new house, but it has a different address and the postal folks who deliver the mail know how to find your new address. And then there are MMU's that add a layer of protection and other features. The MMU (memory management unit) can also virtualize an address, so the processor may be programmed to write to address 0x100000 but the mmu translates that to 0x2300000 before it goes out on the normal bus to be sorted as memory or peripheral eventually finding its destination. Why would you do such a thing, well two major reasons. One is so that for example when you compile an application to run in your operating system, all programs for that OS can be compiled to run at the same address lets say address 0x8000. But there is only one physical address 0x8000 out there (lets assume) what happens is the operating system has configured the mmu for your program such that your program things it is running at that address, also the operating system can, if it chooses and the mmu has the feature, to add protections such that if your program tries to access something outside its allocated memory space then a fault occurs and your program is prevented from doing that. Prevented from hacking into or crashing other programs memory space. Likewise if the operating system supports it could also choose to use that fault to swap out some data from ram to disk and then give you more ram, virtual memory, allowing the programs to think there is more memory than there really is. An mmu is not the only way to do all of this but it is the popular way. So when you have that pointer in C++ running on some operating system it is most likely that that is a virtual address not the physical address, the mmu converts that address that has been given to your program into the real memory address. When the os chooses to switch out your program for another it is relatively easy to tell the mmu to let the other task think that that low numbered address space 0x8000 for example now belongs to the other program. And your program is put to sleep (not executed) for a while.
My C++ application occasionally runs out of memory due to large amounts of data being retrieved from a database. It has to run on 32bit WinXP machines.
Is it possible to transparently (for most of the existing code) swap out the data objects to disk and read them into memory only on demand, so I'm not limited to the 2GB that 32bit Windows gives to the process?
I've looked at VirtualAlloc and Address Window Extensions but I'm not sure it's what I want.
I also found this SO question where the questioner creates a file mapping and wants to create objects in there. One answer suggests using placement new which sounds like it would be pretty transparent to the rest of the code.
Will this prevent my application to run out of physical memory? I'm not entirely sure of it because after all there is still the 32bit address space limit. Or is this a different kind of problem that will occur when trying to create a lot of objects?
So long as you are using a 32-bit operating system there is nothing you can do about this. There is no way to have more than 3GB (2GB in the case of Windows) of data in virtual memory, whether or not it's actually swapped out to disk.
Historically databases have always handled this problem by using read, write and seek. So rather than accessing data directly from memory, they use a fake (64-bit) pointer. Data is split into blocks (normally around 4kb), and a number of these blocks are allocated in memory. When they want to access data from a fake pointer address they check if the block is loaded into memory and if it is they access it from there. If it is not then they find an empty slot and copy it in, then return the address. If there are no slots free then a piece of data will be written back out to disk (if it's been modified) and that slot will be reused.
The real beauty of this is that if your system has enough RAM then the operating system will cache much more than 2GB of this data in RAM at any point in time, and when you feel like you are actually reading and writing from disk the operating system will probably just be copying data around in memory. This, of course, requires a 32-bit operating system that support more than 3GB of physical memory, such as Linux or Windows Server with PAE.
SQLite has a nice self-contained implementation of this, which you could probably make use of with little effort.
If you do not wish to do this then your only alternatives are to either use a 64-bit operating system or to work with less data at any given point in time.