which registers is changed when we move from user mode to kernel mode ?! and what is the reason to move to kernel mode? - c++

which registers is changed when we move from user mode to kernel mode ?! and what is the reason to move to kernel mode ?
why these reasons aren't cause moving to kernel mode :
make new admin by root ( super user or admin)
If i get TLB miss why we don't move to kernel mode
when we write to bit Page modified in the Page tables

From your questions i found that you are very poor in operating system concepts.
Ok let me explain,(I am assuming you are using linux not windows).
"which registers is changed when we move from user mode to kernel mode ?"
For knowing answer for this question you need to learn about process management.
But i can simply say, linux uses system call interface for changing from user space to kernel space. system call interface uses some registers (Based on your processor) to pass system call number and arguments for system call.

In general, the move to kernel mode happens when
you make an explicit request to the kernel (a system call)
you make an implicit request to the kernel (accessing memory that isn't mapped into your space, whether valid or not)
the kernel decides it needs to do something more important than executing your code (normally as the result of a hardware interrupt).
All registers will be preserved, as it would be rather difficult to write code if your registers could change at random, but how that happens is very CPU specific.

"which registers is changed when we move from user mode to kernel mode ?!"
In a typical x86 based architecture running linux kernel, this is what happens:
a software program shall trigger interrupt 0x80 by the instruction: int $0x80
The CPU will change the program counter register & the code selector to refer to
the place where linux system call handler exists in memory (linux applies virtual
memory concept).
Till now registers affected are: CS, EIP, and EFLAGS register. the CPU also changes
Stack Selector (SS) and Stack Pointer (ESP) to refer to the top of kernel stack.
Finally, the kernel changes the Data Selector and Extra Data Selector (DS & ES) to
select to a kernel mode data segment.
The kernel shall push the program context on kernel's stack and the general purpose
registers (like accumulators) will change due to the kernel code being executed.
So as you can see, it all depends on the operating system and the architecture.
"and what is the reason to move to kernel mode ?"
The CPU by default works in kernel mode, your question should be "what is the need of user mode?". The user mode is necessary because it doesn't provide all permissions to the running software. You can run your browser/file manager/shell in user mode without any worries. If full permissions are given to application software, they will access the kernel data and damage it, and they might also access the hardware and for example, destroy the data stored on your hard disk.
Kernel of course must work in kernel mode (at least the core of the kernel). Application software for example, might require to write data to a file on the disk. application software doesn't have access to the disk (because it is running in user mode). the only way to achieve this is to call the kernel (which is running in kernel mode) to do the job. that's why you need to move from user mode to kernel mode and vice versa.

Related

How does OS detect a crash of a process

How does OS know that EIP is no longer a valid/legal instruction and that application has crashed? How does it know when to generate the crash dump data?
On an x86-compatible processor, when EIP points to a page which does not have read permission, a page that is not mapped, an invalid instruction, or when a valid instruction tries to access a memory page without permission, or a page that is not mapped, or a divide instruction sees that the denominator is zero, or an INT instruction is executed, or a bunch of other things, it raises an exception. In the case of an exception occuring in protected mode when the current privilege level (CPL) is > 0, the following things occur:
Loads the values for SS and ESP from a memory section called the task state segment.
Pushes the values of SS, ESP, EFLAGS, CS and EIP onto the stack. The SS and ESP values are the previous ones, not the new ones from the TSS.
Some exceptions also push an error code onto the stack.
Gets the values for CS and EIP from the interrupt descriptor table and puts these values in CS and EIP.
Note that the kernel has set up these tables and segments in advance.
Then:
The kernel decides what to do with the exception. This depends on the specific kernel. Usually, it decides to kill your program. On Linux, you can override this default using signal handling and on Windows you can override it using Structured Exception Handling.
(This is not an exhaustive reference to x86 exception handling. This is a brief overview of the most common case.)
The detailed answer https://stackoverflow.com/a/59075911/15304 from #user253751 is there for you to know all that you may want to know.
A word of context might help though: processor usually proceeds to the next instruction after each instruction is over, but there are cases where it will suddenly start a completely unrelated instruction. This is called an interrupt, and is widely used to support device operations or get some code called at periodic intervals.
In an interrupt handler, we have to save the full processor state so that the interrupted code can be safely resumed after we're done with device-specific code.
The hardware exception mechanism used to know that a process is trying to do something that is impossible/invalid given the current configuration extensively borrows interrupts mechanisms, but it also has to take care of a context switch between (presumably) user-level code for the "faulty" process and kernel-level code that will handle the fault. That context switch is the reason why we see stack pointers re-loaded and task state segment involved in the description of hardware exceptions that have much simpler definitions (e.g. exectue instruction at address 0xfffff000) on other architectures.
Note that having a hardware exception doesn't necessarily means that the process crashed. The exception handler in the kernel will usually have to compare some information (what address we tried to access, what object is mapped at this address, etc.) and either does useful job (bring one more page of a mapped file into memory) and resume the process, or calls it an invalid access.

What prevents a user-space program from switching to higher levels? [duplicate]

This question already has answers here:
entering ring 0 from user mode
(3 answers)
Closed 8 years ago.
Context:
according to this description user-space programms cannot perform all operations which are provided by the processors. The description in the link above says that there are different operation levels inside the cpu.
Question:
How is user-space code prevented from beeing executed in privileged levels by the cpu? Couldn't it be possible to switch into higher levels by using assembly language without using system-calls?
I am pretty sure it is not, but I do not understand why. Could anyone please point this out or point to some resources which deals with this topic?
When the cpu reaches an instruction which, due to the identity of the instruction to be executed, the memory address to be accessed, or some other condition, is not permitted at the current privilege level, a cpu exception is raised. This essentially saves the current cpu state (register contents, etc.) and transfers execution to a preset kernel address running at kernel privilege level, which can inspect the operation that was to be performed and decide how to proceed. In practice, it will generally end with the kernel killing the process if the operation to be performed is not permitted.
The cpu processes code stored in ram.
The memory keeps flags. The memory has a special layout. There are so called descriptor tables, which translate physical memory into virtual one. First there is a descriptortest or segment test where the gdt is read. The gdt contains a value called descriptor privilege level. It contains the value of the ringlevel, which the calling process must meet. If it does not, no access is granted.
Then comes the page directory test, which has a supervisor bit. This also must meet certain conditions. If it is zero only priviligeged prozesses may access this page table in the page directory.
If the value is one, all processes may acces the pages in the current checked page directory entry.
The last test is the page test. Its checks are like the previous checks.
If a process passed all checks succesfully, access to the memory page is granted. Cpu Register c3 should be of interest here.

Send interrupt to cpu as keyboard do?

Is it possible to simulate hardware interrupts somehow from user program?
I've seen this question posted many times, but always not answered.
I want to know about low-level interrupts (for example simulate situation when key pressed on keyboard, so that keyboard driver would interrupt interrupt).
High level events and APIs are outside scope, and question is rather theoretical than practical (to prevent "why" discussions :)
Yes and no.
On an x86 CPU (for one example) there's an int instruction that generates an interrupt. Once the interrupt is generated, the CPU won't necessarily1 distinguish between an interrupt generated by hardware and one generated by software. For one example, in the original PC BIOS, IBM chose an interrupt that would cause the print-screen command to execute. The interrupt they chose (interrupt 5) was one that wasn't then in use, but which Intel had said was reserved for future use. Intel eventually did put that interrupt to use -- in the 286 they added a bound instruction that checks that a value is within bounds, and generates an interrupt if it's not. The bound instruction is essentially never used though, because it generates interrupt 5 if a value is out of bounds. This means (if you're running something like MS-DOS that allows it) executing the bound instruction with a value that's out of bounds will print the screen.
On a modern OS, however, this won't generally be allowed. All generation and handling of interrupts happens in the kernel. The hardware had 4 levels of protection ("rings") and support for specifying the ring at which the int instruction can be executed. If you try to execute it from code running at ring 3, it won't execute directly -- instead, execution will switch to the OS kernel, which can treat it as it chooses.
This allows (for example) Windows to emulate MS-DOS, so MS-DOS programs (which do use the int instruction) can execute in a virtual machine, with virtualized input and output, so even though they "think" they're working directly with the keyboard and screen hardware, they're actually using emulations of them provided by software.
For "native" programs, however, using most int instructions (i.e. any but a tiny number of interrupts intended for communication with the kernel) will simply result in the program being shut down.
So, bottom line: yes, the hardware supports it -- but the hardware also supports prohibiting it, and nearly every modern OS does exactly that, at least for most code outside the OS kernel itself.
Though, with typical hardware, the interrupt handler can read data from the programmable interrupt controller (PIC) chip that will tell it whether the interrupt came through the PIC (i.e., hardware interrupt) or not (software interrupt). Most hardware also supports at least a few interrupts that can be generated only by hardware, such as NMI on the x86. These are usually reserved for fairly narrow uses though (e.g., NMI on a PC is normally used for things like memory parity errors).

Privileged instructions, adding register values?

I finished homework for a graduate course in operating systems. I got a great score and I only missed one tiny point of a question. It asked which were privileged instructions and which were not. I answered all correctly except one: Adding one register value to another
I answered it was privileged but apparently it's not! How can this be?
I figured the user interacts with registers/memory by using systems calls, which in a sense change from user mode system calls to kernel mode routines. Therefore the adding of one register value to another could be called by a non-privileged user, but in the end the kernel is doing the work and is in kernel, privileged mode. Therefore it's privileged? A user can't do it by themselves. Am I wrong? Why?!
Thanks!
I'm not sure why you would think that changing a register would require kernel intervention. Some special registers may be privileged (those controlling things like descriptor tables or protection levels, with which user-mode code could bypass system-mode protections) but general purpose registers can be changed freely without a kernel getting involved.
When your code is running, the vast majority of instructions would be things like:
inc %eax
movl $7,%ebx
addl %eax,%ebx
As an aside, I'm just imagining how slow my code would run if it required a system call to the kernel every time I incremented a counter or called a function :-)
The only thing I can think of would be if you thought your execution thread wasn't allowed to change registers arbitrarily since that may affect those registers for other threads. But the kernel would take care of that when switching threads - all your registers would be packed away somewhere for later and the ones for the next thread would be loaded in.
Based on your comments, you seem to think that the time of adding is when the CPU protection mechanism should step in. In fact, it can't at that point because it has no idea what you're going to use the register for. You may just be using it as a counter.
However, if you do use it as an address to access memory, and that memory is invalid somehow (outside of your address space, or swapped to disk), the kernel will step in at that point to rectify the situation (toss your application out on its ear, or bring in the swapped-out memory).
However, even that is not a privileged instruction, it's just the CPU handling page faults.
A privileged instruction is something that you're not allowed to do at all, like change the interrupt descriptor table location registers or deactivate interrupts.

How to translate a virtual memory address to a physical address?

In my C++ program (on Windows), I'm allocating a block of memory and can make sure it stays locked (unswapped and contiguous) in physical memory (i.e. using VirtualAllocEx(), MapUserPhysicalPages() etc).
In the context of my process, I can get the VIRTUAL memory address of that block,
but I need to find out the PHYSICAL memory address of it in order to pass it to some external device.
1. Is there any way I can translate the virtual address to the physical one within my program, in USER mode?
2. If not, I can find out this virtual to physical mapping only in KERNEL mode. I guess it means I have to write a driver to do it...? Do you know of any readily available driver/DLL/API which I can use, that my application (program) will interface with to do the translation?
3. In case I'll have to write the driver myself, how do I do this translation? which functions do I use? Is it mmGetPhysicalAddress()? How do I use it?
4. Also, if I understand correctly, mmGetPhysicalAddress() returns the physical address of a virtual base address that is in the context of the calling process. But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called... so how do I translate the virtual address in the application (user-mode) memory space, not the driver?
Any answers, tips and code excerpts will be much appreciated!!
Thanks
In my C++ program (on Windows), I'm allocating a block of memory and can make sure it stays locked (unswapped and contiguous) in physical memory (i.e. using VirtualAllocEx(), MapUserPhysicalPages() etc).
No, you can't really ensure that it stays locked. What if your process crashes, or exits early? What if the user kills it? That memory will be reused for something else, and if your device is still doing DMA, that will eventually result in data loss/corruption or a bugcheck (BSOD).
Also, MapUserPhysicalPages is part of Windows AWE (Address Windowing Extensions), which is for handling more than 4 GB of RAM on 32-bit versions of Windows Server. I don't think it was intended to be used to hack up user-mode DMA.
1. Is there any way I can translate the virtual address to the physical one within my program, in USER mode?
There are drivers that let you do this, but you cannot program DMA from user mode on Windows and still have a stable and secure system. Letting a process that runs as a limited user account read/write physical memory allows that process to own the system. If this is for a one-off system or a prototype, this is probably acceptable, but if you expect other people (particularly paying customers) to use your software and your device, you should write a driver.
2. If not, I can find out this virtual to physical mapping only in KERNEL mode. I guess it means I have to write a driver to do it...?
That is the recommended way to approach this problem.
Do you know of any readily available driver/DLL/API which I can use, that my application (program) will interface with to do the translation?
You can use an MDL (Memory Descriptor List) to lock down arbitrary memory, including memory buffers owned by a user-mode process, and translate its virtual addresses into physical addresses. You can also have Windows temporarily create an MDL for the buffer passed into a call to DeviceIoControl by using METHOD_IN_DIRECT or METHOD_OUT_DIRECT.
Note that contiguous pages in the virtual address space are almost never contiguous in the physical address space. Hopefully your device is designed to handle that.
3. In case I'll have to write the driver myself, how do I do this translation? which functions do I use? Is it mmGetPhysicalAddress()? How do I use it?
There's a lot more to writing a driver than just calling a few APIs. If you're going to write a driver, I would recommend reading as much relevant material as you can from MSDN and OSR. Also, look at the examples in the Windows Driver Kit.
4. Also, if I understand correctly, mmGetPhysicalAddress() returns the physical address of a virtual base address that is in the context of the calling process. But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called... so how do I translate the virtual address in the application (user-mode) memory space, not the driver?
Drivers are not processes. A driver can run in the context of any process, as well as various elevated contexts (interrupt handlers and DPCs).
You have a virtually continguous buffer in your application. That range of virtual memory is, as you noted, only available in the context of your application and some of it may be paged out at any time. So, in order to do access the memory from a device (which is to say, do DMA) you need to both lock it down and get a description that can be passed to a device.
You can get a description of the buffer called an MDL, or Memory Descriptor List, by sending an IOCTL (via the DeviceControl function) to your driver using METHOD_IN_DIRECT or METHOD_OUT_DIRECT. See the following page for a discussion of defining IOCTLs.
http://msdn.microsoft.com/en-us/library/ms795909.aspx
Now that you have a description of the buffer in a driver for your device, you can lock it down so that the buffer remains in memory for the entire period that your device may act on it. Look up MmProbeAndLockPages on MSDN.
Your device may or may not be able to read or write all of the memory in the buffer. The device may only support 32-bit DMA and the machine may have more than 4GB of RAM. Or you may be dealing with a machine that has an IOMMU, a GART or some other address translation technology. To accomodate this, use the various DMA APIs to get a set of logical addresses that are good for use by your device. In many cases, these logical addresses will be equivalent to the physical addresses that your question orginally asked about, but not always.
Which DMA API you use depends on whether your device can handle scatter/gather lists and such. Your driver, in its setup code, will call IoGetDmaAdapter and use some of the functions returned by it.
Typically, you'll be interested in GetScatterGatherList and PutScatterGatherList. You supply a function (ExecutionRoutine) which actually programs your hardware to do the transfer.
There's a lot of details involved. Good Luck.
You can not access the page tables from user space, they are mapped in the kernel.
If you are in the kernel, you can simply inspect the value of CR3 to locate the base page table address and then begin your resolution.
This blog series has a wonderful explanation of how to do this. You do not need any OS facility/API to resolve virtual<->physical addresses.
Virtual Address: f9a10054
1: kd> .formats 0xf9a10054
Binary: 11111001 10100001 00000000 01010100
Page Directory Pointer Index(PDPI) 11 Index into
1st table(Page Directory Pointer
Table) Page Directory Index(PDI)
111001 101 Index into 2nd
table(Page Directory Table) Page
Table Index(PTI)
00001 0000 Index into 3rd
table(Page Table) Byte Index
0000 01010100 0x054, the offset
into the physical memory page
In his example, they use windbg, !dq is a physical memory read.
1) No
2) Yes, you have to write a driver. Best would be either a virtual driver, or change the driver for the special-external device.
3) This gets very confusing here. MmGetPhysicalAddress should be the method you are looking for, but I really don't know how the physical address is mapped to the bank/chip/etc. on the physical memory.
4) You cannot use paged memory, because that gets relocated. You can lock paged memory with MmProbeAndLockPages on an MDL you can build on memory passed in from the user mode calling context. But it is better to allocate non-paged memory and hand that to your user mode application.
PVOID p = ExAllocatePoolWithTag( NonPagedPool, POOL_TAG );
PHYSICAL_ADDRESS realAddr = MmGetPhysicalAddress( p );
// use realAddr
You really shouldn't be doing stuff like this in usermode; as Christopher says, you need to lock the pages so that mm doesn't decide to page out your backing memory while a device is using it, which would end up corrupting random memory pages.
But if the calling process is the driver, and I'm using my application to call the driver for that function, I'm changing contexts and I am no longer in the context of the app when the mmGetPhysicalAddress routine is called
Drivers don't have context like user-mode apps do; if you're calling into a driver via an IOCTL or something, you are usually (but not guaranteed!) to be in the calling user thread's context. But really, this doesn't matter for what you're asking, because kernel-mode memory (anything above 0x80000000) is the same mapping no matter where you are, and you'd end up allocating memory in the kernel side. But again, write a proper driver. Use WDF (http://www.microsoft.com/whdc/driver/wdf/default.mspx), and it will make writing a correct driver much easier (though still pretty tricky, Windows driver writing is not easy)
EDIT: Just thought I'd throw out a few book references to help you out, you should definitely (even if you don't pursue writing the driver) read Windows Internals by Russinovich and Solomon (http://www.amazon.com/Microsoft-Windows-Internals-4th-Server/dp/0735619174/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1229284688&sr=8-2); Programming the Microsoft Windows Driver Model is good too (http://www.amazon.com/Programming-Microsoft-Windows-Driver-Second/dp/0735618038/ref=sr_1_1?ie=UTF8&s=books&qid=1229284726&sr=1-1)
Wait, there is more. For the privilege of runnning on your customer's Vista 64 bit, you get expend more time and money to get your kernal mode driver resigned my Microsoft,