This question already has answers here:
entering ring 0 from user mode
(3 answers)
Closed 8 years ago.
Context:
according to this description user-space programms cannot perform all operations which are provided by the processors. The description in the link above says that there are different operation levels inside the cpu.
Question:
How is user-space code prevented from beeing executed in privileged levels by the cpu? Couldn't it be possible to switch into higher levels by using assembly language without using system-calls?
I am pretty sure it is not, but I do not understand why. Could anyone please point this out or point to some resources which deals with this topic?
When the cpu reaches an instruction which, due to the identity of the instruction to be executed, the memory address to be accessed, or some other condition, is not permitted at the current privilege level, a cpu exception is raised. This essentially saves the current cpu state (register contents, etc.) and transfers execution to a preset kernel address running at kernel privilege level, which can inspect the operation that was to be performed and decide how to proceed. In practice, it will generally end with the kernel killing the process if the operation to be performed is not permitted.
The cpu processes code stored in ram.
The memory keeps flags. The memory has a special layout. There are so called descriptor tables, which translate physical memory into virtual one. First there is a descriptortest or segment test where the gdt is read. The gdt contains a value called descriptor privilege level. It contains the value of the ringlevel, which the calling process must meet. If it does not, no access is granted.
Then comes the page directory test, which has a supervisor bit. This also must meet certain conditions. If it is zero only priviligeged prozesses may access this page table in the page directory.
If the value is one, all processes may acces the pages in the current checked page directory entry.
The last test is the page test. Its checks are like the previous checks.
If a process passed all checks succesfully, access to the memory page is granted. Cpu Register c3 should be of interest here.
Related
Windows 10, x64 , x86
My current knowledge
Lets say it is quad core, there will be 4 individual program counters which will point to 4 different locations of code for parallel execution.
Each of this program counters indicates where a computer is in its program sequence.
The address it points to changes after a context switch where another threads program counter gets placed onto the program counter to execute.
What I want to do:
Im in Kernel Mode my thread is running on core 1 and I want to read the current instruction pointer of core 2.
Expected Results:
0x203123 is the address of the instruction pointer and this address belongs to this thread and this thread belongs to this process... etc.
Anyone knows how to do it or can give me good book references, links etc...
Although I don't believe it's officially documented, there is a ZwGetContextThread exported from ntdll.dll. Being undocumented, things can change (and I haven't tried it in quite a while) but at least when I last tried it, you called it with a thread handle and a pointer to a CONTEXT structure, and it would return that thread's context.
I'm not certain exactly how up-to-date that is though. It's never mattered to me, so I haven't checked, but my guess would be that the IP in the CONTEXT you get is whatever was saved the last time the thread was suspended. So, if you want something (reasonably) current, you'd use ZwSuspendThread, get the context, then ZwResumeThread to start it running again.
Here I suppose I'm probably supposed to give the standard lines about undocumented function being subject to change, using them being a bad idea, and that you should generally leave all of this alone. Ah well, I been disappointing teachers and other authority figures for years, and I guess I'm not changing right now.
On the other hand, there may be a practical problem here. If you really need data that's really current, this probably isn't going to work very well for you. What it gives you will be kind of current at best. On the other hand, really current is almost a meaningless concept with information that goes out of date every clock cycle.
Anyone knows how to do it or can give me good book references, links etc...
For 80x86 hardware (regardless of operating system); there are only 3 ways to do this (that I know of):
a) send an inter-processor interrupt to the other CPU, and have an interrupt handler that stores the "return EIP" (from its stack) at a known address in memory so that your CPU can read "value of EIP immediately before interrupt" (with synchronization so that your CPU doesn't read before the value is written, etc).
b) put the other CPU into some kind of "debug mode" (single-stepping, last branch recording, ...) so that (either code in a debug exception handler or the CPU's hardware itself) is constantly writing EIP values to memory that you can read.
Of course both of these options will ruin performance, and the value you get will probably be useless (because EIP would've changed after you obtain it but before you can use the obtained value). To ensure the value is still useful; you'd need the other CPU to wait until after you've consumed the obtained value (and are ready for the next value); and to do that you'd have to resort to single-step debugging facilities (with the waiting in the debug exception handler), where you'll be lucky if you can get performance better than a thousand times slower (and can probably improve performance by simply disabling other CPUs completely).
Also note that they still won't accurately tell you EIP in all cases (e.g. if the CPU is in SMM/System Management Mode and is beyond the control of the OS); and I doubt Windows kernel supports any of it (e.g. kernel should support single-stepping of user-space processes/threads to allow debuggers to work, but won't support single-stepping of kernel and will probably lock up the computer due to various "waiting for lock to be released for 6 days" problems).
The last of the 3 options is:
c) Run the OS inside an emulator/simulator instead of running it on real hardware. In that case you can probably modify the emulator/simulator's code to inject EIP values somewhere (maybe some kind of virtual "EIP reporting device"?). This will ruin performance of the emulator/simulator, but you may be able to hide that (e.g. "virtual time inside the emulator passes at a rate of one second per 1000 seconds of real time outside the emulator").
How does OS know that EIP is no longer a valid/legal instruction and that application has crashed? How does it know when to generate the crash dump data?
On an x86-compatible processor, when EIP points to a page which does not have read permission, a page that is not mapped, an invalid instruction, or when a valid instruction tries to access a memory page without permission, or a page that is not mapped, or a divide instruction sees that the denominator is zero, or an INT instruction is executed, or a bunch of other things, it raises an exception. In the case of an exception occuring in protected mode when the current privilege level (CPL) is > 0, the following things occur:
Loads the values for SS and ESP from a memory section called the task state segment.
Pushes the values of SS, ESP, EFLAGS, CS and EIP onto the stack. The SS and ESP values are the previous ones, not the new ones from the TSS.
Some exceptions also push an error code onto the stack.
Gets the values for CS and EIP from the interrupt descriptor table and puts these values in CS and EIP.
Note that the kernel has set up these tables and segments in advance.
Then:
The kernel decides what to do with the exception. This depends on the specific kernel. Usually, it decides to kill your program. On Linux, you can override this default using signal handling and on Windows you can override it using Structured Exception Handling.
(This is not an exhaustive reference to x86 exception handling. This is a brief overview of the most common case.)
The detailed answer https://stackoverflow.com/a/59075911/15304 from #user253751 is there for you to know all that you may want to know.
A word of context might help though: processor usually proceeds to the next instruction after each instruction is over, but there are cases where it will suddenly start a completely unrelated instruction. This is called an interrupt, and is widely used to support device operations or get some code called at periodic intervals.
In an interrupt handler, we have to save the full processor state so that the interrupted code can be safely resumed after we're done with device-specific code.
The hardware exception mechanism used to know that a process is trying to do something that is impossible/invalid given the current configuration extensively borrows interrupts mechanisms, but it also has to take care of a context switch between (presumably) user-level code for the "faulty" process and kernel-level code that will handle the fault. That context switch is the reason why we see stack pointers re-loaded and task state segment involved in the description of hardware exceptions that have much simpler definitions (e.g. exectue instruction at address 0xfffff000) on other architectures.
Note that having a hardware exception doesn't necessarily means that the process crashed. The exception handler in the kernel will usually have to compare some information (what address we tried to access, what object is mapped at this address, etc.) and either does useful job (bring one more page of a mapped file into memory) and resume the process, or calls it an invalid access.
This question already has answers here:
Watch a memory location/install 'data breakpoint' from code?
(5 answers)
Closed 9 years ago.
In Windows (both 32 and 64 bit), through program (C++) is it possible to determine if a certain memory location has changed? I am trying to extrapolate the concept that we see in Visual Studio where we can set data break point.
Use Case: I understand its a dirty hack, but the fastest to implement to be re-implemented later
I am sharing data across process boundary (read between a 32 bit client and 64 bit server). The Client allocates memory (beyond our control) and passes the address to the server. The Server allocates a storage to shadow the client memory and via various code path can update that shadowed memory location. Instead of identifying and trapping each of these location (I was trying to find an easier path), to raise an event on change and eventually write back the data through WriteProcessMemory to the client process
Whilst it's probably possible to find a solution using a combination of VirtualProtect and the Windows debug interface, I'm not sure it's a particularly good solution for this scenario. One of the problems is that you introduce a delay on every new write, and you are looking at a transfer to another process that is monitoring the program as a "debugger". That process will then have to "unprotect" that page, mark it as "updated" for the other Server (or Client, depending on which direction you are going), and "continue" the application making the update. This is quite time consuming. And of course, there is no trivial way to know when the writing process has completed a sequence of updates. You also need to know exactly where to "continue" when there is a SEH "__except" call, and it's not always entirely trivial do to that, especially if the code is in the middle of a memcpy or something like that.
When I worked with graphics, I know that both our and some competitors driver would do this, first write-protect the memory, and then by hooking into the windows own page-fault handler, look up the page-fault, see if it's the special region(s), and if so, mark that page as updated and reset it to writeable. This allowed the driver to only copy the updated regions. But in this case, there is a distinct "I want to draw this stuff" after all the updates have been made.
If you want to badly enough, you can use the debug API to set a breakpoint on your own data, which will be triggered on a write, just like the VS API does.
The basics of this are to start a thread to do the "debugging". It will:
temporarily stop the primary thread.
Get that thread's registers with GetThreadContext
set the address to break on in one of DR0 through DR 3.
Set the size of the data in question in DR 6.
Set the type of breakpoint (data write, in this case) in DR 7.
Use SetThreadContext to tell the primary thread to use the modified registers.
Restart execution of the primary thread.
That's going from memory, so although I believe it's pretty close to the basic idea, I may have gotten a detail or two wrong, or (more likely) left out a few steps.
In most cases, it's going to be easier to do something at the source level, where you overload operator= for the target in question, so you get some code executed during assignment to the data. Then in that operator you can (for example) set an Event that code in another thread waits on (and reacts appropriately).
Another possibility (especially if you want to break on any access to a whole range of addresses) is to use VirtualProtect to force an exception on any access to that block of memory. Like a debug exception, this will be triggered synchronously, so if you want asynchronous execution you'll have to accomplish it by setting an Event (or whatever) and having another thread waiting on that so it'll execute when the Event is set.
which registers is changed when we move from user mode to kernel mode ?! and what is the reason to move to kernel mode ?
why these reasons aren't cause moving to kernel mode :
make new admin by root ( super user or admin)
If i get TLB miss why we don't move to kernel mode
when we write to bit Page modified in the Page tables
From your questions i found that you are very poor in operating system concepts.
Ok let me explain,(I am assuming you are using linux not windows).
"which registers is changed when we move from user mode to kernel mode ?"
For knowing answer for this question you need to learn about process management.
But i can simply say, linux uses system call interface for changing from user space to kernel space. system call interface uses some registers (Based on your processor) to pass system call number and arguments for system call.
In general, the move to kernel mode happens when
you make an explicit request to the kernel (a system call)
you make an implicit request to the kernel (accessing memory that isn't mapped into your space, whether valid or not)
the kernel decides it needs to do something more important than executing your code (normally as the result of a hardware interrupt).
All registers will be preserved, as it would be rather difficult to write code if your registers could change at random, but how that happens is very CPU specific.
"which registers is changed when we move from user mode to kernel mode ?!"
In a typical x86 based architecture running linux kernel, this is what happens:
a software program shall trigger interrupt 0x80 by the instruction: int $0x80
The CPU will change the program counter register & the code selector to refer to
the place where linux system call handler exists in memory (linux applies virtual
memory concept).
Till now registers affected are: CS, EIP, and EFLAGS register. the CPU also changes
Stack Selector (SS) and Stack Pointer (ESP) to refer to the top of kernel stack.
Finally, the kernel changes the Data Selector and Extra Data Selector (DS & ES) to
select to a kernel mode data segment.
The kernel shall push the program context on kernel's stack and the general purpose
registers (like accumulators) will change due to the kernel code being executed.
So as you can see, it all depends on the operating system and the architecture.
"and what is the reason to move to kernel mode ?"
The CPU by default works in kernel mode, your question should be "what is the need of user mode?". The user mode is necessary because it doesn't provide all permissions to the running software. You can run your browser/file manager/shell in user mode without any worries. If full permissions are given to application software, they will access the kernel data and damage it, and they might also access the hardware and for example, destroy the data stored on your hard disk.
Kernel of course must work in kernel mode (at least the core of the kernel). Application software for example, might require to write data to a file on the disk. application software doesn't have access to the disk (because it is running in user mode). the only way to achieve this is to call the kernel (which is running in kernel mode) to do the job. that's why you need to move from user mode to kernel mode and vice versa.
I finished homework for a graduate course in operating systems. I got a great score and I only missed one tiny point of a question. It asked which were privileged instructions and which were not. I answered all correctly except one: Adding one register value to another
I answered it was privileged but apparently it's not! How can this be?
I figured the user interacts with registers/memory by using systems calls, which in a sense change from user mode system calls to kernel mode routines. Therefore the adding of one register value to another could be called by a non-privileged user, but in the end the kernel is doing the work and is in kernel, privileged mode. Therefore it's privileged? A user can't do it by themselves. Am I wrong? Why?!
Thanks!
I'm not sure why you would think that changing a register would require kernel intervention. Some special registers may be privileged (those controlling things like descriptor tables or protection levels, with which user-mode code could bypass system-mode protections) but general purpose registers can be changed freely without a kernel getting involved.
When your code is running, the vast majority of instructions would be things like:
inc %eax
movl $7,%ebx
addl %eax,%ebx
As an aside, I'm just imagining how slow my code would run if it required a system call to the kernel every time I incremented a counter or called a function :-)
The only thing I can think of would be if you thought your execution thread wasn't allowed to change registers arbitrarily since that may affect those registers for other threads. But the kernel would take care of that when switching threads - all your registers would be packed away somewhere for later and the ones for the next thread would be loaded in.
Based on your comments, you seem to think that the time of adding is when the CPU protection mechanism should step in. In fact, it can't at that point because it has no idea what you're going to use the register for. You may just be using it as a counter.
However, if you do use it as an address to access memory, and that memory is invalid somehow (outside of your address space, or swapped to disk), the kernel will step in at that point to rectify the situation (toss your application out on its ear, or bring in the swapped-out memory).
However, even that is not a privileged instruction, it's just the CPU handling page faults.
A privileged instruction is something that you're not allowed to do at all, like change the interrupt descriptor table location registers or deactivate interrupts.