How do detect if an address will cause an access violation? - c++

I'm creating a class for a Lua binding which holds a pointer and can be changed by the scripter. It will include a few functions such as :ReadString and :ReadBool, but I don't want the application to crash if I can tell that the address they supplied will cause an access violation.
Is the a good way to detect if an address is outside of the readable/writable memory? Thanks!
A function library that may be useful is the "Virtual" function libraries, for example VirtualQuery
I'm not really looking for a foolproof design, I just want to omit the obvious (null pointers, pointers way outside the possible memory location)
I understand how unsafe this library is, and I'm not looking for safety, just sanity.

There are ways, but they do not serve the purpose you intend. That is; yes, you can determine whether an address appears to be valid at the current moment in time. But; no, you cannot determine whether that address will be valid a few clock cycles from now. Another thread could change the virtual memory map and a formerly valid address would become invalid.
The only way to properly handle the possibility of accessing suspect pointers is using whatever native exception handling is available on your platform. This may involve handling the signal SIG_BUS or it may involve using the proprietary __try and __catch extensions.
The idiom to use is the one wherein you attempt the access, and explicitly handle the resulting exception, if any does happen to occur.
You also have the problem of ensuring that the pointers you return point to your memory or to some other memory. For that, you could make your own data structure, a tree springs to mind, which stores the valid address ranges your "pointers" can achieve. Otherwise, the script code can hand you some absolute addresses and you will change memory structures by the operating system for the process environment.
The application you write about is highly suspect and you should probably start over with a less explosive design. But I thought I would tell you how to play with fire if you really want to.

Check out Raymond Chen's blog post, which goes more deeply into why this practice is bad. Most interestingly, he points out that, once a page is tested by IsBadReadPtr, further accesses to that page will not raise exceptions!

There is no, and that's why you should never do things like this.

Perhaps try using segvcatch, which can convert segfaults into C++ exceptions.

Related

Using STL containers without exception handling, in low memory situation

I've been trying to deal with low memory situation in my VC++ code.
I've used std::nothrow and checking returns value of new operator for NULL. Application works fine.
But problem is at very low system memory and it crashes abruptly anywhere especially inside STL containers calls (map, vector, queue etc) and the error is "Exception bad_alloc". Obviously these containers cannot allocate required memory so they simply throw bad_alloc.
Now since I've used these containers liberally in my code, I just don't want each and every function inside "try...catch" block. It would clutter the code. (And moreover, the code uses event based library. So, many of the functions are callbacks. Hence, its not like one or few parent caller function(s) I can put in try/catch block and solve this problem)
Without using try/catch, how can this problem be addressed?
At least can someone please tell which of these containers and methods throw bad_alloc (So that I will try putting only that particular code in try/catch block)
If you're not using dynamic_cast or any other features that that it gives you, you can turn off RTTI - that might save you a bit, but probably not enough.
The only other option I can offer is to profile your memory usage, and optimize your code so that you're freeing things you no longer need earlier.
You ask: "how can this problem be addressed?"
Well, what is the problem?
"I don't have enough memory to run my program" — procure more
"My program uses too much memory" — use less
You can't magically work around it any other way.

is it possible to use function pointers this way?

This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.
If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.
The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.
While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.
It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.
Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.

Is there a way to mark a chunk of allocated memory readonly?

if I allocate some memory using malloc() is there a way to mark it readonly. So memcpy() fails if someone attempt to write to it?
This is connected to a faulty api design where users are miss-using a const pointer returned by a method GetValue() which is part of large memory structure. Since we want to avoid copying of large chunk of memory we return live pointer within a structured memory which is of a specific format. Now problem is that some user find hack to get there stuff working by writing to this memory directly and avoid SetValue() call that does allocation and properly handing in memory binary format that we have developed. Although there hack sometime work but sometime it causes memory access violation due to incorrect interpretation of control flags which has been overridden by user.
Educating user is one task but let say for now we want there code to fail.
I am just wondering if we can simply protect against this case.
For analogy assume someone get a blob column from sqlite statement and then write back to it. Although in case of sqlite it will not make sense but this somewhat happing in our case.
On most hardware architectures you can only change protection attributes on entire memory pages; you can't mark a fragment of a page read-only.
The relevant APIs are:
mprotect() on Unix;
VirtualProtect() on Windows.
You'll need to ensure that the memory page doesn't contain anything that you don't want to make read-only. To do this, you'll either have to overallocate with malloc(), or use a different allocation API, such as mmap(), posix_memalign() or VirtualAlloc().
Depends on the platform. On Linux, you could use mprotect() (http://linux.die.net/man/2/mprotect).
On Windows you might try VirtualProtect() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898(v=vs.85).aspx). I've never used it though.
Edit:
This is not a duplicate of NPE's answer. NPE originally had a different answer; it was edited later and mprotect() and VirtualProtect() were added.
a faulty api design where users are miss-using a const pointer returned by a method GetValue() which is part of large memory structure. Since we want to avoid copying of large chunk of memory we return live pointer within a structured memory which is of a specific format
That is not clearly a faulty API design. An API is a contract: you promise your class will behave in a particular way, clients of the class promise to use the API in the proper manner. Dirty tricks like const_cast are improper (and in some, but not all cases, have undefined behaviour).
It would be faulty API design if using const_cast lead to a security issue. In that case you must copy the chunk of memory, or redesign the API. This is the norm in Java, which does not have the equivalent of const (despite const being a reserved word in Java).
Obsfucate the pointer. i.e. return to the client the pointer plus an offset, now they can't use the pointer directly.
whenever the pointer is passed to your code via the official API, subtract the offset and use the pointer as usual.

In either C or C++, should I check pointer parameters against NULL/nullptr?

This question was inspired by this answer.
I've always been of the philosophy that the callee is never responsible when the caller does something stupid, like passing of invalid parameters. I have arrived at this conclusion for several reasons, but perhaps the most important one comes from this article:
Everything not defined is undefined.
If a function doesn't say in it's docs that it's valid to pass nullptr, then you damn well better not be passing nullptr to that function. I don't think it's the responsibility of the callee to deal with such things.
However, I know there are going to be some who disagree with me. I'm curious whether or not I should be checking for these things, and why.
If you're going to check for NULL pointer arguments where you have not entered into a contract to accept and interpret them, do it with an assert, not a conditional error return. This way the bugs in the caller will be immediately detected and can be fixed, and it makes it easy to disable the overhead in production builds. I question the value of the assert except as documentation however; a segfault from dereferencing the NULL pointer is just as effective for debugging.
If you return an error code to a caller which has already proven itself buggy, the most likely result is that the caller will ignore the error, and bad things will happen much later down the line when the original cause of the error has become difficult or impossible to track down. Why is it reasonable to assume the caller will ignore the error you return? Because the caller already ignored the error return of malloc or fopen or some other library-specific allocation function which returned NULL to indicate an error!
In C++, if you don't want to accept NULL pointers, then don't take the chance: accept a reference instead.
While in general I don't see the value in detecting NULL (why NULL and not some other invalid address?) for a public API I'd probably still do it simply because many C and C++ programmers expect such behavior.
Defense in Depth principle says yes. If this is an external API then totally essential. Otherwise, at least an assert to assist in debugging misuse of your API.
You can document the contract until you are blue in the face, but you cannot in callee code prevent ill-advised or malicious misuse of your function. The decision you have to make is what's the likely cost of misuse.
In my view, it's not a question of responsibility. It's a question of robustness.
Unless I have full control on the caller and I must optimize for even the minute speed improvement, I always check for NULL.
I lean heavily on the side of 'don't trust your user's input to not blow up your system' and in defensive programming in general. Since I have made APIs in a past life, I have seen users of the libraries pass in null pointers and then application crashes result.
If it is truly an internal library and I'm the only person (or only a select few) have the ability to use it, then I might ease up on null pointer checks as long as everyone agrees to abide by general contracts. I can't trust the user base at large to adhere to that.
The answer is going to be different for C and C++.
C++ has references. The only difference between passing a pointer and passing a reference is that the pointer can be null. So, if the writer of the called function expects a pointer argument and forgets to do something sane when it's null, he's silly, crazy or writing C-with-classes.
Either way, this is not a matter of who wears the responsibility hat. In order to write good software, the two programmers must co-operate, and it is the responsibility of all programmers to 1° avoid special cases that would require this kind of decision and 2° when that fails, write code that blows up in a non-ambiguous and documented way in order to help with debugging.
So, sure, you can point and laugh at the caller because he messed up and "everything not defined is undefined" and had to spend one hour debugging a simple null pointer bug, but your team wasted some precious time on that.
My philosophy is: Your users should be allowed to make mistakes, your programming team should not.
What this means is that the only place you should check for invalid parameters including NULL, is in the top-level user interface. Everywhere the user can provide input to your code, you should check for errors, and handle them as gracefully as possible.
Everywhere else, you should use ASSERTS to ensure the programmers are using the functions correctly.
If you are writing an API, then only the top-level functions should catch and handle bad input. It is pointless to keep checking for a NULL pointer three or four levels deep into your call stack.
I am pro defensive programming.
Unless you can profile that these nullptr checkings happen in a bottleneck of your application... (in such cases it is conceivable one should not do those pointers value tests at those points)
but all in all comparing an int with 0 is really cheap an operation.
I think it is a shame to let potential crash bugs instead of consuming so little CPU.
so: Test your pointers against NULL!
I think that you should strive to write code that is robust for every conceivable situation. Passing a NULL pointer to a function is very common; therefore, your code should check for it and deal with it, usually by returning an error value. Library functions should NOT crash an application.
For C++, if your function doesn't accept nullpointer, then use a reference argument. In general.
There are some exceptions. For example, many people, including myself, think it's better with pointer argument when the actual argument will most naturally be a pointer, especially when the function stores away of a copy of the pointer. Even when the function doesn't support nullpointer argument.
How much to defend against invalid argument depends, including that it depends on subjective opinion and gut-feeling.
Cheers & hth.,
One thing you have to consider is what happens if some caller DOES misuse your API. In the case of passing NULL pointers, the result is an obvious crash, so it's OK not to check. Any misuse will be readily apparent to the calling code's developer.
The infamous glibc debacle is another thing entirely. The misuse resulted in actually useful behavior for the caller, and the API stayed that way for decades. Then they changed it.
In this case, the API developers' should have checked values with an assert or some similar mechanism. But you can't go back in time to correct an error. The wailing and gnashing of teeth were inevitable. Read all about it here.
If you don't want a NULL then don't make the parameter a pointer.
By using a reference you guarantee that the object will not be NULL.
He who performas invalid operations on invalid or nonexisting data, only deserves his system-state to become invalid.
I consider it complete nonsense that functions which expect input should check for NULL. Or whatever other value for that matter. The sole job of a function is to do a task based on its input or scope-state, nothing else. If you have no valid input, or no input at all, then don't even call the function. Besides, a NULL-check doesn't detect the other millions and millions of possible invalid values. You know on forehand you would be passing NULL, so why would you still pass it, waste valuable cycles on yet another function call with parameter passing, an in-function comparison of some pointer, and then check the function output again for success or not. Sure, I might have done so when I was 6 years old back in 1982, but those days have long since gone.
There is ofcourse the argument to be made for public API's. Like some DLL offering idiot-proof checking. You know, those arguments: "If the user supplies NULL you don't want your application to crash." What a non-argument. It is the user which passes bogus data in the first place; it's an explicit choice and nothing else than that. If one feels that is quality, well... I prefer solid logic and performance over such things. Besides, a programmer is supposed to know what he's doing. If he's operating on invalid data for the particular scope, then he has no business calling himself a programmer. I see no reason to downgrade the performance, increase power consumption, while increasing binary size which in turn affects instruction caching and branch-prediction, of my products in order to support such users.
I don't think it's the responsibility of the callee to deal with such things
If it doesn't take this responsibility it might create bad results, like dereferencing NULL pointers. Problem is that it always implicitly takes this responsibility. That's why i prefer graceful handling.
In my opinion, it's the callee's responsibility to enforce its contract.
If the callee shouldn't accept NULL, then it should assert that.
Otherwise, the callee should be well behaved when it's handed a NULL. That is, either it should functionally be a no-op, return an error code, or allocate its own memory, depending on the contract that you specified for it. It should do whatever seems to be the most sensible from the caller's perspective.
As the user of the API, I want to be able to continue using it without having the program crash; I want to be able to recover at the least or shut down gracefully at worst.
One side effect of that approach is that when your library crashes in response to being passed an invalid argument, you will tend to get the blame.
There is no better example of this than the Windows operating system. Initially, Microsoft's approach was to eliminate many tests for bogus arguments. The result was an operating system that was more efficient.
However, the reality is that invalid arguments are passed all time. From programmers that aren't up to snuff, or just using values returned by other functions there weren't expected to be NULL. Now, Windows performs more validation and is less efficient as a result.
If you want to allow your routines to crash, then don't test for invalid parameters.
Yes, you should check for null pointers. You don't want to crash an application because the developer messed something up.
Overhead of development time + runtime performance has a trade-off with the robustness of the API you are designing.
If the API you are publishing has to run inside the process of the calling routine, you SHOULD NOT check for NULL or invalid arguments. In this scenario, if you crash, the client program crashes and the developer using your API should mend his ways.
However, if you are providing a runtime/ framework which will run the client program inside it (e.g., you are writing a virtual machine or a middleware which can host the code or an operating system), you should definitely check of the correctness of the arguments passed. You don't want your program to be blamed for the mistakes of a plugin.
There is a distinction between what I would call legal and moral responsibility in this case. As an analogy, suppose you see a man with poor eyesight walking towards a cliff edge, blithely unaware of its existence. As far as your legal responsibility goes, it would in general not be possible to successfully prosecute you if you fail to warn him and he carries on walking, falls off the cliff and dies. On the other hand, you had an opportunity to warn him -- you were in a position to save his life, and you deliberately chose not to do so. The average person tends to regard such behaviour with contempt, judging that you had a moral responsibility to do the right thing.
How does this apply to the question at hand? Simple -- the callee is not "legally" responsible for the actions of the caller, stupid or otherwise, such as passing in invalid input. On the other hand, when things go belly up and it is observed that a simple check within your function could have saved the caller from his own stupidity, you will end up sharing some of the moral responsibility for what has happened.
There is of course a trade-off going on here, dependent on how much the check actually costs you. Returning to the analogy, suppose that you found out that the same stranger was inching slowly towards a cliff on the other side of the world, and that by spending your life savings to fly there and warn him, you could save him. Very few people would judge you entirely harshly if, in this particular situation, you neglected to do so (let's assume that the telephone has not been invented, for the purposes of this analogy). In coding terms, however, if the check is as simple as checking for NULL, you are remiss if you fail to do so, even if the "real" blame in the situation lies with the caller.

Can you force a crash if a write occurs to a given memory location with finer than page granularity?

I'm writing a program that for performance reasons uses shared memory (sockets and pipes as alternatives have been evaluated, and they are not fast enough for my task, generally speaking any IPC method that involves copies is too slow). In the shared memory region I am writing many structs of a fixed size. There is one program responsible for writing the structs into shared memory, and many clients that read from it. However, there is one member of each struct that clients need to write to (a reference count, which they will update atomically). All of the other members should be read only to the clients.
Because clients need to change that one member, they can't map the shared memory region as read only. But they shouldn't be tinkering with the other members either, and since these programs are written in C++, memory corruption is possible. Ideally, it should be as difficult as possible for one client to crash another. I'm only worried about buggy clients, not malicious ones, so imperfect solutions are allowed.
I can try to stop clients from overwriting by declaring the members in the header they use as const, but that won't prevent memory corruption (buffer overflows, bad casts, etc.) from overwriting. I can insert canaries, but then I have to constantly pay the cost of checking them.
Instead of storing the reference count member directly, I could store a pointer to the actual data in a separate mapped write only page, while keeping the structs in read only mapped pages. This will work, the OS will force my application to crash if I try to write to the pointed to data, but indirect storage can be undesirable when trying to write lock free algorithms, because needing to follow another level of indirection can change whether something can be done atomically.
Is there any way to mark smaller areas of memory such that writing them will cause your app to blow up? Some platforms have hardware watchpoints, and maybe I could activate one of those with inline assembly, but I'd be limited to only 4 at a time on 32-bit x86 and each one could only cover part of the struct because they're limited to 4 bytes. It'd also make my program painful to debug ;)
Edit: I found this rather eye popping paper, but unfortunately it requires using ECC memory and a modified Linux kernel.
I don't think its possible to make a few bits read only like that at the OS level.
One thing that occurred to me just now is that you could put the reference counts in a different page like you suggested. If the structs are a common size, and are all in sequential memory locations you could use pointer arithmetic to locate a reference count from the structures pointer, rather than having a pointer within the structure. This might be better than having a pointer for your use case.
long *refCountersBase;//The start address of the ref counters page
MyStruct *structsBase;//The start address of your structures page
//get address to reference counter
long *getRefCounter(MyStruct *myStruct )
{
size_t n = myStruct - structsBase;
long *ref = refCountersBase + n;
return ref;
}
You would need to add a signal handler for SIGSEGV which recovers from the exception, but only for certain addresses. A starting point might be http://www.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html and the corresponding documentation for your OS.
Edit: I believe what you want is to perform the write and return if the write address is actually OK, and tail-call the previous exception handler (the pointer you get when you install your exception handler) if you want to propagate the exception. I'm not experienced in these things though.
I have never heard of enforcing read-only at less than a page granularity, so you might be out of luck in that direction unless you can put each struct on two pages. If you can afford two pages per struct you can put the ref count on one of the pages and make the other read-only.
You could write an API rather than just use headers. Forcing clients to use the API would remove most corruption issues.
Keeping the data with the reference count rather than on a different page will help with locality of data and so improve cache performance.
You need to consider that a reader may have a problem and fail to properly update its ref count. Also that the writer may fail to complete an update. Coping with these things requires extra checks. You can combine such checks with the API. It may be worth experimenting to measure the performance implications of some kind of integrity checking. It may be fast enough to keep a checksum, something as simple as adler32.