Can you force a crash if a write occurs to a given memory location with finer than page granularity? - c++

I'm writing a program that for performance reasons uses shared memory (sockets and pipes as alternatives have been evaluated, and they are not fast enough for my task, generally speaking any IPC method that involves copies is too slow). In the shared memory region I am writing many structs of a fixed size. There is one program responsible for writing the structs into shared memory, and many clients that read from it. However, there is one member of each struct that clients need to write to (a reference count, which they will update atomically). All of the other members should be read only to the clients.
Because clients need to change that one member, they can't map the shared memory region as read only. But they shouldn't be tinkering with the other members either, and since these programs are written in C++, memory corruption is possible. Ideally, it should be as difficult as possible for one client to crash another. I'm only worried about buggy clients, not malicious ones, so imperfect solutions are allowed.
I can try to stop clients from overwriting by declaring the members in the header they use as const, but that won't prevent memory corruption (buffer overflows, bad casts, etc.) from overwriting. I can insert canaries, but then I have to constantly pay the cost of checking them.
Instead of storing the reference count member directly, I could store a pointer to the actual data in a separate mapped write only page, while keeping the structs in read only mapped pages. This will work, the OS will force my application to crash if I try to write to the pointed to data, but indirect storage can be undesirable when trying to write lock free algorithms, because needing to follow another level of indirection can change whether something can be done atomically.
Is there any way to mark smaller areas of memory such that writing them will cause your app to blow up? Some platforms have hardware watchpoints, and maybe I could activate one of those with inline assembly, but I'd be limited to only 4 at a time on 32-bit x86 and each one could only cover part of the struct because they're limited to 4 bytes. It'd also make my program painful to debug ;)
Edit: I found this rather eye popping paper, but unfortunately it requires using ECC memory and a modified Linux kernel.

I don't think its possible to make a few bits read only like that at the OS level.
One thing that occurred to me just now is that you could put the reference counts in a different page like you suggested. If the structs are a common size, and are all in sequential memory locations you could use pointer arithmetic to locate a reference count from the structures pointer, rather than having a pointer within the structure. This might be better than having a pointer for your use case.
long *refCountersBase;//The start address of the ref counters page
MyStruct *structsBase;//The start address of your structures page
//get address to reference counter
long *getRefCounter(MyStruct *myStruct )
{
size_t n = myStruct - structsBase;
long *ref = refCountersBase + n;
return ref;
}

You would need to add a signal handler for SIGSEGV which recovers from the exception, but only for certain addresses. A starting point might be http://www.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html and the corresponding documentation for your OS.
Edit: I believe what you want is to perform the write and return if the write address is actually OK, and tail-call the previous exception handler (the pointer you get when you install your exception handler) if you want to propagate the exception. I'm not experienced in these things though.

I have never heard of enforcing read-only at less than a page granularity, so you might be out of luck in that direction unless you can put each struct on two pages. If you can afford two pages per struct you can put the ref count on one of the pages and make the other read-only.
You could write an API rather than just use headers. Forcing clients to use the API would remove most corruption issues.
Keeping the data with the reference count rather than on a different page will help with locality of data and so improve cache performance.
You need to consider that a reader may have a problem and fail to properly update its ref count. Also that the writer may fail to complete an update. Coping with these things requires extra checks. You can combine such checks with the API. It may be worth experimenting to measure the performance implications of some kind of integrity checking. It may be fast enough to keep a checksum, something as simple as adler32.

Related

C++ memory protect pointers

Is it possible to "memory protect" pointers so that it's actually impossible to change them in the code, so that attempting to change them in the code results in a bus error? I am not referring to const but some type of deeper immutability assurance on the OS level. The question applies to any OS.
(Carmack mentions something like that here: https://youtu.be/Uooh0Y9fC_M?t=1h41m)
Since your question is quite general, here are some general ideas.
From within your application, the language gives you a number of constructs to protect your data. For example, data that is not meant to be exposed should be flagged as private. This does not prevent clients from reverse engineering your class though, for example if you have
class C {
public: int a;
private: int b;
};
then usually you will be able to access b through int* pB = &(c.a) + 1. This is not standard by any definition, but on most systems this will probably work. In general, because C++ allows very low-level memory management you can basically access any part of your applications memory from anywhere within the application, though making sensible (ab)use of this requires some reverse engineering.
When you expose public data, you can return const references and pointers, but of course it is very easy to still change this memory:
const* T myImmutable = new T();
const_cast<T*>(myImmutable)->change(); // Oops
Again, this is not standard C++ as compilers will use the const qualifier to perform optimizations, and things may break when you circumvent that, but the language does not stop you from doing this.
If you want to protect your memory from external changes, things become a bit trickier. In general, the OS makes sure that the memory assigned to different processes is separated and you cannot just go and write in the memory space of other processes. However, there are some functions in the Windows API (Read/WriteProcessMemory) that one may use. Of course, this requires heavy reverse engineering to determine exactly where in memory the pointer to be changed is located, but it is possible. Some ways of protecting against this are VirtualProtect, as mentioned in Dani's answer. There are increasingly complex things that you can do, like keeping checksums of important data or
[writing a] driver that monitors the SSDT and then catches when WriteProcessMemory or ReadProcessMemory is executed and you can squash those calls if they're pointed at the game
but as the first answer on the page I found that on correctly points out:
Safety doesn't exist. The only thing you can do is make it as hard as possible to crack
which is a lesson that you should never forget!

Passing the results of `std::string::c_str()` to `mkdtemp()` using `const_cast<char*>()`

OK, so: we all know that generally the use of const_cast<>() anywhere is so bad it’s practically a programming war crime. So this is a hypothetical question about how bad it might be, exactly, in a specific case.
To wit: I ran across some code that did something like this:
std::string temporary = "/tmp/directory-XXXXX";
const char* dtemp = ::mkdtemp(const_cast<char*>(temporary.c_str()));
/// `temporary` is unused hereafter
… now, I have run across numerous descriptions about how to get writeable access to the underlying buffer of a std::string instance (q.v. https://stackoverflow.com/a/15863513/298171 for example) – all of them have the caveat that yes, these methods aren’t guaranteed to work by any C++ standard, but in practice they all do.
With this in mind, I am just curious on how using const_cast<char*>(string.c_str()) compares to other known methods (e.g. the aforementioned &string[0], &c)… I ask because the code in which I found this method in use seems to work fine in practice, and I thought I’d see what the experts thought before I attempt the inevitable const_cast<>()-free rewrite.
const cannot be enforced at hardware level because in practice, in non-hypothetical environment, you can set read-only attribute only to a full 4K memory page and there are huge pages on the way, which drastically reduce CPU's lookup misses in the TLB.
const doesn't affect code generation like __restrict from C99 does. In fact, const, roughly speaking, means "poison all write attempts to this data, I'd like to protect my invariants here"
Since std::string is a mutable string, its underlying buffer cannot be allocated in read-only memory. So const_cast<> shouldn't cause program crash here unless you're going to change some bytes outside of underlying buffer's bounds or trying to delete, free() or realloc() something. However, altering of chars in the buffer may be classified as invariant violation. Because you don't use std::string instance after that and simply throw it away this shouldn't provoke program crash unless some particular std::string implementation decide to check its invariants' integrity before destruction and force a crash if some of these are broken. Because such check couldn't be done in less than O(N) time and std::string is a performance critical class, it is unlikely to be done by anyone.
Another issue may come from Copy-on-Write strategy. So, by modifying the buffer directly you may break some other std::string's instance which shares the buffer with your string. But few years ago majority of C++ experts came to conclusion that COW is too fragile and too slow especially in multi-threaded environments, so modern C++ libraries shouldn't use it and instead adhere to using move construction where possible and avoiding heap traffic for small length strings where applicable.

Pointers in C++ : How large should an object be to need use of a pointer?

Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!

Is there a way to mark a chunk of allocated memory readonly?

if I allocate some memory using malloc() is there a way to mark it readonly. So memcpy() fails if someone attempt to write to it?
This is connected to a faulty api design where users are miss-using a const pointer returned by a method GetValue() which is part of large memory structure. Since we want to avoid copying of large chunk of memory we return live pointer within a structured memory which is of a specific format. Now problem is that some user find hack to get there stuff working by writing to this memory directly and avoid SetValue() call that does allocation and properly handing in memory binary format that we have developed. Although there hack sometime work but sometime it causes memory access violation due to incorrect interpretation of control flags which has been overridden by user.
Educating user is one task but let say for now we want there code to fail.
I am just wondering if we can simply protect against this case.
For analogy assume someone get a blob column from sqlite statement and then write back to it. Although in case of sqlite it will not make sense but this somewhat happing in our case.
On most hardware architectures you can only change protection attributes on entire memory pages; you can't mark a fragment of a page read-only.
The relevant APIs are:
mprotect() on Unix;
VirtualProtect() on Windows.
You'll need to ensure that the memory page doesn't contain anything that you don't want to make read-only. To do this, you'll either have to overallocate with malloc(), or use a different allocation API, such as mmap(), posix_memalign() or VirtualAlloc().
Depends on the platform. On Linux, you could use mprotect() (http://linux.die.net/man/2/mprotect).
On Windows you might try VirtualProtect() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898(v=vs.85).aspx). I've never used it though.
Edit:
This is not a duplicate of NPE's answer. NPE originally had a different answer; it was edited later and mprotect() and VirtualProtect() were added.
a faulty api design where users are miss-using a const pointer returned by a method GetValue() which is part of large memory structure. Since we want to avoid copying of large chunk of memory we return live pointer within a structured memory which is of a specific format
That is not clearly a faulty API design. An API is a contract: you promise your class will behave in a particular way, clients of the class promise to use the API in the proper manner. Dirty tricks like const_cast are improper (and in some, but not all cases, have undefined behaviour).
It would be faulty API design if using const_cast lead to a security issue. In that case you must copy the chunk of memory, or redesign the API. This is the norm in Java, which does not have the equivalent of const (despite const being a reserved word in Java).
Obsfucate the pointer. i.e. return to the client the pointer plus an offset, now they can't use the pointer directly.
whenever the pointer is passed to your code via the official API, subtract the offset and use the pointer as usual.

Are classes guaranteed to have the same organization in memory between program runs?

I'm attempting to implement a Save/Load feature into my small game. To accomplish this I have a central class that stores all the important variables of the game such as position, etc. I then save this class as binary data to a file. Then simply load it back for the loading function. This seems to work MOST of the time, but if I change certain things then try to do a save/load the program will crash with memory access violations. So, are classes guaranteed to have the same structure in memory on every run of the program or can the data be arranged at random like a struct?
Response to Jesus - I mean the data inside the class, so that if I save the class to disk, when I load it back, will everything fit nicely back.
Save
fout.write((char*) &game,sizeof Game);
Load
fin.read((char*) &game, sizeof Game);
Your approach is extremely fragile. With many restrictions, it can work. These restrictions are not worth subjecting your users (or yourself!) to in typical cases.
Some Restrictions:
Never refer to external memory (e.g. a pointer or reference)
Forbid ABI changes/differences. Common case: memory layout and natural alignment on 32 vs 64 will vary. The user will need a new 'game' for each ABI.
Not endian compatible.
Altering your type's layouts will break your game. Changing your compiler options can do this.
You're basically limited to POD data.
Use offsets instead of pointers to refer to internal data (This reference would be in contiguous memory).
Therefore, you can safely use this approach in extremely limited situations -- that typically applies only to components of a system, rather than the entire state of the game.
Since this is tagged C++, "boost - Serialization" would be a good starting point. It's well tested and abstracts many of the complexities for you.
Even if this would work, just don't do it. Define a file format at the byte-level and write sensible 'convert to file format' and 'convert from file format' functions. You'll actually know the format of the file. You'll be able to extend it. Newer versions of the program will be able to read files from older versions. And you'll be able to update your platform, build tools, and classes without fear of causing your program to crash.
Yes, classes and structures will have the same layout in memory every time your program runs., although I can't say if the standard enforces this. The machine code generated by C++ compilers use "hard-coded" offsets to access type fields, so they are fixed. Realistically, the layout will only change if you modify the C++ class definition (field sizes, order, virtual methods, etc.), compile with a different compiler or change compiler options.
As long as the type is POD and without pointer fields, it should be safe to simply dump it to a file and read it back with the exact same program. However, because of the above-mentionned concerns, this approach is quite inflexible with regard to versionning and interoperability.
[edit]
To respond to your own edit, do not do this with your "Game" object! It certainly has pointers to other objects, and those objects will not exist anymore in memory or will be elsewhere when you'll reload your file.
You might want to take a look at this.
Classes are not guaranteed to have the same structure in memory as pointers can point to different locations in memory each time a class is created.
However, without posting code it is difficult to say with certainty where the problem is.