Secure deallocation of boost::asio::const_buffer - c++

For the purpose of PA:DSS I need to be sure that boost::asio::const_buffer (e.g. in boost::asio::async_write) will be zeroed when coming out of scope.
With a STL containers I can substitute a allocator/deallocator like this:
void deallocate(volatile pointer p, size_type n) {
std::memset(p, 0, n * sizeof(T));
::operator delete(p);
}
However I have no idea how to achieve the same with boost::asio::const_buffer, at least not in a way which would still let boost::asio::async_write to consume it. Also I don't want to reinvent the wheel (if there is any).

Short answer: Asio buffers don't own their memory, so they should not be responsible for disposing of it either.
First off, you should not use
std::memset(p, 0, n * sizeof(T));
Use a function like SecureZeroMemory instead: How-to ensure that compiler optimizations don't introduce a security risk?
I realize you had volatile there for this reason, but it might not always be honoured like you expect:
Your secure_memset function might not be sufficient. According to http://open-std.org/jtc1/sc22/wg14/www/docs/n1381.pdf there are optimizing compilers that will only zero the first byte – Daniel Trebbien Nov 9 '12 at 12:50
Background reading:
https://cryptocoding.net/index.php/Coding_rules#Clean_memory_of_secret_data
http://blog.quarkslab.com/a-glance-at-compiler-internals-keep-my-memset.html
On to ASIO
Make sure you fully realize that Boost Asio buffers have no ownership semantics. They only ever reference data owned by another object.
More importantly than the question posed, you might want to check that you keep around the buffer data long enough. A common pitfall is to pass a local as a buffer:
std::string response = "OK\r\n\r\n";
asio::async_write(sock_, asio::buffer(response), ...); // OOOPS!!!
This leads to Undefined Behaviour immediately.
IOW const_buffer is a concept. There are a gazillion ways to construct it on top of (your own) objects:
documentation
A buffer object represents a contiguous region of memory as a 2-tuple consisting of a pointer and size in bytes. A tuple of the form {void*, size_t} specifies a mutable (modifiable) region of memory. Similarly, a tuple of the form {const void*, size_t} specifies a const (non-modifiable) region of memory. These two forms correspond to the classes mutable_buffer and const_buffer, respectively
So, let's assume you have your buffer type
struct SecureBuffer
{
~SecureBuffer() { shred(); }
size_t size() const { return length_; }
char const* data() const { return data_; }
// ...
private:
void shred(); // uses SecureZeroMemory etc.
std::array<char, 1024> data_ = {0};
size_t length_ = 0u;
};
Then you can simply pass it where you want to use it:
SecureBuffer secret; // member variable (lifetime exceeds async operation)
// ... set data
boost::asio::async_write(sock_,
boost::asio::buffer(secret.data(), secret.size()),
/*...*/
);

Related

Guaranteed allocation size in std::string initialization.\

There are a variety of questions around this topic already on this site:
Guarantees on C++ std::string heap memory allocation?
std::string allocation policy
My question is different from these in that I am interested in how std::string determines its original capacity and how I can rightsize a string assuming that I know how many bytes exactly I will need. Calling reserve(n) can result in the string allocating more memory and I need it to be 24 bytes (right above the sso threshold, can't fit under). Overallocation would be quite dramatic as I potentially hold millions of these in memory, so if it, e.g., aligns at 32 bytes the 33% overhead really hurts. Naturally I would also like to avoid the potential reallocation from shrink_to_fit.
My understanding is that you can get an exact allocation size by initializing as std::string(rightsized_constant, size) via the const char*, size_t ctor, but of course nothing guarantees this.
Is there a reasonably clean way to get this using std::string?
Solution I used after some thinking, discussion with a coworker and brief experiments with clang++/g++ on Linux.
template<typename T>
class Reader {
void operator()(T key, std::string* append_to);
};
...
Reader a;
Reader b;
std::string s;
std::vector<KeyType> keys = ...;
std::vector<std::string> out;
out.reserve(keys.size());
for (const auto& key : keys) {
a(key, &s);
b(key, &s);
out.emplace_back(s); // relies on assumption that copy will rightsize
s.clear(); // relies on the implicit guarantee that this doesn't release memory
} // s rightsized after first iteration
Still curious if anything actually gives a guarantee.

How to fill buffers with mixed types conveniently in standard conformant way?

There are problems, where we need to fill buffers with mixed types. Two examples:
programming OpenGL/DirectX, we need to fill vertex buffers, which can have mixed types (which is basically an array of struct, but the struct maybe described by a run-time data)
creating a memory allocator: putting header/trailer information to the buffer (size, flags, next/prev pointer, sentinels, etc.)
The problem can be described like this:
there is an allocation function, which gives back some memory (new, malloc, OS dependent allocation function, like mmap or VirtualAlloc)
there is a need to put mixed types into an allocated buffer, at various offsets
A solution can be this, for example writing an int to an offset:
void *buffer = <allocate>;
int offset = <some_offset>;
char *ptr = static_cast<char*>(buffer);
*reinterpret_cast<int*>(ptr+offset) = int_value;
However, this is inconvenient, and has UB at least two places:
ptr+offset is UB, as there is no char array at ptr
writing to the result of reinterpret_cast is UB, as there is no int there
To solve the inconvenience problem, this solution is often used:
union Pointer {
void *asVoid;
bool *asBool;
byte *asByte;
char *asChar;
short *asShort;
int *asInt;
Pointer(void *p) : asVoid(p) { }
};
So, with this union, we can do this:
Pointer p = <allocate>;
p.asChar += offset;
*p.asInt++ = int_value; // write an int to offset
*p.asShort++ = short_value; // then a short afterwards
// other writes here
This solution is convenient for filling buffers, but has further UB, as the solution uses non-active union members.
So, my question is: how can one solve this problem in a strictly standard conformant, and most convenient way? I mean, I'd like to have the functionality which the union solution gives me, but in a standard conformant way.
(Note: suppose, that we have no alignment issues here, alignment is taken care of by using proper offsets)
A simple (and conformant) way to handle these things is leveraging std::memcpy to move whatever values you need into the correct offsets in your storage area, e.g.
std::int32_t value;
char *ptr;
int offset;
// ...
std::memcpy(ptr+offset, &value, sizeof(value));
Do not worry about performance, since your compiler will not actually perform std::memcpy calls in many cases (e.g. small values). Of course, check the assembly output (and profile!), but it should be fine in general.

Allocator with dense and sparse pointers - what's going on?

i'm trying to write a handle allocator in C++. this allocator would "handle" (hue hue hue) the allocation of handles for referencing assets (such as textures, uniforms, etc) in a game engine. for instance, inside a function for creating a texture, the handle allocator would be called to create a TextureHandle. when the texture was destroyed, the handle allocator would free the TextureHandle.
i'm reading through the source of BX, a library that includes a handle allocator just for this purpose - it's the base library of the popular library BGFX, a cross-platform abstraction over different rendering APIs.
before i start explaining what's baffling me, let me first outline what this class essentially looks like:
class HandleAllocator {
public:
constructor, destructor
getters: getNumHandles, getMaxHandles
u16 alloc();
void free(u16 handle);
bool isValid(u16 handle) const;
void reset();
private:
u16* getDensePointer() const;
u16* getSparsePointer() const;
u16 _numHandles;
u16 _maxHandles;
}
here's what getDensePointer() looks like:
u8* ptr = (u8*)reinterpret_cast<const u8*>(this);
return (u16*)&ptr[sizeof(HandleAlloc)];
as far as i understand it, this function is returning a pointer to the end of the class in memory, although i don't understand why the this pointer is first cast to a uint8_t* before being dereferenced and used with the array-index operator on the next line.
here's what's weird to me. the constructor calls the reset() function, which looks like this.
_numHandles = 0;
u16* dense = getDensePointer();
for(u16 ii=0, num = _maxHandles; ii < num; ++ii) {
dense[ii] = ii;
}
if getDensePointer returns a pointer to the end of the class in memory, how is it safe to be writing to memory beyond the end of the class in this for loop? how do i know this isn't stomping on something stored adjacent to it?
i'm a total noob, i realize the answer to this is probably obvious and betrays a total lack of knowledge on my part, but go easy on me..
To answer the first question, ask yourself why pointers have a type. In the end, they are just variables that are meant to store memory addresses. Any variable with a range large enough to store all possible memory addresses could do. They what is the difference between, let's say, int* and u8*?
The difference is the way operations are performed on them. Besides dereferencing, which is another story, pointer arithmetic is also involved. Let's take the following declarations: int *p; u8 *u;. Now, p+2, in order to have sense, will return the address at p+8 (the address of the second integer, if you'd like) while u+2 would return the address of u+2 (since u8 has a size of 1).
Now, sizeof gives you the size of the type in bytes. You want to move sizeof(x) bytes, so you need to index the array (or do pointer arithmetic, they are equivalent here) on a byte-sized data type. And that's why you cast it to u8.
Now, for the second question,
how do i know this isn't stomping on something stored adjacent to it?
simply by making sure nothing is there. This is done during the creation of the handler. For example, if you have:
HandleAllocator *h = new HandleAllocator[3]
you can freely call reset on h[0] and have 2 handlers worth of memory to play with. Without more details, it's hard to tell the exact way this excess memory is allocated and what's its purpose.

Struct hack equivalent in C++

The struct hack where you have an array of length 0 as the last member of a struct from C90 and C99 is well known, and with the introduction of flexible array members in C99, we even got a standardized way of using it with []. Unfortunately, C++ provides no such construct, and (at least with Clang 3.4), compiling a struct with either [0] or [] will yield a compilation warning with --std=c++11 -pedantic:
$ cat test.cpp
struct hack {
char filler;
int things[0];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:14: warning: zero size arrays are an extension [-Wzero-length-array]
int things[0];
and similarly
$ cat test.cpp
struct fam {
char filler;
int things[];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:7: warning: flexible array members are a C99 feature [-Wc99-extensions]
int things[];
My question then is this; say that I want to have a struct that contains an array of variable size as the last item in C++. What is the right thing to do given a compiler that supports both? Should I go with the struct hack [0] (which is a compiler extension), or the FAM [] (which is a C99 feature)? As far as I understand it, either will work, but I am trying to figure out which is the lesser evil?
Also, before people start suggesting keeping an int* to a separately allocated piece of memory in the struct instead, that is not a satisfactory answer. I want to allocate a single piece of memory to hold both my struct and the array elements. Using a std::vector also falls into the same category. If you wonder why I don't want to use a pointer instead, the R.'s answer to another question gives a good overview.
There have been some similar questions elsewhere, but none give an answer to this particular question:
Are flexible array members valid in C++?: Very similar, but the question there is whether FAM is valid in C++ (no). I am looking for a good reason to pick one or the other.
Conforming variant of the old “struct hack”: Proposes an alternative, but it's neither pretty, nor always correct (what if padding is added to the struct?). Accessing the elements later is also not as clean as doing e.things[42].
You can get more or less the same effect using a member
function and a reinterpret_cast:
int* buffer() { return reinterpret_cast<int*>(this + 1); }
This has one major defect: it doesn't guarantee correct
alignment. For example, something like:
struct Hack
{
char size;
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
is likely to return a mis-aligned pointer. You can work around
this by putting the data in the struct in a union with the type
whose pointer you are returning. If you have C++11, you can
declare:
struct alignas(alignof(int)) Hack
{
char size;
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
(I think. I've never actually tried this, and I could have some
details of the syntax wrong.)
This idiom has a second important defect: it does nothing to
ensure that the size field corresponds to the actual size of the
buffer, and worse, there is no real way of using new here. To
correct this, somewhat, you can define a class specific
operator new and operator delete:
struct alignas(alignof(int)) Hack
{
void* operator new( size_t, size_t n );
void operator delete( void* );
Hack( size_t n );
char size;
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
The client code will then have to use placement new to allocate:
Hack* hack = new (20) Hack(20);
The client still has to repeat the size, but he cannot ignore
it.
There are also techniques which can be used to prevent creating
instances which aren't allocated dynamically, etc., to end up
with something like:
struct alignas(alignof(int)) Hack
{
private:
void operator delete( void* p )
{
::operator delete( p );
}
// ban all but dynamic lifetime (and also inheritance, member, etc.)
~Hack() = default;
// ban arrays
void* operator new[]( size_t ) = delete;
void operator delete[]( void* p ) = delete;
public:
Hack( size_t n );
void* operator new( size_t, size_t n )
{
return ::operator new( sizeof(Hack) + n * sizeof(int) );
}
char size;
// Since dtor is private, we need this.
void deleteMe() { delete this; }
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
Given the fundamental dangers of such a class, it is debatable
if so many protective measures are necessary. Even with them,
it's really only usable by someone who fully understands all of
the constraints, and is carefully paying attention. In all but
extreme cases, in very low level code, you'd just make the
buffer a std::vector<int> and be done with it. In all but the
lowest level code, the difference in performance would not be
worth the risk and effort.
EDIT:
As a point of example, g++'s implementation of
std::basic_string uses something very similar to the above,
with a struct containing a reference count, the current size
and the current capacity (three size_t), followed directly by
the character buffer. And since it was written long before
C++11 and alignas/alignof, something like
std::basic_string<double> will crash on some systems (e.g.
a Sparc). (While technically a bug, most people do not consider
this a critical problem.)
This is C++, so templates are available:
template <int N>
struct hack {
int filler;
int thing [N];
};
Casting between different pointers to different instantiations will be the difficult issue, then.
The first thing that comes to mind is DON't, don't write C in C++. In 99.99% of the cases this hack is not needed, won't make any noticeable improvement in performance over just holding a std::vector and will complicate your life and that of the other maintainers of the project in which you deploy this.
If you want a standard compliant approach, provide a wrapper type that dynamically allocates a chunk of memory large enough to contain the hack (minus the array) plus N*sizeof(int) for the equivalent of the array (don't forget to ensure proper alighnment). The class would have accessors that map the members and the array elements to the correct location in memory.
Ignoring alignment and boiler plate code to make the interface nice and the implementation safe:
template <typename T>
class DataWithDynamicArray {
void *ptr;
int* array() {
return static_cast<int*>(static_cast<char*>(ptr)+sizeof(T)); // align!
}
public:
DataWithDynamicArray(int size) : ptr() {
ptr = malloc(sizeof(T) + sizeof(int)*size); // force correct alignment
new (ptr) T();
}
~DataWithDynamicArray() {
static_cast<T*>(ptr)->~T();
free(ptr);
}
// copy, assignment...
int& operator[](int pos) {
return array()[pos];
}
T& data() {
return *static_cast<T*>(ptr);
}
};
struct JustSize { int size; };
DataWithDynamicArray<JustSize> x(10);
x.data().size = 10
for (int i = 0; i < 10; ++i) {
x[i] = i;
}
Now I would really not implement it that way (I would avoid implementing it at all!!), as for example the size should be a part of the state of DataWithDynamicArray...
This answer is provided only as an exercise, to explain that the same thing can be done without extensions, but beware this is just a toy example that has many issues including but not limited to exception safety or alignment (and yet is better than forcing the user to do the malloc with the correct size). The fact that you can does not mean that you should, and the real question is whether you need this feature and whether what you are trying to do is a good design at all or not.
If you really you feel the need to use a hack, why not just use
struct hack {
char filler;
int things[1];
};
followed by
hack_p = malloc(sizeof(struct hack)+(N-1)*sizeof int));
Or don't even bother about the -1 and live with a little extra space.
C++ does not have the concept of "flexible arrays". The only way to have a flexible array in C++ is to use a dynamic array - which leads you to use int* things. You will need a size parameter if you are attempting to read this data from a file so that you can create the appropriate sized array (or use a std::vector and just keep reading until you reach the end of the stream).
The "flexible array" hack keeps the spatial locality (that is has the allocated memory in a contiguous block to the rest of the structure), which you lose when you are forced to use dynamic memory. There isn't really an elegant way around that (e.g. you could allocate a large buffer, but you would have to make it sufficiently large enough to hold any number of elements you wanted - and if the actual data being read in was smaller than the buffer, there would be wasted space allocated).
Also, before people start suggesting keeping an int* to a separately
allocated piece of memory in the struct instead, that is not a
satisfactory answer. I want to allocate a single piece of memory to
hold both my struct and the array elements. Using a std::vector also
falls into the same category.
A non-standard extension is not going to work when you move to a compiler that does not support it. If you keep to the standard (e.g. avoid using compiler-specific hacks), you are less likely to run into these types of issues.
There is at least one advantage for flexible array members over zero length arrays when the compiler is clang.
struct Strukt1 {
int fam[];
int size;
};
struct Strukt2 {
int fam[0];
int size;
};
Here clang will error if it sees Strukt1 but won't error if it instead sees Strukt2. gcc and icc accept either without errors and msvc errors in either case. gcc does error if the code is compiled as C.
The same applies for this similar but less obvious example:
struct Strukt3 {
int size;
int fam[];
};
strukt Strukt4 {
Strukt3 s3;
int i;
};

How to fix heap corruption

I've tried to build a very minimalistic memory read library to read some unsigned ints out of it. However, I run into a "HEAP CORRUPTION DETECTED" error message when the ReadUnsignedInt method wants to return.
HEAP CORRUPTION DETECTED. CRT detected that the application wrote to memory after end of buffer.
As I have read, this may be the cause when trying to double delete something. This may be caused by some incorrect usage of the std::tr1::shared_ptr but I cannot determine what I am doing wrong with them. Code is as follows (error handling omitted):
unsigned int Memory::ReadUnsignedInt (unsigned int address) const {
std::tr1::shared_ptr<byte> bytes =
this->ReadBytes(address, sizeof(unsigned int));
return *((int*)bytes.get());
// correct value (how to improve this ugly piece of code?)
}
std::tr1::shared_ptr<byte> Memory::ReadBytes (
unsigned int address, int numberOfBytes) const
{
std::tr1::shared_ptr<byte> pBuffer(new byte(numberOfBytes));
ReadProcessMemory(m_hProcess.get(), (LPCVOID)address,
pBuffer.get(), numberOfBytes * sizeof(byte), NULL))
return pBuffer;
}
Michael and Naveen have both found the same major flaw in your code, but not the only flaw.
shared_ptr will delete the pointed-at object when its reference count goes to zero.
This means you can only give it objects allocated by new -- not new[].
You may wish to use shared_ptr<vector<byte> > or boost::shared_array<byte> instead.
The issue is:
new byte(numberOfBytes)
This allocates a single byte with a value of numberOfBytes.
You want to do:
new byte[numberOfBytes]
Which allocates an array of bytes numberOfBytes long.
But since you know you are only reading 4 bytes, why bother allocating memory at all? Just pass the address of a unsigned int on the stack.
The basic problems with your code have been pointed out already. Looking at it, I'm left wondering why you'd use a shared_ptr here at all, though. If I were doing it, I'd probably use something like this:
unsigned Memory::ReadUnsignedInt(unsigned address) {
unsigned ret;
ReadProcessMemory(m_hProcess.get(), (void *)address, &ret, sizeof(ret), NULL);
return ret;
}
std::vector<char> Memory::ReadBytes(unsigned address, int num) {
std::vector<char> ret(num);
ReadProcessMemory(m_hProcess.get(), (void *)address, &ret[0], num, NULL);
return ret;
}
Then again, instead of ReadUnsignedInt, I'd be tempted to use a template:
template <class T>
T Memory::Read(unsigned address) {
T ret;
ReadProcessMemory(m_hProcess.get(), (void*)address, &ret, sizeof(ret), NULL);
return ret;
}
Since you don't pass a parameter from which it can deduce the type for the template parameter, you'd always have to specify explicitly:
int x = Read<int>(wherever);
char a = Read<char>(wherever);
The alternative would be to pass the destination as a parameter:
template <class T>
Memory::Read(unsigned address, T &t) {
ReadProcessMemory(my_hProcess.get(), (void *)address, &t, sizeof(t), NULL);
};
which you'd use like:
Read(wherever, some_int);
Read(somewhere, some_long);
and so on.
If you're worried about the inefficiency of returning a vector of char, you probably shouldn't -- VC++ (like most other reasonably current compilers) has what's called a "named return value optimization", which means in a situation like this, it passes a hidden reference to the vector you assign the result to, and ReadBytes will use that to deposit the data directly where it's going to end up anyway. For that matter, with any reasonable optimization turned on at all, ReadBytes will almost certainly end up as an inline function, so nothing involved really gets "passed" or "returned" at all.
On the other hand, this code won't work particularly well with older compilers -- and with old enough compilers, the versions using member template functions probably won't even compile. As long as you use a reasonably current compiler, however, life should be good.
I believe new byte(numberOfBytes) should be new byte[numberOfBytes]. Otherwise you would have allocated only one byte. Just to complete the answer, as #ephemient indicated you can not use shared_ptr here as it will do a delete where as you should do delete[]. If not done like this, the behavior will be undefined.