I am using some C-library in my C++ code. The library wants me to allocate some amount of the memory and pass the pointer to the library. Unfortunately, the exact required memory size is not known in advance, so the library also requires me to provide C-callback with the following signature:
void* callback_realloc(void* ptr, size_t new_size);
where ptr is previously passed memory and new_size is required size, and the callback must return the pointer to newly allocated memory. There is no direct way to store the allocator state. Instead, I need to rely on pointer arithmetic somehow as the following:
template<class T>
struct o_s {
std::aligned_storage_t<sizeof(T), alignof(T)> data;
};
template<class Alloc>
struct o_i: private Alloc {
std::size_t allocated_size;
const Alloc& get_allocator() const { return *this; }
};
template<class T>
struct o: public o_i, public o_s<T> {
void* ptr() {
return &data;
}
// Additionally, override class operator new and operator delete...
};
void* my_callback(void* ptr, size_t new_size) {
auto meta = static_cast<o*>(reinterpret_cast<o_s<char>*>(ptr));
// access to the allocator state ...
}
Then sizeof(o_i) + initial_size memory is allocated and ptr() is passed to the C library.
At this point, I understand that I am not the first person in the world who needs this pattern. Unfortunately (and surprisingly) I have not found anything suitable for this in Boost or STL. I would like to use ready implementation to avoid possible underwater rocks.
The simplest solution is to allocate the memory using std::malloc, and use std::realloc as the callback. C API such as this are the case where using those makes sense in C++.
You don't necessarily have to use std::malloc however. You can implement a custom allocator if you want to. Using some of the allocated storage is one way of storing allocation metadata, and it's an efficient way. That's not necessary either though, since you can also store the metadata separately in a map-like structure.
I don't think you'd find an idiomatic way to do this in C++, as this is a C idiom.
Conceptually, you have two ways of going about this:
Store the metadata alongside the buffer you've allocated (as you seem to be doing). I feel using double inheritance etc. is a bit overkill here, when you could just allocate a char buffer of sizeof(size_t)+allocation_size and use the first part for your metadata.
Allocate and return to the API a raw buffer as needed, and use a separate static data structure to manage this with a map of ptr->allocation size. I suppose this is what malloc/realloc is doing behind the scenes anyway.
Both are valid, and the one you chose depends on the specific details of your application. When dealing with memory it's ok to actually deal with memory, be it pointer arithmetic or what not.
The important thing is probably to keep the ugly bit to a single location, and provide a C++ style API to this on the C++ side of things, so that the client code isn't exposed to implementation details.
Related
I'm implementing a super simple container for long term memory management, and the container will have inside an array.
I was wondering, what are the actual implications of those two approaches below?
template<class T, size_t C>
class Container
{
public:
T objects[C];
};
And:
template<class T>
class Container
{
public:
Container(size_t cap)
{
this->objects = new T[cap];
}
~Container()
{
delete[] this->objects;
}
T* objects;
};
Keep in mind that those are minimal examples and I'm not taking into account things like storing the capacity, the virtual size, etc.
If the size of the container is known at compile time, like in first example, you should better use std::array. For instance:
template<class T, size_t C>
class Container
{
public:
std::array<T, C> objects;
};
This has important advantages:
You can get access to its element via std::get, which automatically checks that the access is within bounds, at compile time.
You have iterators for Container::objects, so you can use all the routines of the algorithm library.
The second example has some important drawbacks:
You cannot enforce bounds-check when accessing the elements: this can potentially lead to bugs.
What happens if new in the constructor throws? You have to manage this case properly.
You need a suitable copy constructor and assignment operators.
you need a virtual destructor unless you are sure that nobody derives from the class, see here.
You can avoid all these problems by using a std::vector.
In addition to #francesco's answer:
First example
In your first example, your Container holds a C-style array. If an instance of the Container is created on the stack, the array will be on the stack as well. You might want to read heap vs stack (or similar). So, allocating on the stack can have advantages, but you have to be careful with the size you give to the array (size_t C) in order to avoid a stack overflow.
You should consider using std::array<T,C>.
Second example
Here you hold a pointer of type T which points to a C-style array which you allocate on the heap (it doesn't matter whether you allocate an instance of Container on the stack or on the heap). In this case, you don't need to know the size at compile time, which has obvious advantages in many situations. Also, you can use much greater values for size_t C.
You should consider using std::vector<T>.
Further research
For further research, read on stack vs heap allocation/performance, std::vector and std::array.
In my code I use buffers currently allocated this way:
char* buf1 = (char*)malloc(size);
However at some points in the code I want to reassign the pointer to some place else in memory. The problem is that there are other places in the code that still need to be able to access the pointer buf1.
What's the best way to do this in C++? Right now I am considering writing a struct with a single char* in it, then allocating an object of this struct type and passing it to the places where I need to, and referring to wrapped pointer to get the current value of buf1.
However it seems that this is similar to what unique_ptr does. If I use unique_ptr how can I wrap a char* with it? I had some trouble with testing this and I'm not sure it's supported.
To clarify: these buffers are bytes of varying sizes.
In general, this question cannot be answered. There are simply way too many things you could be wanting to be doing with an array of char. Without knowing what it actually is that you want to do, its impossible to say what may be good abstractions to use…
If you want to do stuff with strings, just use std::string. If you want a dynamically-sized buffer that can grow and shrink, use std::vector.
If you just need a byte buffer the size of which is determined at runtime or which you'd just generally want to live in dynamic storage, I'd go with std::unique_ptr. While std::unique_ptr<T> is just for single objects, the partial specialization std::unique_ptr<T[]> can be used for dealing with dynamically allocated arrays. For example:
auto buffer = std::unique_ptr<char[]> { new char[size] };
Typically, the recommended way to create an object via new and get an std::unique_ptr to it would be to use std::make_unique. And if you want your buffer initialized to some particular value, you should indeed use std::make_unique<char[]>(value). However, std::make_unique<T[]>() will value-initialize the elements of the array it creates. In the case of a char array, that effectively means that your array will be zero-initialized. In my experience, compilers are, unfortunately, unable to optimize away the zero-initialization, even if the entire buffer would be overwritten first thing right after being created. So if you want an uninitialized buffer for the sake of avoiding the overhead of initialization, you can't use std::make_unique. Ideally, you'd just define your own function to create a default-initialized array via new and get an std::unique_ptr to it, for example:
template <typename T>
inline std::enable_if_t<std::is_array_v<T> && (std::extent_v<T> == 0), std::unique_ptr<T>> make_unique_default(std::size_t size)
{
return std::unique_ptr<T> { new std::remove_extent_t<T>[size] };
}
and then
auto buffer = make_unique_default<char[]>(new char[size]);
It seems that C++20 will include this functionality in the form of std::make_unique_default_init. So that would be the preferred method then.
Note that, if you're dealing with plain std::unique_ptr, you will still have to pass around the size of the buffer separately. You may want to bundle up an std::unique_ptr and an std::size_t if you're planning to pass around the buffer
template <typename T>
struct buffer_t
{
std::unique_ptr<T[]> data;
std::size_t size;
};
Note that something like above struct represents ownership of the buffer. So you'd want to use this, e.g., when returning a new buffer from a factory function, e.g.,
buffer_t makeMeABuffer();
or handing off ownership of the buffer to someone else, e.g.,
DataSink(buffer_t&& buffer)
You would not want to use it just to point some function to the buffer data and size do some processing without transferring ownership. For that, you'd just pass a pointer and size, or, e.g., use a span (starting, again, with C++20; also available as part of GSL)…
I'm working on a program that stores a vital data structure as an unstructured string with program-defined delimiters (so we need to walk the string and extract the information we need as we go) and we'd like to convert it to a more structured data type.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself. The length of the string will always be known at allocation time. We've determined through testing that doubling the number of allocations required for each of these data types is an unnacceptable cost. Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation? If we were using cstrings I'd just have a char * in the struct and point it to the end of the struct after allocating a block big enough for the struct and string, but we'd prefer std::string if possible.
Most of my experience is with C, so please forgive any C++ ignorance displayed here.
If you have such rigorous memory needs, then you're going to have to abandon std::string.
The best alternative is to find or write an implementation of basic_string_ref (a proposal for the next C++ standard library), which is really just a char* coupled with a size. But it has all of the (non-mutating) functions of std::basic_string. Then you use a factory function to allocate the memory you need (your struct size + string data), and then use placement new to initialize the basic_string_ref.
Of course, you'll also need a custom deletion function, since you can't just pass the pointer to "delete".
Given the previously linked to implementation of basic_string_ref (and its associated typedefs, string_ref), here's a factory constructor/destructor, for some type T that needs to have a string on it:
template<typename T> T *Create(..., const char *theString, size_t lenstr)
{
char *memory = new char[sizeof(T) + lenstr + 1];
memcpy(memory + sizeof(T), theString, lenstr);
try
{
return new(memory) T(..., string_ref(theString, lenstr);
}
catch(...)
{
delete[] memory;
throw;
}
}
template<typename T> T *Create(..., const std::string & theString)
{
return Create(..., theString.c_str(), theString.length());
}
template<typename T> T *Create(..., const string_ref &theString)
{
return Create(..., theString.data(), theString.length());
}
template<typename T> void Destroy(T *pValue)
{
pValue->~T();
char *memory = reinterpret_cast<char*>(pValue);
delete[] memory;
}
Obviously, you'll need to fill in the other constructor parameters yourself. And your type's constructor will need to take a string_ref that refers to the string.
If you are using std::string, you can't really do one allocation for both structure and string, and you also can't make the allocation of both to be one large block. If you are using old C-style strings it's possible though.
If I understand you correctly, you are saying that through profiling you have determined that the fact that you have to allocate a string and another data member in your data structure imposes an unacceptable cost to you application.
If that's indeed the case I can think of a couple solutions.
You could pre-allocate all of these structures up front, before your program starts. Keep them in some kind of fixed collection so they aren't copy-constructed, and reserve enough buffer in your strings to hold your data.
Controversial as it may seem, you could use old C-style char arrays. It seems like you are fogoing much of the reason to use strings in the first place, which is the memory management. However in your case, since you know the needed buffer sizes at start up, you could handle this yourself. If you like the other facilities that string provides, bear in mind that much of that is still available in the <algorithm>s.
Take a look at Variable Sized Struct C++ - the short answer is that there's no way to do it in vanilla C++.
Do you really need to allocate the container structs on the heap? It might be more efficient to have those on the stack, so they don't need to be allocated at all.
Indeed two allocations can seem too high. There are two ways to cut them down though:
Do a single allocation
Do a single dynamic allocation
It might not seem so different, so let me explain.
1. You can use the struct hack in C++
Yes this is not typical C++
Yes this requires special care
Technically it requires:
disabling the copy constructor and assignment operator
making the constructor and destructor private and provide factory methods for allocating and deallocating the object
Honestly, this is the hard-way.
2. You can avoid allocating the outer struct dynamically
Simple enough:
struct M {
Kind _kind;
std::string _data;
};
and then pass instances of M on the stack. Move operations should guarantee that the std::string is not copied (you can always disable copy to make sure of it).
This solution is much simpler. The only (slight) drawback is in memory locality... but on the other hand the top of the stack is already in the CPU cache anyway.
C-style strings can always be converted to std::string as needed. In fact, there's a good chance that your observations from profiling are due to fragmentation of your data rather than simply the number of allocations, and creating an std::string on demand will be efficient. Of course, not knowing your actual application this is just a guess, and really one can't know this until it's tested anyways. I imagine a class
class my_class {
std::string data() const { return self._data; }
const char* data_as_c_str() const // In case you really need it!
{ return self._data; }
private:
int _type;
char _data[1];
};
Note I used a standard clever C trick for data layout: _data is as long as you want it to be, so long as your factory function allocates the extra space for it. IIRC, C99 even gave a special syntax for it:
struct my_struct {
int type;
char data[];
};
which has good odds of working with your C++ compiler. (Is this in the C++11 standard?)
Of course, if you do do this, you really need to make all of the constructors private and friend your factory function, to ensure that the factory function is the only way to actually instantiate my_class -- it would be broken without the extra memory for the array. You'll definitely need to make operator= private too, or otherwise implement it carefully.
Rethinking your data types is probably a good idea.
For example, one thing you can do is, rather than trying to put your char arrays into a structured data type, use a smart reference instead. A class that looks like
class structured_data_reference {
public:
structured_data_reference(const char *data):_data(data) {}
std::string get_first_field() const {
// Do something interesting with _data to get the first field
}
private:
const char *_data;
};
You'll want to do the right thing with the other constructors and assignment operator too (probably disable assignment, and implement something reasonable for move and copy). And you may want reference counted pointers (e.g. std::shared_ptr) throughout your code rather than bare pointers.
Another hack that's possible is to just use std::string, but store the type information in the first entry (or first several). This requires accounting for that whenever you access the data, of course.
I'm not sure if this exactly addressing your problem. One way you can optimize the memory allocation in C++ by using a pre-allocated buffer and then using a 'placement new' operator.
I tried to solve your problem as I understood it.
unsigned char *myPool = new unsigned char[10000];
struct myStruct
{
myStruct(char* aSource1, char* aSource2)
{
original = new (myPool) string(aSource1); //placement new
data = new (myPool) string(aSource2); //placement new
}
~myStruct()
{
original = NULL; //no deallocation needed
data = NULL; //no deallocation needed
}
string* original;
string* data;
};
int main()
{
myStruct* aStruct = new (myPool) myStruct("h1", "h2");
// Use the struct
aStruct = NULL; // No need to deallocate
delete [] myPool;
return 0;
}
[Edit] After, the comment from NicolBolas, the problem is bit more clear. I decided to write one more answer, eventhough in reality it is not that much advantageous than using a raw character array. But, I still believe that this is well within the stated constraints.
Idea would be to provide a custom allocater for the string class as specified in this SO question.
In the implementation of the allocate method, use the placement new as
pointer allocate(size_type n, void * = 0)
{
// fail if we try to allocate too much
if((n * sizeof(T))> max_size()) { throw std::bad_alloc(); }
//T* t = static_cast<T *>(::operator new(n * sizeof(T)));
T* t = new (/* provide the address of the original character buffer*/) T[n];
return t;
}
The constraint is that for the placement new to work, the original string address should be known to the allocater at run time. This can be achieved by external explicit setting before the new string member creation. However, this is not so elegant.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself.
I have a feeling that may you are not exploiting C++'s type-system to its maximum potential here. It looks and feels very C-ish (that is not a proper word, I know). I don't have concrete examples to post here since I don't have any idea about the problem you are trying to solve.
Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation?
I believe that you are worrying about the structure allocation followed by a copy of the string to the structure member? This ideally shouldn't happen (but of course, this depends on how and when you are initializng the members). C++11 supports move construction. This should take care of any extra string copies that you are worried about.
You should really, really post some code to make this discussion worthwhile :)
a vital data structure as an unstructured string with program-defined delimiters
One question: Is this string mutable? If not, you can use a slightly different data-structure. Don't store copies of parts of this vital data structure but rather indices/iterators to this string which point to the delimiters.
// assume that !, [, ], $, % etc. are your program defined delims
const std::string vital = "!id[thisisdata]$[moredata]%[controlblock]%";
// define a special struct
enum Type { ... };
struct Info {
size_t start, end;
Type type;
// define appropriate ctors
};
// parse the string and return Info obejcts
std::vector<Info> parse(const std::string& str) {
std::vector<Info> v;
// loop through the string looking for delims
for (size_t b = 0, e = str.size(); b < e; ++b) {
// on hitting one such delim create an Info
switch( str[ b ] ) {
case '%':
...
case '$;:
// initializing the start and then move until
// you get the appropriate end delim
}
// use push_back/emplace_back to insert this newly
// created Info object back in the vector
v.push_back( Info( start, end, kind ) );
}
return v;
}
I am writing a performance critical application in which I am creating large number of objects of similar type to place orders. I am using boost::singleton_pool for allocating memory. Finally my class looks like this.
class MyOrder{
std::vector<int> v1_;
std::vector<double> v2_;
std::string s1_;
std::string s2_;
public:
MyOrder(const std::string &s1, const std::string &s2): s1_(s1), s2_(s2) {}
~MyOrder(){}
static void * operator new(size_t size);
static void operator delete(void * rawMemory) throw();
static void operator delete(void * rawMemory, std::size_t size) throw();
};
struct MyOrderTag{};
typedef boost::singleton_pool<MyOrderTag, sizeof(MyOrder)> MyOrderPool;
void* MyOrder:: operator new(size_t size)
{
if (size != sizeof(MyOrder))
return ::operator new(size);
while(true){
void * ptr = MyOrderPool::malloc();
if (ptr != NULL) return ptr;
std::new_handler globalNewHandler = std::set_new_handler(0);
std::set_new_handler(globalNewHandler);
if(globalNewHandler) globalNewHandler();
else throw std::bad_alloc();
}
}
void MyOrder::operator delete(void * rawMemory) throw()
{
if(rawMemory == 0) return;
MyOrderPool::free(rawMemory);
}
void MyOrder::operator delete(void * rawMemory, std::size_t size) throw()
{
if(rawMemory == 0) return;
if(size != sizeof(Order)) {
::operator delete(rawMemory);
}
MyOrderPool::free(rawMemory);
}
I recently posted a question about performance benefit in using boost::singleton_pool. When I compared the performances of boost::singleton_pool and default allocator, I did not gain any performance benefit. When someone pointed that my class had members of the type std::string, whose allocation was not being governed by my custom allocator, I removed the std::string variables and reran the tests. This time I noticed a considerable performance boost.
Now, in my actual application, I cannot get rid of member variables of time std::string and std::vector. Should I be using boost::pool_allocator with my std::string and std::vector member variables?
boost::pool_allocator allocates memory from an underlying std::singleton_pool. Will it matter if different member variables (I have more than one std::string/std::vector types in my MyOrder class. Also I am employing pools for classes other than MyOrder which contain std::string/std::vector types as members too) use the same memory pool? If it does, how do I make sure that they do one way or the other?
Now, in my actual application, I cannot get rid of member variables of time std::string and std::vector. Should I be using boost::pool_allocator with my std::string and std::vector member variables?
I have never looked into that part of boost, but if you want to change where strings allocate their memory, you need to pass a different allocator to std::basic_string<> at compile time. There is no other way. However, you need to be aware of the downsides of that: For example, such strings will not be assignable to std::string anymore. (Although employing c_str() would work, it might impose a small performance penalty.)
boost::pool_allocator allocates memory from an underlying std::singleton_pool. Will it matter if different member variables (I have more than one std::string/std::vector types in my MyOrder class. Also I am employing pools for classes other than MyOrder which contain std::string/std::vector types as members too) use the same memory pool? If it does, how do I make sure that they do one way or the other?
The whole point of a pool is to put more than one object into it. If it was just one, you wouldn't need a pool. So, yes, you can put several objects into it, including the dynamic memory of several std::string objects.
Whether this gets you any performance gains, however, remains to be seen. You use a pool because you have reasons to assume that it is faster than the general-purpose allocator (rather than using it to, e.g., allocate memory from a specific area, like shared memory). Usually such a pool is faster because it can make assumptions on the size of the objects allocated within. That's certainly true for your MyOrder class: objects of it always have the same size, otherwise (larger derived classes) you won't allocate them in the pool.
That's different for std::string. The whole point of using a dynamically allocating string class is that it adapts to any string lengths. The memory chunks needed for that are of different size (otherwise you could just char arrays instead). I see little room for a pool allocator to improve over the general-purpose allocator for that.
On a side note: Your overloaded operator new() returns the result of invoking the global one, but your operator delete just passes anything coming its way to that pool's free(). That seems very suspicious to me.
Using a custom allocator for the std::string/std::vector in your class would work (assuming the allocator is correct) - but only performance testing will see if you really see any benefits from it.
Alternatively, if you know that the std::string/std::vector will have upper limits, you could implement a thin wrapper around a std::array (or normal array if you don't have c++11) that makes it a drop in replacement.
Even if the size is unbounded, if there is some size that most values would be less than, you could extend the std::array based implementations above to be expandable by allocating with your pooled allocator if they fill up.
I encounter a confused question when I go through the source code of firebreath (src/ScriptingCore/Variant.h)
// function pointer table
struct fxn_ptr_table {
const std::type_info& (*get_type)();
void (*static_delete)(void**);
void (*clone)(void* const*, void**);
void (*move)(void* const*,void**);
bool (*less)(void* const*, void* const*);
};
// static functions for small value-types
template<bool is_small>
struct fxns
{
template<typename T>
struct type {
static const std::type_info& get_type() {
return typeid(T);
}
static void static_delete(void** x) {
reinterpret_cast<T*>(x)->~T();
}
static void clone(void* const* src, void** dest) {
new(dest) T(*reinterpret_cast<T const*>(src));
}
static void move(void* const* src, void** dest) {
reinterpret_cast<T*>(dest)->~T();
*reinterpret_cast<T*>(dest) = *reinterpret_cast<T const*>(src);
}
static bool lessthan(void* const* left, void* const* right) {
T l(*reinterpret_cast<T const*>(left));
T r(*reinterpret_cast<T const*>(right));
return l < r;
}
};
};
// static functions for big value-types (bigger than a void*)
template<>
struct fxns<false>
{
template<typename T>
struct type {
static const std::type_info& get_type() {
return typeid(T);
}
static void static_delete(void** x) {
delete(*reinterpret_cast<T**>(x));
}
static void clone(void* const* src, void** dest) {
*dest = new T(**reinterpret_cast<T* const*>(src));
}
static void move(void* const* src, void** dest) {
(*reinterpret_cast<T**>(dest))->~T();
**reinterpret_cast<T**>(dest) = **reinterpret_cast<T* const*>(src);
}
static bool lessthan(void* const* left, void* const* right) {
return **reinterpret_cast<T* const*>(left) < **reinterpret_cast<T* const*>(right);
}
};
};
template<typename T>
struct get_table
{
static const bool is_small = sizeof(T) <= sizeof(void*);
static fxn_ptr_table* get()
{
static fxn_ptr_table static_table = {
fxns<is_small>::template type<T>::get_type
, fxns<is_small>::template type<T>::static_delete
, fxns<is_small>::template type<T>::clone
, fxns<is_small>::template type<T>::move
, fxns<is_small>::template type<T>::lessthan
};
return &static_table;
}
};
The question is why the implementation of static functions for the big value-types (bigger than void*) is different from the small ones.
For example, static_delete for small value-type is just to invoke destructor on T instance, while that for big value-type is to use 'delete'.
Is there some trick? Thanks in advance.
It looks like Firebreath uses a dedicated memory pool for its small objects, while large objects are allocated normally in the heap. Hence the different behaviour. Notice the placement new in clone() for small objects, for instance: this creates the new object in a specified memory location without allocating it. When you create an object using placement new, you must explicitly call the destructor on it before deallocating memory, and this is what static_delete() does.
Memory is not actually deallocated because, as I say, it looks like a dedicated memory pool is in use. Memory management must be performed somewhere else. This kind of memory pool is a common optimisation for small objects.
What does the internal documentation say? If the author hasn't
documented it, he probably doesn't know himself.
Judging from the code, the interface to small objects is different than
that to large objects; the pointer you pass for a small object is a
pointer to the object itself, where as the one you pass for a large
object is a pointer to a pointer to the object.
The author, however, doesn't seem to know C++ very well (and I would
avoid using any code like this). For example, in move, he explicitly
destructs the object, then assigns to it: this is guaranteed undefined
behavior, and probably won't work reliably for anything but the simplest
built-in types. Also the distinction small vs. large objects is largely
irrelevant; some “small” objects can be quite expensive to
copy. And of course, given that everything here is a template anyway,
there's absolutely no reason to use void* for anything.
I have edited your question to include a link to the original source file, since obviously most of those answering here have not read it to see what is actually going on. I admit that this is probably one of the most confusing pieces of code in FireBreath; at the time, I was trying to avoid using boost and this has worked really well.
Since then I've considered switching to boost::any (for those itching to suggest it, no, boost::variant wouldn't work and I'm not going to explain why here; ask another question if you really care) but we have customized this class a fair amount to make it exactly what we need and boost::any would be difficult to customize in a similar manner. More than anything, we've been following the old axim: if it ain't broke, don't fix it!
First of all, you should know that several C++ experts have gone over this code; yes, it uses some practices that many consider dubious, but they are very carefully considered and they are consistent and reliable on the compilers supported by FireBreath. We have done extensive testing with valgrind, visual leak detector, LeakFinder, and Rational Purify and have never found any leaks in this code. It is more than a bit confusing; it's amazing to me that people who don't understand code assume the author doesn't know C++. In this case, Christopher Diggins (who wrote the code you quoted and the original cdiggins::any class that this is taken from) seems to know C++ extremely well as evidenced by the fact that he was able to write this code. The code is used internally and is highly optimized -- perhaps more than FireBreath needs, in fact. However, it has served us well.
I will try to explain the answer to your question as best I remember; keep in mind that I don't have a lot of time and it's been awhile since I really dug in deep with this. The main reason for "small" types using a different static class is that "small" types are pretty much built-in types; an int, a char, a long, etc. Anything bigger than void* is assumed to be an object of some sort. This is an optimization to allow it to reuse memory whenever possible rather than deleting and reallocating it.
If you look at the code side-by-side it's a lot clearer. If you look at delete and clone you'll see that on "large" objects it's dynamically allocating the memory; it calls "delete" in delete and in clone it uses a regular "new". In the "small" version it just stores the memory internally and reuses it; it never "delete"s the memory, it just calls the destructor or the constructor of the correct type on the memory that it has internally. Again, this is just done for the sake of efficiency. In move on both types it calls the destructor of the old object and then assigns the new object data.
The object itself is stored as a void* because we don't actually know what type the object will be; to get the object back out you have to specify the type, in fact. This is part of what allows the container to hold absolutely any type of data. That is the reason there are so many reinterpret_cast calls there -- many people see that and say "oh, no! The author must be clueless!" However, when you have a void* that you need to dereference, that's exactly the operator that you would use.
Anyway, all of that said, cdiggins has actually put out a new version of his any class this year; I'll need to take a look at it and probably will try to pull it in to replace the current one. The trick is that I have customized the current one (primarily to add a comparison operator so it can be put in a STL container and to add convert_cast) so I need to make sure I understand the new version well enough to do that safely.
Hope that helps; the article I got it from is here: http://www.codeproject.com/KB/cpp/dynamic_typing.aspx
Note that the article has been updated and it doesn't seem to be possible to get to the old one with the original anymore.
EDIT
Since I wrote this we have confirmed some issues with the old variant class and it has been updated and replaced with one that utilizes boost::any. Thanks to dougma for most of the work on this. FireBreath 1.7 (current master branch as of the time of this writing) contains that fix.