I know this is strange but I'm just having fun.
I am attempting to transmit a std::map (instantiated using placement new in a fixed region of memory) between two processes via a socket between two machines: Master and Slave. The map I'm using has this typedef:
// A vector of Page objects
typedef
std::vector<Page*,
PageTableAllocator<Page*> >
PageVectorType;
// A mapping of binary 'ip address' to a PageVector
typedef
std::map<uint32_t,
PageVectorType*,
std::less<uint32_t>,
PageTableAllocator<std::pair<uint32_t, PageVectorType*> > >
PageTableType;
The PageTableAllocator<T> class is responsible for allocating whatever memory the STL containers may want/need into a fixed location in memory. E.g., all Page objects and STL internal structures are being instantiated in this fixed memory region. This ensures that both the std::map object and the allocator are both placed in a fixed region of memory. I've used GDB to make sure the map and allocator behave correctly (all memory used is in the fixed region, nothing ever goes on the application's normal heap).
Assuming Master starts up, initializes all of it's STL structures and the special memory region, the following happens. Slave starts, prints out its version of the page table, then looks for a Master. Slave finds a master, deletes its version of the page table, copies Master's version of the page table (and the special memory region), and successfully prints it out the Master's version of the page table. From what prodding I've done in GDB I can perform many read-only operations.
When trying to add to the newly copied PageTableType object, Slave faults in the allocator's void construct (pointer p, const T& value) method. The value passed in as p points to an already allocated area of memory (as per Master's version of the std::map).
I don't know anything about C++ object structure, but I'm guessing that object state from Slave's version of the PageTableType must be hanging around even after I replace all of the memory that the PageTableType and its allocator used. My question is if this is a valid concern. Does C++ maintain some sort of object state outside of the area of memory that object was instantiate din?
All of the objects used in the map are non-POD. Same is true for the allocator.
To answer your specific question:
Does C++ maintain some sort of object state outside of the area of memory that object was instantiated in?
The answer is no. There are no other data structures set up to "track" objects or anything of the sort. C++ uses an explicit memory allocation model, so if you choose to be responsible for allocation and deallocation, then you have complete control.
I suspect there's something wrong in your code somewhere, but since you believe the code is correct you're inventing some other reason why your code might be failing, and following that path instead. I would pull back, and carefully examine everything about the way your code is working right now, and see if you can determine the problem. Although the STL classes are complex (especially std::map), they're ultimately just code and there is no hidden magic in there.
Related
I'm developing a game server for a video game called Tibia.
Basically, there can be up to millions of objects, of which there can be up to thousands of deletes and re-creations as players interact with the game world.
The thing is, the original creators used a Slot Map / Object Pool on which pointers are re-used when an object is removed. This is a huge performance boost since there's no need to do much memory reallocation unless needed.
And of course, I'm trying to accomplish that myself, but I've come into one huge problem with my Slot Map:
Here's just a few explanation of how Slot Map works according to a source I found online:
Object class is the base class for every game object, my Slot Map / object Pool is using this Object class to save every allocated object.
Example:
struct TObjectBlock
{
Object Object[36768];
};
The way the slot map works is that, the server first allocates, say, 36768 objects in a list of TObjectBlock and gives them a unique ID ObjectID for each Object which can be re-used in a free object list when the server needs to create a new object.
Example:
Object 1 (ID: 555) is deleted, it's ID 555 is put in a free object ID
list, an Item creation is requested, ID 555 is reused since it's on
the free object list, and there is no need to reallocate another
TObjectBlock in the array for further objects.
My problem: How can I use "Player" "Creature" "Item" "Tile" to support this Slot Map? I don't seem to come up with a solution into this logic problem.
I am using a virtual class to manage all objects:
struct Object
{
uint32_t ObjectID;
int32_t posx;
int32_t posy;
int32_t posz;
};
Then, I'd create the objects themselves:
struct Creature : Object
{
char Name[31];
};
struct Player : Creature
{
};
struct Item : Object
{
uint16_t Attack;
};
struct Tile : Object
{
};
But now if I was to make use of the slot map, I'd have to do something like this:
Object allocatedObject;
allocatedObject.ObjectID = CreateObject(); // Get a free object ID to use
if (allocatedObject.ObjectID != INVALIDOBJECT.ObjectID)
{
Creature* monster = new Creature();
// This doesn't make much sense, since I'd have this creature pointer floating around!
monster.ObjectID = allocatedObject.ObjectID;
}
It pretty much doesn't make much sense to set a whole new object pointer the already allocated object unique ID.
What are my options with this logic?
I believe you have a lot of tangled concepts here, and you need to detangle them to make this work.
First, you are actually defeating the primary purpose of this model. What you showed smells badly of cargo cult programming. You should not be newing objects, at least without overloading, if you are serious about this. You should allocate a single large block of memory for a given object type and draw from that on "allocation" - be it from an overloaded new or creation via a memory manager class. That means you need separate blocks of memory for each object type, not a single "objects" block.
The whole idea is that if you want to avoid allocation-deallocation of actual memory, you need to reuse the memory. To construct an object, you need enough memory to fit it, and your types are not the same length. Only Tile in your example is the same size as Object, so only that could share the same memory (but it shouldn't). None of the other types can be placed in the objects memory because they are longer. You need separate pools for each type.
Second, there should be no bearing of the object ID on how things are stored. There cannot be, once you take the first point into consideration, if the IDs are shared and the memory is not. But it must be pointed out explicitly - the position in a memory block is largely arbitrary and the IDs are not.
Why? Let's say you take object 40, "delete" it, then create a new object 40. Now let's say some buggy part of the program referenced the original ID 40. It goes looking for the original 40, which should error, but instead finds the new 40. You just created an entirely untrackable error. While this can happen with pointers, it is far more likely to happen with IDs, because few systems impose checks on ID usage. A main reason for indirecting access with IDs is to make access safer by making it easy to catch bad usage, so by making IDs reusable, you make them just as unsafe as storing pointers.
The actual model for handling this should look like how the operating system does similar operations (see below the divide for more on that...). That is to say, follow a model like this:
Create some sort of array (like a vector) of the type you want to store - the actual type, not pointers to it. Not Object, which is a generic base, but something like Player.
Size that to the size you expect to need.
Create a stack of size_t (for indexes) and push into it every index in the array. If you created 10 objects, you push 0 1 2 3 4 5 6 7 8 9.
Every time you need an object, pop an index from the stack and use the memory in that cell of the array.
If you run out of indexes, increase the size of the vector and push the newly created indexes.
When you use objects, indirect via the index that was popped.
Essentially, you need a class to manage the memory.
An alternative model would be to directly push pointers into a stack with matching pointer type. There are benefits to that, but it is also harder to debug. The primary benefit to that system is that it can easily be integrated into existing systems; however, most compilers do similar already...
That said, I suggest against this. It seems like a good idea on paper, and on very limited systems it is, but modern operating systems are not "limited systems" by that definition. Virtual memory already resolves the biggest reason to do this, memory fragmentation (which you did not mention). Many compiler allocators will attempt to more or less do what you are trying to do here in the standard library containers by drawing from memory pools, and those are far more manageable to use.
I once implemented a system just like this, but for many good reasons have ditched it in favor of a collection of unordered maps of pointers. I have plans to replace allocators if I discover performance or memory problems associated with this model. This lets me offset the concern of managing memory until testing/optimization, and doesn't require quirky system design at every level to handle abstraction.
When I say "quirky", believe me when I say that there are many more annoyances with the indirection-pool-stack design than I have listed.
I have two classes, similar to this:
class A
{
public:
B* ptr1;
}
class B
{
public:
std::vector<A*> list;
}
In the main implementation, I'm doing something like this:
int main() {
// there are a lot more A objects than B objects, i.e. listOfA.size() >>> listOfB.size()
std::vector<A> listOfA;
std::vector<B> listOfB;
while (//some loop)
{
listOfB[jj].list.push_back( &(listofA[ii]) );
listOfA[ii].ptr1 = &( listOfB[jj] );
}
} // int main end
Basically something like this. A lot of A objects are assigned to one B object, and these A objects are stored in that pointer vector as pointers. Additionally, each of these A objects get a pointer to the B object they belong to. For the context, I'm basically doing an Connected Components Algorithm with run-length-encoding (for image segmentation), where class A are the line segments and class B are the final objects in the image.
So, the pointers of the vector in class B all point to Objects which are stored in a regular vector. These objects should be deleted when the regular vector goes out of scope, right? I've read that a vector of pointer like in class B usually requires writing a manual destructor, but this shouldn't be the case here, I think...
The reason why I'm asking is, of course, because my code keeps crashing. I'm using an Asus Xtion Pro camera to get the images and then am performing the algorithm on every image. Weird thing is, the program crashes whenever I shake the camera a bit harder. When the camera is stationary or moved only a little or slowly, nothing happens. Also, when I use a different algorithm (also connected components, but without run-length-encoding and also doesn't use pointers), nothing crashes, no matter how much I shake the camera. Also, in Debug mode (which ran much slower than the Release mode), nothing crashed either.
I tried making a destructor for the pointer vector in class B, but it resulted in a "block is valid" error, so I guess it deleted something twice.
I also tried replacing every pointer wih a c++11 std::shared_ptr, but that only produced very irregular behaviour and the code still crashed when I shaked the camera.
I basically just want to know if in terms of memory leaking and pointer handling, the code shown above seems fine or if there are mistakes in the code which could lead to crashes.
EDIT (SOLVED): The solution (see the accepted answer) was to ensure that the vector 'listOfB' doesn't get resized during run-time, for example by using 'reserve()' to reserve enough space for it. After doing this, everything worked fine! Apparently it worked, because if the vector 'listOfB' gets resized (by push_back()), the internal memory adresses of the B instances in it are also changed, causing the pointers in the A instances (which point to B instances) to now point to the wrong adresses - and thus resulting in trouble, which lead to the crash.
About the camera shaking, apparently, shaking the camera resulted in very blurry pictures with lot of elements to segment, thus increasing the number of objects (i.e., resulting in higher size required for listOfB). So, mystery solved! Thanks a lot! :-)
I think the design is broken. listofB will grow (you do push_backs) and re-allocate its internal data array, invalidating all addresses stored in the ptrs of the A instances. The usual algorithm will grow the data size by a factor of 2 which may explain that you are good for a while if not too much data arrives. Also, as long as the memory of the old data is still in the address space of the program (especially if it is on the same memory page, for example because the new data fits in it as well), the program may not crash accessing it and just retrieve old data.
On a more constructive note: Your solution will work if you know the maximum elements in advance, which may be hard (think you get a 4k camera next year ;-)). In that case you can, by the way, just take a simple static array anyway.
Perhaps you could also use a std::map to store A objects instead of a simple vector listofA. Each A object would need a unique ID of some sort (static counter in A in the easiest case) to use as a key into the map. Bs would store keys, not addresses of As.
Assuming you have not made a mistake in how you build your network you should be fine. You would have to post more code to assess that. Also you can not use either of the vectors after you change one of them because if they reallocate their members all pointers pointing to them are invalidated. But using raw pointers to managed objects is the correct way to build networks.
By managed objects I mean objects whose lifetime is guaranteed to last longer than the network and whose memory will be automatically released. So they should be elements of a container or objects managed by some kind of smart pointer.
However it sounds like you have a hardware problem.
I am trying to solve the following issue: having a custom data container that manages a generic type, I need to allow for other application components to retrieve the container's internal pointer and use it as if it were a simple T* array region (without treating it as a more intelligent array holder). The problem is that this memory is, in a very special case, moved somewhere else and erased. So there are a plethora of components that are aware of the old data pointer and will use that one to access their required information.
The setup looks, pseudo-codeish, something like this:
container<T>
{
T* ptr;
public:
ContainerInterfaceCode..
}
Hypothesis:
T* ptr is a pseudo-address (may I call it "virtual"?) which is mapped in a physical space A.
When an event is risen, T* ptr's mapping will be set for another physical space, B.
Any component that uses T* ptr is then oblivious of the change of physical location, "thinking" its data is stored at that virtual address.
Conclusion:
I would therefore like to know whether there is a mechanism involving memory mapping (virtual to physical) that will allow to juggle with the mapping of the T* ptr, thus leaving other application components untouched. Simply put, T* ptr should point to a memory region that gets mapped in a certain part, and, upon request, that same pointer will be mapped in another place (where the underlying data is to be copied for consistency). This must allow seamless transitions.
Note: I can't use wrappers, smart pointers, handles, etc. for the simple fact that it means modifying a huge codebase just for one, rather minor, modification.
As I haven't found enough resources dealing with this scenario, can anyone, perhaps, present a short webography with some relevant reading material on the subject?
In linux you can used the shared memory.The shared memory is a mechanism which allow two process access a same area of memory, it is a kind of IPC method. You can find some more
details here http://en.wikipedia.org/wiki/Shared_memory.
Of course I would like to know some magic fix to this but I am open to restructuring.
So I have a class DeviceDependent, with the following constructor
DeviceDependent(Device& device);
which stores a reference to the device. The device can change state, which will necessitate a change in all DeviceDependent instances dependent on that device. (You guessed it this is my paltry attempt to ride the directX beast)
To handle this I have the functions DeviceDependent::createDeviceResources(), DeviceDependent::onDeviceLost().
I planned to register each DeviceDependentinstance to the device specified in the DeviceDependent constructor. The Device would keep a std::vector<DeviceDependent*> of all DeviceDependent instances so registered. It would then iterate through that vector and called the above functions when appropriate.
This seemed simple enough, but what I especially liked about it was that I could have a std::vector<DeviceDependent (or child)> somewhere else in the code and iterate over them quickly. For instance I have a class Renderable which as the name suggest represents a renderable object, I need to iterate over this once a frame at least and because of this I did not want the objects to be scattered throughout memory.
Down to business, here is the problem:
When I create the solid objects I relied on move semantics. This was purely by instinct I did not consider copying large objects like these to add them to the std::vector<DeviceDependent (or child)> collection. (and still abhor the idea)
However, with move semantics (and I have tested this for those who don't believe it) the address of the object changes. What's more it changes after the default constructor is called. That means my code inside the constructor of DeviceDependant calling device.registerDeviceDependent(this) compiles and runs fine, but the device accumulates a list of pointers which are invalidated as soon as the object is moved into the vector.
I want to know if there is someway I can stick to this plan and make it work.
Things I thought of:
Making the 'real' vector a collection of shared pointers, no issue copying. The object presumably will not change address. I don't like this plan because I am afraid that leaving things out on the heap will harm iteration performance.
Calling register after the object has been moved, it's what I'm doing provisionally but I don't like it because I feel the constructor is the proper place to do this. There
should not exist an instance of DeviceDependent that is not on some device's manifest.
Writing my own move constructor or move assignment functions. This way I could remove the old address from the device and change it to the new one. I don't want to do this because I don't want to keep updating it as the class evolves.
This has nothing to do with move constructors. The issue is std::vector. When you add a new item to that vector, it may reallocate its memory, and that will cause all the DeviceDependant objects to be transferred to a new memory block internal to the vector. Then new versions of each item will be constructed, and the old ones deleted. Whether the construction is copy-construction or move-construction is irrelevant; the objects effectively change their address either way.
To make your code correct, DeviceDependant objects need to unregister themselves in their destructor, and register themselves in both copy- and move-constructors. You should do this regardless of what else you decide about storage, if you have not deleted those constructors. Otherwise those constructors, if called, will do the wrong thing.
One approach not on your list would be to prevent the vector reallocating by calling reserve() with the maximum number of items you will store. This is only practical if you know a reasonable upper-bound to the number of DeviceDependant objects. However, you may find that reserving an estimate, while not eliminating the vector reallocations entirely, makes it rare enough that the cost of un-registering and re-registering becomes insignificant.
It sounds like your goal is getting cache-coherency for the DeviceDependants. You might find that using a std::deque as main storage avoids the re-allocations while still giving enough cache-coherency. Or you could gain cache-coherency by writing a custom allocator or operator new().
As an aside, it sounds like your design is being driven by performance costs that you are only guessing at. If you actually measure it, you might find that using std::vector> is fine, and doesn't significantly the time it takes to iterate over them. (Note you don't need shared pointers here, since the vector is the only owner, so you can avoid the overheads of reference-counting.)
In the application we have something about 30 types of objects that are created repeatedly.
Some of them have long life (hours) some have short (milliseconds).
Objects could be created in one thread and destroyed in another.
Does anybody have any clue what could be good pooling technique in the sense of minimal creation/destruction latency, low lock contention and reasonable memory utilization?
Append 1.
1.1. Object pool/memory allocations for one type usually is not related to another type (see 1.3 for an exception)
1.2. Memory allocation is performed for only one type (class) at time, usually for several objects at time.
1.3. If a type aggregates another type using pointer (for some reason) these types allocated together in the one continuous piece of memory.
Append 2.
2.1. Using a collection with access serialization per type is known to be worse than new/delete.
2.2. Application is used on different platforms/compilers and cannot use compiler/platform specific tricks.
Append 3.
It becomes obvious that the fastest (with lowest latency) implementation should organize object pooling as star-like factories network. Where the central factory is global for other thread specific factories. Regular object provision/recycling is more effective to do in a thread specific factory while the central factory could be used for object balancing between threads.
3.1. What is the most effective way to organize communications between the central factory and thread specific factories?
I assume you have profile and measured your code after doing all that creation and verified that create/destroy is actually causing an issue. Else this is what you should do first.
If you still want to do the object pooling, as a first step, you should ensure your objects are stateless coz, that would be the prerequisite for reusing an object. Similarly you should ensure the members of the object and the object itself has no issue with being used from a different threads other than the one which created it. (COM STA objects / window handles etc)
If you use windows and COM, one way to use system provided pooling would be to write Free Threaded objects and enable object pooling, which will make the COM+ run time (earlier known as MTS) do this for you. If you use some other platform like Java perhaps you could use application servers that define interfaces that your objects should implement and the COM+ server could do the pooling for you.
or you could roll your own code. But you should try to find if there is pattern for this and if yes use that instead of what follows below
If you need to roll your own code, create a dynamically growable collection which tracks the objects already created. Use a vector preferrably for the collection since you would only be adding to the collection and it would be easy to traverse it searching for a free object. (assuming you do not delete objects in pool). Change the collection type according to your delete policies (vector of pointers/references to objects if you are using C++ so that delete and recreate an object at the same location)
Each object should be tracked using a flag which can be read in a volatile manner and changed using an interlock function to mark it as being used/ not used.
If all objects are used, you need to create a new object and add it to the collection. Before adding, you can acquire a lock (critical section), mark the new object as being used and exit the lock.
Measure and proceed - probably if you implemented the above collection as a class you could easily create different collections for different object types so as to reduce lock contention from threads that do different work.
Finally you could implement an overloaded class factory interface that can create all kinds of pooled objects and knows which collection holds which class
You could then optimize on this design from there.
Hope that helps.
To minimize construct/destruct latency, you need fully constructed objects at hand, so you will eliminate the new/ctor/dtor/delete time. These "free" objects can be kept in a list so you just pop/push the element at the end.
You may lock the object pools (one for each type) one by one. It is a bit more efficient than a system-wide lock, but does not have the overhead of a by-object locking.
If you haven't looked at tcmalloc, you might want to take a look. Basing your implementation off of its concepts might be a good start. Key points:
Determine a set of size classes. (Each allocation will be fulfilled by using an entry from an equal or greater sized allocation.)
Use one size-class per page. (All instances in a page are the same size.)
Use per-thread freelists to avoid atomic operations on every alloc/dealloc
When a per-thread freelist is too large, move some of the instances back to the central freelist. Try to move back allocations from the same page.
When a per-thread freelist is empty, take some from the central freelist. Try to take contiguous entries.
Important: You probably know this, but make sure your design will minimize false sharing.
Additional things you can do that tcmalloc can't:
Try to enable locality of reference by using finer-grained allocation pools. For example, if a few thousand objects will be accessed together, then it is best if they are close together in memory. (To minimize cache missed and TLB faults.) If you allocate these instances from their own threadcache, then they should have fairly good locality.
If you know in advance which instances will be long-lived and which will not, then allocate them from separate thread caches. If you do not know, then periodically copy the old instances using a threadcache for allocation and update old references to the new instances.
If you have some guess of the preferred size of the pool you can create fixed size pool using stack structure using array (the fastest possible solution). Then you need to implement four phases of object life time hard initialization (and memory allocation), soft initialization, soft cleanup and hard cleanup (and memory release). Now in pseudo code:
Object* ObjectPool::AcquireObject()
{
Object* object = 0;
lock( _stackLock );
if( _stackIndex )
object = _stack[ --_stackIndex ];
unlock( _stackLock );
if( !object )
object = HardInit();
SoftInit( object );
}
void ObjectPool::ReleaseObject(Object* object)
{
SoftCleanup( object );
lock( _stackLock );
if( _stackIndex < _maxSize )
{
object = _stack[ _stackIndex++ ];
unlock( _stackLock );
}
else
{
unlock( _stack );
HardCleanup( object );
}
}
HardInit/HardCleanup method performs full object initialization and destruction and they are executed only if the pool is empty or if the freed object cannot fit the pool because it is full. SoftIniti performs soft initialization of objects, it initializes only those aspect of objects that can be changed since it was released. SoftCleanup method free resources used by the object which should be freed as fast as possible or those resources which can become invalid during the time its owner resides in the pool. As you can see locking is minimal, only two lines of code (or only few instructions).
These four methods can be implemented in separate (template) classes so you can implement fine tuned operations per object type or usage. Also you may consider using smart pointers to automatically return object to its pool when it is no longer needed.
Have you tried the hoard allocator? It provides better performance than the default allocator on many systems.
Why do you have multiple threads destroying objects they did not create? It's a simple way to handle object lifetime, but the costs can vary widely depending on use.
Anyways, if you haven't started implementing this yet, at the very least you can put the create/destroy functionality behind an interface so that you can test/change/optimize this at a later date when you have more information about what your system actually does.