I have managed to use this bullet physics struct, by it seems that the data it takes must be kept in memory.
Is there any reason for that? Am I using it right?
Bullet Physics library prefer to passes pointers to objects back and forth. When you created your BT structure, and pass it to some other BT function, then it will not "copy" the data locally to itself. It will just grab a pointer to it, then act on it, then that's it.
This is due to performance reasons: Copying such large data structures to local variables would be a waste of memory, and cpu cycles.
To answer the your last two questions:
Yes, I think you use it right. Keep your instantiated BT structures around until they are not needed anymore (e.g.: rigid body is removed from world, or application exit).
Related
One of our projects deals with tons of data. It selects data from an database and serializes the results into JSON/XML.
Sometimes the amount of selected rows can reach the 50 million mark easily.
However though, the runtime of the program was to bad in the beginning.
So we have refactored the program with one major adjustment:
The working objects for serialization wouldn't be recreated for every single row, instead the object will be cleared and reinitialized.
For example:
Before:
For every single database row we create an object of DatabaseRowSerializer and call the specific serialize function.
// Loop with all dbRows
{
DatabaseRowSerializer serializer(dbRow);
result.add(serializer.toXml());
}
After:
The constructor of DatabaseRowSerializer doesn't sets the dbRow. Instead this will be done by the initDbRow()-function.
The main thing here is, that only one object will be used for the whole runtime. After the serialization of an dbRow, the clear()-function
will be called to reset the object.
DatabaseRowSerializer serializer;
// Loop with all dbRows
{
serializier.initDbRow(dbRow);
result.add(serializer.toXml());
serializier.clear();
}
So my question:
Is this really a good way to handle the problem?
In my opinion init()-functions aren't really smart. And normally a constructor should be used to initialize the possible parameters.
Which way do you generally prefer? Before or after?
On the one hand, this is subjective. On the other, opinion widely agrees that in C++ you should avoid this "init function" idiom because:
It is worse code
You have to remember to "initialise" your object and, if you don't, what state is it in? Your object should never be in a "dead" state. (Don't get me started on "moved-from" objects…) This is why C++ introduced constructors and destructors, because the old C approach was kind of minging and resulting programs are harder to prove correct.
It is unnecessary
There is essentially no overhead in creating a DatabaseRowSerializer every time, unless its constructor does more than your initDbRow function, in which case your two examples are not equivalent anyway.
Even if your compiler doesn't optimise away the unnecessary "allocation", there isn't really an allocation anyway because the object just takes up space on the stack and it has to do that regardless.
So if this change really solved your performance problem, something else was probably going on.
Use your constructors and destructors. Freely and proudly!
That's the common advice when writing C++.
A possible third approach if you did want to make the serializer re-usable for whatever reason, is to move all of its state into the actual operational function call:
DatabaseRowSerializer serializer;
// loop with all dbRows
{
result.add(serializer.toXml(dbRow));
}
You might do this if the serialiser has some desire to cache information, or re-use dynamically-allocated buffers, to aid in performance. That of course adds some state into the serialiser.
If you do this and still don't have any state, then the whole thing can just be a static call:
// loop with all dbRows
{
result.add(DatabaseRowSerializer::toXml(dbRow));
}
…but then it may as well just be a function.
Ultimately we can't know exactly what's best for you, but there are plenty of options and considerations.
Generally I agree with the points raised by LRiO in the other answer.
Just moving the constructor out of the loop isn't a good idea.
However, for this style of loop body:
feed object some data
transform data within object
return transformed data from object
It is, IMHO, often the case that the transforming object will allocate some buffers (on the heap) that potentially can be reused when the second form with the init function is used. In naive implementations, this reuse may not even be deliberate, just a side effect of the implementation.
So, IFF you're seeing a speed up by your refactoring (hoisting the object constructor out of the loop), it may be because the object is now able to re-use some buffers and avoid repeated "redundant" heap allocations for these buffers.
So, in summary:
You do not want the constructor to be hoisted out of the loop for its own sake. But you want all buffers that can be preserved to be preserved across the loop iterations.
First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.
Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!
My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.
I'm writing a program that for performance reasons uses shared memory (sockets and pipes as alternatives have been evaluated, and they are not fast enough for my task, generally speaking any IPC method that involves copies is too slow). In the shared memory region I am writing many structs of a fixed size. There is one program responsible for writing the structs into shared memory, and many clients that read from it. However, there is one member of each struct that clients need to write to (a reference count, which they will update atomically). All of the other members should be read only to the clients.
Because clients need to change that one member, they can't map the shared memory region as read only. But they shouldn't be tinkering with the other members either, and since these programs are written in C++, memory corruption is possible. Ideally, it should be as difficult as possible for one client to crash another. I'm only worried about buggy clients, not malicious ones, so imperfect solutions are allowed.
I can try to stop clients from overwriting by declaring the members in the header they use as const, but that won't prevent memory corruption (buffer overflows, bad casts, etc.) from overwriting. I can insert canaries, but then I have to constantly pay the cost of checking them.
Instead of storing the reference count member directly, I could store a pointer to the actual data in a separate mapped write only page, while keeping the structs in read only mapped pages. This will work, the OS will force my application to crash if I try to write to the pointed to data, but indirect storage can be undesirable when trying to write lock free algorithms, because needing to follow another level of indirection can change whether something can be done atomically.
Is there any way to mark smaller areas of memory such that writing them will cause your app to blow up? Some platforms have hardware watchpoints, and maybe I could activate one of those with inline assembly, but I'd be limited to only 4 at a time on 32-bit x86 and each one could only cover part of the struct because they're limited to 4 bytes. It'd also make my program painful to debug ;)
Edit: I found this rather eye popping paper, but unfortunately it requires using ECC memory and a modified Linux kernel.
I don't think its possible to make a few bits read only like that at the OS level.
One thing that occurred to me just now is that you could put the reference counts in a different page like you suggested. If the structs are a common size, and are all in sequential memory locations you could use pointer arithmetic to locate a reference count from the structures pointer, rather than having a pointer within the structure. This might be better than having a pointer for your use case.
long *refCountersBase;//The start address of the ref counters page
MyStruct *structsBase;//The start address of your structures page
//get address to reference counter
long *getRefCounter(MyStruct *myStruct )
{
size_t n = myStruct - structsBase;
long *ref = refCountersBase + n;
return ref;
}
You would need to add a signal handler for SIGSEGV which recovers from the exception, but only for certain addresses. A starting point might be http://www.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html and the corresponding documentation for your OS.
Edit: I believe what you want is to perform the write and return if the write address is actually OK, and tail-call the previous exception handler (the pointer you get when you install your exception handler) if you want to propagate the exception. I'm not experienced in these things though.
I have never heard of enforcing read-only at less than a page granularity, so you might be out of luck in that direction unless you can put each struct on two pages. If you can afford two pages per struct you can put the ref count on one of the pages and make the other read-only.
You could write an API rather than just use headers. Forcing clients to use the API would remove most corruption issues.
Keeping the data with the reference count rather than on a different page will help with locality of data and so improve cache performance.
You need to consider that a reader may have a problem and fail to properly update its ref count. Also that the writer may fail to complete an update. Coping with these things requires extra checks. You can combine such checks with the API. It may be worth experimenting to measure the performance implications of some kind of integrity checking. It may be fast enough to keep a checksum, something as simple as adler32.