Why this performance deterioration? - c++

I need to reduce the memory used by my native Windows C++ application, without compromising its performances.
My main data structure is composed by several thousands of instances, dynamically allocated, of the following Line class:
struct Properties
{
// sizeof(Properties) == 28
};
// Version 1
class Line
{
virtual void parse(xml_node* node, const Data& data)
{
parse_internal(node, data);
create();
}
virtual void parse_internal(xml_node*, const Data&);
void create();
Properties p;
};
But since I notice that I could get rid of the class member p, because I only need it within the parse method, I changed the Line implementation:
// Version 2
class Line
{
virtual void parse(xml_node* node, const Data& data)
{
Properties p;
parse_internal(node, data, &p);
create(&p);
}
virtual void parse_internal(xml_node*, const Data&, Properties*);
void create(Properties*);
};
This reduced the memory allocated of several megabytes, but it increased the elapsed time by more than 50 milliseconds.
I wonder how is this possible considering that the application has been compiled for release version with speed optimization fully on. Is it due to the argument passing? Is it due to the stack allocation of my struct Properties?
Update:
The method Line::parse is called just once for each instance. The data structure is composed by a std::vector of Lines. Multiple threads manage a different subset of this vector.

You write that parse_internal is recursive. That means it gets 3 arguments in the changed variant, instead of 2 in the original - and is called recursively a few times.
You also have to access members using pointer syntax instead of element de-referencing (and possibly verify that the Properties pointer is non-null). To eliminate the pointer issues you can use a reference argument to parse_internal.
Is there a reason to have parse_internal as a virtual member function, or could you change it to be static (in the modified variant)?

Related

SystemC/TLM (C++) sharing memory pool; static members, static methods, Singleton or?

Context:
I am writing a specific communication protocol to be used between TLM models (HW blocks described with SystemC and thus C++).
TLM notion is not important, just note that this communication is mimicked by allocating objects, the generic payloads (gps), that are passed between these C++ models of HW blocks.
Aim:
Together with the protocol, I want to provide a memory manager that should be able to efficiently handle the gps; this is quite important since in one simulation lots of gps are constructed, used and destroyed and this can slow down things a lot.
My goal is also to create something simple that could be used by others without efforts.
Issues:
The first issue I had was in creating a single shared pool for all the blocks communicating with that protocol. I thought about creating a static member in the mm class, but then I realized that:
Static members require a definition in the cpp. This makes the mm class less intuitive to use (with different people using this, some will forget to do so) and I would prefer to avoid that.
Depending on where (and in which?) in the cpp file the static variable definition is done, the pool might not have wet the parameters needed to be initialized (i.e., the number of mm instances created).
The second issue is similar to the first one. I want to count the number of instances and thus instead of a pool I need to create a shared counter to be used then by the pool to initialize itself. Again, I wanted to avoid static variable definitions in a cpp file and to guarantee the order of initialization.
I have considered mainly:
static members (discarded for the reasons above)
Singletons (discarded because I don't need to create a whole class for the pool to make it visible by others and single-instanced)
static methods (the approaches I finally picked and that is not far from a complete Singleton)
This is the code I produced (only relevant part included):
/**
* Helper class to count another class' number of instances.
*/
class counter {
public:
// Constructor
counter() : count(0) {}
//Destructor
virtual ~counter() {}
private:
unsigned int count;
public:
unsigned int get_count() {return count;}
void incr_count() {count++;}
void decr_count() {count--;}
};
template <unsigned int MAX = 1>
class mm: public tlm::tlm_mm_interface {
//////////////////////////////TYPEDEFS AND ENUMS/////////////////////////////
public:
typedef tlm::tlm_generic_payload gp_t;
///////////////////////////CLASS (CON/DE)STRUCTOR////////////////////////////
public:
// Constructor
mm() {inst_count().incr_count();}
// Copy constructor
mm(const mm&) {inst_count().incr_count();}
// Destructor
virtual ~mm() {} // no need to decrease instance count in our case
////////////////////////////////CLASS METHODS////////////////////////////////
public:
// Counter for number of isntances.
static counter& inst_count() {
static counter cnt;
return cnt;
}
/* This pattern makes sure that:
-- 1. The pool is created only when the first alloc appears
-- 2. All instances of mm have been already created (known instance sequence)
-- 3. Only one pool exists */
static boost::object_pool<gp_t>& get_pool() {
static boost::object_pool<gp_t> p(
mm<MAX>::inst_count().get_count() * MAX / 2, // creation size
mm<MAX>::inst_count().get_count() * MAX // max size used
);
return p;
}
// Allocate
virtual gp_t* allocate() {
//...
return gp;
}
// Free the generic payload and data_ptr
virtual void free(gp_t* gp) {
//...
get_pool().destroy(gp);
}
}
Now, the initiator block class header should have a member:
mm m_mm;
And the initiator block class cpp should use this like:
tlm_generic_payload* gp;
gp = m_mm.allocate();
//...
m_mm.free(gp); // In truth this is called by gp->release()...
// ...not important here
Having an electronic HW background, I am mainly trying to improve coding style, learn new approaches and optimize speed/memory allocation.
Is there a better way to achieve this? In particular considering my doubts:
It seems to me a not optimal workaround to encapsulate the counter in a class, put it locally (but static) in a static method and then do the same for the pool.
even though SystemC "simulation kernel" is single-threaded, I need to consider a multithread case...I am not sure that the relationship between those two static methods is safe even thou independently they should be safe...with C++03 g++ adds code to guarantee it and with C++11:
ยง6.7 [stmt.dcl] p4 If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
Thanks in advance.

What is the proper way to handle a large number of interface implementations?

For one of my current projects I have an interface defined for which I have a large number of implementations. You could think of it as a plugin interface with many plugins.
These "plugins" each handle a different message type in a network protocol.
So when I get a new message, I loop through a list of my plugins, see who can handle it, and call into them via the interface.
The issue I am struggling with is how to allocate, initialize, and "load" all the implementations into my array/vector/whatever.
Currently I am declaring all of the "plugins" in main(), then calling an "plugin_manager.add_plugin(&plugin);" for each one. This seems less than ideal.
So, the actual questions:
1. Is there a standardized approach to this sort of thing?
2. Is there any way to define an array (global?) pre-loaded with the plugins?
3. Am I going about this the wrong way entirely? Are there other (better?) architecture options for this sort of problem?
Thanks.
EDIT:
This compiles (please excuse the ugly code)... but it kind of seems like a hack.
On the other hand, it solves the issue of allocation, and cleans up main()... Is this a valid solution?
class intf
{
public:
virtual void t() = 0;
};
class test : public intf
{
public:
test(){}
static test* inst(){ if(!_inst) _inst = new test; return _inst; }
static test* _inst;
void t(){}
};
test* test::_inst = NULL;
intf* ints[] =
{
test::inst(),
NULL
};
Store some form of smart pointer in a container. Dynamically allocate the plugins and register them in the container so that they can be used later.
One possible approach for your solution would be, if you have some form of message id that the plugin can decode, to use a map from that id to the plugin that handles that. This approach allows you to have fast lookup of the plugin given the input message.
One way of writing less code would be to use templates for the instantiation function. Then you only need to write one and put it in the interface, instead of having one function per implementation class.
class intf
{
public:
virtual void t() = 0;
template<class T>
static T* inst()
{
static T instance;
return &instance;
}
};
class test : public intf { ... };
intf* ints[] =
{
intf::inst<test>(),
NULL
};
The above code also works around two bugs you have in your code: One is a memory leak, in your old inst() function you allocate but you never free; The other is that the constructor sets the static member to NULL.
Other tips is to read more about the "singleton" pattern, which is what you have. It can be useful in some situations, but is generally advised against.

Organizing static data in C++

I'm working on some embedded software where there is some static information about "products". Since the information for a certain product never changes during execution I would like to initialize these data structures at compile time to save some space on the stack/heap.
I made a Product class for the data, intending to make a huge array of all the products in the system and then do lookups in this structure, but I haven't figured out quite how to get it working. The arrays are giving me loads of trouble. Some psuedo code:
class Product {
int m_price;
int m_availability[]; // invalid, need to set a size
... etc
// Constructor grabbing values for all members
Product(int p, int a[], ...);
}
static const Product products[] =
{
Product(99, {52,30,63, 49}, ...), // invalid syntax
...
}
Is there a way to making something like this work? The only thing I can think of would be to organize by attribute and skip the whole Product object. I feel that would make the whole thing harder to understand and maintain though.
Does anyone have any suggestions on how I might best organize this kind of data?
Thank you.
An old school C style static array of structs sounds like a perfect match to your requirements. Initializes at compile time, zero runtime overhead, no use of stack or heap. It's not a co-incidence that C is still a major player in the embedded world.
So (one recipe - plenty of scope to change the details of this);
// in .h file
class Product {
public: // putting this first means the class is really a struct
int m_price;
int m_availability[4];
//.... (more)
};
extern const Product product_array[];
extern const int product_array_nbr;
// in .cpp file
const Product product_array[] =
{
{
23,
{56,1,2,4},
//....(more)
},
{
24,
{65,1,2,4},
//....(more)
},
//....(more)
};
const int product_array_nbr = sizeof(product_array)/sizeof(product_array[0]);
A couple of years ago when I was working in embedded we needed to explicitly control the memory allocation of our structures.
Imagine this type of struct :
.h file
template<class T,uint16 u16Entries>
class CMemoryStruct
{
public:
/**
*Default c'tor needed for every template
*/
CMemoryStruct(){};
/**
*Default d'tor
*/
~CMemoryStruct(){};
/**
*Array which hold u16Entries of T objects. It is defined by the two template parameters, T can be of any type
*/
static T aoMemBlock[u16Entries];
/**
*Starting address of the above specified array used for fast freeing of allocated memory
*/
static const void* pvStartAddress;
/**
*Ending address of the above specified array used for fast freeing of allocated memory
*/
static const void* pvEndAddress;
/**
*Size of one T object in bytes used for determining the array to which the necessary method will be invoked
*/
static const size_t sizeOfEntry;
/**
*Bitset of u16Entries which has the same size as the Array of the class and it is used to specify whether
*a particular entry of the templated array is occupied or not
*/
static std::bitset<u16Entries> oVacancy;
};
/**
*Define an array of Type[u16Entries]
*/
template<class Type,uint16 u16Entries> Type CMemoryStruct<Type,u16Entries>::aoMemBlock[u16Entries];
/**
*Define a const variable of a template class
*/
template<class Type,uint16 u16Entries> const void* CMemoryStruct<Type,u16Entries>::pvStartAddress=&CMemoryStruct<Type,u16Entries>::aoMemBlock[0];
template<class Type,uint16 u16Entries> const void* CMemoryStruct<Type,u16Entries>::pvEndAddress=&CMemoryStruct<Type,u16Entries>::aoMemBlock[u16Entries-1];
template<class Type,uint16 u16Entries> const size_t CMemoryStruct<Type,u16Entries>::sizeOfEntry=sizeof(Type);
/**
*Define a bitset inside a template class...
*/
template<class Type,uint16 u16Entries> std::bitset<u16Entries> CMemoryStruct<Type,u16Entries>::oVacancy;
Depending on your compiler and environment you could manipulate the area of where the static allocation take place. In our case we moved this to the ROM which was plenty. Also note that depending on your compiler i.e. Greenhills compilers, you may need to use the export keyword and define your static members to the .cpp file.
You can use the start and end pointers to navigate through the data. If your compiler supports full STL you may want to use std::vectors with custom allocators and overloaded new operators which would save your memory to somewhere else than the stack. In our case the new operators were overloaded in such a way that all the memory allocation was done on predefined memory structures.
Hope I gave you an idea.
In C++98/03, you cannot initialize arrays in a constructor initializer.
In C++11, this has been fixed with uniform initialization:
class Product
{
int m_availability[4];
public:
Product() : m_availability{52,30,63, 49} { }
};
If you need the data to be provided in the constructor, use a vector instead:
class Product
{
const std::vector<int> m_availability;
public:
Product(std::initializer_list<int> il) : m_availability(il) { }
};
Usage:
extern const Product p1({1,2,3});
Memory for the static variables is still reserved when the code is actually executing -- you won't be saving space on the stack. You might want to consider use of vectors instead of arrays -- they're easier to pass and process.

Most effective method of executing functions an in unknown order

Let's say I have a large, between 50 and 200, pool of individual functions whose job it is to operate on a single object and modify it. The pool of functions is selectively put into a single array and arranged in an arbitrary order.
The functions themselves take no arguments outside of the values present within the object it is modifying, and in this way the object's behavior is determined only by which functions are executed and in what order.
A way I have tentatively used so far is this, which might explain better what my goal is:
class Behavior{
public:
virtual void act(Object * obj) = 0;
};
class SpecificBehavior : public Behavior{
// many classes like this exist
public:
void act(Object * obj){ /* do something specific with obj*/ };
};
class Object{
public:
std::list<Behavior*> behavior;
void behave(){
std::list<Behavior*>::iterator iter = behavior.front();
while(iter != behavior.end()){
iter->act(this);
++iter;
};
};
};
My Question is, what is the most efficient way in C++ of organizing such a pool of functions, in terms of performance and maintainability. This is for some A.I research I am doing, and this methodology is what most closely matches what I am trying to achieve.
edits: The array itself can be changed at any time by any other part of the code not listed here, but it's guaranteed to never change during the call to behave(). The array it is stored in needs to be able to change and expand to any size
If the behaviour functions have no state and only take one Object argument, then I'd go with a container of function objects:
#include <functional>
#include <vector>
typedef std::function<void(Object &)> BehaveFun;
typedef std::vector<BehaveFun> BehaviourCollection;
class Object {
BehaviourCollection b;
void behave() {
for (auto it = b.cbegin(); it != b.cend(); ++it) *it(*this);
}
};
Now you just need to load all your functions into the collection.
if the main thing you will be doing with this collection is iterating over it, you'll probably want to use a vector as dereferencing and incrementing your iterators will equate to simple pointer arithmetic.
If you want to use all your cores, and your operations do not share any state, you might want to have a look at a library like Intel's TBB (see the parallel_for example)
I'd keep it exactly as you have it.
Perofmance should be OK (there may be an extra indirection due to the vtable look up but that shouldn't matter.)
My reasons for keeping it as is are:
You might be able to lift common sub-behaviour into an intermediate class between Behaviour and your implementation classes. This is not as easy using function pointers.
struct AlsoWaveArmsBase : public Behaviour
{
void act( Object * obj )
{
start_waving_arms(obj); // Concrete call
do_other_action(obj); // Abstract call
end_waving_arms(obj); // Concrete call
}
void start_waving_arms(Object*obj);
void end_waving_arms(Object*obj);
virtual void do_other_actions(Object * obj)=0;
};
struct WaveAndWalk : public AlsoWaveArmsBase
{
void do_other_actions(Object * obj) { walk(obj); }
};
struct WaveAndDance : pubic AlsoWaveArmsBase
{
void do_other_actions(Object * obj) { walk(obj); }
}
You might want to use state in your behaviour
struct Count : public Behavior
{
Behaviour() : i(0) {}
int i;
void act(Object * obj)
{
count(obj,i);
++i;
}
}
You might want to add helper functions e.g. you might want to add a can_act like this:
void Object::behave(){
std::list<Behavior*>::iterator iter = behavior.front();
while(iter != behavior.end()){
if( iter->can_act(this) ){
iter->act(this);
}
++iter;
};
};
IMO, these flexibilities outweigh the benefits of moving to a pure function approach.
For maintainability, your current approach is the best (virtual functions). You might get a tiny little gain from using free function pointers, but I doubt it's measurable, and even if so, I don't think it is worth the trouble. The current OO approach is fast enough and maintainable. The little gain I'm talking about comes from the fact that you are dereferencing a pointer to an object and then (behind the scenes) dereferencing a pointer to a function (which happening as the implementation of calling a virtual function).
I wouldn't use std::function, because it's not very performant (though that might differ between implementations). See this and this. Function pointers are as fast as it gets when you need this kind of dynamism at runtime.
If you need to improve the performance, I suggest to look into improving the algorithm, not this implementation.

Extending a class and maintaining binary backward compatibility

I'm trying to add new functionality to an existing library. I would need to add new data to a class hierarchy so that the root class would have accessors for it. Anyone should be able to get this data only sub-classes could set it (i.e. public getter and protected setter).
To maintain backward compatibility, I know I must not do any of the following (list only includes actions relevant to my problem):
Add or remove virtual functions
Add or remove member variables
Change type of existing member variable
Change signature of existing function
I can think of two ways to add this data to hierarchy: adding a new member variable to root class or adding pure virtual accessor functions (so that data could be stored in sub-classes). However, to maintain backward compatilibity I can not do either of these.
The library is using extensively pimpl idiom but unfortunately the root class I have to modify does not use this idiom. Sub-classes, however, use this idiom.
Now only solution that I can think of is simulating member variable with static hash-map. So I could create a static hash-map, store this new member to it, and implement static accessors for it. Something like this (in pseudo c++):
class NewData {...};
class BaseClass
{
protected:
static setNewData(BaseClass* instance, NewData* data)
{
m_mapNewData[instance] = data;
}
static NewData* getNewData(BaseClass* instance)
{
return m_mapNewData[instance];
}
private:
static HashMap<BaseClass*, NewData*> m_mapNewData;
};
class DerivedClass : public BaseClass
{
void doSomething()
{
BaseClass::setNewData(this, new NewData());
}
};
class Outside
{
void doActions(BaseClass* action)
{
NewData* data = BaseClass::getNewData(action);
...
}
};
Now, while this solution might work, I find it very ugly (of course I could also add non-static accessor functions but this wouldn't remove the ugliness).
Are there any other solutions?
Thank you.
You could use the decorator pattern. The decorator could expose the new data-elements, and no change to the existing classes would be needed. This works best if clients obtain their objects through factories, because then you can transparently add the decorators.
Finally, check binary compatibility using automated tools like abi-compliance-checker.
You can add exported functions (declspec import/export) without affecting binary compatibility (ensuring you do not remove any current functions and add your new functions at the end), but you cannot increase the size of the class by adding new data members.
The reason you cannot increase the size of the class is that for someone that compiled using the old size but uses the newly extended class would mean that the data member stored after your class in their object (and more if you add more than 1 word) would get trashed by the end of the new class.
e.g.
Old:
class CounterEngine {
public:
__declspec(dllexport) int getTotal();
private:
int iTotal; //4 bytes
};
New:
class CounterEngine {
public:
__declspec(dllexport) int getTotal();
__declspec(dllexport) int getMean();
private:
int iTotal; //4 bytes
int iMean; //4 bytes
};
A client then may have:
class ClientOfCounter {
public:
...
private:
CounterEngine iCounter;
int iBlah;
};
In memory, ClientOfCounter in the old framework will look something like this:
ClientOfCounter: iCounter[offset 0],
iBlah[offset 4 bytes]
That same code (not recompiled but using your new version would look like this)
ClientOfCounter: iCounter[offset 0],
iBlah[offset 4 bytes]
i.e. it doesn't know that iCounter is now 8 bytes rather than 4 bytes, so iBlah is actually trashed by the last 4 bytes of iCounter.
If you have a spare private data member, you can add a Body class to store any future data members.
class CounterEngine {
public:
__declspec(dllexport) int getTotal();
private:
int iTotal; //4 bytes
void* iSpare; //future
};
class CounterEngineBody {
private:
int iMean; //4 bytes
void* iSpare[4]; //save space for future
};
class CounterEngine {
public:
__declspec(dllexport) int getTotal();
__declspec(dllexport) int getMean() { return iBody->iMean; }
private:
int iTotal; //4 bytes
CounterEngineBody* iBody; //now used to extend class with 'body' object
};
If your library is open-source then you can request to add it to the upstream-tracker. It will automatically check all library releases for backward compatibility. So you can easily maintain your API.
EDIT: reports for qt4 library are here.
It is hard to maintain binary compatibility - it is much easier to maintain only interface compatibility.
I think that the only reasonable solution is to break supporting current library and redesign it to only export pure virtual interfaces for classes.
That interfaces could never be modified in the future, but you can add new interfaces.
In that interfaces you could only use primitive types like pointers and specified size integers or floats. You should not have interfaces with for example std::strings or other non-primitive types.
When returning pointers to data allocated in DLL, you need to provide a virtual method for deallocation, so that the application deallocates the data using DLL's delete.
Adding data members to the root will break binary compatibility (and force a rebuild, if that is your concern), but it won't break backward compatibility and neither will adding member functions (virtual or not). Adding new member functions is the obvious way to go.