I have an algorithm - very performance sensitive that goes through a graph and make some decisions. As part of it I have to create a backtrace for the solution I find to be the best. At every step there are multiple choices and they are evaluated again and again until at the end there is only one. At every new step current backtraces could be made a subsolution for zero or more new possible solutions.
The complexity of the algorithm does not allow tricks so what I have so far is smart pointers for the nodes where the backtraces are dynamically hooked.
struct Node;
typedef boost::intrusive_ptr<Node> NodeSPtr;
struct Node
{
Node(NodePool& pool)
: mCount(0)
{
}
~Node()
{
}
size_t ref_count()
{
return mCount;
}
NodeSPtr mPreviousNode;
size_t mCount;
// ... some data
};
The problem with this is that it generates big number of small items and this slows down the algo.
The question is what other options you could suggest where both performance and memory are sensitive matters.
Related
I have a class that is a bit complex to initialize. It is basically a tree structure and to create an instance the current constructor takes the root node. Nevertheless there are some instances that will be used more often than others. I would like to make it easier for the user to instantiate this ones faster and easier. I was debating what the best option would be.
First option: using enum to choose between different options in the constructor.
enum CommonPatterns {TRIANGLE, DIAMOND};
typedef struct PatternNode {
int id;
vector<PatternNode*> child;
} PatternNode;
class Pattern {
private:
PatternNode root;
public:
//Constructor that takes the root of the tree
Pattern (PatternNode root) { this->root = root; }
//Constructor that takes enum to create some common instances
Pattern (CommonPatterns pattern)
{
PatternNode predefined_root;
if (pattern == CommonPatterns::TRIANGLE)
{
//Build tree structure for the triangle
}
else if (pattern == CommonPatterns::DIAMOND)
{
//Build tree structure for the diamond
}
Pattern(predefined_root);
}
}
Second option: predifining some static instances
Pattern.h
enum CommonPatterns {TRIANGLE, DIAMOND};
typedef struct PatternNode {
int id;
vector<PatternNode*> child;
} PatternNode;
class Pattern {
private:
PatternNode root;
static Pattern createTriangle();
static Pattern createDiamond();
public:
//Constructor that takes the root of the tree
Pattern (PatternNode root) { this->root = root; }
//Predefined common instances of patterns
const static Pattern TRIANGLE;
const static Pattern DIAMOND;
}
Pattern.cc
Pattern::Pattern createTriangle()
{
PatternNode root;
//Create the tree for the triangle
return Pattern(root);
}
Pattern::Pattern createDiamond()
{
PatternNode root;
//Create the tree for the diamond
return Pattern(root);
}
Pattern Pattern::TRIANGLE = Pattern::createTriangle();
Pattern Pattern::DIAMOND = Pattern::createDiamond();
I don't understand that well the implications of using static performance wise so I would appreciate some suggestions.
As usual when people ask for the performance benefits, the first rule of optimization of code applies: If you think, you have a performance problem, measure the performance.
So my (and many a a people's) opinion is, that you should treat this problem with other things in mind, e.g. what is more clear to the user and/or the reader of the code (which is often yourself, so be extra nice to them!) or what code structure makes it easier to test.
Unfortunately those are a bit up to opinion, so now I will share mine:
Having separate functions for these seems cleaner to me.
It means that for testing purposes you have more but smaller tests, which makes it easier to spot the exact problem, when a test fails.
Related: The constructor is smaller and hence less error prone.
For the user it is extremely specific: He gets a function in the class namespace whose name says what it does.
If you go that route, remember to document these static functions in a way that a user will stumble upon them, e.g. mention them in the class documentation and/or the constructor documentation.
Although the same holds for documentation of the enum.
Lastly let me hazard a guess regarding performance:
Although I don't expect any noticable performance issues either way, the static function version has the advantage that the compiler may optimize it more easily as it (seems to) depends only on compile-time data.
Again to really find out about performance, you would have to
measure the performance differences or --even better--
disassemble the code and see what the compiler actually did with your code.
I would like to simulate a population over time and keep a genealogy of the individuals that are still alive (I don't need to keep data about dead lineages). Generations are discrete and non-overlapping. For simplicity, let's assume that reproduction is asexual and each individual has exactly one parent. Here is a class Individual
class Individual
{
public:
size_t nbChildren;
const Individual* parent;
Individual(const Individual& parent);
};
In my Population class, I would have a vector for the current offsprings and of the current parents (the current parents being the offsprings of the previous generation).
class Population
{
private:
std::vector<Individual*> currentOffsprings;
std::vector<Individual*> currentParents;
public:
addIndividual(const Individual& parent) // Is called from some other module
{
Individual* offspring = new Individual(parent);
currentOffsprings.push_back(offspring);
}
void pruneDeadLineages() // At the end of each generation, get rid of ancestors that did not leave any offsprings today
{
// Collect the current parents that have not left any children in the current generation of offsprings
std::queue<Individual*> individualsWithoutChildren; // FIFO structure
for (auto& currentParent : currentParents)
{
if (currentParent->nbChildren() == 0)
{
individualsWithoutChildren.push(currentParent);
}
}
// loop through the FIFO to get rid of all individuals in the tree that don't have offspring in this generation
while (individualsWithoutChildren.size() != 0)
{
auto ind = individualsWithoutChildren.pop_front();
if (ind->nbChildren == 0)
{
ind->parent.nbChildren--;
if (ind->parent.nbChildren == 0)
{
individualsWithoutChildren.push(ind->parent);
}
delete ind;
}
}
}
void newGeneration() // Announce the beginning of a new generation from some other module
{
currentParents.swap(currentOffsprings); // Set offsprings as parents
currentOffsprings.resize(0); // Get rid of pointers to parents (now grand parents)
}
void doStuff() // Some time consuming function that will run each generation
{
for (auto ind : currentOffspings)
{
foo(ind);
}
}
};
Assuming that the slow part of my code will be looping through the individuals in the doStuff method, I would like to keep individual contiguous in memory and hence
std::vector<Individual*> currentOffsprings;
std::vector<Individual*> currentParents;
would become
std::vector<Individual> currentOffsprings;
std::vector<Individual> currentParents;
Now the problem is that I don't want to consume memory for ancestors that did not leave any offspring in the current generation. In other words, I don't want to keep whole vectors of length of the number of individuals per generation in the population for each generation. I thought I could implement a destructor of Individual that does nothing, so that the Individuals of the grand parent generation do not get killed at the line currentOffsprings.resize(0); in void Population::newGeneration(). Then in void Population::pruneDeadLineages(), I would explicitly delete the individuals with a method Individual::destructor() instead of using delete or Individual::~Individual().
Is it silly? Would it be memory safe (or yield to segmentation fault or memory leaks)? What other option do I have to 1) make sure that current generations individuals are contiguous in memory and 2) I can free the memory within this contiguous stretch of memory for ancestors that did not leave any offsprings?
I don't really understand why do you need to have your Individuals stored contiguously in memory.
Since you'll have to remove some of them and add others at each generation, you'll have to perform a reallocation for the whole bunch of Individuals in order to keep them contiguous in memory.
But anyway, I will not question what you want to do.
I think the easiest way is to let std::vector do the things for you. No need of pointers.
At each next generation, you move the offsprings from currentOffsprings to currentParents and you clear() the currentOffsprings.
Then, for each parent that do not have any children in the current generation, you can just use erase() from std::vector to remove them and consequently let the std::vector taking care of keeping its elements contiguous.
Better than 100 words, something like:
void Population::newGeneration()
{
currentParents.swap(currentOffsprings);
currentOffsprings.clear();
}
void Population::pruneDeadLineages()
{
currentParents.erase(std::remove_if(currentParents.begin(), currentParents.end(), [](const Individual & ind){return ind.nbChildren == 0;}), currentParents.end());
}
Of course it assumes that the parents and offsprings are defined in Population as:
std::vector<Individual> currentParents;
std::vector<Individual> currentOffsprings;
Note: As std::remove_if moves the elements to remove at the end of the container, the elements to keep stays contiguous and there is thus no reallocation when performing the erasing.
This way your two requirements (keep Individuals contiguous in memory and get rid of dead lineages) will be filled, without doing weird things with destructors,...
But as you have two std::vector, you are assured to have currentOffsprings stored contiguously in memory, same thing for currentParents.
But the two std::vectors are absolutely not guaranteed to be contiguous each other (But I think you are already aware of this and that it is not what you want).
Let me know if I have misunderstood your actual problem
I've been scouring the net looking for a container that handles this scenario best:
Linear memory (no gaps like an object pool or allocator would have)
Some way to give a reference to an object in container that remains persistent between adds/removals. Or a way to search quickly to find original objects.
Decently fast adds to end and removals from middle (but no inserts required)
So far the only solution I've been able to find is to use an std::vector and when a removal takes place I update all reference indices above the current index being removed. This just seems bad, looking for any other solution that would be more efficient.
Here is a horrible idea. I haven't tried it at all so there is probably more than a few bugs.
template <typename T>
class InsaneContainter {
public:
class MemberPointer {
friend class InsaneContainer;
size_t idx_;
InsaneContainter* parent_;
public:
MemberPointer(InsaneContainter* parent,size_t idx) idx_(idx),parent_(parent){}
T& operator*() {
parent->members[idx_];
}
};
friend class MemberPointer;
using Handle = std::shared_ptr<MemberPointer>;
Handle insert(const T& t) {
members.push_back(std::make_tuple(T{t},Handle{new MemberPointer{this,members.size()}));
return std::get<1>(members.back());
}
Handle GetHandle(size_t idx) {
return std::get<1>(members[idx]);
}
void delete(size_t idx) {
//swap with end
std::swap(members[idx],members.back());
std::get<1>(members[idx])->idx_=idx;
members.pop_back();
}
private:
std::vector<std::tuple<T,std::shared_ptr<MemberPointer>> members_;
};
The idea is that, at insertion time, you'll receive a handle that will always have O(1) find and delete. While it is otherwize O(n) to find the object, once you find it you can get the handle which will stay up to date.
The usage of such a structure is...limited to say the least so I suspect and X vs Y problem here.
Through lots of performance testing I found the fastest method for the general case below:
1.) Use a pool allocator that stores free memory regions.
2.) Use free memory region list to copy occupied data linearly into temporary memory at every "gap" in the pool.
This works best for me due to the nature of add/removes in my program (resulting in low fragmentation)
I'm trying to share an instance of a class between two programs. This is a glorified producers consumers problem; however, for abstraction purposes, I have put a mutex in the class.
I've seen instances of sharing structs between processes, but this generally involved a fork. I want to keep the processes separate, because they will be doing two different things so half of the program\code segment will be wasted on each process.
It might be easier to show than try to explain.
class my_class
{
private:
sem_t mutex;
data_type *my_data; //just some linked list
fstream some_file;
public:
my_class();
data_type* retrieve();
void add(string add);
};
my_class::my_class()
{
my_data = new data_type();
sem_init(&mutex, 0, 1);
my_file.open("log", ios::out);
}
data_type* my_class::retrieve()
{
data_type *temp = NULL;
sem_wait(&mutex);
if(my_data -> next != NULL)
{
temp = my_data;
my_data = my_data -> next;
}
sem_post(&mutex);
return my_data;
}
void my_class::add(string data)
{
data_type *temp = new data_type();
temp -> data = data;
data_type *top;
sem_wait(&mutex);
top = my_data;
while(top -> next -> next) //adds it to the end. The end's next is set to NULL
{
top = top -> next;
}
top -> next = temp;
my_file << name << "\n";
sem_post(&mutex);
}
What I'm really looking for is a way to share an instance of this class as a pointer. This way, I can then have threads that can access this instance. I think because of how much sharing I want to do, it needs to go on the heap and not the stack.
I wouldn't consider making this its own program and then using networking i\o to interact because of how simple it is. Needless to say, this is not exactly what I'm doing; however I think I've made a simplified\generic enough example that if this can be solved then I can easily apply it to my solution and it might help others.
Again, I'm looking for a way to share one instance of this code between two separate processes.
I don't know if this can be done because the class has a linked list in it let alone a file in it. If it can, then whose heap and file table does it fill (both?).
EDIT:
Thanks for the help so far; however, it should be worth pointing out that both processes may not be running at the same time. One acts as a daemon and the other will appears intermittently. Both programs already have threads, so that's why I want to do it on the heap.
You cannot share memory that is on the heap between processes. Using mmap() with MAP_ANON | MAP_SHARED you can share whole pages, but not on the heap. Using shm_open etc. you can share other objects, but again not on the heap.
Perhaps what you want is threads. These will allow you to share items on the heap.
I think your understanding of fork() is garbled. fork() will result in a copy-on-write image of your program in memory. As your code won't be written to, if you don't exec(), it will be only use one copy of physical memory. If you exec() a different version of your program (e.g. if the producer exec()s the consumer), it's likely to be less memory efficient than having it all in one place and fork()ing. And in either case you are going to have the overhead of some sort of IPC. Threads here seem a far better solution.
I have an implementation of a queue, something like template <typename T> queue<T> with a struct QueueItem { T data;} and I have a separate library that times the passage of data across different places (including from one producer thread to consumer thread via this queue). In order to do this, I inserted code from that timing library into the push and pop functions of the queue so that when they assign a BufferItem.data they also assign an extra member i added of type void* to some timing metadata from that library. I.e. what used to be something like:
void push(T t)
{
QueueItem i;
i.data = t;
//insert i into queue
}
became
void push(T t)
{
QueueItem i;
i.data = t;
void* fox = timinglib.getMetadata();
i.timingInfo = fox;
//insert i into queue
}
with QueueItem going from
struct QueueItem
{
T data;
}
to
struct QueueItem
{
T data;
void* timingInfo;
}
What I would like to achieve, however, is the ability to swap out of the latter struct in favor of the lighter weight struct whenever the timing library is not activated. Something like:
if timingLib.isInactive()
;//use the smaller struct QueueItem
else
;//use the larger struct QueueItem
as cheaply as possible. What would be a good way to do this?
You can't have a struct that is big and small at the same time, obviously, so you're going to have to look at some form of inheritance or pointer/reference, or a union.
A union would be ideal for you if there's "spare" data in T that could be occupied by your timingInfo. If not, then it's going to be as 'heavy' as the original.
Using inheritance is also likely to be as big as the original, as it'll add a vtable in there which will pad it out too much.
So, the next option is to store a pointer only, and have that point to the data you want to store, either the data or the data+timing. This kind of pattern is known as 'flyweight' - where common data is stored separately to the object that is manipulated. This might be what you're looking for (depending on what the timing info metadata is).
The other, more complex, alternative is to have 2 queues that you keep in sync. You store data in one, and the other one stores the associated timeing info, if enabled. If not enabled, you ignore the 2nd queue. The trouble with this is ensuring the 2 are kept in sync, but that's a organisational problem rather than a technical challenge. Maybe create a new Queue class that contains the 2 real queues internally.
I'll start by just confirming my assumption that this needs to be a runtime choice and you can't just build two different binaries with timing enabled/disabled. That approach eliminates as much overhead in any approach as possible.
So now let's assume we want different runtime behavior. There will need to be runtime decisions, so there are a couple options. If you can get away with the (relatively small) cost of polymorphism then you could make your queue polymorphic and create the appropriate instance once at startup and then its push for example either will or won't add the extra data.
However if that's not an option I believe you can use templates to help accomplish your end, although there will likely be some up-front work and it will probably increase the size of your binary with the extra code.
You start with a template to add timing to a class:
template <typename Timee>
struct Timed : public Timee
{
void* timingInfo;
};
Then a timed QueueItem would look like:
Timed<QueueItem> timed_item;
To anything that doesn't care about the timing, this class looks exactly like a QueueItem: It will automatically upcast or slice to the parent as appropriate. And if a method needs to know the timing information you either create an overload that knows what to do for a Timed<T> or do a runtime check (for the "is timing enabled" flag) and downcast to the correct type.
Next, you'll need to change your Queue instantiation to know whether it's using the base QueueItem or the Timed version. For example, a very very rough sketch of a possible mechanism:
template <typename Element>
void run()
{
Queue<Element> queue;
queue.setup();
queue.process();
}
int main()
{
if(do_timing)
{
run<Timed<QueueItem> >();
}
else
{
run<QueueItem>();
}
return 0;
}
You would "likely" need a specialization for Queue when used with Timed items unless getting the metadata is stateless in which case the Timed constructor can gather the info and self-populate itself when created. Then Queue just stays the same and relies on which instantiation you're using.