How could I refactor this code with performance in mind?

How could I refactor this code with performance in mind? - c++

I have a method where performance is really important (I know premature optimization is the root of all evil. I know I should and I did profile my code. In this application every tenth of a second I save is a big win.) This method uses different heuristics to generate and return elements. The heuristics are used sequentially: the first heuristic is used until it can no longer return elements, then the second heuristic is used until it can no longer return elements and so on until all heuristics have been used. On each call of the method I use a switch to move to the right heuristic. This is ugly, but work well. Here is some pseudo code
class MyClass
{
private:
unsigned int m_step;
public:
MyClass() : m_step(0) {};
Elem GetElem()
{
// This switch statement will be optimized as a jump table by the compiler.
// Note that there is no break statments between the cases.
switch (m_step)
{
case 0:
if (UseHeuristic1())
{
m_step = 1; // Heuristic one is special it will never provide more than one element.
return theElem;
}
m_step = 1;
case 1:
DoSomeOneTimeInitialisationForHeuristic2();
m_step = 2;
case 2:
if (UseHeuristic2())
{
return theElem;
}
m_step = 3;
case 3:
if (UseHeuristic3())
{
return theElem;
}
m_step = 4; // But the method should not be called again
}
return someErrorCode;
};
}
As I said, this works and it's efficient since at each call, the execution jumps right where it should. If a heuristic can't provide an element, m_step is incremented (so the next time we don't try this heuristic again) and because there is no break statement, the next heuristic is tried. Also note that some steps (like step 1) never return an element, but are one time initialization for the next heuristic.
The reason initializations are not all done upfront is that they might never be needed. It is always possible (and common) for GetElem to not get called again after it returned an element, even if there are still elements it could return.
While this is an efficient implementation, I find it really ugly. The case statement is a hack; using it without break is also hackish; the method gets really long, even if each heuristic is encapsulated in its own method.
How should I refactor this code so it's more readable and elegant while keeping it as efficient as possible?

Wrap each heuristic in an iterator. Initialize it completely on the first call to hasNext(). Then collect all iterators in a list and use a super-iterator to iterate through all of them:
boolean hasNext () {
if (list.isEmpty()) return false;
if (list.get(0).hasNext()) return true;
while (!list.isEmpty()) {
list.remove (0);
if (list.get(0).hasNext()) return true;
}
return false;
}
Object next () {
return list.get (0).next ();
}
Note: In this case, a linked list might be a tiny bit faster than an ArrayList but you should still check this.
[EDIT] Changed "turn each" into "wrap each" to make my intentions more clear.

I don't think your code is so bad, but if you're doing this kind of thing a lot, and you want to hide the mechanisms so that the logic is clearer, you could look at Simon Tatham's coroutine macros. They're intended for C (using static variables) rather than C++ (using member variables), but it's trivial to change that.
The result should look something like this:
Elem GetElem()
{
crBegin;
if (UseHeuristic1())
{
crReturn(theElem);
}
DoSomeOneTimeInitialisationForHeuristic2();
while (UseHeuristic2())
{
crReturn(theElem);
}
while (UseHeuristic3())
{
crReturn(theElem);
}
crFinish;
return someErrorCode;
}

To my mind if you do not need to modify this code much, eg to add new heuristics then document it well and don't touch it.
However if new heuristics are added and removed and you think that this is an error prone process then you should consider refactoring it. The obvious choice for this would be to introduce the State design pattern. This will replace your switch statement with polymorphism which might slow things down but you would have to profile both to be sure.

It looks like there really isn't much to optimize in this code - probably most of the optimization can be done in the UseHeuristic functions. What's in them?

You can turn the control flow inside-out.
template <class Callback> // a callback that returns true when it's done
void Walk(Callback fn)
{
if (UseHeuristic1()) {
if (fn(theElem))
return;
}
DoSomeOneTimeInitialisationForHeuristic2();
while (UseHeuristic2()) {
if (fn(theElem))
return;
}
while (UseHeuristic3()) {
if (fn(theElem))
return;
}
}
This might earn you a few nanoseconds if the switch dispatch and the return statements are throwing the CPU off its stride, and the recipient is inlineable.
Of course, this kind of optimization is futile if the heuristics themselves are nontrivial. And much depends on what the caller looks like.

That's micro optimization, but there is no need to set m_elem value when you are not returning from GetElem. See code below.
Larger optimization definitely need simplifying control flow (less jumps, less returns, less tests, less function calls), because as soon as a jump is done processor cache are emptied (well some processors have branch prediction, but it's no silver bullet). You can give a try at solutions proposed by Aaron or Jason, and there is others (for instance you can implement several get_elem functions annd call them through a function pointer, but I'm quite sure it'll be slower).
If the problem allow it, it can also be efficient to compute several elements at once in heuristics and use some cache, or to make it truly parallel with some thread computing elements and this one merely a customer waiting for results... no way to say more without some details on the context.
class MyClass
{
private:
unsigned int m_step;
public:
MyClass() : m_step(0) {};
Elem GetElem()
{
// This switch statement will be optimized as a jump table by the compiler.
// Note that there is no break statments between the cases.
switch (m_step)
{
case 0:
if (UseHeuristic1())
{
m_step = 1; // Heuristic one is special it will never provide more than one element.
return theElem;
}
case 1:
DoSomeOneTimeInitialisationForHeuristic2();
m_step = 2;
case 2:
if (UseHeuristic2())
{
return theElem;
}
case 3:
m_step = 4;
case 4:
if (UseHeuristic3())
{
return theElem;
}
m_step = 5; // But the method should not be called again
}
return someErrorCode;
};
}

What you really can do here is replacing conditional with State pattern.
http://en.wikipedia.org/wiki/State_pattern
May be it would be less performant because of the virtual method call, maybe it would be better performant because of less state maintaining code, but the code would be definitely much clearer and maintainable, as always with patterns.
What could improve performance, is elimination of DoSomeOneTimeInitialisationForHeuristic2();
with separate state between. 1 and 2.

Since each heuristic is represented by a function with an identical signature, you can make a table of function pointers and walk through it.
class MyClass
{
private:
typedef bool heuristic_function();
typedef heuristic_function * heuristic_function_ptr;
static heuristic_function_ptr heuristic_table[4];
unsigned int m_step;
public:
MyClass() : m_step(0) {};
Elem GetElem()
{
while (m_step < sizeof(heuristic_table)/sizeof(heuristic_table[0]))
{
if (heuristic_table[m_step]())
{
return theElem;
}
++m_step;
}
return someErrorCode;
};
};
MyClass::heuristic_function_ptr MyClass::heuristic_table[4] = { UseHeuristic1, DoSomeOneTimeInitialisationForHeuristic2, UseHeuristic2, UseHeuristic3 };

If the element code you are processing can be converted to an integral value, then you can construct a table of function pointers and index based on the element. The table would have one entry for each 'handled' element, and one for each known but unhandled element. For unknown elements, do a quick check before indexing the function pointer table.
Calling the element-processing function is fast.
Here's working sample code:
#include <cstdlib>
#include <iostream>
using namespace std;
typedef void (*ElementHandlerFn)(void);
void ProcessElement0()
{
cout << "Element 0" << endl;
}
void ProcessElement1()
{
cout << "Element 1" << endl;
}
void ProcessElement2()
{
cout << "Element 2" << endl;
}
void ProcessElement3()
{
cout << "Element 3" << endl;
}
void ProcessElement7()
{
cout << "Element 7" << endl;
}
void ProcessUnhandledElement()
{
cout << "> Unhandled Element <" << endl;
}
int main()
{
// construct a table of function pointers, one for each possible element (even unhandled elements)
// note: i am assuming that there are 10 possible elements -- 0, 1, 2 ... 9 --
// and that 5 of them (0, 1, 2, 3, 7) are 'handled'.
static const size_t MaxElement = 9;
ElementHandlerFn handlers[] =
{
ProcessElement0,
ProcessElement1,
ProcessElement2,
ProcessElement3,
ProcessUnhandledElement,
ProcessUnhandledElement,
ProcessUnhandledElement,
ProcessElement7,
ProcessUnhandledElement,
ProcessUnhandledElement
};
// mock up some elements to simulate input, including 'invalid' elements like 12
int testElements [] = {0, 1, 2, 3, 7, 4, 9, 12, 3, 3, 2, 7, 8 };
size_t numTestElements = sizeof(testElements)/sizeof(testElements[0]);
// process each test element
for( size_t ix = 0; ix < numTestElements; ++ix )
{
// for some robustness...
if( testElements[ix] > MaxElement )
cout << "Invalid Input!" << endl;
// otherwise process normally
else
handlers[testElements[ix]]();
}
return 0;
}

If it ain't broke don't fix it.
It looks pretty efficient as is. It doesn't look hard to understand either. Adding iterators etc. is probably going to make it harder to understand.
You are probably better off doing
Performance analysis. Is time really spent in this procedure at all, or is most of it in the functions that it calls? I can't see any significant time being spent here.
More unit tests, to prevent someone from breaking it if they have to modify it.
Additional comments in the code.

Related

How to deal with declaration of the primitive type without the initial value known (C++)?

In some cases it happens for me to declare a variable without knowing its value first like:
int a;
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
}
do_something_with(a);
Is it the standard professional practice to assign some clearly wrong value like -1000 anyway (making potential bugs more reproducible) or it is preferred not to add the code that does nothing useful as long as there are no bugs? From one side, looks reasonable to remove randomness, from the other side, magical and even "clearly wrong" numbers somehow do not look attractive.
In many cases it is possible to declare when the value is first known, or use a ternary operator, but here we would need it nested so also rather clumsy.
Declaring inside the block would move the variable out of the scope prematurely.
Or would this case justify the usage of std::optional<int> a and assert(a) later, making sure we have the value?
EDIT: The bugs I am talking about would occur if suddenly all 3 conditions are false that should "absolutely never happen".

As far as I know the most popular and safest way is using inline lambda call. Note that the if should be exhaustive (I added SOME_DEFAULT_VALUE as a placeholder). I suppose that if you don't know what to put in final else block you should consider a few options:
using optional and putting none in the else,
throwing exception that describes the problem,
putting assert if logically this situation should never happen
const int a = [&] {
if (c1) {
return 1;
} else if (c2) {
return 2;
} else if (c3) {
return -3;
} else {
return SOME_DEFAULT_VALUE;
}
}();
do_something_with(a);
In a situation when the initialization logic duplicates somewhere you can simply extract the lambda to a named function as other answers suggest

In my opinion, the safest option, if you dont want this other value (its just useless), then it may lead to really subtle bug which may be hard to find. Therefore I would throw an expectation when any of the conditions is not met:
int get_init_value(bool c1, bool c2, bool c3) {
if (c1) { return 1; }
else if (c2) { return 2; }
else if (c3) { return -3; }
throw std::logic_error("noone of conditions to define value was met");
}
That way we avoid getting some weird values that want actually match our code, but they would compile anyways ( debugging it may take a lot of time). I consider it way better than just assigning it some clearly wrong value.

Opinion based answer!
I know the example is a simplification of a real, more complex example, but IMHO it seems nowadays this kind of design issue emerge more often, and people sometimes kinda tend to over-complicate it.
Isn't it the whole purpose of a variable to hold some value? Thus isn't having a default value for this variable also a feasible thing?
So what exactly is wrong with:
int a = -1000; // or some other value meant to used for undefined
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
}
do_something_with(a);
It is simple and readable... No lambdas, exceptions and other stuff making the code unnecessary complicated...
Or like:
int a;
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
} else {
a = -1000; // default for unkown state
}
do_something_with(a);
You could introduce a constant const int undefined = -1000; and use the constant.
Or an enum if c1, c2, c3 are states in some sort (which it most likely is)...
You could rearrange the code to eliminate the variable if it is not needed elsewhere.
if (c1) {
do_something_with(1);
} else if (c2) {
do_something_with(2);
} else if (c3) {
do_something_with(-3);
}

I would introduce a default value. I'm usually using MAX value of the type for this.
Shortest you can do this with the ternary operator like this:
#include <climits>
int a = c1 ? 1 : c2 ? 2 : c3 ? -3 : INT_MAX;
do_something_with(a);

I understand your real code is much more complicated than the outline presented, but IMHO the main problem here is
should we do_something_with(a) at all if a is undefined,
rather than
what the initial value should be.
And the solution might be adding explicitly some status flag like a_is_defined to the actual parameter a instead of using magic constans.
int a = 0;
bool a_is_defined = false;
When you set them both according to c... conditions and pass them to do_something() you'll be able to make a clear distinction between a specific if(a_is_defined) {...} path and a default (error handling?) else {...}.
Or even provide separate routines to explicitly handle both paths one level earlier: if(a_is_defined) do_someting_with(a); else do_something_else();.

How to select functions called inside nested loops before getting into loops?

As shown in the following code, one of several atomic routines is called in the function messagePassing.
Which one to use is determined before diving into the nested loops.
In the current implementation, several while loops are used for sake of runtime performance.
I want to avoid repeating myself (repeating the shared operations in the nested loops) for sake of readability and maintainability, and achieve something like messagePassingCleanButSlower.
Is there a approach which does not sacrifice runtime performance?
I need to deal with two scenarios.
In the first one, the atomic routines are small and only involve 3 plus/minus operations, thus I guess they will be inlined.
In the second one, the atomic routines are big (about 200 lines) and hence unlikely to be inlined.
#include <vector>
template<typename Uint, typename Real>
class Graph {
public:
void messagePassing(Uint nit, Uint type);
void messagePassingCleanButSlower(Uint nit, Uint type);
private:
struct Vertex {}; // Details are hidden since they are distracting.
std::vector< Vertex > vertices;
void atomicMessagePassingType1(Vertex &v);
void atomicMessagePassingType2(Vertex &v);
void atomicMessagePassingType3(Vertex &v);
// ...
// may have other types
};
template<typename Uint, typename Real>
void
Graph<Uint, Real>::
messagePassing(Uint nit, Uint type)
{
Uint count = 0; // round counter
if (type == 1) {
while (count < nit) {
++count;
// many operations
for (auto &v : vertices) {
// many other operations
atomicMessagePassingType1(v);
}
}
}
else if (type == 2) {
while (count < nit) {
++count;
// many operations
for (auto &v : vertices) {
// many other operations
atomicMessagePassingType2(v);
}
}
}
else {
while (count < nit) {
++count;
// many operations
for (auto &v : vertices) {
// many other operations
atomicMessagePassingType3(v);
}
}
}
}
template<typename Uint, typename Real>
void
Graph<Uint, Real>::
messagePassingCleanButSlower(Uint nit, Uint type)
{
Uint count = 0; // round counter
while (count < nit) {
++count;
// many operations
for (auto &v : vertices) {
// many other operations
if (type == 1) {
atomicMessagePassingType1(v);
}
else if (type == 2) {
atomicMessagePassingType2(v);
}
else {
atomicMessagePassingType3(v);
}
}
}
}

See the benchmarks here:
http://quick-bench.com/rMsSb0Fg4I0WNFX8QbKugCe3hkc
For 1. I have setup a test scenario where the operations in atomicMessagePassingTypeX are really short (only an optimization barrier). I chose roughly 100 elements for vertices and 100 iterations of the outer while. These conditions are going to be different for your actual code, so whether my benchmark results apply to your case, you must verify by benchmarking your own code.
The four test cases are: Your two variants, the one with a function pointer mentioned in the other answers and one where the function pointer is called through a dispatching lambda, like this:
template<typename Uint, typename Real>
void
Graph<Uint, Real>::
messagePassingLambda(Uint nit, Uint type)
{
using ftype = decltype(&Graph::atomicMessagePassingType1);
auto lambda = [&](ftype what_to_call) {
Uint count = 0; // round counter
while (count < nit) {
++count;
// many operations
for (auto &v : vertices) {
// many other operations
(this->*what_to_call)(v);
}
}
};
if(type == 1) lambda(&Graph::atomicMessagePassingType1);
else if(type == 2) lambda(&Graph::atomicMessagePassingType2);
else lambda(&Graph::atomicMessagePassingType3);
}
Try all combinations of GCC 9.1/Clang 8.0 and O2/O3. You will see that at O3 both compilers give roughly the same performance for your "slow" variant, in the case of GCC, it is actually the best. The compiler does hoist the if/else statements out of at least the inner loops and then, for some reason that is not completely clear to me, GCC does reorder the instructions in the inner loop differently than it does for the first variant, resulting in it being even a slightly bit faster.
The function pointer variant is consistently slowest.
The lambda variant is effectively equal to your first variant in performance. I guess it is clear why they are essentially the same if the the lambda is inlined.
If it is not inlined, then there might be a significant performance penalty due to the indirect call of what_to_call. This can be avoided by forcing a different type with appropriate direct call at each call site of lambda:
With C++14 or later you can make a generic lambda:
auto lambda = [&](auto what_to_call) {
adjust the call form (this->*what_to_call)(v); to what_to_call(); and call it with another lambda:
lambda([&](){ atomicMessagePassingType1(v); });
which will force the compiler to instantiate one function per dispatch and that should remove any potential indirect calls.
With C++11 you cannot make a generic lambda or variable template and so you would need to write an actual function template taking the secondary lambda as argument.

You can use a function pointer to make the decision before entering the loop, like so:
template<typename Uint, typename Real>
void
Graph<Uint, Real>::
messagePassingV2(Uint nit, bool isType1)
{
void (Graph::* aMPT_Ptr)(Vertex &); // Thanks to #uneven_mark for the corerct
if (isType1)
aMPT_Ptr = &Graph<Uint, Real>::atomicMessagePassingType1; // syntax here
else
aMPT_Ptr = &Graph<Uint, Real>::atomicMessagePassingType2;
Uint count = 0; // round counter
while (count < nit) {
++count;
for (auto& v : vertices) {
(this->*aMPT_Ptr)(v); // Again, thanks to #uneven_mark for the syntax!
}
}
}
The one thing that remains as a potential issue is what happens if either of the functions 'assigned' to the pointer is inlined. I'm thinking that, as there is code taking the address of these functions, then the compiler will probably prevent any inlining.

There are a couple ways.
1) Bool param. This really just moves the if/else into the function... but that's a good thing when you use the function[s] in multiple places, and a bad thing if you're trying to move the test out of the loop. OTOH, speculative execution should mitigate that.
2) Member function pointers. Nasty syntax in the raw, but 'auto' can burry all that for us.
#include <functional>
#include <iostream>
class Foo
{
public:
void bar() { std::cout << "bar\n"; }
void baz() { std::cout << "baz\n"; }
};
void callOneABunch(Foo& foo, bool callBar)
{
auto whichToCall = callBar ? &Foo::bar : &Foo::baz;
// without the auto, this would be "void(Foo::*)()"
// typedef void(Foo::*TypedefNameGoesHereWeirdRight)();
for (int i = 0; i < 4; ++i)
{
std::invoke(whichToCall, foo); // C++17
(foo.*whichToCall)(); // ugly, several have recommended wrapping it in a macro
Foo* foop = &foo;
(foop->*whichToCall)(); // yep, still ugly
}
}
int main() {
Foo myFoo;
callOneABunch(myFoo, true);
}
You can also take a swing at this with std::function or std::bind, but after arguing with fuction for a bit, I fell back on the bare syntax.

Prevent or detect "this" from being deleted during use

One error that I often see is a container being cleared whilst iterating through it. I have attempted to put together a small example program demonstrating this happening. One thing to note is that this can often happen many function calls deep so is quite hard to detect.
Note: This example deliberately shows some poorly designed code. I am trying to find a solution to detect the errors caused by writing code such as this without having to meticulously examine an entire codebase (~500 C++ units)
#include <iostream>
#include <string>
#include <vector>
class Bomb;
std::vector<Bomb> bombs;
class Bomb
{
std::string name;
public:
Bomb(std::string name)
{
this->name = name;
}
void touch()
{
if(rand() % 100 > 30)
{
/* Simulate everything being exploded! */
bombs.clear();
/* An error: "this" is no longer valid */
std::cout << "Crickey! The bomb was set off by " << name << std::endl;
}
}
};
int main()
{
bombs.push_back(Bomb("Freddy"));
bombs.push_back(Bomb("Charlie"));
bombs.push_back(Bomb("Teddy"));
bombs.push_back(Bomb("Trudy"));
for(size_t i = 0; i < bombs.size(); i++)
{
bombs.at(i).touch();
}
return 0;
}
Can anyone suggest a way of guaranteeing this cannot happen?
The only way I can currently detect this kind of thing is replacing the global new and delete with mmap / mprotect and detecting use after free memory accesses. This and Valgrind however sometimes fail to pick it up if the vector does not need to reallocate (i.e only some elements removed or the new size is not yet the reserve size). Ideally I don't want to have to clone much of the STL to make a version of std::vector that always reallocates every insertion/deletion during debug / testing.
One way that almost works is if the std::vector instead contains std::weak_ptr, then the usage of .lock() to create a temporary reference prevents its deletion whilst execution is within the classes method. However this cannot work with std::shared_ptr because you do not need lock() and same with plain objects. Creating a container of weak pointers just for this would be wasteful.
Can anyone else think of a way to protect ourselves from this.

Easiest way is to run your unit tests with Clang MemorySanitizer linked in.
Let some continuous-integration Linux box to do it automatically on each push
into repo.
MemorySanitizer has "Use-after-destruction detection" (flag -fsanitize-memory-use-after-dtor + environment variable MSAN_OPTIONS=poison_in_dtor=1) and so it will blow the test up that executes the code and that turns your continuous-integration red.
If you have neither unit tests nor continuous integration in place then you can also just manually debug your code with MemorySanitizer but that is hard way compared with the easiest. So better start to use continuous integration and write unit tests.
Note that there may be legitimate reasons of memory reads and writes after destructor has been ran but memory hasn't yet been freed. For example std::variant<std::string,double>. It lets us to assign it std::string then double and so its implementation might destroy the string and reuse same storage for double. Filtering such cases out is unfortunately manual work at the moment, but tools evolve.

In your particular example the misery boils down to no less than two design flaws:
Your vector is a global variable. Limit the scope of all of your objects as much as possible and issues like this are less likely to occur.
Having the single responsibility principle in mind, I can hardly imagine how one could come up with a class that needs to have some method that either directly or indirectly (maybe through 100 layers of call stack) deletes objects that could happen to be this.
I am aware that your example is artificial and intentionally bad, so please don't get me wrong here: I'm sure that in your actual case it is not so obvious how sticking to some basic design rules can prevent you from doing this. But as I said, I strongly believe that good design will reduce the likelyhood of such bugs coming up. And in fact, I cannot remember that I was ever facing such an issue, but maybe I am just not experienced enough :)
However, if this really keeps being an issue despite sticking with some design rules, then I have this idea how to detect it:
Create a member int recursionDepth in your class and initialize it with 0
At the beginning of each non-private method increment it.
Use RAII to make sure that at the end of each method it is decremented again
In the destructor check it to be 0, otherwise it means that the destructor is directly or indirectly called by some method of this.
You may want to #ifdef all of this and enable it only in debug build. This would essentially make it a debug assertion, some people like them :)
Note, that this does not work in a multi threaded environment.

In the end I went with a custom iterator that if the owner std::vector resizes whilst the iterator is still in scope, it will log an error or abort (giving me a stacktrace of the program). This example is a bit convoluted but I have tried to simplify it as much as possible and removed unused functionality from the iterator.
This system has flagged up about 50 errors of this nature. Some may be repeats. However Valgrind and ElecricFence at this point came up clean which is disappointing (In total they flagged up around 10 which I have already fixed since the start of the code cleanup).
In this example I use clear() which Valgrind does flag as an error. However in the actual codebase it is random access erases (i.e vec.erase(vec.begin() + 9)) which I need to check and Valgrind unfortunately misses quite a few.
main.cpp
#include "sstd_vector.h"
#include <iostream>
#include <string>
#include <memory>
class Bomb;
sstd::vector<std::shared_ptr<Bomb> > bombs;
class Bomb
{
std::string name;
public:
Bomb(std::string name)
{
this->name = name;
}
void touch()
{
if(rand() % 100 > 30)
{
/* Simulate everything being exploded! */
bombs.clear(); // Causes an ABORT
std::cout << "Crickey! The bomb was set off by " << name << std::endl;
}
}
};
int main()
{
bombs.push_back(std::make_shared<Bomb>("Freddy"));
bombs.push_back(std::make_shared<Bomb>("Charlie"));
bombs.push_back(std::make_shared<Bomb>("Teddy"));
bombs.push_back(std::make_shared<Bomb>("Trudy"));
/* The key part is the lifetime of the iterator. If the vector
* changes during the lifetime of the iterator, even if it did
* not reallocate, an error will be logged */
for(sstd::vector<std::shared_ptr<Bomb> >::iterator it = bombs.begin(); it != bombs.end(); it++)
{
it->get()->touch();
}
return 0;
}
sstd_vector.h
#include <vector>
#include <stdlib.h>
namespace sstd
{
template <typename T>
class vector
{
std::vector<T> data;
size_t refs;
void check_valid()
{
if(refs > 0)
{
/* Report an error or abort */
abort();
}
}
public:
vector() : refs(0) { }
~vector()
{
check_valid();
}
vector& operator=(vector const& other)
{
check_valid();
data = other.data;
return *this;
}
void push_back(T val)
{
check_valid();
data.push_back(val);
}
void clear()
{
check_valid();
data.clear();
}
class iterator
{
friend class vector;
typename std::vector<T>::iterator it;
vector<T>* parent;
iterator() { }
iterator& operator=(iterator const&) { abort(); }
public:
iterator(iterator const& other)
{
it = other.it;
parent = other.parent;
parent->refs++;
}
~iterator()
{
parent->refs--;
}
bool operator !=(iterator const& other)
{
if(it != other.it) return true;
if(parent != other.parent) return true;
return false;
}
iterator operator ++(int val)
{
iterator rtn = *this;
it ++;
return rtn;
}
T* operator ->()
{
return &(*it);
}
T& operator *()
{
return *it;
}
};
iterator begin()
{
iterator rtn;
rtn.it = data.begin();
rtn.parent = this;
refs++;
return rtn;
}
iterator end()
{
iterator rtn;
rtn.it = data.end();
rtn.parent = this;
refs++;
return rtn;
}
};
}
The disadvantages of this system is that I must use an iterator rather than .at(idx) or [idx]. I personally don't mind this one so much. I can still use .begin() + idx if random access is needed.
It is a little bit slower (nothing compared to Valgrind though). When I am done, I can do a search / replace of sstd::vector with std::vector and there should be no performance drop.

return a vector vs use a parameter for the vector to return it

With the code below, the question is:
If you use the "returnIntVector()" function, is the vector copied from the local to the "outer" (global) scope? In other words is it a more time and memory consuming variation compared to the "getIntVector()"-function? (However providing the same functionality.)
#include <iostream>
#include <vector>
using namespace std;
vector<int> returnIntVector()
{
vector<int> vecInts(10);
for(unsigned int ui = 0; ui < vecInts.size(); ui++)
vecInts[ui] = ui;
return vecInts;
}
void getIntVector(vector<int> &vecInts)
{
for(unsigned int ui = 0; ui < vecInts.size(); ui++)
vecInts[ui] = ui;
}
int main()
{
vector<int> vecInts = returnIntVector();
for(unsigned int ui = 0; ui < vecInts.size(); ui++)
cout << vecInts[ui] << endl;
cout << endl;
vector<int> vecInts2(10);
getIntVector(vecInts2);
for(unsigned int ui = 0; ui < vecInts2.size(); ui++)
cout << vecInts2[ui] << endl;
return 0;
}

In theory, yes it's copied. In reality, no, most modern compilers take advantage of return value optimization.
So you can write code that acts semantically correct. If you want a function that modifies or inspects a value, you take it in by reference. Your code does not do that, it creates a new value not dependent upon anything else, so return by value.

Use the first form: the one which returns vector. And a good compiler will most likely optimize it. The optimization is popularly known as Return value optimization, or RVO in short.

Others have already pointed out that with a decent (not great, merely decent) compiler, the two will normally end up producing identical code, so the two give equivalent performance.
I think it's worth mentioning one or two other points though. First, returning the object does officially copy the object; even if the compiler optimizes the code so that copy never takes place, it still won't (or at least shouldn't) work if the copy ctor for that class isn't accessible. std::vector certainly supports copying, but it's entirely possible to create a class that you'd be able to modify like in getIntVector, but not return like in returnIntVector.
Second, and substantially more importantly, I'd generally advise against using either of these. Instead of passing or returning a (reference to) a vector, you should normally work with an iterator (or two). In this case, you have a couple of perfectly reasonable choices -- you could use either a special iterator, or create a small algorithm. The iterator version would look something like this:
#ifndef GEN_SEQ_INCLUDED_
#define GEN_SEQ_INCLUDED_
#include <iterator>
template <class T>
class sequence : public std::iterator<std::forward_iterator_tag, T>
{
T val;
public:
sequence(T init) : val(init) {}
T operator *() { return val; }
sequence &operator++() { ++val; return *this; }
bool operator!=(sequence const &other) { return val != other.val; }
};
template <class T>
sequence<T> gen_seq(T const &val) {
return sequence<T>(val);
}
#endif
You'd use this something like this:
#include "gen_seq"
std::vector<int> vecInts(gen_seq(0), gen_seq(10));
Although it's open to argument that this (sort of) abuses the concept of iterators a bit, I still find it preferable on practical grounds -- it lets you create an initialized vector instead of creating an empty vector and then filling it later.
The algorithm alternative would look something like this:
template <class T, class OutIt>
class fill_seq_n(OutIt result, T num, T start = 0) {
for (T i = start; i != num-start; ++i) {
*result = i;
++result;
}
}
...and you'd use it something like this:
std::vector<int> vecInts;
fill_seq_n(std::back_inserter(vecInts), 10);
You can also use a function object with std::generate_n, but at least IMO, this generally ends up more trouble than it's worth.
As long as we're talking about things like that, I'd also replace this:
for(unsigned int ui = 0; ui < vecInts2.size(); ui++)
cout << vecInts2[ui] << endl;
...with something like this:
std::copy(vecInts2.begin(), vecInts2.end(),
std::ostream_iterator<int>(std::cout, "\n"));

In C++03 days, getIntVector() is recommended for most cases. In case of returnIntVector(), it might create some unncessary temporaries.
But by using return value optimization and swaptimization, most of them can be avoided. In era of C++11, the latter can be meaningful due to the move semantics.

In theory, the returnIntVector function returns the vector by value, so a copy will be made and it will be more time-consuming than the function which just populates an existing vector. More memory will also be used to store the copy, but only temporarily; since vecInts is locally scoped it will be stack-allocated and will be freed as soon as the returnIntVector returns. However, as others have pointed out, a modern compiler will optimize away these inefficiencies.

returnIntVector is more time consuming because it returns a copy of the vector, unless the vector implementation is realized with a single pointer in which case the performance is the same.
in general you should not rely on the implementation and use getIntVector instead.

Bin packing implementation in C++ with STL

This is my first time using this site so sorry for any bad formatting or weird formulations, I'll try my best to conform to the rules on this site but I might do some misstakes in the beginning.
I'm right now working on an implementation of some different bin packing algorithms in C++ using the STL containers. In the current code I still have some logical faults that needs to be fixed but this question is more about the structure of the program. I would wan't some second opinion on how you should structure the program to minimize the number of logical faults and make it as easy to read as possible. In it's current state I just feel that this isn't the best way to do it but I don't really see any other way to write my code right now.
The problem is a dynamic online bin packing problem. It is dynamic in the sense that items have an arbitrary time before they will leave the bin they've been assigned to.
In short my questions are:
How would the structure of a Bin packing algorithm look in C++?
Is STL containers a good tool to make the implementation be able to handle inputs of arbitrary lenght?
How should I handle the containers in a good, easy to read and implement way?
Some thoughts about my own code:
Using classes to make a good distinction between handling the list of the different bins and the list of items in those bins.
Getting the implementation as effective as possible.
Being easy to run with a lot of different data lengths and files for benchmarking.
#include <iostream>
#include <fstream>
#include <list>
#include <queue>
#include <string>
#include <vector>
using namespace std;
struct type_item {
int size;
int life;
bool operator < (const type_item& input)
{
return size < input.size;
}
};
class Class_bin {
double load;
list<type_item> contents;
list<type_item>::iterator i;
public:
Class_bin ();
bool operator < (Class_bin);
bool full (type_item);
void push_bin (type_item);
double check_load ();
void check_dead ();
void print_bin ();
};
Class_bin::Class_bin () {
load=0.0;
}
bool Class_bin::operator < (Class_bin input){
return load < input.load;
}
bool Class_bin::full (type_item input) {
if (load+(1.0/(double) input.size)>1) {
return false;
}
else {
return true;
}
}
void Class_bin::push_bin (type_item input) {
int sum=0;
contents.push_back(input);
for (i=contents.begin(); i!=contents.end(); ++i) {
sum+=i->size;
}
load+=1.0/(double) sum;
}
double Class_bin::check_load () {
return load;
}
void Class_bin::check_dead () {
for (i=contents.begin(); i!=contents.end(); ++i) {
i->life--;
if (i->life==0) {
contents.erase(i);
}
}
}
void Class_bin::print_bin () {
for (i=contents.begin (); i!=contents.end (); ++i) {
cout << i->size << " ";
}
}
class Class_list_of_bins {
list<Class_bin> list_of_bins;
list<Class_bin>::iterator i;
public:
void push_list (type_item);
void sort_list ();
void check_dead ();
void print_list ();
private:
Class_bin new_bin (type_item);
bool comparator (type_item, type_item);
};
Class_bin Class_list_of_bins::new_bin (type_item input) {
Class_bin temp;
temp.push_bin (input);
return temp;
}
void Class_list_of_bins::push_list (type_item input) {
if (list_of_bins.empty ()) {
list_of_bins.push_front (new_bin(input));
return;
}
for (i=list_of_bins.begin (); i!=list_of_bins.end (); ++i) {
if (!i->full (input)) {
i->push_bin (input);
return;
}
}
list_of_bins.push_front (new_bin(input));
}
void Class_list_of_bins::sort_list () {
list_of_bins.sort();
}
void Class_list_of_bins::check_dead () {
for (i=list_of_bins.begin (); i !=list_of_bins.end (); ++i) {
i->check_dead ();
}
}
void Class_list_of_bins::print_list () {
for (i=list_of_bins.begin (); i!=list_of_bins.end (); ++i) {
i->print_bin ();
cout << "\n";
}
}
int main () {
int i, number_of_items;
type_item buffer;
Class_list_of_bins bins;
queue<type_item> input;
string filename;
fstream file;
cout << "Input file name: ";
cin >> filename;
cout << endl;
file.open (filename.c_str(), ios::in);
file >> number_of_items;
for (i=0; i<number_of_items; ++i) {
file >> buffer.size;
file >> buffer.life;
input.push (buffer);
}
file.close ();
while (!input.empty ()) {
buffer=input.front ();
input.pop ();
bins.push_list (buffer);
}
bins.print_list ();
return 0;
}
Note that this is just a snapshot of my code and is not yet running properly
Don't wan't to clutter this with unrelated chatter just want to thank the people who contributed, I will review my code and hopefully be able to structure my programming a bit better

How would the structure of a Bin packing algorithm look in C++?
Well, ideally you would have several bin-packing algorithms, separated into different functions, which differ only by the logic of the algorithm. That algorithm should be largely independent from the representation of your data, so you can change your algorithm with only a single function call.
You can look at what the STL Algorithms have in common. Mainly, they operate on iterators instead of containers, but as I detail below, I wouldn't suggest this for you initially. You should get a feel for what algorithms are available and leverage them in your implementation.
Is STL containers a good tool to make the implementation be able to handle inputs of arbitrary length?
It usually works like this: create a container, fill the container, apply an algorithm to the container.
Judging from the description of your requirements, that is how you'll use this, so I think it'll be fine. There's one important difference between your bin packing algorithm and most STL algorithms.
The STL algorithms are either non-modifying or are inserting elements to a destination. bin-packing, on the other hand, is "here's a list of bins, use them or add a new bin". It's not impossible to do this with iterators, but probably not worth the effort. I'd start by operating on the container, get a working program, back it up, then see if you can make it work for only iterators.
How should I handle the containers in a good, easy to read and implement way?
I'd take this approach, characterize your inputs and outputs:
Input: Collection of items, arbitrary length, arbitrary order.
Output: Collection of bins determined by algorithm. Each bin contains a collection of items.
Then I'd worry about "what does my algorithm need to do?"
Constantly check bins for "does this item fit?"
Your Class_bin is a good encapsulation of what is needed.
Avoid cluttering your code with unrelated stuff like "print()" - use non-member help functions.
type_item
struct type_item {
int size;
int life;
bool operator < (const type_item& input)
{
return size < input.size;
}
};
It's unclear what life (or death) is used for. I can't imagine that concept being relevant to implementing a bin-packing algorithm. Maybe it should be left out?
This is personal preference, but I don't like giving operator< to my objects. Objects are usually non-trivial and have many meanings of less-than. For example, one algorithm might want all the alive items sorted before the dead items. I typically wrap that in another struct for clarity:
struct type_item {
int size;
int life;
struct SizeIsLess {
// Note this becomes a function object, which makes it easy to use with
// STL algorithms.
bool operator() (const type_item& lhs, const type_item& rhs)
{
return lhs.size < rhs.size;
}
}
};
vector<type_item> items;
std::sort(items.begin, items.end(), type_item::SizeIsLess);
Class_bin
class Class_bin {
double load;
list<type_item> contents;
list<type_item>::iterator i;
public:
Class_bin ();
bool operator < (Class_bin);
bool full (type_item);
void push_bin (type_item);
double check_load ();
void check_dead ();
void print_bin ();
};
I would skip the Class_ prefix on all your types - it's just a bit excessive, and it should be clear from the code. (This is a variant of hungarian notation. Programmers tend to be hostile towards it.)
You should not have a class member i (the iterator). It's not part of class state. If you need it in all the members, that's ok, just redeclare it there. If it's too long to type, use a typedef.
It's difficult to quantify "bin1 is less than bin2", so I'd suggest removing the operator<.
bool full(type_item) is a little misleading. I'd probably use bool can_hold(type_item). To me, bool full() would return true if there is zero space remaining.
check_load() would seem more clearly named load().
Again, it's unclear what check_dead() is supposed to accomplish.
I think you can remove print_bin and write that as a non-member function, to keep your objects cleaner.
Some people on StackOverflow would shoot me, but I'd consider just making this a struct, and leaving load and item list public. It doesn't seem like you care much about encapsulation here (you're only need to create this object so you don't need do recalculate load each time).
Class_list_of_bins
class Class_list_of_bins {
list<Class_bin> list_of_bins;
list<Class_bin>::iterator i;
public:
void push_list (type_item);
void sort_list ();
void check_dead ();
void print_list ();
private:
Class_bin new_bin (type_item);
bool comparator (type_item, type_item);
};
I think you can do without this class entirely.
Conceptually, it represents a container, so just use an STL container. You can implement the methods as non-member functions. Note that sort_list can be replaced with std::sort.
comparator is too generic a name, it gives no indication of what it compares or why, so consider being more clear.
Overall Comments
Overall, I think the classes you've picked adequately model the space you're trying to represent, so you'll be fine.
I might structure my project like this:
struct bin {
double load; // sum of item sizes.
std::list<type_item> items;
bin() : load(0) { }
};
// Returns true if the bin can fit the item passed to the constructor.
struct bin_can_fit {
bin_can_fit(type_item &item) : item_(item) { }
bool operator()(const bin &b) {
return item_.size < b.free_space;
}
private:
type_item item_;
};
// ItemIter is an iterator over the items.
// BinOutputIter is an output iterator we can use to put bins.
template <ItemIter, BinOutputIter>
void bin_pack_first_fit(ItemIter curr, ItemIter end, BinOutputIter output_bins) {
std::vector<bin> bins; // Create a local bin container, to simplify life.
for (; curr != end; ++curr) {
// Use a helper predicate to check whether the bin can fit this item.
// This is untested, but just for an idea.
std::vector<bin>::iterator bin_it =
std::find_if(bins.begin(), bins.end(), bin_can_fit(*curr));
if (bin_it == bins.end()) {
// Did not find a bin with enough space, add a new bin.
bins.push_back(bin);
// push_back invalidates iterators, so reassign bin_it to the last item.
bin_it = std::advance(bins.begin(), bins.size() - 1);
}
// bin_it now points to the bin to put the item in.
bin_it->items.push_back(*curr);
bin_it->load += curr.size();
}
std::copy(bins.begin(), bins.end(), output_bins); // Apply our bins to the destination.
}
void main(int argc, char** argv) {
std::vector<type_item> items;
// ... fill items
std::vector<bin> bins;
bin_pack_first_fit(items.begin(), items.end(), std::back_inserter(bins));
}

Some thoughts:
Your names are kinda messed up in places.
You have a lot of parameters named input, thats just meaningless
I'd expect full() to check whether it is full, not whether it can fit something else
I don't think push_bin pushes a bin
check_dead modifies the object (I'd expect something named check_*, to just tell me something about the object)
Don't put things like Class and type in the names of classes and types.
class_list_of_bins seems to describe what's inside rather then what the object is.
push_list doesn't push a list
Don't append stuff like _list to every method in a list class, if its a list object, we already know its a list method
I'm confused given the parameters of life and load as to what you are doing. The bin packing problem I'm familiar with just has sizes. I'm guessing that overtime some of the objects are taken out of bins and thus go away?
Some further thoughts on your classes
Class_list_of_bins is exposing too much of itself to the outside world. Why would the outside world want to check_dead or sort_list? That's nobodies business but the object itself. The public method you should have on that class really should be something like
* Add an item to the collection of bins
* Print solution
* Step one timestep into the future
list<Class_bin>::iterator i;
Bad, bad, bad! Don't put member variables on your unless they are actually member states. You should define that iterator where it is used. If you want to save some typing add this: typedef list::iterator bin_iterator and then you use bin_iterator as the type instead.
EXPANDED ANSWER
Here is my psuedocode:
class Item
{
Item(Istream & input)
{
read input description of item
}
double size_needed() { return actual size required (out of 1) for this item)
bool alive() { return true if object is still alive}
void do_timestep() { decrement life }
void print() { print something }
}
class Bin
{
vector of Items
double remaining_space
bool can_add(Item item) { return true if we have enough space}
void add(Item item) {add item to vector of items, update remaining space}
void do_timestep() {call do_timestep() and all Items, remove all items which indicate they are dead, updating remaining_space as you go}
void print { print all the contents }
}
class BinCollection
{
void do_timestep { call do_timestep on all of the bins }
void add(item item) { find first bin for which can_add return true, then add it, create a new bin if neccessary }
void print() { print all the bins }
}
Some quick notes:
In your code, you converted the int size to a float repeatedly, that's not a good idea. In my design that is localized to one place
You'll note that the logic relating to a single item is now contained inside the item itself. Other objects only can see whats important to them, size_required and whether the object is still alive
I've not included anything about sorting stuff because I'm not clear what that is for in a first-fit algorithm.

This interview gives some great insight into the rationale behind the STL. This may give you some inspiration on how to implement your algorithms the STL-way.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How could I refactor this code with performance in mind? - c++

It looks like there really isn't much to optimize in this code - probably most of the optimization can be done in the UseHeuristic functions. What's in them?

Related

How to deal with declaration of the primitive type without the initial value known (C++)?

How to select functions called inside nested loops before getting into loops?

Prevent or detect "this" from being deleted during use

return a vector vs use a parameter for the vector to return it

Bin packing implementation in C++ with STL

Categories

Resources