Optimizing iterative deepening depth first search in C++ - c++

I am using an implementation of the IDDFS graph algorithm in a project and I noticed that for each node, the function takes around 8 - 10 seconds to compute. And, with a very large dataset with over 7000 nodes, it is proving to be the most cumbersome. So, I am trying to optimize my code so that it will run faster but all I have found so far is to change my function call by value parameters to being called by reference which has made a decent difference.
Are there any other ways I could fasten my code? I am compiling using c++11.
// Utility DFS function -- returns DFS Path if it exists, -1 if not exists
vector<int> dfs_util(vector<int> path, int target, vector<vector<int>> &adj_list, int depth)
{
int curr_node = path.back();
if (curr_node == target)
return path;
if (depth <= 0)
{
vector<int> tmp;
tmp.push_back(CUST_NULL);
return tmp;
}
for (auto child : adj_list[curr_node])
{
vector<int> new_path = path;
new_path.push_back(child);
vector<int> result = dfs_util(new_path, target, adj_list, depth - 1);
if (result.back() != CUST_NULL)
{
return result;
}
}
vector<int> tmp;
tmp.push_back(CUST_NULL);
return tmp;
}
// IDDFS Function -- returns IDDFS Path if it exists, -1 if not
vector<int> iddfs(vector<vector<int>> &adj_list, int src, int target, int max_depth = 2)
{
vector<int> result;
max_depth++;
for (int depth = 0; depth < max_depth; depth++)
{
vector<int> path;
path.push_back(src);
result = dfs_util(path, target, adj_list, depth);
if (result.back() == CUST_NULL || result.size() == 0)
continue;
int final_index = 0;
int idx_count = 0;
for (auto item : result)
{
if (item == src)
final_index = max(final_index, idx_count);
idx_count++;
}
result = vector<int>(result.begin() + final_index, result.end());
}
return result;
}

Here are few points to optimize your code:
The parameter vector<int> path is copied for no apparent reasons. You can pass it by reference.
vector<int> new_path = path; create a new copy of path. This is not needed: you can mutate path (passed by reference) so to add an element before the recursive call and remove it after. You can also reserve some space initially so to avoid re-allocations.
Returning a vector<int> in a recursive function is generally not a good idea in term of performance. This is especially true here since the function sometime ruen a vector of size 1 with CUST_NULL inside so to return a special code. This is very inefficient since std::vector will allocate some memory for that and memory allocations are expensive. You can instead return it by reference or play with std::optional so to return a special code without having to create a new non-empty vector. Note that the mutation-based solution is probably faster.
Be aware that push_back do some check to reallocate the target vector if needed. While this is simple and flexible, this is not actually needed for path since its maximum size is known not to be bigger than a threshold (depth +/- 1). You can fix its size to this threshold and fill it based on an index provided in the recursive function.
Using vector<vector<int>> is not efficient since each item of this data structure (of type vector<int>) are likely stored non-contiguously (they may with some luck due to the way low-level allocator works and depending on how your code fill them). It is more efficient to use a big flat vector<int> virtually split in some parts. The starting location of each part can be put in another vector<size_t> (with n+1 items). This makes the code a bit more complex though.
You can unroll/inline the function code when depth is 1 so to avoid many function calls in that case (and copies in your current code). This strategy is not guaranteed to be faster (though it often is) if the compiled function assembly code becomes huge or if this case never happen in practice (eg. to fit in CPU instruction cache).

Related

Rotate elements in a vector and how to return a vector

c++ newbie here. So for an assignment I have to rotate all the elements in a vector to the left one. So, for instance, the elements {1,2,3} should rotate to {2,3,1}.
I'm researching how to do it, and I saw the rotate() function, but I don't think that will work given my code. And then I saw a for loop that could do it, but I'm not sure how to translate that into a return statement. (i tried to adjust it and failed)
This is what I have so far, but it is very wrong (i haven't gotten a single result that hasn't ended in an error yet)
Edit: The vector size I have to deal with is just three, so it doesn't need to account for any sized vector
#include <vector>
using namespace std;
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result;
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{
v.at(i) = v.at(i + 1);
result.at(i) = v.at(i);
}
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
All my teacher does it upload textbook pages that explain what certain parts of code are supposed to do but the textbook pages offer NO help in trying to figure out how to actually apply this stuff.
So could someone please give me a few pointers?
Since you know exactly how many elements you have, and it's the smallest number that makes sense to rotate, you don't need to do anything fancy - just place the items in the order that you need, and return the result:
vector<int> rotate3(const vector<int>& x) {
return vector<int> { x[1], x[2], x[0] };
}
Note that if your collection always has three elements, you could use std::array instead:
std::array<int,3>
First, just pay attention that you have passed v as const reference (const vector<int>&) so you are forbbiden to modify the state of v in v.at(i) = v.at(i + 1);
Although Sergey has already answered a straight forward solution, you could correct your code like this:
#include <vector>
using namespace std;
vector<int> left_rotate(const vector<int>& v)
{
vector<int> result;
int size = v.size(); // this way you are able to rotate vectors of any size
for (auto i = 1; i < size; ++i)
result.push_back(v.at(i));
// adding first element of v at end of result
result.push_back(v.front());
return result;
}
Use Sergey's answer. This answer deals with why what the asker attempted did not work. They're damn close, so it's worth going though it, explaining the problems, and showing how to fix it.
In
v.at(i) = v.at(i + 1);
v is constant. You can't write to it. The naïve solution (which won't work) is to cut out the middle-man and write directly to the result vector because it is NOT const
result.at(i) = v.at(i + 1);
This doesn't work because
vector<int> result;
defines an empty vector. There is no at(i) to write to, so at throws an exception that terminates the program.
As an aside, the [] operator does not check bounds like at does and will not throw an exception. This can lead you to thinking the program worked when instead it was writing to memory the vector did not own. This would probably crash the program, but it doesn't have to1.
The quick fix here is to ensure usable storage with
vector<int> result(v.size());
The resulting code
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result(v.size()); // change here to size the vector
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{
result.at(i) = v.at(i + 1); // change here to directly assign to result
}
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
almost works. But when we run it on {1, 2, 3} result holds {2, 3, 0} at the end. We lost the 1. That's because v.at(i + 1) never touches the first element of v. We could increase the number of for loop iterations and use the modulo operator
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result(v.size());
int size = 3;
for (auto i = 0; i < size; ++i) // change here to iterate size times
{
result.at(i) = v.at((i + 1) % size); // change here to make i + 1 wrap
}
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
and now the output is {2, 3, 1}. But it's just as easy, and probably a bit faster, to just do what we were doing and tack on the missing element after the loop.
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result(v.size());
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{
result.at(i) = v.at(i + 1);
}
result.at(size - 1) = v.at(0); // change here to store first element
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
Taking this a step further, the size of three is an unnecessary limitation for this function that I would get rid of and since we're guaranteeing that we never go out of bounds in our for loop, we don't need the extra testing in at
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
if (v.empty()) // nothing to rotate.
{
return vector<int>{}; // return empty result
}
vector<int> result(v.size());
for (size_t i = 0; i < v.size() - 1; ++i) // Explicitly using size_t because
// 0 is an int, and v.size() is an
// unsigned integer of implementation-
// defined size but cannot be larger
// than size_t
// note v.size() - 1 is only safe because
// we made sure v is not empty above
// otherwise 0 - 1 in unsigned math
// Becomes a very, very large positive
// number
{
result[i] = v[i + 1];
}
result.back() = v.front(); // using direct calls to front and back because it's
// a little easier on my mind than math and []
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
We can go further still and use iterators and range-based for loops, but I think this is enough for now. Besides at the end of the day, you throw the function out completely and use std::rotate from the <algorithm> library.
1This is called Undefined Behaviour (UB), and one of the most fearsome things about UB is anything could happen including giving you the expected result. We put up with UB because it makes for very fast, versatile programs. Validity checks are not made where you don't need them (along with where you did) unless the compiler and library writers decide to make those checks and give guaranteed behaviour like an error message and crash. Microsoft, for example, does exactly this in the vector implementation in the implementation used when you make a debug build. The release version of Microsoft's vector make no checks and assumes you wrote the code correctly and would prefer the executable to be as fast as possible.
I saw the rotate() function, but I don't think that will work given my code.
Yes it will work.
When learning there is gain in "reinventing the wheel" (e.g. implementing rotate yourself) and there is also gain in learning how to use the existing pieces (e.g. use standard library algorithm functions).
Here is how you would use std::rotate from the standard library:
std::vector<int> rotate_1(const std::vector<int>& v)
{
std::vector<int> result = v;
std::rotate(result.begin(), result.begin() + 1, result.end());
return result;
}

Minimizing variable copies in recursive function

I'm looking for efficient memory allocation when dealing with recursive function. As far as I understand, variables I use in the function will remain allocated in memory until recursion is finished. Is there a way to avoid this as I believe this causes slow run of my code below where state variable is copied every time the function is called (correct me if I'm wrong as I'm new to C++).
#include <fstream>
#include <vector>
using namespace std;
int N = 30;
double MIN_COST = 1000000;
vector<int> MIN_CUT = {};
void minCut(vector<int> state, int index, int nodeValue) {
double currentCost;
if (index >= 0) {
currentCost = getCurrentCost(state); // some magic evaluating state cost
state.push_back(nodeValue);
if (currentCost >= MIN_COST) { // kill branch if incomplete solution is already worse than best achieved solution
return;
}
}
if (index == N - 1) { // check if leaf node
if (currentCost < MIN_COST) {
MIN_COST = currentCost;
MIN_CUT = state;
}
return;
}
minCut(state, index + 1, 1); // left subtree - adding 1 to vector
minCut(state, index + 1, 0); // right subtree - adding 0 to vector
return;
}
int main() {
vector<int> state = {};
minCut(state, -1, NULL);
cout << MIN_COST << "\n";
return 0;
}
Your algorithm is effectively building a tree of paths, but you're using a vector to hold the nodes for each path.
A
/ \
/ \
B C
/ \ / \
D E F G
This is the tree you're traversing.
But you're creating new vectors at every node, which contain the whole path up to that node. So as you're visiting node G, in your stack you have 3 vectors:
vector { A, C, G }
vector { A, C }
vector { A }
It should be clear how this is less efficient as you have noticed, but maybe seeing it this way hints at the correct efficient implementation.
The call stack itself holds the path to the root node. The stack when visiting G would be something like
minCut < visiting G >
minCut < visiting C >
minCut < visiting A >
In order to efficiently exploit this fact, make minCut pass the minimum amount of information. In this case we're talking about something linked-list like.
You have then two options that jump out:
Use vector, but:
Pass it by reference.
And you must then maintain it across calls, pushing and popping nodes to keep synchronized with the actual state.
Use an actual linked list. It should be easy to construct the vector by traversing pointers-to-parent-nodes.
Yes, there is a more efficient way to pass state through each function call. This is called passing by reference and can be achieved like so:
void minCut(vector<int>& state, int index, int nodeValue) { ...
This will result in the original state being referenced instead of copied each time the function is called.
For this to work correctly in the code you posted you will have to make some modifications, this is just the general concept.

Fast algorithm to remove odd elements from vector

Given a vector of integers, I want to wrote a fast (not obvious O(n^2)) algorithm to remove all odd elements from it.
My idea is: iterate through vector till first odd element, then copy everything before it to the end of vector (call push_back method) and so on until we have looked through all original elements (except copied ones), then remove all of them, so that only the vector's tail survive.
I wrote the following code to implement it:
void RemoveOdd(std::vector<int> *data) {
size_t i = 0, j, start, end;
uint l = (*data).size();
start = 0;
for (i = 0; i < l; ++i)
{
if ((*data)[i] % 2 != 0)
{
end = i;
for (j = start, j < end, ++j)
{
(*data).push_back((*data)[j]);
}
start = i + 1;
}
}
(*data).erase((*data).begin(), i);
}
but it gives me lots of errors, which I can't fix. I'm very new to the programming, so expect that all of them are elementary and stupid.
Please help me with error corrections or another algorithm implementation. Any suggestions and explanations will be very appreciative. It is also better not to use algorithm library.
You can use the remove-erase idiom.
data.erase(std::remove_if(data.begin(), data.end(),
[](int item) { return item % 2 != 0; }), data.end());
You don't really need to push_back anything (or erase elements at the front, which requires repositioning all that follows) to remove elements according to a predicate... Try to understand the "classic" inplace removal algorithm (which ultimately is how std::remove_if is generally implemented):
void RemoveOdd(std::vector<int> & data) {
int rp = 0, wp = 0, sz = data.size();
for(; rp<sz; ++rp) {
if(data[rp] % 2 == 0) {
// if the element is a keeper, write it in the "write pointer" position
data[wp] = data[rp];
// increment so that next good element won't overwrite this
wp++;
}
}
// shrink to include only the good elements
data.resize(wp);
}
rp is the "read" pointer - it's the index to the current element; wp is the "write" pointer - it always points to the location where we'll write the next "good" element, which is also the "current length" of the "new" vector. Every time we have a good element we copy it in the write position and increment the write pointer. Given that wp <= rp always (as rp is incremented once at each iteration, and wp at most once per iteration), you are always overwriting either an element with itself (so no harm is done), or an element that has already been examined and either has been moved to its correct final position, or had to be discarded anyway.
This version is done with specific types (vector<int>), a specific predicate, with indexes and with "regular" (non-move) assignment, but can be easily generalized to any container with forward iterators (as its done in std::remove_if) and erase.
Even if the generic standard library algorithm works well in most cases, this is still an important algorithm to keep in mind, there are often cases where the generic library version isn't sufficient and knowing the underlying idea is useful to implement your own version.
Given pure algorithm implementation, you don't need to push back elements. In worst case scenario, you will do more than n^2 copy. (All odd data)
Keep two pointers: one for iterating (i), and one for placing. Iterate on all vector (i++), and if *data[I] is even, write it to *data[placed] and increment placed. At the end, reduce length to placed, all elements after are unecessary
remove_if does this for you ;)
void DeleteOdd(std::vector<int> & m_vec) {
int i= 0;
for(i= 0; i< m_vec.size(); ++i) {
if(m_vec[i] & 0x01)
{
m_vec.erase(m_vec.begin()+i);
i--;
}
}
m_vec.resize(i);
}

vector size remaining static after pushback() calls for powerset function

I wrote the following function, as an implementation of this algorithm/approach, to generate the power-set (set of all subsets) of a given string:
vector<string> getAllSubsets(string a, vector<string> allSubsets)
{
if(a.length() == 1)
{
// Base case,
allSubsets.push_back("");
allSubsets.push_back(a);
}
else {
vector<string> temp = getAllSubsets(a.substr(0,a.length()-1),allSubsets);
vector<string> with_n = temp;
vector<string> without_n = temp;
for(int i = 0;i < temp.size()-1;i++)
{
allSubsets.push_back(with_n[i] + a[a.length()-1]);
allSubsets.push_back(without_n[i]);
}
}
return allSubsets;
}
however, someone appears to be going wrong: the size of temp and allSubsets remains static from recursive call to recursive call, when they should be increasing due to the push_back() calls. is there any reason why this would take place?
It's because you have an off-by-one error. Because this occurs in your next-to-base case, you are never inserting any entries.
Since the first invalid index is temp.size(), i < temp.size() means that you will always have a valid index. Subtracting 1 means that you are missing the last element of the vector.
It's worth noting that passing allSubsets in as a parameter is kinda silly because it's always empty. This kind of algorithm simply doesn't require a second parameter. And secondly, you could be more efficient using hash sets that can perform deduplication for you simply and quickly.

Vector performance suffering

I've been working on state space exploration and was originally using a map to store the assignment of the world states like map<Variable *, int>, where variables are objects in the world with a domain from 0 to n where n is finite. The implementation was extremely quick for performance, but I noticed that it does not scale well with the size of the state space. I changed the states to use vector<int> instead, where I use the id of a variable to find its index in the vector. Memory usage improved greatly, but the efficiency of the solver has tanked (gone from <30 seconds to 400+). The only code that I modified was generating the states and validating if the state is the goal. I can't figure out why using a vector has degraded performance, especially since the vector operations should only take linear time at worst.
Originally this is was how I generated nodes:
State * SuccessorGen::generate_successor(const Operator &op, map<Variable *, int> &var_assignment){
map<Variable *, int> values;
values.insert(var_assignment.begin(), var_assignment.end());
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
values[eff_it->var] = eff_it->after;
}
return new State(values);
}
And in my new implementation:
State* SuccessorGen::generate_successor(const Operator &op, const vector<int> &assignment){
vector<int> child;
child = assignment;
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
Variable *v = eff_it->var;
int id = v->get_id();
child[id] = eff_it->after;
}
return new State(child);
}
(The goal checking is similar, just looping over the goal assignment instead of operator effects.)
Are these vector operations really that much slower than using a map? Is there an equally efficient STL container I can use that has a lower overhead? The number of variables is relatively small (<50) and the vector never needs to be resized or modified after the for loop.
Edit:
I tried timing one loop through all the operators to see timing comparisons, with the effect list and assignment the vector version runs one loop in 0.3 seconds, while the map version is a little over 0.4 seconds. When I comment that section out the map was about the same, yet the vector jumped up to closer to 0.5 seconds. I added child.reserve(assignment.size()) but that did not make any change.
Edit 2:
From user63710's answer, I've also been digging through the rest of the code and noticed something really strange going on in the heuristic calculation. The vector version works fine, but for the map I use this line Node *n = new Node(i, transition.value, label_cost); open_list.push(n);, but once the loop finishes filling the queue the node gets totally screwed up. Nodes are a simple struct as:
struct Node{
// Source Value, Destination Value
int from;
int to;
int distance;
Node(int &f, int &t, int &d) : from(f), to(t), distance(d){}
};
Instead of having from, to, distance, it replaces from and to with id with some random number, and that search does not do what it should and is returning much faster then it should. When I tweak the map version to convert the map to a vector and run this:
Node n(i, transition.value, label_cost); open_list.push(n);
the performance is about equal to that of the vector. So that fixes my main issue, but this leaves me wondering why using Node *n gets this behaviour opposed to Node n()?
If as you say, the sizes of these structures are fairly small (~50 elements), I have to think that the issue is somewhere else. At least, I don't think it involves the memory accesses or allocation of the vector/map.
Some example code I made to test: Map version:
unique_ptr<map<int, int>> make_successor_map(const vector<int> &ids,
const map<int, int> &input)
{
auto new_map = make_unique<map<int, int>>(input.begin(), input.end());
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_map)[ids[i]], (*new_map)[i]);
return new_map;
}
int main()
{
auto a_map = make_unique<map<int, int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
{
a_map->insert({i, rand()});
ids.push_back(i);
}
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
{
auto temp_map = make_successor_map(ids, *a_map);
swap(temp_map, a_map);
}
cout << a_map->begin()->second << endl;
}
Vector version:
unique_ptr<vector<int>> make_successor_vec(const vector<int> &ids,
const vector<int> &input)
{
auto new_vec = make_unique<vector<int>>(input);
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_vec)[ids[i]], (*new_vec)[i]);
return new_vec;
}
int main()
{
auto a_vec = make_unique<vector<int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
{
a_vec->push_back(rand());
ids.push_back(i);
}
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
{
auto temp_vec = make_successor_vec(ids, *a_vec);
swap(temp_vec, a_vec);
}
cout << *a_vec->begin() << endl;
}
The map version takes around 15 seconds to run on my old Core 2 Duo T9600, and the vector version takes 0.406 seconds. Both we're compiled on G++ 4.9.2 with g++ -O3 --std=c++1y. So if your code takes 0.4s per iteration (note that it took my example code 0.4s for 1 million calls), then I'm really thinking your problem is somewhere else.
That's not to say you aren't having performance decreases due to switching from map->vector, but that the code you posted doesn't show much reason for that to happen.
The problem is that you create vectors without reserving space. Vectors store elements contiguously. That ensures constant access to elements.
So everytime you add an item to the vector (for example via your inserter), the vector has to reallocate more space and eventuelly move all the existing elements to a reallocated memory location. This causes slowdown and considerable heap fragmentation.
The solution to this is to reserve() elements if you know in advance how many elements you'll have. Or if you don't reserve() larger chunks and compare size() and capacity() to check if it's time to reserve more.