I want to split a vector into small vectors, process each of them separately on a thread, then merge them. I want to use std::async for creating threads and my code looks something like this
void func(std::vector<int>& vec)
{
//do some stuff
}
// Calling part
std::vector<std::future<void>> futures;
std::vector<std::vector<int>> temps;
for (int i = 1; i <= threadCount; ++i)
{
auto& curBegin = m_vec.begin() + (i - 1) * size / threadCount;
auto& curEnd = m_vec.begin() + i * size / threadCount;
std::vector<int> tmp(curBegin, curEnd);
temps.push_back(std::move(tmp));
futures.push_back(std::async(std::launch::async, &func, std::ref(temps.back())));
}
for (auto& f : futures)
{
f.wait();
}
std::vector<int> finalVector;
for (int i = 0; i < temps.size() - 1; ++i)
{
std::merge(temps[i].begin(), temps[i].end(), temps[i + 1].begin(), temps[i + 1].end(), std::back_inserter(finalVector));
}
Here m_vec is the main vector, which is being split into small vectors.
The problem is that when I pass a vector to func(), in function it becomes invalid, either size is 0 or with invalid elements. But when I try to call the function without std::async everything works fine.
So what's the problem with std::async and is there anything special that I should do?
Thank you for your time!
If a reallocation happens while you're iteratively expanding the temps vector, then it's very likely std::ref(temps.back()) that the thread operates on is referencing an already invalidated memory area. You can avoid relocations by reserving the memory prior to consecutive push_backs:
temps.reserve(threadCount);
Piotr S. has already given a correct answer with a solution, I will just add some explanation.
So what's the problem with std::async and is there anything special that I should do?
The problem is not with async.
You would get exactly the same effect if you did:
std::vector<std::function<void()>> futures;
// ...
for (int i = 1; i <= threadCount; ++i)
{
// ...
futures.push_back(std::bind(&func, std::ref(temps.back())));
}
for (auto f : futures)
f();
In this version I don't use async, I create several function objects then run them all one by one. You will see the same problem with this code, which is that the function objects (or in your case, the tasks being run by async) hold references to vector elements that get destroyed when you insert into temps and cause it to reallocate.
To solve the problem you need to ensure the elements in temps are stable, i.e. do not get destroyed and recreated at a different location, as Piotr S shows in his answer.
Related
This is a continuation of my previous question: Nested vector<float> and reference manipulation.
I got the loops and all working, but I'm trying to add new instances of arrays to a total vector.
Here's one example of what I mean:
array<float, 3> monster1 = { 10.5, 8.5, 1.0 };
// ...
vector<array<float, 3>*> pinkys = { &monster1};
// ...
void duplicateGhosts() {
int count = 0;
int i = pinkys.size(); // this line and previous avoid overflow
array<float, 3>& temp = monster1; // this gets the same data, but right now it's just a reference
for (auto monster : pinkys) { // for each array of floats in the pinkys vector,
if (count >= i) // if in this instance of duplicateGhosts they've all been pushed back,
break;
pinkys.push_back(&temp); // this is where I want to push_back a new instance of an array
count++;
}
}
With the current code, instead of creating a new monster, it is adding a reference to the original monster1 and therefore affecting its behavior.
As mentioned in a comment you cannot insert elements to a container you are iterating with a range based for loop. That is because the range based for loop stops when it reaches pinkys.end() but that iterator gets invalidated once you call pinkys.push_back(). It is not clear why you are iterating pinkys in the first place. You aren't using monster (a copy of the elements in the vector) in the loop body.
The whole purpose of the loop seems to be to have as many iterations as there are already elements in the container. For that you need not iterate elements of pinkys but you can do:
auto old_size = pinkys.size();
for (size_t i=0; i < old_size; ++i) {
// add elements
}
Further, it is not clear why you are using a vector of pointers. Somebody has to own the monsters in the vector. If it isnt anybody else, it is the vector. And in that case you should use a std::vector<monster>. For shared ownership you should use std::shared_ptr. Never use owning raw pointers!
Don't use a plain array for something that you can give a better name:
struct monster {
float hitpoints; // or whatever it actually is.
float attack; // notice how this is much clearer
float defense; // than using an array?
};
With those modifications the method could look like this:
void duplicateGhosts() {
auto old_size = pinkys.size();
for (size_t i=0; i < old_size; ++i) {
pinkys.push_back( pinkys[i] );
}
}
From the name of the method I assumed you want to duplciate the vectors elements. If you want to just add the same monster as many times as there were elements before, that is
void duplicateGhosts() {
auto old_size = pinkys.size();
for (size_t i=0; i < old_size; ++i) {
pinkys.push_back( monster{} );
}
}
c++ newbie here. So for an assignment I have to rotate all the elements in a vector to the left one. So, for instance, the elements {1,2,3} should rotate to {2,3,1}.
I'm researching how to do it, and I saw the rotate() function, but I don't think that will work given my code. And then I saw a for loop that could do it, but I'm not sure how to translate that into a return statement. (i tried to adjust it and failed)
This is what I have so far, but it is very wrong (i haven't gotten a single result that hasn't ended in an error yet)
Edit: The vector size I have to deal with is just three, so it doesn't need to account for any sized vector
#include <vector>
using namespace std;
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result;
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{
v.at(i) = v.at(i + 1);
result.at(i) = v.at(i);
}
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
All my teacher does it upload textbook pages that explain what certain parts of code are supposed to do but the textbook pages offer NO help in trying to figure out how to actually apply this stuff.
So could someone please give me a few pointers?
Since you know exactly how many elements you have, and it's the smallest number that makes sense to rotate, you don't need to do anything fancy - just place the items in the order that you need, and return the result:
vector<int> rotate3(const vector<int>& x) {
return vector<int> { x[1], x[2], x[0] };
}
Note that if your collection always has three elements, you could use std::array instead:
std::array<int,3>
First, just pay attention that you have passed v as const reference (const vector<int>&) so you are forbbiden to modify the state of v in v.at(i) = v.at(i + 1);
Although Sergey has already answered a straight forward solution, you could correct your code like this:
#include <vector>
using namespace std;
vector<int> left_rotate(const vector<int>& v)
{
vector<int> result;
int size = v.size(); // this way you are able to rotate vectors of any size
for (auto i = 1; i < size; ++i)
result.push_back(v.at(i));
// adding first element of v at end of result
result.push_back(v.front());
return result;
}
Use Sergey's answer. This answer deals with why what the asker attempted did not work. They're damn close, so it's worth going though it, explaining the problems, and showing how to fix it.
In
v.at(i) = v.at(i + 1);
v is constant. You can't write to it. The naïve solution (which won't work) is to cut out the middle-man and write directly to the result vector because it is NOT const
result.at(i) = v.at(i + 1);
This doesn't work because
vector<int> result;
defines an empty vector. There is no at(i) to write to, so at throws an exception that terminates the program.
As an aside, the [] operator does not check bounds like at does and will not throw an exception. This can lead you to thinking the program worked when instead it was writing to memory the vector did not own. This would probably crash the program, but it doesn't have to1.
The quick fix here is to ensure usable storage with
vector<int> result(v.size());
The resulting code
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result(v.size()); // change here to size the vector
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{
result.at(i) = v.at(i + 1); // change here to directly assign to result
}
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
almost works. But when we run it on {1, 2, 3} result holds {2, 3, 0} at the end. We lost the 1. That's because v.at(i + 1) never touches the first element of v. We could increase the number of for loop iterations and use the modulo operator
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result(v.size());
int size = 3;
for (auto i = 0; i < size; ++i) // change here to iterate size times
{
result.at(i) = v.at((i + 1) % size); // change here to make i + 1 wrap
}
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
and now the output is {2, 3, 1}. But it's just as easy, and probably a bit faster, to just do what we were doing and tack on the missing element after the loop.
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
vector<int> result(v.size());
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{
result.at(i) = v.at(i + 1);
}
result.at(size - 1) = v.at(0); // change here to store first element
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
Taking this a step further, the size of three is an unnecessary limitation for this function that I would get rid of and since we're guaranteeing that we never go out of bounds in our for loop, we don't need the extra testing in at
vector<int> rotate(const vector<int>& v)
{
// PUT CODE BELOW THIS LINE. DON'T CHANGE ANYTHING ABOVE.
if (v.empty()) // nothing to rotate.
{
return vector<int>{}; // return empty result
}
vector<int> result(v.size());
for (size_t i = 0; i < v.size() - 1; ++i) // Explicitly using size_t because
// 0 is an int, and v.size() is an
// unsigned integer of implementation-
// defined size but cannot be larger
// than size_t
// note v.size() - 1 is only safe because
// we made sure v is not empty above
// otherwise 0 - 1 in unsigned math
// Becomes a very, very large positive
// number
{
result[i] = v[i + 1];
}
result.back() = v.front(); // using direct calls to front and back because it's
// a little easier on my mind than math and []
return result;
// PUT CODE ABOVE THIS LINE. DON'T CHANGE ANYTHING BELOW.
}
We can go further still and use iterators and range-based for loops, but I think this is enough for now. Besides at the end of the day, you throw the function out completely and use std::rotate from the <algorithm> library.
1This is called Undefined Behaviour (UB), and one of the most fearsome things about UB is anything could happen including giving you the expected result. We put up with UB because it makes for very fast, versatile programs. Validity checks are not made where you don't need them (along with where you did) unless the compiler and library writers decide to make those checks and give guaranteed behaviour like an error message and crash. Microsoft, for example, does exactly this in the vector implementation in the implementation used when you make a debug build. The release version of Microsoft's vector make no checks and assumes you wrote the code correctly and would prefer the executable to be as fast as possible.
I saw the rotate() function, but I don't think that will work given my code.
Yes it will work.
When learning there is gain in "reinventing the wheel" (e.g. implementing rotate yourself) and there is also gain in learning how to use the existing pieces (e.g. use standard library algorithm functions).
Here is how you would use std::rotate from the standard library:
std::vector<int> rotate_1(const std::vector<int>& v)
{
std::vector<int> result = v;
std::rotate(result.begin(), result.begin() + 1, result.end());
return result;
}
I'm working with a huge amount of data stored in an array, and am trying to optimize the amount of time it takes to access and modify it. I'm using Window, c++ and VS2015 (Release mode).
I ran some tests and don't really understand the results I'm getting, so I would love some help optimizing my code.
First, let's say I have the following class:
class foo
{
public:
int x;
foo()
{
x = 0;
}
void inc()
{
x++;
}
int X()
{
return x;
}
void addX(int &_x)
{
_x++;
}
};
I start by initializing 10 million pointers to instances of that class into a std::vector of the same size.
#include <vector>
int count = 10000000;
std::vector<foo*> fooArr;
fooArr.resize(count);
for (int i = 0; i < count; i++)
{
fooArr[i] = new foo();
}
When I run the following code, and profile the amount of time it takes to complete, it takes approximately 350ms (which, for my purposes, is far too slow):
for (int i = 0; i < count; i++)
{
fooArr[i]->inc(); //increment all elements
}
To test how long it takes to increment an integer that many times, I tried:
int x = 0;
for (int i = 0; i < count; i++)
{
x++;
}
Which returns in <1ms.
I thought maybe the number of integers being changed was the problem, but the following code still takes 250ms, so I don't think it's that:
for (int i = 0; i < count; i++)
{
fooArr[0]->inc(); //only increment first element
}
I thought maybe the array index access itself was the bottleneck, but the following code takes <1ms to complete:
int x;
for (int i = 0; i < count; i++)
{
x = fooArr[i]->X(); //set x
}
I thought maybe the compiler was doing some hidden optimizations on the loop itself for the last example (since the value of x will be the same during each iteration of the loop, so maybe the compiler skips unnecessary iterations?). So I tried the following, and it takes 350ms to complete:
int x;
for (int i = 0; i < count; i++)
{
fooArr[i]->addX(x); //increment x inside foo function
}
So that one was slow again, but maybe only because I'm incrementing an integer with a pointer again.
I tried the following too, and it returns in 350ms as well:
for (int i = 0; i < count; i++)
{
fooArr[i]->x++;
}
So am I stuck here? Is ~350ms the absolute fastest that I can increment an integer, inside of 10million pointers in a vector? Or am I missing some obvious thing? I experimented with multithreading (giving each thread a different chunk of the array to increment) and that actually took longer once I started using enough threads. Maybe that was due to some other obvious thing I'm missing, so for now I'd like to stay away from multithreading to keep things simple.
I'm open to trying containers other than a vector too, if it speeds things up, but whatever container I end up using, I need to be able to easily resize it, remove elements, etc.
I'm fairly new to c++ so any help would be appreciated!
Let's look from the CPU point of view.
Incrementing an integer means I have it in a CPU register and just increments it. This is the fastest option.
I'm given an address (vector->member) and I must copy it to a register, increment, and copy the result back to the address. Worst: My CPU cache is filled with vector pointers, not with vector-member pointers. Too few hits, too much cache "refueling".
If I could manage to have all those members just in a vector, CPU cache hits would be much more frequent.
Try the following:
int count = 10000000;
std::vector<foo> fooArr;
fooArr.resize(count, foo());
for (auto it= fooArr.begin(); it != fooArr.end(); ++it) {
it->inc();
}
The new is killing you and actually you don't need it because resize inserts elements at the end if the size it's greater (check the docs: std::vector::resize)
And the other thing it's about using pointers which IMHO should be avoided until the last moment and it's uneccesary in this case. The performance should be a little bit faster in this case since you get better locality of your references (see cache locality). If they were polymorphic or something more complicated it might be different.
I am using 3 threads to chunk a for loop, and the 'data' is a global array so I want to lock that part in 'calculateAll' function,
std::vector< int > calculateAll(int ***data,std::vector<LineIndex> indexList)
{
std::vector<int> v_a=std::vector<int>();
for(int a=0;a<indexList.size();a++)
{
mylock.lock();
v_b.push_back(/*something related with data*/);
mylock.unlock();
v_a.push_back(a);
}
return v_a;
}
for(int i=0;i<3;i++)
{
int s =firstone+i*chunk;
int e = ((s+chunk)<indexList.size())? (s+chunk) : indexList.size();
t[i]=std::thread(calculateAll,data,indexList,s,e);
}
for (int i = 0; i < 3; ++i)
{
t[i].join();
}
my question is, how can I get the return value which is a vector from each thread and then combine them together? The reason I want to do that is because if I declare ’v_a‘ as a global vector, when each thread trys to push_back their value in this vector 'v_a' it will be some crash(or not?).So I am thinking to declare a vector for each of the thread, then combine them into a new vector for further use( like I do it without thread ).
Or is there some better methods to deal with that concurrency problem? The order of 'v_a' does not matter.
I appreciate for any suggestion.
First of all, rather than the explicit lock() and unlock() seen in your code, always use std::lock_guard where possible. Secondly, it would be better you use std::futures to this thing. Launch each thread with std::async then get the results in another loop while aggregating the results at the same time. Like this:
using Vector = std::vector<int>;
using Future = std::future<Vector>;
std::vector<Future> futures;
for(int i=0;i<3;i++)
{
int s =firstone+i*chunk;
int e = ((s+chunk)<indexList.size())? (s+chunk) : indexList.size();
auto fut = std::async(std::launch::async, calculateAll, data, indexList, s, e);
futures.push_back( std::move(fut) );
}
//Combine the results
std::vector<int> result;
for(auto& fut : futures){ //Iterate through in the order the future was created
auto vec = fut.get(); //Get the result of the future
result.insert(result.end(), vec.begin(), vec.end()); //append it to results in order
}
Here is a minimal, complete and working example based on your code - which demonstrates what I mean: Live On Coliru
Create a structure with a vector and a lock. Pass an instance of that structure, with the lock pre-locked, to each thread as it starts. Wait for all locks to become unlocked. Each thread then does its work and unlocks its lock when done.
I've been working on state space exploration and was originally using a map to store the assignment of the world states like map<Variable *, int>, where variables are objects in the world with a domain from 0 to n where n is finite. The implementation was extremely quick for performance, but I noticed that it does not scale well with the size of the state space. I changed the states to use vector<int> instead, where I use the id of a variable to find its index in the vector. Memory usage improved greatly, but the efficiency of the solver has tanked (gone from <30 seconds to 400+). The only code that I modified was generating the states and validating if the state is the goal. I can't figure out why using a vector has degraded performance, especially since the vector operations should only take linear time at worst.
Originally this is was how I generated nodes:
State * SuccessorGen::generate_successor(const Operator &op, map<Variable *, int> &var_assignment){
map<Variable *, int> values;
values.insert(var_assignment.begin(), var_assignment.end());
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
values[eff_it->var] = eff_it->after;
}
return new State(values);
}
And in my new implementation:
State* SuccessorGen::generate_successor(const Operator &op, const vector<int> &assignment){
vector<int> child;
child = assignment;
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
Variable *v = eff_it->var;
int id = v->get_id();
child[id] = eff_it->after;
}
return new State(child);
}
(The goal checking is similar, just looping over the goal assignment instead of operator effects.)
Are these vector operations really that much slower than using a map? Is there an equally efficient STL container I can use that has a lower overhead? The number of variables is relatively small (<50) and the vector never needs to be resized or modified after the for loop.
Edit:
I tried timing one loop through all the operators to see timing comparisons, with the effect list and assignment the vector version runs one loop in 0.3 seconds, while the map version is a little over 0.4 seconds. When I comment that section out the map was about the same, yet the vector jumped up to closer to 0.5 seconds. I added child.reserve(assignment.size()) but that did not make any change.
Edit 2:
From user63710's answer, I've also been digging through the rest of the code and noticed something really strange going on in the heuristic calculation. The vector version works fine, but for the map I use this line Node *n = new Node(i, transition.value, label_cost); open_list.push(n);, but once the loop finishes filling the queue the node gets totally screwed up. Nodes are a simple struct as:
struct Node{
// Source Value, Destination Value
int from;
int to;
int distance;
Node(int &f, int &t, int &d) : from(f), to(t), distance(d){}
};
Instead of having from, to, distance, it replaces from and to with id with some random number, and that search does not do what it should and is returning much faster then it should. When I tweak the map version to convert the map to a vector and run this:
Node n(i, transition.value, label_cost); open_list.push(n);
the performance is about equal to that of the vector. So that fixes my main issue, but this leaves me wondering why using Node *n gets this behaviour opposed to Node n()?
If as you say, the sizes of these structures are fairly small (~50 elements), I have to think that the issue is somewhere else. At least, I don't think it involves the memory accesses or allocation of the vector/map.
Some example code I made to test: Map version:
unique_ptr<map<int, int>> make_successor_map(const vector<int> &ids,
const map<int, int> &input)
{
auto new_map = make_unique<map<int, int>>(input.begin(), input.end());
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_map)[ids[i]], (*new_map)[i]);
return new_map;
}
int main()
{
auto a_map = make_unique<map<int, int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
{
a_map->insert({i, rand()});
ids.push_back(i);
}
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
{
auto temp_map = make_successor_map(ids, *a_map);
swap(temp_map, a_map);
}
cout << a_map->begin()->second << endl;
}
Vector version:
unique_ptr<vector<int>> make_successor_vec(const vector<int> &ids,
const vector<int> &input)
{
auto new_vec = make_unique<vector<int>>(input);
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_vec)[ids[i]], (*new_vec)[i]);
return new_vec;
}
int main()
{
auto a_vec = make_unique<vector<int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
{
a_vec->push_back(rand());
ids.push_back(i);
}
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
{
auto temp_vec = make_successor_vec(ids, *a_vec);
swap(temp_vec, a_vec);
}
cout << *a_vec->begin() << endl;
}
The map version takes around 15 seconds to run on my old Core 2 Duo T9600, and the vector version takes 0.406 seconds. Both we're compiled on G++ 4.9.2 with g++ -O3 --std=c++1y. So if your code takes 0.4s per iteration (note that it took my example code 0.4s for 1 million calls), then I'm really thinking your problem is somewhere else.
That's not to say you aren't having performance decreases due to switching from map->vector, but that the code you posted doesn't show much reason for that to happen.
The problem is that you create vectors without reserving space. Vectors store elements contiguously. That ensures constant access to elements.
So everytime you add an item to the vector (for example via your inserter), the vector has to reallocate more space and eventuelly move all the existing elements to a reallocated memory location. This causes slowdown and considerable heap fragmentation.
The solution to this is to reserve() elements if you know in advance how many elements you'll have. Or if you don't reserve() larger chunks and compare size() and capacity() to check if it's time to reserve more.