Let Action be a class with a is_finished method and a numeric tag property.
Let this->vactions be a std::vector<Action>
The intent is to iterate the vector and identify those Actions who are finished,
store their tags in a std::vector<unsigned int> and delete the actions.
I tried to play with lambdas and a little and came up with a little
code that read nicely but caused memory corruptions. The "extended" version,
on the other hand, works as expected.
I suspect foul play in the remove_if part, but for the life of me I can't figure
out what's wrong.
Here's the example code.
This causes memory corruptions
std::vector<unsigned int> tags;
auto is_finished=[p_delta](Action& action) -> bool {return action.is_finished();};
//This is supposed to put the finished actions at the end of the vector and return
//a iterator to the first element that is finished.
std::vector<Action>::iterator nend=remove_if(this->vactions.begin(), this->vactions.end(), is_finished);
auto store_tag=[&tags](Action& action)
{
if(action->has_tag())
{
tags.push_back(action->get_tag());
}
};
//Store the tags...
for_each(nend, this->vactions.end(), store_tag);
//Erase the finished ones, they're supposed to be at the end.
this->vaction.erase(nend, this->vaction.end());
if(tags.size())
{
auto do_something=[this](unsigned int tag){this->do_something_with_tag(tag);};
for_each(tags.begin(), tags.end(), do_something);
}
This, on the other side, works as expected
std::vector<Action>::iterator ini=this->vactions.begin(),
end=this->vactions.end();
std::vector<unsigned int> tags;
while(ini < end)
{
if( (*ini).is_finished())
{
if((*ini).has_tag())
{
tags.push_back((*ini).get_tag());
}
ini=this->vaction.erase(ini);
end=this->vaction.end();
}
else
{
++ini;
}
}
if(tags.size())
{
auto do_something=[this](unsigned int tag){this->do_something_with_tag(tag);};
for_each(tags.begin(), tags.end(), do_something);
}
I am sure there's some rookie mistake here. Can you help me spot it?.
I thought that the for_each could be updating my nend iterator but found
no information about it. What if it did? Could the vector try to erase beyond the "end" point?.
std::remove_if does not preserve the values of the elements that are to be removed (See cppreference). Either get the tag values before calling remove_if - as you do in the second case - or use std::partition instead.
Related
I am relatively new to modern c++ and working with a foreign code base. There is a function that takes a std::unordered_map and checks to see if a key is present in the map. The code is roughly as follows
uint32_t getId(std::unordered_map<uint32_t, uint32_t> &myMap, uint32_t id)
{
if(myMap.contains(id))
{
return myMap.at(id);
}
else
{
std::cerr << "\n\n\nOut of Range error for map: "<< id << "\t not found" << std::flush;
exit(74);
}
}
It seems like calling contains() followed by at() is inefficient since it requires a double lookup. So, my question is, what is the most efficient way to accomplish this? I also have a followup question: assuming the map is fairly large (~60k elements) and this method gets called frequently how problematic is the above approach?
After some searching, it seems like the following paradigms are more efficient than the above, but I am not sure which would be best.
Calling myMap.at() inside of a try-catch construct
Pros: at automatically throws an error if the key does not exist
Cons: try-catch is apparently fairly costly and also constrains what the optimizer can do with the code
Use find
Pros: One call, no try-catch overhead
Cons: Involves using an iterator; more overhead than just returning the value
auto findit = myMap.find(id);
if(findit == myMap.end())
{
//error message;
exit(74);
}
else
{
return findit->first;
}
You can do
// stuff before
{
auto findit = myMap.find(id);
if ( findit != myMap.end() ) {
return findit->first;
} else {
exit(74);
}
}
// stuff after
or with the new C++17 init statement syntax
// stuff before
if ( auto findit = myMap.find(id); findit != myMap.end() ) {
return findit->first;
} else {
exit(74);
}
// stuff after
Both define the iterator reference only in local scope. As the interator use is most definitively optimized away, I would go with it. Doing a second hash calculation will be slower almost for sure.
Also note that findit->first returns the key not the value. I was not sure what you expect the code to do, but one of the code snippets in the question returns the value, while the other one returns the key
In case you don't get enough speedup within only removing the extra lookup operation and if there are millions of calls to getId in a multi-threaded program, then you can use an N-way map to be able to parallelize the id-checks:
template<int N>
class NwayMap
{
public:
NwayMap(uint32_t hintMaxSize = 60000)
{
// hint about max size to optimize initial allocations
for(int i=0;i<N;i++)
shard[i].reserve(hintMaxSize/N);
}
void addIdValuePairThreadSafe(const uint32_t id, const uint32_t val)
{
// select shard
const uint32_t selected = id%N; // can do id&(N-1) for power-of-2 N value
std::lock_guard<std::mutex> lg(mut[selected]);
auto it = shard[selected].find(id);
if(it==shard[selected].end())
{
shard[selected].emplace(id,val);
}
else
{
// already added, update?
}
}
uint32_t getIdMultiThreadSafe(const uint32_t id)
{
// select shard
const uint32_t selected = id%N; // can do id&(N-1) for power-of-2 N value
// lock only the selected shard, others can work in parallel
std::lock_guard<std::mutex> lg(mut[selected]);
auto it = shard[selected].find(id);
// we expect it to be found, so get it quicker
// without going "else"
if(it!=shard[selected].end())
{
return it->second;
}
else
{
exit(74);
}
}
private:
std::unordered_map<uint32_t, uint32_t> shard[N];
std::mutex mut[N];
};
Pros:
if you serve each shard's getId from their own CPU threads, then you benefit from N*L1 cache size.
even within single thread use case, you can still interleave multiple id-check operations and benefit from instruction-level-parallelism because checking id 0 would have different independent code path than checking id 1 and CPU could do out-of-order execution on them (if pipeline is long enough)
Cons:
if a lot of checks from different threads collide, their operations are serialized and the locking mechanism causes extra latency
when id values are mostly strided, the parallelization is not efficient due to unbalanced emplacement
Calling myMap.at() inside of a try-catch construct
Pros: at automatically throws an error if the key does not exist
Cons: try-catch is apparently fairly costly and also constrains what the optimizer can do with the code
Your implementation of getId terminates application, so who cares about exception overheads?
Please note that most compilers (AFAIK all) implement C++ exceptions to have zero cost when exception is not thrown. Problem appears when stack is unwinded when exception is thrown and matching exception handler. I read somewhere that penalty when exception is thrown is x40 comparing to case when stack is unwinded by simple returns (with possible error codes).
Since you want to just terminate application then this overhead is negligible.
I've discovered a stange behaviour and I don't know, why it is as it is. Look at this code here:
std::map<size_t, std::vector<Resource*>> bla;
for(int i = 0; i<100000; i++) {
std::vector<Resource*> blup;
blup.push_back(new Resource());
bla[i] = blup;
}
for (auto& resources : bla) {
for (auto resource : resources.second) {
delete resource; // <---- this delete here
}
resources.second.clear();
}
bla.clear(); // <---- this clear here
If I run this program in eclipse-debugger, the clear() on the last line took several seconds (too long in my opinion). For bigger maps (>10M elements) it needs up to several minutes(!).
But if I comment out the delete statement in the inner loop, then the clear() function become very fast (as fast as I expect it to be).
Without debugger the code is fast in both cases (with and without delete).
The class "Resource" is a small container class, containinig 2 uint64_t's and a std::string (and ofcourse some methods).
Why is clear() slow? And why it speeds up, if I don't delete the pointers. I don't understand it, because I even clear the vectors in the map. So the map should not see any pointers, it is a map between uint64_t and std::vectors.
I am using MinGw_64 for Windows (msys64, msys2-x86_64-20210419), and Eclipse.
Now I have switched to map-only without vectors. And all is fine with this. Clear is super fast, even, if I delete the pointers. But without vectors I have to think about my algorithm a little bit.
CppCheck suggest me to replace one of my code by a STL algorithm, I'm not against it, but I don't know how to replace it. I'm pretty sure this is a bad suggestion (There is warning about experimental functionalities in CppCheck).
Here is the code :
/* Cutted beginning of the function ... */
for ( const auto & program : m_programs )
{
if ( program->compare(vertexShader, tesselationControlShader, tesselationEvaluationShader, geometryShader, fragmentShader) )
{
TraceInfo(Classname, "A program has been found matching every shaders.");
return program;
}
}
return nullptr;
} /* End of the function */
And near the if condition I got : "Consider using std::find_if algorithm instead of a raw loop."
I tried to use it, but I can't get the return working anymore... Should I ignore this suggestion ?
I suppose you may need to use that finding function not once. So, according to DRY, you need to separate the block where you invoke an std::find_if algorithm to a distinct wrapper function.
{
// ... function beginning
auto found = std::find_if(m_programs.cbegin(), m_programs.cend(),
[&](const auto& prog)
{
bool b = prog->compare(...);
if (b)
TraceInfo(...);
return b;
});
if (found == m_programs.cend())
return nullptr;
return *found;
}
The suggestion is good. An STL algorithm migth be able to choose an appropriate
approach based on your container type.
Furthermore, I suggest you to use a self-balancing container like an std::set.
// I don't know what kind of a pointer you use.
using pProgType = std::shared_pointer<ProgType>;
bool compare_progs(const pProgType &a, const pProgType &b)
{
return std::less(*a, *b);
}
std::set<std::shared_pointer<prog_type>,
std::integral_constant<decltype(&compare_progs), &compare_progs>> progs.
This is a sorted container, so you will spend less time for searching a program by a value, given you implement a compare operator (which is invoked by std::less).
If you can use an stl function, use it. This way you will not have to remember what you invented, because stl is properly documented and safe to use.
I am iterating a map where I need to add elements on that map depending on a condition that an element is not found (it could be any other condition).
My main problem is that with a big scale of updates to be added, the application takes the whole CPU and all the memory.
State Class:
class State {
int id;
int timeStamp;
int state;
}
Method in State:
void State::updateStateIfTimeStampIsHigher(const State& state) {
if (this->id == state.getId() && state.getTimeStamp() > this->getTimeStamp()) {
this->timeStamp = state.getTimeStamp();
this->state = state.getState();
}
}
Loop Code:
std::map<int, State> data;
const std::map<int, State>& update;
for (auto const& updatePos : update) {
if (updatePos.first != this->toNodeId) {
std::map<int, State>::iterator message = data.find(updatePos.first);
if (message != data.end() && message->first) {
message->second.updateStateIfTimeStampIsHigher(updatePos.second);
} else {
data.insert(std::make_pair(updatePos.first, updatePos.second));
}
}
}
Watching KCacheGrind data it looks like the data.insert() line takes most time / memory. I am new to KCacheGrind, but this line seemed to be around 72% of the cost.
Do you have any suggestions on how to improve this?
Your question is quite general, but I see tho things to make it run faster:
Use hinted insertion / emplacement. When you add new element its iterator is returned. Assuming that both maps are ordered in same fashion you can tell where was the last one inserted so lookup should be faster (could use some benchmarking here).
Use emplace_hint for faster insertion
Sample code here:
std::map<int, long> data;
const std::map<int, long> update;
auto recent = data.begin();
for (auto const& updatePos : update) {
if (updateElemNotFound) {
recent = data.emplace_hint(recent, updatePos);
}
}
Also, if you want to trade CPU over memory you could use unordered_map (Is there any advantage of using map over unordered_map in case of trivial keys?), but first dot would not matter anymore.
I could find a satisfying answer thanks to researching the comments to the question. It did help a little bit to change from map to unordered_map but I still got unsatisfying results.
I ended up using Google's sparsehash that provides a better resource usage despite some drawbacks from erasing entries (which I do).
The code solution is as follows. First I include the required library:
#include <sparsehash/sparse_hash_map>
Then, my new data definition looks like:
struct eqint {
bool operator()(int i1, int i2) const {
return i1 == i2;
}
};
google::sparse_hash_map<int, State, std::tr1::hash<int>, eqint> data;
Since I have to use "erase" I have to do this after the sparsemap construction:
data.clear_deleted_key();
data.set_deleted_key(-1);
Finally my loop code changes very little:
for (auto const& updatePos : update) {
if (updatePos.first != this->toNodeId) {
google::sparse_hash_map<int, State, std::tr1::hash<int>, eqint>::iterator msgIt = data.find(updatePos.first);
if (msgIt != data.end() && msgIt->first) {
msgIt->second.updateStateIfTimeStampIsHigher(updatePos.second);
} else {
data[updatePos.first] = updatePos.second;
}
}
}
The time before making the changes for a whole application run under specific parameters was:
real 0m28,592s
user 0m27,912s
sys 0m0,676s
And the time after making the changes for the whole application run under the same specific parameters is:
real 0m37,464s
user 0m37,032s
sys 0m0,428s
I run it with other cases and the results where similar (from a qualitative point of view). The system time and resourse usage (CPU and memory) decreases and the user time increases.
Overall I am satisfied with the tradeoff since I was more concerned about resource usage than execution time (the application is a simulator and it was not able to finish and get results under really heavy load and now it does).
I am writing an event system for a game engine, and I need a way to 'disconnect' functions from the event (erase from std::vector) but I also require that to remove an event the developer must give a valid reference to the function they wish to disconnect.
At the moment I have a class like this:
template<typename ... Args>
class event{
public:
using delegate_type = std::function<void(Args...)>;
void operator()(Args ... args){
for(auto &&f : m_funcs)
f(args...);
}
template<typename FunctorType>
delegate_type &connect(FunctorType &&f){
m_funcs.emplace_back(f);
return m_funcs.back();
}
bool disconnect(delegate_type &f){
for(auto iter = begin(m_funcs); iter != end(m_funcs); ++iter)
if(&(*iter) == &f){ // if the dev passed a valid function
m_funcs.erase(iter);
return true;
}
return false;
}
private:
std::vector<delegate_type> m_funcs;
};
But this of-course suffers from reference invalidation whenever the underlying vector is resized after a disconnect or connect operation.
I tried switching to an std::list solution rather than std::vector but the speed difference when iterating over the functions is detrimental enough that I can not make that switch in a release build.
Is there some way I can avoid the invalidation using a helper class or identifier instead of a straight reference?
I suggest you to take a look at Boost.Signals2. When connecting, it returns a sigc::connection object. You can keep it client-side and use it for disconnecting or it will disconnect automatically the slot when it is destroyed. It does not suffer from the reference invalidation problem.
Store weak_ptr<delegate_type>. Remove invalid ones on each invocation (remove_if). (operator()). Guard agains recursive invocation somehow as an aside.
Return shared_ptr<delegate_type> as your token. The external user simply .reset()s to unregister. No need to disconnect!
Allocate with make_shared<delegate_type> within connect.
Make connect a template that perfect forwards into the make_shared.
Use an associative container, like the std::list you were using.
But, you'll need a more complicated use of the container so you can eliminate the overhead of looping over it.
The best solution is of course just removing the need of the loop.
The most efficient way, would be to then return an iterator for the container's cell instead of the function itself.
A remove like this would avoid the search, because you'll have the iterator location already.
In associative containers the iterators don't get invalidated (But beware, it's not thread safe of course).
template<typename FunctorType>
std::list<delegate_type>::iterator &connect(FunctorType &&f){
m_funcs.emplace_back(f);
//get the iterator for the last item
return m_funcs.end()--;
}
bool disconnect(std::list<delegate_type>::iterator &f){
//direct removal
m_funcs.erase(f);
return true;
}