Is Visual Studio recursive_directory_iterator.pop() broken? - c++

Given an example directory tree for testing:
Root
A
A1
A2
B
B1
B2
I wish to recursively enumerate the directories, but skip the processing of directory A completely.
According to the MSDN documentation code something like the following should do the job:
void TestRecursion1()
{
path directory_path("Root");
recursive_directory_iterator it(directory_path);
while (it != recursive_directory_iterator())
{
if (it->path().filename() == "A")
{
it.pop();
}
else
{
++it;
}
}
}
...it does not. MSDN for recursive_directory_iterator.pop() states
If depth() == 0 the object becomes an end-of-sequence iterator.
Otherwise, the member function terminates scanning of the current
(deepest) directory and resumes at the next lower depth.
What actually happens is that due to a short circuit test in pop() if 'depth == 0' nothing happens at all, the iterator is neither incremented nor does it become the end of sequence iterator and the program enters an infinite loop.
The issue seems to be that semantically pop() is intended to shunt processing of the tree to the next level higher than the current level, whereas in this example I wish to skip processing of A and continue processing at B. The first problem is that both these directories (A and B) exist at the same level in the tree, the second problem is that this level is also the top level of the tree so there is no higher level at which to resume processing. All that said it still seems like a bug that pop() fails to set the iterator to the end-of-sequence iterator thus causing an infinite loop.
After this testing I reasoned that if I can't pop() A directly, I should at least be able to pop() from any child of A and achieve a similar result. I tested this with the following code:
template<class TContainer>
bool begins_with(const TContainer& input, const TContainer& match)
{
return input.size() >= match.size()
&& equal(match.begin(), match.end(), input.begin());
}
void TestRecursion2()
{
path base_path("C:\\_Home\\Development\\Workspaces\\Scratch \\TestDirectoryRecursion\\bin\\Debug\\Root");
recursive_directory_iterator it(base_path);
while (it != recursive_directory_iterator())
{
string relative_path = it->path().parent_path().string().substr(base_path.string().size());
cout << relative_path << "\n";
if (begins_with(relative_path, string("\\A")))
{
it.pop();
}
else
{
cout << it->path().filename() << " depth:" << it.depth() << "\n";
++it;
}
}
}
Here I test every item being processed to determine whether its parent is Root\A, and if it is call pop(). Even this doesn't work. The test correctly identifies whether a node in the tree is a child of A and calls pop() accordingly, but even at this deeper level pop() still fails to increment the iterator, again causing an infinite loop. What's more, even if this did work it would still be very undesirable since there is no guarantee of the order in which sub nodes are enumerated, so despite the test to check whether a particular node is a child of A because those nodes might be indirect children you could still end up processing a goodly amount of A anyway.
I think my next course of action is to abandon use of this recursive_directory_iterator and drive the recursion manually using a standard directory_iterator, but it seems as if I should be able to achieve what I need more simply with the recursive_directory_iterator but I'm getting blocked at every turn. So my questions are:
Is the recursive_directory_iterator.pop() method broken?
If not how do I use it to skip the processing of a directory?

Isn't the code you want more like the following, using disable_recursion_pending()?
while (it != recursive_directory_iterator())
{
if (it->path().filename() == "A")
{
it.disable_recursion_pending();
}
++it;
}

Related

Most efficient paradigm for checking if a key exists in a c++ std::unordered_map?

I am relatively new to modern c++ and working with a foreign code base. There is a function that takes a std::unordered_map and checks to see if a key is present in the map. The code is roughly as follows
uint32_t getId(std::unordered_map<uint32_t, uint32_t> &myMap, uint32_t id)
{
if(myMap.contains(id))
{
return myMap.at(id);
}
else
{
std::cerr << "\n\n\nOut of Range error for map: "<< id << "\t not found" << std::flush;
exit(74);
}
}
It seems like calling contains() followed by at() is inefficient since it requires a double lookup. So, my question is, what is the most efficient way to accomplish this? I also have a followup question: assuming the map is fairly large (~60k elements) and this method gets called frequently how problematic is the above approach?
After some searching, it seems like the following paradigms are more efficient than the above, but I am not sure which would be best.
Calling myMap.at() inside of a try-catch construct
Pros: at automatically throws an error if the key does not exist
Cons: try-catch is apparently fairly costly and also constrains what the optimizer can do with the code
Use find
Pros: One call, no try-catch overhead
Cons: Involves using an iterator; more overhead than just returning the value
auto findit = myMap.find(id);
if(findit == myMap.end())
{
//error message;
exit(74);
}
else
{
return findit->first;
}
You can do
// stuff before
{
auto findit = myMap.find(id);
if ( findit != myMap.end() ) {
return findit->first;
} else {
exit(74);
}
}
// stuff after
or with the new C++17 init statement syntax
// stuff before
if ( auto findit = myMap.find(id); findit != myMap.end() ) {
return findit->first;
} else {
exit(74);
}
// stuff after
Both define the iterator reference only in local scope. As the interator use is most definitively optimized away, I would go with it. Doing a second hash calculation will be slower almost for sure.
Also note that findit->first returns the key not the value. I was not sure what you expect the code to do, but one of the code snippets in the question returns the value, while the other one returns the key
In case you don't get enough speedup within only removing the extra lookup operation and if there are millions of calls to getId in a multi-threaded program, then you can use an N-way map to be able to parallelize the id-checks:
template<int N>
class NwayMap
{
public:
NwayMap(uint32_t hintMaxSize = 60000)
{
// hint about max size to optimize initial allocations
for(int i=0;i<N;i++)
shard[i].reserve(hintMaxSize/N);
}
void addIdValuePairThreadSafe(const uint32_t id, const uint32_t val)
{
// select shard
const uint32_t selected = id%N; // can do id&(N-1) for power-of-2 N value
std::lock_guard<std::mutex> lg(mut[selected]);
auto it = shard[selected].find(id);
if(it==shard[selected].end())
{
shard[selected].emplace(id,val);
}
else
{
// already added, update?
}
}
uint32_t getIdMultiThreadSafe(const uint32_t id)
{
// select shard
const uint32_t selected = id%N; // can do id&(N-1) for power-of-2 N value
// lock only the selected shard, others can work in parallel
std::lock_guard<std::mutex> lg(mut[selected]);
auto it = shard[selected].find(id);
// we expect it to be found, so get it quicker
// without going "else"
if(it!=shard[selected].end())
{
return it->second;
}
else
{
exit(74);
}
}
private:
std::unordered_map<uint32_t, uint32_t> shard[N];
std::mutex mut[N];
};
Pros:
if you serve each shard's getId from their own CPU threads, then you benefit from N*L1 cache size.
even within single thread use case, you can still interleave multiple id-check operations and benefit from instruction-level-parallelism because checking id 0 would have different independent code path than checking id 1 and CPU could do out-of-order execution on them (if pipeline is long enough)
Cons:
if a lot of checks from different threads collide, their operations are serialized and the locking mechanism causes extra latency
when id values are mostly strided, the parallelization is not efficient due to unbalanced emplacement
Calling myMap.at() inside of a try-catch construct
Pros: at automatically throws an error if the key does not exist
Cons: try-catch is apparently fairly costly and also constrains what the optimizer can do with the code
Your implementation of getId terminates application, so who cares about exception overheads?
Please note that most compilers (AFAIK all) implement C++ exceptions to have zero cost when exception is not thrown. Problem appears when stack is unwinded when exception is thrown and matching exception handler. I read somewhere that penalty when exception is thrown is x40 comparing to case when stack is unwinded by simple returns (with possible error codes).
Since you want to just terminate application then this overhead is negligible.

How to conditionally remove an element from a list using an iterator?

Problem:
I am writing a simple file manager application. In this program I have a "Directory" class:
class Directory
{
public:
Directory(string address, string directoryname)
{
this->path = address;
this->name = directoryname;
}
string GetFullPath(){ return path == "/" ? path + name : path + "/" + name; }
string path;
string name;
string user;
};
and a linked-list of directory objects:
list<Directory*> DirectoryList;
I want to implement the "rm -r directorypath" shell command in linux, so I need to browse through the list and remove the "directorypath" directory and all of its sub-directories. The problem is that I don't know how to browse through the link list and remove all directories whose parent directory is "directorypath". I have tried these two methods:
method 1:
This method encounters a runtime error, because it cannot access the list anymore after the first deletion.
for (auto address : DirectoryList)
if (address->GetFullPath() == directorypath)
{
for (auto subdirectory : DirectoryList)
if (subdirectory ->path == address->GetFullPath())
DirectoryList.remove(subdirectory );
}
method 2:
for (auto address : DirectoryList)
if (address->GetFullPath() == directorypath)
{
for (auto it = DirectoryList.begin(); it != DirectoryList.end();)
it = DirectoryList.erase(it);
return true;
}
this method can access all the elements perfectly even after deletion but I don't know how to check this if condition using the iterator it:
if (subdirectory ->path == address->GetFullPath())
Your method 1 is failing because std::list.remove(val) removes all elements in your list that compare equal to val. You call it once and you're done. The for() loop shouldn't be there, it's not the way it's intended to be used. Good example is here.
Note that this method will modify your container and the size of it. You need to be careful here and make sure that the your iterators are still valid after calling erase. My gut feeling is that indeed the iterators get invalided and that's why you're getting errors.
Your method 2 looks almost fine. First of all, fallow niceguy's advice to check the condition:
if ((*it).path == address->GetFullPath())
Now, bear in mind that erasing it will update the iterator to point to the location after the iterator you removed. This counts as one update to your iterator, it. It will be further updated in the for loop, but that's not something that you want (i.e. two updates per iteration mean that you're skipping some elements). You could try something like this instead:
auto it = DirectoryList.begin()
while (it != DirectoryList.end())
{
if ((*it).path == address->GetFullPath())
DirectoryList.erase(it);
}

C++ pointer to class in a kind

This is a snippet of an open source code. Full source code is available https://github.com/gec/dnp3/blob/master/src/opendnp3/DNP3/ResponseContext.h
ObjectWriteIterator owi = arAPDU.WriteContiguous(apObject, start,stop);
for(size_t i = start; i <= stop; ++i) {
if(owi.IsEnd()) { // out of space in the fragment
this->mStaticWriteMap[arKey] =
boost::bind(&ResponseContext::WriteStaticObjects<T>, this, apObject,
arStart, arStop, arKey, _1); return false;
}
apObject->Write(*owi, arStart->mValue);
++arStart; //increment the iterators
++owi;
}
ObjectWriteIterator::ObjectWriteIterator() :
mpPos(NULL),
mIndex(1),
mStart(0),
mStop(0),
mObjectSize(0)
{}
My question is: I don't understand is where *owi is referring in this context.
owi is an iterator, which is a 'standard' C++ interface for iterating over some collection.
The interface has them use pointer-symantics, so the * operator 'dereferences' the iterator and returns a reference to the value it currently 'points' to, and incrementing it via ++ moves it to the next item in the collection being iterated over.
In this case, it looks like a collection of ObjectWrite objects inside the collection specified by apObject between start and stop (start and stop are also typically defined by iterators set to some location in the collection).
sorry, I was earlier not sure about one can build a self contained 'Mock up' iterator class which use hidden is the header file
inline boost::uint8_t* ObjectWriteIterator::operator*() const
{
if(this->IsEnd()) throw InvalidStateException(LOCATION, "End of
iteration");
return mpPos;
}
in the header file. Sorry for wild goose run. Thanks for the prompt reply and I learned something new about the the core implementation of the iterator as well.

Is there a way to keep data from previous recursion in C++ (specific example)?

I'm working on an AVL Tree Project (almost finished after lots of hours of programming) and I wonder if it's possible to keep data from the calling recursion. This is the code:
node* previous;
//Visits the nodes by level recursively (post-order traversal), so that it can calculate the balance of each node (updates heights when deleting a node with two children)
void AVLTree::updateTreeHeights(node *ptr)
{
if(ptr==root)
previous=root;
if(ptr==NULL)
return;
updateTreeHeights(ptr->leftChild);
updateTreeHeights(ptr->rightChild);
if(ptr->leftChild==NULL && ptr->rightChild==NULL)
{
ptr->heightL=ptr->heightR=0;
}
else if(ptr->leftChild==NULL)
{
ptr->heightR=max(ptr->rightChild->heightL,ptr->rightChild->heightR)+1;
ptr->heightL=0;
}
else if(ptr->rightChild==NULL)
{
ptr->heightL=max(ptr->leftChild->heightL,ptr->leftChild->heightR)+1;
ptr->heightR=0;
}
else
{
ptr->heightL=max(ptr->leftChild->heightL,ptr->leftChild->heightR)+1;
ptr->heightR=max(ptr->rightChild->heightL,ptr->rightChild->heightR)+1;
}
ptr->balance=ptr->heightR-ptr->heightL;
if(ptr->balance>1)
balanceTree(ptr,previous,ptr->rightChild);
else if(ptr->balance<-1)
balanceTree(ptr,previous,ptr->leftChild);
}
Here's what I want! I want to keep the ptr value from the calling recursion and save it to the gloabal variable named previous (it's not necessery to be global, but I figured that it must be the only way). For example if ptr points at number 20 and then we call the recursive function for ptr's leftChild (e.g. updateTreeHeights(ptr->leftChild);) I want to keep number 20 (previous=ptr;). Is it possible somehow? I'm not really good with recursion! (Don't tell! :P )
I don't see why not. You can make a global variable and then just copy it over from updateTreeHeights. Just keep a look out for making sure the copy happens only once, and also by doing previous=ptr previous will be pointing to the entire node. So you might have to dive a little deeper in the node to get the date you want.

c++:return statement behaving weirdly

Here is an outline of the code containing the relevant part of my code.
Inside the empprint function i call a bfs print function, which calls itself recursively till its done printing everything that needs to be printed, after which its supposed to return me back to empprint function. But the return statement in bfsprint doesn't take me back to empprint.
One possible reason I can think of is that bfsprint calls itself recursively so it will only return to the last bfsprint method that called it instead of empprint function but it doesnt seem to solve my problem. I m stuck up with a code whose execution doesnt terminate.
void node::empprint(node* myroot)
{
//do something
bfsprint(c);
cout<<"pt 5"; //this cout is not reached
return;
}
void node::bfsprint(Linklist<node*> noddy)
{
// lot of code to implement breadth-first search. No issue
if(c.getHead()==NULL) cout<<"1" //this does print 1 to output
if(c.getHead()==NULL) return; //I think this should send me back to empprint
// and print "pt 5" on output but program hangs.
// instead of this happening
bfsprint(c);
}
If anybody thinks this might be influenced by other code in the method , I will add it but I dont think its the case.
If your call stack looks like:
node::empprint
node::bfsprint
node::bfsprint
then returning from the final call will result in
node::empprint
node::bfsprint
So your still N calls deep away from getting back to node::empprint.
You could set a bool in the class to return back out, but thats a bit hacky..
void node::bfsprint(Linklist<node*> noddy)
{
if ( something ) { m_unwindstack = true; }
// setting the bool to force returning early/stop recursion once m_unwindstack is true to get back to empprint
if ( m_unwindstack ) { return; }
}
Edit: By the way if you're doing anything with Linklist you'll never seen the changes since your passing a copy of the data. You should pass a reference Linklist&.
Also Linklist seems like your own class? So if you don't use a reference then be sure its copyable otherwise bad things will happen.