I have a program that needs to detect cycles (and nodes that are members of that cycle) in directed graphs. To do this, I use LLVM's strongly-connected components algorithm. It's pretty easy to use and it does pretty much what it should:
vector<vector<PreAstBasicBlock*>> stronglyConnectedComponents;
for (auto iter = scc_begin(&function); iter != scc_end(&function); ++iter)
{
if (iter.hasLoop())
{
stronglyConnectedComponents.push_back(*iter);
}
}
This correctly identifies simple SCCs like this simple one:
This is great, but I'd love to know when I have strongly-connected components within each larger strongly-connected components. For instance, this is identified as a single SCC:
That's absolutely correct, since every node in that graph is reachable starting from any other node. However, B⇄C has the additional property that it is independent of the D→A back-edge. It is an SCC by itself and it has a single entry node and a single exiting node: I could replace it with one single node and I wouldn't have edges poking in or out of its conceptual middle.
How can I find these smaller strongly-connected components in the strongly-connected components?
So I was trying to craft a nice response / things you can do with more functions available to you, but was working on it offline and my computer crashed :(.
I've recreated the core of what I was trying to say, that only solves your immediate problem / example, in pseudo-code using a nonexistent GetConnectedComponents(...) helper function whose idealized behavior you can hopefully understand from context:
bool HasConnectedSubgraph(Graph& entire_graph) {
for (const auto& connected_subgraph : GetConnectedComponents(entire_graph)
for (const auto& connected_node : connected_subgraph) {
local_copy = connected_subgraph;
local_copy.erase(std::remove_if(local_copy.begin(), local_copy.end(),
[&](const Node& n) {return n == connected_node}) , local_copy.end());
if (!GetConnectedComponents(local_copy).empty()) {
return true;
}
}
}
return false;
}
This certainly isn't efficient or pretty, but should be enough to springboard your thoughts on the problem.
Related
I managed to figure out the answer to this question as I was writing it. Keeping it posted to hopefully aide future devs that run into similar issue.
Here's a quick rundown:
I have a directed, weighted map class.
I keep track of all my nodes in: vector<Node> nodes;
I keep track of all the nodes' adjacent nodes in map<Node, <map<Node, int>> connections
When I try doing a traversal through the Graph and I reach a node that does not have any adjacent nodes it will crash the program because my map throws out_of_range exception.
After looking online I see someone has the solution of using this line of code when they are adding nodes: (void) connections[node];. If I remove this line of code, I get the out_of_range exception from the map class's .at() function, but with this line of code, it somehow avoids that exception.
My Question: What is the line of code doing that avoids the exception from being thrown?
My best guess right now is that the line of code is somehow creating an empty adjacency list and my for-each loop doesn't get the exception
set<Node> nodes; // {n1, n2...nn}
map<Node, map<Node, int>> connections; //Connections between the nodes and their weights
//Add node to graph
void add(Node node) {
nodes.insert(node); //add to node list
(void) connections[node]; //This is the magic line!!
}
bool DFS(N start, N target) {
for (Node node : nodes) {
//This for-each loop crashes when the node in the .at() doesn't exist in the connections map
for (pair<N, int> connectedNode : connections.at(node)) {
if (target == connectedNode.first) {
return true;
}
}
}
return false;
}
As I was writing the question I was able to answer my own question. Love a good Rubber Ducky moment. Hopefully, this question can aide future devs who also miss the very basic answer to this question.
In stl::map the [ ] operator will get a reference to the value of the key inside the [ ] if it exists, if it doesn't exist it creates one.
So in the add(Node node) function the (void)connections[node] was actually creating the node in the adjacency map.
The (void) before the line is telling the compiler to ignore any warnings about this because it's technically an incomplete line of code (according to the compiler). Read more about the meaning of (void) here
CppCheck suggest me to replace one of my code by a STL algorithm, I'm not against it, but I don't know how to replace it. I'm pretty sure this is a bad suggestion (There is warning about experimental functionalities in CppCheck).
Here is the code :
/* Cutted beginning of the function ... */
for ( const auto & program : m_programs )
{
if ( program->compare(vertexShader, tesselationControlShader, tesselationEvaluationShader, geometryShader, fragmentShader) )
{
TraceInfo(Classname, "A program has been found matching every shaders.");
return program;
}
}
return nullptr;
} /* End of the function */
And near the if condition I got : "Consider using std::find_if algorithm instead of a raw loop."
I tried to use it, but I can't get the return working anymore... Should I ignore this suggestion ?
I suppose you may need to use that finding function not once. So, according to DRY, you need to separate the block where you invoke an std::find_if algorithm to a distinct wrapper function.
{
// ... function beginning
auto found = std::find_if(m_programs.cbegin(), m_programs.cend(),
[&](const auto& prog)
{
bool b = prog->compare(...);
if (b)
TraceInfo(...);
return b;
});
if (found == m_programs.cend())
return nullptr;
return *found;
}
The suggestion is good. An STL algorithm migth be able to choose an appropriate
approach based on your container type.
Furthermore, I suggest you to use a self-balancing container like an std::set.
// I don't know what kind of a pointer you use.
using pProgType = std::shared_pointer<ProgType>;
bool compare_progs(const pProgType &a, const pProgType &b)
{
return std::less(*a, *b);
}
std::set<std::shared_pointer<prog_type>,
std::integral_constant<decltype(&compare_progs), &compare_progs>> progs.
This is a sorted container, so you will spend less time for searching a program by a value, given you implement a compare operator (which is invoked by std::less).
If you can use an stl function, use it. This way you will not have to remember what you invented, because stl is properly documented and safe to use.
I am iterating a map where I need to add elements on that map depending on a condition that an element is not found (it could be any other condition).
My main problem is that with a big scale of updates to be added, the application takes the whole CPU and all the memory.
State Class:
class State {
int id;
int timeStamp;
int state;
}
Method in State:
void State::updateStateIfTimeStampIsHigher(const State& state) {
if (this->id == state.getId() && state.getTimeStamp() > this->getTimeStamp()) {
this->timeStamp = state.getTimeStamp();
this->state = state.getState();
}
}
Loop Code:
std::map<int, State> data;
const std::map<int, State>& update;
for (auto const& updatePos : update) {
if (updatePos.first != this->toNodeId) {
std::map<int, State>::iterator message = data.find(updatePos.first);
if (message != data.end() && message->first) {
message->second.updateStateIfTimeStampIsHigher(updatePos.second);
} else {
data.insert(std::make_pair(updatePos.first, updatePos.second));
}
}
}
Watching KCacheGrind data it looks like the data.insert() line takes most time / memory. I am new to KCacheGrind, but this line seemed to be around 72% of the cost.
Do you have any suggestions on how to improve this?
Your question is quite general, but I see tho things to make it run faster:
Use hinted insertion / emplacement. When you add new element its iterator is returned. Assuming that both maps are ordered in same fashion you can tell where was the last one inserted so lookup should be faster (could use some benchmarking here).
Use emplace_hint for faster insertion
Sample code here:
std::map<int, long> data;
const std::map<int, long> update;
auto recent = data.begin();
for (auto const& updatePos : update) {
if (updateElemNotFound) {
recent = data.emplace_hint(recent, updatePos);
}
}
Also, if you want to trade CPU over memory you could use unordered_map (Is there any advantage of using map over unordered_map in case of trivial keys?), but first dot would not matter anymore.
I could find a satisfying answer thanks to researching the comments to the question. It did help a little bit to change from map to unordered_map but I still got unsatisfying results.
I ended up using Google's sparsehash that provides a better resource usage despite some drawbacks from erasing entries (which I do).
The code solution is as follows. First I include the required library:
#include <sparsehash/sparse_hash_map>
Then, my new data definition looks like:
struct eqint {
bool operator()(int i1, int i2) const {
return i1 == i2;
}
};
google::sparse_hash_map<int, State, std::tr1::hash<int>, eqint> data;
Since I have to use "erase" I have to do this after the sparsemap construction:
data.clear_deleted_key();
data.set_deleted_key(-1);
Finally my loop code changes very little:
for (auto const& updatePos : update) {
if (updatePos.first != this->toNodeId) {
google::sparse_hash_map<int, State, std::tr1::hash<int>, eqint>::iterator msgIt = data.find(updatePos.first);
if (msgIt != data.end() && msgIt->first) {
msgIt->second.updateStateIfTimeStampIsHigher(updatePos.second);
} else {
data[updatePos.first] = updatePos.second;
}
}
}
The time before making the changes for a whole application run under specific parameters was:
real 0m28,592s
user 0m27,912s
sys 0m0,676s
And the time after making the changes for the whole application run under the same specific parameters is:
real 0m37,464s
user 0m37,032s
sys 0m0,428s
I run it with other cases and the results where similar (from a qualitative point of view). The system time and resourse usage (CPU and memory) decreases and the user time increases.
Overall I am satisfied with the tradeoff since I was more concerned about resource usage than execution time (the application is a simulator and it was not able to finish and get results under really heavy load and now it does).
Given an example directory tree for testing:
Root
A
A1
A2
B
B1
B2
I wish to recursively enumerate the directories, but skip the processing of directory A completely.
According to the MSDN documentation code something like the following should do the job:
void TestRecursion1()
{
path directory_path("Root");
recursive_directory_iterator it(directory_path);
while (it != recursive_directory_iterator())
{
if (it->path().filename() == "A")
{
it.pop();
}
else
{
++it;
}
}
}
...it does not. MSDN for recursive_directory_iterator.pop() states
If depth() == 0 the object becomes an end-of-sequence iterator.
Otherwise, the member function terminates scanning of the current
(deepest) directory and resumes at the next lower depth.
What actually happens is that due to a short circuit test in pop() if 'depth == 0' nothing happens at all, the iterator is neither incremented nor does it become the end of sequence iterator and the program enters an infinite loop.
The issue seems to be that semantically pop() is intended to shunt processing of the tree to the next level higher than the current level, whereas in this example I wish to skip processing of A and continue processing at B. The first problem is that both these directories (A and B) exist at the same level in the tree, the second problem is that this level is also the top level of the tree so there is no higher level at which to resume processing. All that said it still seems like a bug that pop() fails to set the iterator to the end-of-sequence iterator thus causing an infinite loop.
After this testing I reasoned that if I can't pop() A directly, I should at least be able to pop() from any child of A and achieve a similar result. I tested this with the following code:
template<class TContainer>
bool begins_with(const TContainer& input, const TContainer& match)
{
return input.size() >= match.size()
&& equal(match.begin(), match.end(), input.begin());
}
void TestRecursion2()
{
path base_path("C:\\_Home\\Development\\Workspaces\\Scratch \\TestDirectoryRecursion\\bin\\Debug\\Root");
recursive_directory_iterator it(base_path);
while (it != recursive_directory_iterator())
{
string relative_path = it->path().parent_path().string().substr(base_path.string().size());
cout << relative_path << "\n";
if (begins_with(relative_path, string("\\A")))
{
it.pop();
}
else
{
cout << it->path().filename() << " depth:" << it.depth() << "\n";
++it;
}
}
}
Here I test every item being processed to determine whether its parent is Root\A, and if it is call pop(). Even this doesn't work. The test correctly identifies whether a node in the tree is a child of A and calls pop() accordingly, but even at this deeper level pop() still fails to increment the iterator, again causing an infinite loop. What's more, even if this did work it would still be very undesirable since there is no guarantee of the order in which sub nodes are enumerated, so despite the test to check whether a particular node is a child of A because those nodes might be indirect children you could still end up processing a goodly amount of A anyway.
I think my next course of action is to abandon use of this recursive_directory_iterator and drive the recursion manually using a standard directory_iterator, but it seems as if I should be able to achieve what I need more simply with the recursive_directory_iterator but I'm getting blocked at every turn. So my questions are:
Is the recursive_directory_iterator.pop() method broken?
If not how do I use it to skip the processing of a directory?
Isn't the code you want more like the following, using disable_recursion_pending()?
while (it != recursive_directory_iterator())
{
if (it->path().filename() == "A")
{
it.disable_recursion_pending();
}
++it;
}
I am processing large files consisting of many redundant values (using YAML's anchors and references). The processing I do on each structure is expensive, and I would like to detect whether I'm looking at a reference to an anchor I've already processed. In Python (with python-yaml), I did this by simply building a dictionary keyed by id(node). Since yaml-cpp uses Node as a reference type, however, this does not seem to work here. Any suggestions?
This is similar to Retrieve anchor & alias string in yaml-cpp from document, but although that feature would be sufficient to solve my problem, it is not neccessary -- if I could get somehow a hash based on the internal address of the node, for example, that would be fine.
The expensive thing I'm doing is computing a hash of each node including itself and its children.
Here is a patch that seems to do what I need. Proceed with caution.
diff -nr include/yaml-cpp/node/detail/node.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/detail/node.h
a13 1
#include <boost/functional/hash.hpp>
a24 1
std::size_t identity_hash() const { return boost::hash<node_ref*>()(m_pRef.get()); }
diff -nr /include/yaml-cpp/node/impl.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/impl.h
a175 5
inline std::size_t Node::identity_hash() const
{
return m_pNode->identity_hash();
}
diff -nr include/yaml-cpp/node/node.h new/yaml-cpp-0.5.1/include/yaml-cpp/node/node.h
a55 2
std::size_t identity_hash() const;
I can then use the below to make a unordered_map using YAML::Node as key.
namespace std {
template <>
struct hash<YAML::Node> {
size_t operator()(const YAML::Node& ss) const {
return ss.identity_hash();
}
};
}
You can check node identity by operator == or Node::is, e.g.:
Node a = ...;
process(a);
Node b = ...;
if (!a.is(b)) {
process(b);
}
I suppose this isn't perfect - if you're trying to do this on a large list of nodes, the checking will have to be O(n).
If you want more than this, please file an issue on the project page.