A* Performance at large maps - c++

i would like some help for my AStar algorithm search, which takes from my point of view far to long. Even though my map is with 500 * 400 coordinates(objectively is my tile graph a bit smaller since I don't took the walls into the TileGraph.) large, I would like to expect the result after a few seconds. The world looks like this, despite the task not being mine
I want to search from marked coordinates "Start"(120|180) to "Ziel"(320|220), which currently takes 48 minutes. And sorry for all, who don't speak german, but the text at the picture isn't important.
At first I want to show you, what I've programmed for A*. In General adapted myself to the pseudocode at https://en.wikipedia.org/wiki/A*_search_algorithm .
bool AStarPath::Processing(Node* Start, Node* End)
m_Start = Start;
m_End = End;
for (Node* n : m_SearchRoom->GetAllNodes())
{
DistanceToStart[n] = std::numeric_limits<float>::infinity();
CameFrom[n] = nullptr;
}
DistanceToStart[m_Start] = 0;
NotEvaluatedNodes.AddElement(0, m_Start);
while (NotEvaluatedNodes.IsEmpty() == false)
{
Node* currentNode = NotEvaluatedNodes.GetElement();
NotEvaluatedNodes.DeleteElement();
if (currentNode == m_End)
{
ReconstructPath();
return true;
}
EvaluatedNodes.insert(currentNode);
ExamineNeighbours(currentNode);
}
return false;
//End Processing
void AStarPath::ExamineNeighbours(Node* current)
for (Node* neighbour : m_SearchRoom->GetNeighbours(current))
{
if (std::find(EvaluatedNodes.begin(), EvaluatedNodes.end(), neighbour) != EvaluatedNodes.end())
{
continue;
}
bool InOpenSet = NotEvaluatedNodes.ContainsElement(neighbour);
float tentative_g_score = DistanceToStart[current] + DistanceBetween(current, neighbour);
if (InOpenSet == true && tentative_g_score >= DistanceToStart[neighbour])
{
continue;
}
CameFrom[neighbour] = current;
DistanceToStart[neighbour] = tentative_g_score;
float Valuation = tentative_g_score + DistanceBetween(neighbour, m_End);
if (InOpenSet == false)
{
NotEvaluatedNodes.AddElement(Valuation, neighbour);
}
else
{
NotEvaluatedNodes.UpdatePriority(neighbour, Valuation);
}
}
//END ExamineNeighbours
double AStarPath::DistanceBetween(Node* a, Node* b)
return sqrt(pow(m_SearchRoom->GetNodeX(a) - m_SearchRoom->GetNodeX(b), 2)
+ pow(m_SearchRoom->GetNodeY(a) - m_SearchRoom->GetNodeY(b), 2));
//END DistanceBetween
I'm sorry for the bad formatting, but I don't really know how to work with the code blocks here.
class AStarPath
private:
std::unordered_set<Node*> EvaluatedNodes;
Binary_Heap NotEvaluatedNodes;
std::unordered_map<Node*, float> DistanceToStart;
std::unordered_map<Node*, Node*> CameFrom;
std::vector<Node*> m_path;
TileGraph* m_SearchRoom;
//END Class AStarPath
Anyway, i have thought myself over my problem already and changed some things.
Firstly, I implemented a binary heap instead of the std::priority_queue. I used a page at policyalmanac for it, but I'm not permitted to add another link, so I can't really give you the address. It improved the performance, but it still takes quite long as I told at the beginning.
Secondly, I used unordered containers (if there are two options), so that the containers don't have to be sorted after the changes. For my EvaluatedNodes I took the std::unordered_set, since from my knowledge it's fastest for std::find, which I use for containment checks.
The usage of std::unordered_map is caused by the need of having seperate keys and values.
Thirdly, I thought about splitting my map into nodes, which represent multiple coordinates(instead of now where one node represents one coordinate) , but I'm not really sure how to choose them. I thought about setting points at position, that the algorithm decises based on the length and width of the map and add neighbouring coordinates, if there aren't a specific distance or more away from the base node/coordinate and I can reach them only from previous added coordinates. To Check whether there is a ability to walk, I would have used the regular A*, with only the coordinates(converted to A* nodes), which are in these big nodes. Despite this I'm unsure which coordinates I should take for the start and end of this pathfinding. This would probably reduce the number of nodes/coordinates, which are checked, if I only use the coordinates/nodes, which were part of the big nodes.(So that only nodes are used, which where part of the bigger nodes at an upper level)
I'm sorry for my english, but hope that all will be understandable. I'm looking forward to your answers and learning new techniques and ways to handle problems and as well learn about all the hundreds of stupids mistakes I produced.
If any important aspect is unclear or if I should add more code/information, feel free to ask.
EDIT: Binary_Heap
class Binary_Heap
private:
std::vector<int> Index;
std::vector<int> m_Valuation;
std::vector<Node*> elements;
int NodesChecked;
int m_NumberOfHeapItems;
void TryToMoveElementUp(int i_pos);
void TryToMoveElementDown(int i_pos);
public:
Binary_Heap(int i_numberOfElements);
void AddElement(int Valuation, Node* element);
void DeleteElement();
Node* GetElement();
bool IsEmpty();
bool ContainsElement(Node* i_node);
void UpdatePriority(Node* i_node, float newValuation);
Binary_Heap::Binary_Heap(int i_numberOfElements)
Index.resize(i_numberOfElements);
elements.resize(i_numberOfElements);
m_Valuation.resize(i_numberOfElements);
NodesChecked = 0;
m_NumberOfHeapItems = 0;
void Binary_Heap::AddElement(int valuation, Node* element)
++NodesChecked;
++m_NumberOfHeapItems;
Index[m_NumberOfHeapItems] = NodesChecked;
m_Valuation[NodesChecked] = valuation;
elements[NodesChecked] = element;
TryToMoveElementUp(m_NumberOfHeapItems);
void Binary_Heap::DeleteElement()
elements[Index[1]] = nullptr;
m_Valuation[Index[1]] = 0;
Index[1] = Index[m_NumberOfHeapItems];
--m_NumberOfHeapItems;
TryToMoveElementDown(1);
bool Binary_Heap::IsEmpty()
return m_NumberOfHeapItems == 0;
Node* Binary_Heap::GetElement()
return elements[Index[1]];
bool Binary_Heap::ContainsElement(Node* i_element)
return std::find(elements.begin(), elements.end(), i_element) != elements.end();
void Binary_Heap::UpdatePriority(Node* i_node, float newValuation)
if (ContainsElement(i_node) == false)
{
AddElement(newValuation, i_node);
}
else
{
int treePosition;
for (int i = 1; i < Index.size(); i++)
{
if (elements[Index[i]] == i_node)
{
treePosition = i;
break;
}
}
//Won't influence each other, since only one of them will change the position
TryToMoveElementUp(treePosition);
TryToMoveElementDown(treePosition);
}
void Binary_Heap::TryToMoveElementDown(int i_pos)
int nextPosition = i_pos;
while (true)
{
int currentPosition = nextPosition;
if (2 * currentPosition + 1 <= m_NumberOfHeapItems)
{
if (m_Valuation[Index[currentPosition]] >= m_Valuation[Index[2 * currentPosition]])
{
nextPosition = 2 * currentPosition;
}
if (m_Valuation[Index[currentPosition]] >= m_Valuation[Index[2 * currentPosition + 1]])
{
nextPosition = 2 * currentPosition + 1;
}
}
else
{
if (2 * currentPosition <= m_NumberOfHeapItems)
{
if (m_Valuation[Index[currentPosition]] >= m_Valuation[Index[2 * currentPosition]])
{
nextPosition = 2 * currentPosition;
}
}
}
if (currentPosition != nextPosition)
{
int tmp = Index[currentPosition];
Index[currentPosition] = Index[nextPosition];
Index[nextPosition] = tmp;
}
else
{
break;
}
}
void Binary_Heap::TryToMoveElementUp(int i_pos)
int treePosition = i_pos;
while (treePosition != 1)
{
if (m_Valuation[Index[treePosition]] <= m_Valuation[Index[treePosition / 2]])
{
int tmp = Index[treePosition / 2];
Index[treePosition / 2] = Index[treePosition];
Index[treePosition] = tmp;
treePosition = treePosition / 2;
}
else
{
break;
}
}

This line introduces major inefficiency, as it needs to iterate over all the nodes in the queue, in each iteration.
bool InOpenSet = NotEvaluatedNodes.ContainsElement(neighbour);
Try using a more efficient data structure, e.g. the unordered_set you use for EvaluatedNodes. Whenever you push or pop a node from the heap, modify the set accordingly to always contain only the nodes in the heap.

Related

Correctly managing pointers in C++ Quadtree implementation

I'm working on a C++ quadtree implementation for collision detection. I tried to adapt this Java implementation to C++ by using pointers; namely, storing the child nodes of each node as Node pointers (code at the end). However, since my understanding of pointers is still rather lacking, I am struggling to understand why my Quadtree class produces the following two issues:
When splitting a Node in 4, the debugger tells me that all my childNodes entries are identical to the first one, i.e., same address and bounds.
Even if 1. is ignored, I get an Access violation reading location 0xFFFFFFFFFFFFFFFF, which I found out is a consequence of the childNode pointees being deleted after the first split, resulting in undefined behaviour.
My question is: what improvements should I make to my Quadtree.hpp so that each Node can contain 4 distinct child node pointers and have those references last until the quadtree is cleared?
What I have tried so far:
Modifying getChildNode according to this guide and using temporary variables in split() to avoid all 4 entries of childNodes to point to the same Node:
void split() {
for (int i = 0; i < 4; i++) {
Node temp = getChildNode(level, bounds, i + 1);
childNodes[i] = &(temp);
}
}
but this does not solve the problem.
This one is particularly confusing. My initial idea was to just store childNodes as Nodes themselves, but turns out that cannot be done while we're defining the Node class itself. Hence, it looks like the only way to store Nodes is by first creating them and then storing pointers to them as I tried to do in split(), yet it seems that those will not "last" until we've inserted all the objects since the pointees get deleted (run out of scope) and we get the aforementioned undefined behaviour. I also thought of using smart pointers, but that seems to only overcomplicate things.
The code:
Quadtree.hpp
#pragma once
#include <vector>
#include <algorithm>
#include "Box.hpp"
namespace quadtree {
class Node {
public:
Node(int p_level, quadtree::Box<float> p_bounds)
:level(p_level), bounds(p_bounds)
{
parentWorld = NULL;
}
// NOTE: mandatory upon Quadtree initialization
void setParentWorld(World* p_world_ptr) {
parentWorld = p_world_ptr;
}
/*
Clears the quadtree
*/
void clear() {
objects.clear();
for (int i = 0; i < 4; i++) {
if (childNodes[i] != nullptr) {
(*(childNodes[i])).clear();
childNodes[i] = nullptr;
}
}
}
/*
Splits the node into 4 subnodes
*/
void split() {
for (int i = 0; i < 4; i++) {
childNodes[i] = &getChildNode(level, bounds, i + 1);;
}
}
/*
Determine which node the object belongs to. -1 means
object cannot completely fit within a child node and is part
of the parent node
*/
int getIndex(Entity* p_ptr_entity) {
quadtree::Box<float> nodeBounds;
quadtree::Box<float> entityHitbox;
for (int i = 0; i < 4; i++) {
nodeBounds = childNodes[i]->bounds;
ComponentHandle<Hitbox> hitbox;
parentWorld->unpack(*p_ptr_entity, hitbox);
entityHitbox = hitbox->box;
if (nodeBounds.contains(entityHitbox)) {
return i;
}
}
return -1; // if no childNode completely contains Entity Hitbox
}
/*
Insert the object into the quadtree. If the node
exceeds the capacity, it will split and add all
objects to their corresponding nodes.
*/
void insertObject(Entity* p_ptr_entity) {
if (childNodes[0] != nullptr) {
int index = getIndex(p_ptr_entity);
if (index != -1) {
(*childNodes[index]).insertObject(p_ptr_entity); // insert in child node
return;
}
}
objects.push_back(p_ptr_entity); // add to parent node
if (objects.size() > MAX_OBJECTS && level < MAX_DEPTH) {
if (childNodes[0] == nullptr) {
split();
}
int i = 0;
while (i < objects.size()) {
int index = getIndex(objects[i]);
if (index != -1)
{
Entity* temp_entity = objects[i];
{
// remove i-th element of the vector
using std::swap;
swap(objects[i], objects.back());
objects.pop_back();
}
(*childNodes[index]).insertObject(temp_entity);
}
else
{
i++;
}
}
}
}
/*
Return all objects that could collide with the given object
*/
std::vector<Entity*> retrieve(Entity* p_ptr_entity, std::vector<Entity*> returnObjects) {
int index = getIndex(p_ptr_entity);
if (index != -1 && childNodes[0] == nullptr) {
(*childNodes[index]).retrieve(p_ptr_entity, returnObjects);
}
returnObjects.insert(returnObjects.end(), objects.begin(), objects.end());
return returnObjects;
}
World* getParentWorld() {
return parentWorld;
}
private:
int MAX_OBJECTS = 10;
int MAX_DEPTH = 5;
World* parentWorld; // used to unpack entities
int level; // depth of the node
quadtree::Box<float> bounds; // boundary of nodes in the game's map
std::vector<Entity*> objects; // list of objects contained in the node: pointers to Entitites in the game
Node* childNodes[4];
quadtree::Box<float> getQuadrantBounds(quadtree::Box<float> p_parentBounds, int p_quadrant_id) {
quadtree::Box<float> quadrantBounds;
quadrantBounds.width = p_parentBounds.width / 2;
quadrantBounds.height = p_parentBounds.height / 2;
switch (p_quadrant_id) {
case 1: // NE
quadrantBounds.top = p_parentBounds.top;
quadrantBounds.left = p_parentBounds.width / 2;
break;
case 2: // NW
quadrantBounds.top = p_parentBounds.top;
quadrantBounds.left = p_parentBounds.left;
break;
case 3: // SW
quadrantBounds.top = p_parentBounds.height / 2;
quadrantBounds.left = p_parentBounds.left;
break;
case 4: // SE
quadrantBounds.top = p_parentBounds.height / 2;
quadrantBounds.left = p_parentBounds.width / 2;
break;
}
return quadrantBounds;
}
Node& getChildNode(int parentLevel, Box<float> parentBounds, int quadrant) {
static Node temp = Node(parentLevel + 1, getQuadrantBounds(parentBounds, quadrant));
return temp;
}
};
}
Where Box is just a helper class that contains some helper methods for rectangular shapes and collision detection. Any help would be greatly appreciated!

A star algorithm finding shortest path but not computing it correctly

Using the following A star visualization as a way to compare path accuracy, I found a large variation between my implementation and this one.
https://qiao.github.io/PathFinding.js/visual/
Path I'm comparing to:
(source: imgsafe.org)
My test paths:
(source: imgsafe.org)
There are times when it seems like the algorithm is checking too few nodes (i.e Test#6). Is this to be expected, or is it not correct?
Important variables in algorithm:
TileMap* m_tileMap;
vector<Tile*> m_openList;
vector<Tile*> m_path;
// Direct mapping of 2D tile map.
// Stores the list type for the same-indexed tile
vector<vector<int>> m_listMap;
Comparator for sorting open list:
struct CompareNodes
{
// sorts lowest F cost to end of vector
bool operator() (Tile* lhs, Tile* rhs)
{
return lhs->getFCost() > rhs->getFCost();
}
};
High level implementation:
vector<Tile*> PathGenerator::generatePath(Tile* startNode, Tile* endNode)
{
setUpListMap();
startNode->setGCost(0);
startNode->setHCost(calculateHCost(startNode, endNode)); // Manhattan (no diagonal). { abs(y2 - y1) + abs(x2 - x1) }
startNode->calculateFCost(); // calculates G+H internally
m_openList.push_back(startNode);
Vector2D startNodePos = startNode->getMapPos();
m_listMap[startNodePos.x][startNodePos.y] = LIST_TYPES::OPEN;
Tile* currentNode;
while (m_openList.empty() == false)
{
currNode = m_openList[m_openList.size() - 1];
m_openList.pop_back();
Vector2D currNodePos = currNode->getMapPos();
m_listMap[currNodePos.x][currNodePos.y] = LIST_TYPES::CLOSED;
if (currNode != endNode)
{
vector<Tile*> neighbours = findNeighbours(currNode);
removeUnnecessaryNodes(&neighbours); // remove walls and closed nodes
computeCosts(&neighbours, currNode, endNode);
addUniqueNodesToOpenList(&neighbours); // ignores duplicates and then sorts open list
}
else
{
m_path = getPath(currNode);
resetLists(); // erases all vectors
}
}
return m_path;
}
void PathGenerator::computeCosts(vector<Tile*>* nodes, Tile* current, Tile* end)
{
int newGCost = current->getGCost() + 1;
for (int i = 0; i < nodes->size(); i++)
{
Tile* node = nodes->at(i);
unsigned int nodeGCost = node->getGCost(); // G cost defaults to max int limit
if (newG < nodeGCost)
{
// set up node costs like above
node->setParentNode(current);
}
}
}
I've added the most important code. If the high level functions don't help to find the source of the issue, let me know and I'll add the implementation for them also.
Help appreciated.
The sorting part seems correct, but since it's a vector this should be very easy for you to verify.
Instead, try using a for-loop as a test-case to make sure you're really get the lowest f-cost node:
Tile* currnode = m_openlist[0];
for (int i = 0; i < m_openList.size() i++)
{
if (m_openList[i]->getFCost() < currnode->getFCost())
currnode = m_openList[i];
}
See if that fixes it. If it does, there's an issue in your sort, but i'm not sure what the issue would be.
Also, in your computeCosts function, you do:
for (int i = 0; i < nodes->size(); i++)
{
Tile* node = nodes->at(i);
//.. other code
}
Since you're using an std::vector, why not make use of its functionality, and use iterators or a range based loop:
// Iterators
for (auto it = nodes->begin(); it != nodes->end(); it++)
{
Tile* node = *it;
//.. other code
}
// Range based loop
for (auto node : *nodes)
{
//.. other code
}

Implementing min function

Good day, I found this priority queue implementation and I am trying to get a min version of it (instead of max). I have no idea where to start. I tried mixing the signs of the functions (naive attempt) but it didn't get me far. Any help of how to implement it and a few words explaining it are very wellcome. The source is below:
Note I have left it's comments
#include <iostream>
#include <vector>
#include <assert.h>
using namespace std;
class PriorityQueue
{
vector<int> pq_keys;
void shiftRight(int low, int high);
void shiftLeft(int low, int high);
void buildHeap();
public:
PriorityQueue(){}
PriorityQueue(vector<int>& items)
{
pq_keys = items;
buildHeap();
}
/*Insert a new item into the priority queue*/
void enqueue(int item);
/*Get the maximum element from the priority queue*/
int dequeue();
/*Just for testing*/
void print();
};
void PriorityQueue::enqueue(int item)
{
pq_keys.push_back(item);
shiftLeft(0, pq_keys.size() - 1);
return;
}
int PriorityQueue::dequeue()
{
assert(pq_keys.size() != 0);
int last = pq_keys.size() - 1;
int tmp = pq_keys[0];
pq_keys[0] = pq_keys[last];
pq_keys[last] = tmp;
pq_keys.pop_back();
shiftRight(0, last-1);
return tmp;
}
void PriorityQueue::print()
{
int size = pq_keys.size();
for (int i = 0; i < size; ++i)
cout << pq_keys[i] << " ";
cout << endl;
}
void PriorityQueue::shiftLeft(int low, int high)
{
int childIdx = high;
while (childIdx > low)
{
int parentIdx = (childIdx-1)/2;
/*if child is bigger than parent we need to swap*/
if (pq_keys[childIdx] > pq_keys[parentIdx])
{
int tmp = pq_keys[childIdx];
pq_keys[childIdx] = pq_keys[parentIdx];
pq_keys[parentIdx] = tmp;
/*Make parent index the child and shift towards left*/
childIdx = parentIdx;
}
else
{
break;
}
}
return;
}
void PriorityQueue::shiftRight(int low, int high)
{
int root = low;
while ((root*2)+1 <= high)
{
int leftChild = (root * 2) + 1;
int rightChild = leftChild + 1;
int swapIdx = root;
/*Check if root is less than left child*/
if (pq_keys[swapIdx] < pq_keys[leftChild])
{
swapIdx = leftChild;
}
/*If right child exists check if it is less than current root*/
if ((rightChild <= high) && (pq_keys[swapIdx] < pq_keys[rightChild]))
{
swapIdx = rightChild;
}
/*Make the biggest element of root, left and right child the root*/
if (swapIdx != root)
{
int tmp = pq_keys[root];
pq_keys[root] = pq_keys[swapIdx];
pq_keys[swapIdx] = tmp;
/*Keep shifting right and ensure that swapIdx satisfies
heap property aka left and right child of it is smaller than
itself*/
root = swapIdx;
}
else
{
break;
}
}
return;
}
void PriorityQueue::buildHeap()
{
/*Start with middle element. Middle element is chosen in
such a way that the last element of array is either its
left child or right child*/
int size = pq_keys.size();
int midIdx = (size -2)/2;
while (midIdx >= 0)
{
shiftRight(midIdx, size-1);
--midIdx;
}
return;
}
int main()
{
//example usage
PriorityQueue asd;
asd.enqueue(2);
asd.enqueue(3);
asd.enqueue(4);
asd.enqueue(7);
asd.enqueue(5);
asd.print();
cout<< asd.dequeue() << endl;
asd.print();
return 0;
}
Well generally in such problems, i.e. algorithms based on comparison of elements, you can redefine what does (a < b) mean. (That is how things in standard library work by the way. You can define your own comparator.)
So if you change it's meaning to the opposite. You will reverse the ordering.
You need to identify every comparison of elements, and switch it. So for every piece of code like this
/*if child is bigger than parent we need to swap*/
if (pq_keys[childIdx] > pq_keys[parentIdx])
invert it's meaning/logic.
Simple negation should do the trick:
/*if child is NOT bigger than parent we need to swap*/
if !(pq_keys[childIdx] > pq_keys[parentIdx])
You do not even need to understand algorithm. Just inverse meaning of what lesser element is.
Edit:
Additional note. You could actually refactor it into some kind of bool compare(T a, T b). And use this function where comparison is used. So whenever you want to change the behaviour you just need to change one place and it will be consistent. But that is mostly to avoid work to look for every such occurrence, and stupid bugs and when you miss one.
Easier:
std::prioroty_queue<int, std::vector<int>, std::greater<int>> my_queue;
If this is part of an exercise, then I suggest following the standard library's design principles: split the problem up:
data storage (e.g. std::vector)
sorting or "heapifying" algorithm (c.f. std::make_heap etc.)
ordering criteria (to be used by 2. above)
Your class should give you some leeway to change any of these independently. With that in place, you can trivially change the "less-than" ordering for a "greater than" one.

Finding cycle in Aho-Corasick automaton

I'am facing a problem which should be solved using Aho-Corasick automaton. I'am given a set of words (composed with '0' or '1') - patterns and I must decide if it is possible to create infinite text, which wouldn't contain any of given patterns. I think, the solution is to create Aho-Corasick automaton and search for a cycle without matching states, but I'm not able to propose a good way to do that. I thought of searching the states graph using DFS, but I'm not sure if it will work and I have an implementation problem - let's assume, that we are in a state, which has an '1' edge - but state pointed by that edge is marked as matching - so we cannot use that edge, we can try fail link (current state doesn't have '0' edge) - but we must also remember, that we could not go with '1' edge from state pointed by fail link of the current one.
Could anyone correct me and show me how to do that? I've written Aho-Corasick in C++ and I'am sure it works - I also understand the entire algorithm.
Here is the base code:
class AhoCorasick
{
static const int ALPHABET_SIZE = 2;
struct State
{
State* edge[ALPHABET_SIZE];
State* fail;
State* longestMatchingSuffix;
//Vector used to remember which pattern matches in this state.
vector< int > matching;
short color;
State()
{
for(int i = 0; i < ALPHABET_SIZE; ++i)
edge[i] = 0;
color = 0;
}
~State()
{
for(int i = 0; i < ALPHABET_SIZE; ++i)
{
delete edge[i];
}
}
};
private:
State root;
vector< int > lenOfPattern;
bool isFailComputed;
//Helper function used to traverse state graph.
State* move(State* curr, char letter)
{
while(curr != &root && curr->edge[letter] == 0)
{
curr = curr->fail;
}
if(curr->edge[letter] != 0)
curr = curr->edge[letter];
return curr;
}
//Function which computes fail links and longestMatchingSuffix.
void computeFailLink()
{
queue< State* > Q;
root.fail = root.longestMatchingSuffix = 0;
for(int i = 0; i < ALPHABET_SIZE; ++i)
{
if(root.edge[i] != 0)
{
Q.push(root.edge[i]);
root.edge[i]->fail = &root;
}
}
while(!Q.empty())
{
State* curr = Q.front();
Q.pop();
if(!curr->fail->matching.empty())
{
curr->longestMatchingSuffix = curr->fail;
}
else
{
curr->longestMatchingSuffix = curr->fail->longestMatchingSuffix;
}
for(int i = 0; i < ALPHABET_SIZE; ++i)
{
if(curr->edge[i] != 0)
{
Q.push(curr->edge[i]);
State* state = curr->fail;
state = move(state, i);
curr->edge[i]->fail = state;
}
}
}
isFailComputed = true;
}
public:
AhoCorasick()
{
isFailComputed = false;
}
//Add pattern to automaton.
//pattern - pointer to pattern, which will be added
//fun - function which will be used to transform character to 0-based index.
void addPattern(const char* const pattern, int (*fun) (const char *))
{
isFailComputed = false;
int len = strlen(pattern);
State* curr = &root;
const char* pat = pattern;
for(; *pat; ++pat)
{
char tmpPat = fun(pat);
if(curr->edge[tmpPat] == 0)
{
curr = curr->edge[tmpPat] = new State;
}
else
{
curr = curr->edge[tmpPat];
}
}
lenOfPattern.push_back(len);
curr->matching.push_back(lenOfPattern.size() - 1);
}
};
int alphabet01(const char * c)
{
return *c - '0';
}
I didn't look through your code, but I know very simple and efficient implementation.
First of all, lets add Dictionary Suffix Links to the tree (their description you can find in Wikipedia). Then you have to look through all your tree and somehow mark matching nodes and nodes that have Dict Suffix Links as bad nodes. The explanation of these actions is obvious: you don't need all the matching nodes, or nodes that have a matching suffix in them.
Now you have an Aho-Corasick tree without any matching nodes. If you just run DFS algo on the resulting tree, you will get what you want.

Exponential tree implementation

I was trying to implement exponential tree from documentation, but here is one place in the code which is not clear for me how to implement it:
#include<iostream>
using namespace std;
struct node
{
int level;
int count;
node **child;
int data[];
};
int binary_search(node *ptr,int element)
{
if(element>ptr->data[ptr->count-1]) return ptr->count;
int start=0;
int end=ptr->count-1;
int mid=start+(end-start)/2;
while(start<end)
{
if(element>ptr->data[mid]) { start=mid+1;}
else
{
end=mid;
}
mid=start+(end-start)/2;
}
return mid;
}
void insert(node *root,int element)
{
node *ptr=root,*parent=NULL;
int i=0;
while(ptr!=NULL)
{
int level=ptr->level,count=ptr->count;
i=binary_search(ptr,element);
if(count<level){
for(int j=count;j<=i-1;j--)
ptr->data[j]=ptr->data[j-1];
}
ptr->data[i]=element;
ptr->count=count+1;
return ;
}
parent=ptr,ptr=ptr->child[i];
//Create a new Exponential Node at ith child of parent and
//insert element in that
return ;
}
int main()
{
return 0;
}
Here is a link for the paper I'm referring to:
http://www.ijcaonline.org/volume24/number3/pxc3873876.pdf
This place is in comment, how can I create a new exponential node at level i? Like this?
parent->child[i]=new node;
insert(parent,element);
The presence of the empty array at the end of the structure indicates this is C style code rather than C++ (it's a C Hack for flexible arrays). I'll continue with C style code as idiomatic C++ code would prefer use of standard containers for the child and data members.
Some notes and comments on the following code:
There were a number of issues with the pseudo-code in the linked paper to a point where it is better to ignore it and develop the code from scratch. The indentation levels are unclear where loops end, all the loop indexes are not correct, the check for finding an insertion point is incorrect, etc....
I didn't include any code for deleting the allocated memory so the code will leak as is.
Zero-sized arrays may not be supported by all compilers (I believe it is a C99 feature). For example VS2010 gives me warning C4200 saying it will not generate the default copy/assignment methods.
I added the createNode() function which gives the answer to your original question of how to allocate a node at a given level.
A very basic test was added and appears to work but more thorough tests are needed before I would be comfortable with the code.
Besides the incorrect pseudo-code the paper has a number of other errors or at least questionable content. For example, concerning Figure 2 it says "which clearly depicts that the slope of graph is linear" where as the graph is clearly not linear. Even if the author meant "approaching linear" it is at least stretching the truth. I would also be interested in the set of integers they used for testing which doesn't appear to be mentioned at all. I assumed they used a random set but I would like to see at least several sets of random numbers used as well as several predefined sets such as an already sorted or inversely sorted set.
.
int binary_search(node *ptr, int element)
{
if (ptr->count == 0) return 0;
if (element > ptr->data[ptr->count-1]) return ptr->count;
int start = 0;
int end = ptr->count - 1;
int mid = start + (end - start)/2;
while (start < end)
{
if (element > ptr->data[mid])
start = mid + 1;
else
end = mid;
mid = start + (end - start)/2;
}
return mid;
}
node* createNode (const int level)
{
if (level <= 0) return NULL;
/* Allocate node with 2**(level-1) integers */
node* pNewNode = (node *) malloc(sizeof(node) + sizeof(int)*(1 << (level - 1)));
memset(pNewNode->data, 0, sizeof(int) * (1 << (level - 1 )));
/* Allocate 2**level child node pointers */
pNewNode->child = (node **) malloc(sizeof(node *)* (1 << level));
memset(pNewNode->child, 0, sizeof(int) * (1 << level));
pNewNode->count = 0;
pNewNode->level = level;
return pNewNode;
}
void insert(node *root, int element)
{
node *ptr = root;
node *parent = NULL;
int i = 0;
while (ptr != NULL)
{
int level = ptr->level;
int count = ptr->count;
i = binary_search(ptr, element);
if (count < (1 << (level-1)))
{
for(int j = count; j >= i+1; --j)
ptr->data[j] = ptr->data[j-1];
ptr->data[i] = element;
++ptr->count;
return;
}
parent = ptr;
ptr = ptr->child[i];
}
parent->child[i] = createNode(parent->level + 1);
insert(parent->child[i], element);
}
void InOrderTrace(node *root)
{
if (root == NULL) return;
for (int i = 0; i < root->count; ++i)
{
if (root->child[i]) InOrderTrace(root->child[i]);
printf ("%d\n", root->data[i]);
}
if (root->child[root->count]) InOrderTrace(root->child[root->count]);
}
void testdata (void)
{
node* pRoot = createNode(1);
for (int i = 0; i < 10000; ++i)
{
insert(pRoot, rand());
}
InOrderTrace(pRoot);
}