Recursive function and performance in C++ - c++

I am new to C++, I would appreciate if anyone and help me to validate the below function or help me improve it.
void RecursiveKnn(std::set<PointId> &refSet, const PointId &id, const KD3Index &kid, int const lvl)
{
if (refSet.find(id) == refSet.end())
refSet.insert(id);
if (lvl < m_knn)
{
auto ids = kid.neighbors(id, 7);
for (auto x : ids)
{
if (x == id)
continue;
RecursiveKnn(refSet, x, kid, lvl+1);
}
}
}
I have written a recursive function to run and generate a set of hierarchical objects. basically, start with one object, get next/nearby objects and so on for the next level. Along with it, I want to avoid duplicates as well. The levels are limited to 3 - 4 and do not expect to go any further.
This function is called millions of time and is taking forever to run. I would really appreciate if anyone can suggest any improvement. On top of my head I am sure std::set is not the correct data structure to use but, I don't know what to use.
EDIT: The reason, I find the function to have the performance problem is that. At first I have a single function with 3 nested for loop. which worked within reasonable time. When I changed it to a recursive function, The process did not complete for more than an hour.

Without more information my guess would be that multiple PointIds can be neighbors with the same PointIds, even though there is a parent/child relation that holds.
In other words, this code is performing a depth-first search on a directed acyclic graph but is not checking for revisits the way a typical depth-first search does. This means you will be exploring the same nodes many times which is extremely inefficient.
Change
if (refSet.find(id) == refSet.end())
refSet.insert(id);
to
if (refSet.find(id) != refSet.end())
return;
refSet.insert(id);
Also you should use an std::unordered_set instead of an std::set but that is a lesser concern if the above is true.

This is wasteful.
if (refSet.find(id) == refSet.end())
refSet.insert(id);
There's no need to check if something is a member of a set before inserting it, just insert it anyway.
refSet.insert(id);
That said this is not an order of magnitude improvement.
This might also help
const auto& ids = kid.neighbors(id, 7);
Depends on what neighbors returns but it looks like you're copying some collection or other.

Related

In a low-latency application, Is unordered_map ever a better solution over vector?

Is it advisable to use unordered_map in place of vector while developing a low-latency application ?
I recently appeared for an interview with a financial company which worked on low-latency trading applications. I was asked a question for which I answered using an unordered_map which seemed pretty good efficiency-wise (0(n)) compared to If I had used a vector (O(n*n)). However, I know that it is advisable to use vector as much as possible and avoid unordered_map in order to utilize benefits of cache coherence. I just wanted to see If there is a better solution possible for this problem The problem I was asked was to check If two strings are a permutation of each other.
bool isPermutation(const std::string& first, const std::string& second) {
std::unordered_map<char, int> charDict;
if(first.length() != second.length())
return false;
for(auto it: first) {
charDict[it]++;
}
for(auto it: second) {
if(charDict.count(it) > 0) {
--charDict[it];
} else {
return false;
}
return true;
}
You can assume that both strings are equal length and the function is only assumed to return true If there is an exact number of occurrences of each character in second string as there are in the first string.
Sure, but it really depends on the problem you are trying to solve. If the domain of your key space is unknown, it would be difficult to come up with a generic solution that is faster than unordered_map.
In this case, the domain of your key space is known: it is limited to ASCII characters. This is convenient because you can instantly convert from item (char) to vector index (std::size_t). So you could just use the value of each character as an index into a vector rather than hashing it for every lookup.
But in general, don't optimize prematurely. If unordered_map is the most natural solution, I would start there, then profile, and if you find that performance does not meet your requirements, look at reworking your solution. (This isn't always the best advice; if you know you are working on a highly critical piece of code, there are certain design decisions you will want to take into consideration from the beginning. Coming back and refactoring later may be much more difficult if you start with an incompatible design.)
Since there are only 256 possible keys, you can use a stack-allocated array of 256 counts, which will be faster than a vector or an unordered_map. if first.size()+second.size() < 128, then only initialize the counts to 0 for keys that actually occur. Otherwise memset the whole array.

optimize for loop using for_each standard or boost tools

I try to optimize for loop below. It do a for loop over std::vector of a struct type. then it checks if any member with type equal to "INSIDE_WITH_MORE_ONE_INTER". if it found such member, it apply a function on it and finally based on the result it may keep the type of change it to "NOT_DEFINED".
for(pnt_vec_iter pnt_iter = newpnTs.begin(); pnt_iter != newpnTs.end(); pnt_iter++)
{
if(pnt_iter->_type == INSIDE_WITH_MORE_ONE_INTER)
{
if(!DoublePointsOnEdgeCheck(*this, pnt_iter->_face, pnt_iter))
{
pnt_iter->_type == NOT_DEFINED;
}
}
}
I am wondering if it is possible to optimize above code excluding defining a function to do so.
This depends way too much on context, like
what's the locality of reference on data dependencies from DoublePointsOnEdgeCheck?
what's the relative frequency of points with _type==ISIDE_WITH_MORE_ONE_INTER? How are the points ordered?
What is the memory layout of a point? etc.
Just profile it, and /imagine/ what would make the algorithm zip quicker through the set.
E.g.
"What if I didn't have to loop through all the points?"
"What if points were already indexed/sorted by _type?" (you'd use a simple equal_range(INSIDE_WITH_MORE_ONE_INTER) call to reduce the work)
What if I do not do the work at all, but instead lazily re-evaluate the value for _type in an accessor type()?" (Or does this break constness too much? Threading?)
Etc.

More efficient data structure

I'm developing a project and I need to do a lot of comparisons between objects and insertions in lists.
Basically I have a object of type Board and I do the following:
if(!(seenStates.contains(children[i])))
{
statesToExpand.addToListOrderly(children[i]);
seenStates.insertHead(children[i]);
}
where statesToExpand and seenStates are two lists that I defined this way:
typedef struct t_Node
{
Board *board;
int distanceToGoal;
t_Node *next;
} m_Node;
typedef m_Node* m_List;
class ListOfStates {
...
Everything works fine but I did some profiling and discovered that almost 99% of the time is spent in operating on these lists, since I have to expand, compare, insert, etc. almost 20000 states.
My question is: is there a more efficient data structure that I could use in order to reduce the execution time of that portion of code?
Update
So I tried using std::vector and it is a bit worse (15 seconds instead of 13 with my old list). Probably I'm doing something wrong... With some more profiling I discovered that approximately 13.5 seconds are spent searching for an element in a vector. This is the code I am using:
bool Game::vectorContains(Board &b)
{
clock_t stop;
clock_t start = clock();
if(seenStates.size() == 0)
{
stop = clock();
clock_counter += (stop-start);
return false;
}
for(vector<m__Node>::iterator it = seenStates.begin(); it != seenStates.end(); it++)
{
if( /* condition */ )
{
stop = clock();
clock_counter += (stop - start);
return true;
}
}
stop = clock();
clock_counter += (stop - start);
return false;
}
Can I do something better here or should I move on to another data structure (maybe an unordered_set as suggested below)?
One more update
I tried the exact same code in release mode and the whole algorithm executes in just 1.2 seconds.
I didn't know there could be such a big difference between Debug and Release. I know that Release does some optimization but this is some difference!
This part:
if(!(seenStates.contains(children[i])))
for a linked list is going to be very slow. While the algorithmic time is O(n), same as it would be for a std::vector<Node>, the memory that you're walking over is going to be all over the place... so you're going to incur lots of cache misses as your container gets larger. After a while, your time is just going to be dominated by those cache misses. So std::vector will likely perform much better.
That said, if you're doing a lot of find()-type operations, you should consider using a container that is setup to do find very quickly... maybe a std::unordered_set?
Using a list ends up with O(n) time to search for elements. You could consider data-structures with more effiecient lookßup, e.g. std::map, std::unordered_map, a sorted vector, other tree-structures. There many data-structures. Which one is best depends on your algorithm design.
Indeed you don't want to use a linked list in your case. Looking for a specific value (ie contains()) is very slow in a linked list, O(n).
Thus using an array list (for example std::vector) or a binary search tree would be smarter, complexity of contains() would become on average O(log n).
However if you are worried about expanding your array list very often, you might make it take a lot of space when you create it (for example 20 000 elements).
Don't forget to consider using two different data structures for your two lists.
If I understand it correctly, your data structure resembles a singly linked list. So, instead of usong your own implementation, you can try to work with a
std::slist<Board*>
or probably better with a
std::slist<std::unique_ptr<Board> >
If you further also need the reference to the previous element, then use a standard std::list. Both will give you constant insertion, but only linear lookup (at least if you don't know where to search).
Alternatively, you can consider using a std::map<std::unique_ptr<Board> > which will give you logarithmic insertion and lookup, but without further effort you lose the information on the successor.
EDIT: std::vector seems no good choise for your kind of requirements. As far as I understood, you need fast search and fast insertion. Both are O(n) for a vector. Use a std::map instead, where both are O(log n). [But note that using the latter doesn't mean you will directly get faster execution times, as that depends on the number of elements]

Something wrong with BFS maze solving algorithm in OCaml

http://ideone.com/QXyVzR
The above link contains a program I wrote to solve mazes using a BFS algorithm. The maze is represented as a 2D array, initially passed in as numbers, (0's represent an empty block which can be visited, any other number represent a "wall" block), and then converted into a record type which I defined, which keeps track of various data:
type mazeBlock = {
walkable : bool;
isFinish : bool;
visited : bool;
prevCoordinate : int * int
}
The output is a list of ordered pairs (coordinates/indices) which trace a shortest path through the maze from the start to the finish, the coordinates of which are both passed in as parameters.
It works fine for smaller mazes with low branching factor, but when I test it on larger mazes (say 16 x 16 or larger), especially on ones with no walls(high branching factor) it takes up a LOT of time and memory. I am wondering if this is inherent to the algorithm or related to the way I implemented it. Can any OCaml hackers out there offer me their expertise?
Also, I have very little experience with OCaml so any advice on how to improve the code stylistically would be greatly appreciated. Thanks!
EDIT:
http://ideone.com/W0leMv
Here is an cleaned-up, edited version of the program. I fixed some stylistic issues, but I didn't change the semantics. As usual, the second test still takes up a huge amount of resources and cannot seem to finish at all. Still seeking help on this issue...
EDIT2:
SOLVED. Thanks so much to both answerers. Here is the final code:
http://ideone.com/3qAWnx
In your critical section, that is mazeSolverLoop, you should only visited elements that have not been visited before. When you take the element from the queue, you should first check if the element has been visited, and in that case do nothing but recurse to get the next element. This is precisely what makes the good time complexity of the algorithm (you never visit a place twice).
Otherwise, yes, your OCaml style could be improved. Some remarks:
the convention in OCaml-land is rather to write_like_this instead of writeLikeThis. I recommend that you follow it, but admittedly that is a matter of taste and not an objective criterion.
there is no point in returning a datastructure if it is a mutable structure that was updated; why do you make a point to always return a (grid, pair) queue, when it is exactly the same as the input? You could just have those functions return unit and have code that is simpler and easier to read.
the abstraction level allowed by pairs is good and you should preserve it; you currently don't. There is no point in writing for example, let (foo, bar) = dimension grid in if in_bounds pos (foo, bar). Just name the dimension dim instead of (foo, bar), it makes no sense to split it in two components if you don't need them separately. Remark that for the neighbor, you do use neighborX and neighborY for array access for now, but that is a style mistake: you should have auxiliary functions to get and set values in an array, taking a pair as input, so that you don't have to destruct the pair in the main function. Try to keep all the code inside a single function at the same level of abstraction: all working on separate coordinates, or all working on pairs (named as such instead of being constructed/deconstructed all the time).
If I understand you right, for an N x N grid with no walls you have a graph with N^2 nodes and roughly 4*N^2 edges. These don't seem like big numbers for N = 16.
I'd say the only trick is to make sure you track visited nodes properly. I skimmed your code and don't see anything obviously wrong in the way you're doing it.
Here is a good OCaml idiom. Your code says:
let isFinish1 = mazeGrid.(currentX).(currentY).isFinish in
let prevCoordinate1 = mazeGrid.(currentX).(currentY).prevCoordinate in
mazeGrid.(currentX).(currentY) <-
{ walkable = true;
isFinish = isFinish1;
visited = true;
prevCoordinate = prevCoordinate1}
You can say this a little more economically as follows:
mazeGrid.(currentX).(currentY) <-
{ mazeGrid.(currentX).(currentY) with visited = true }

Searching fast through a sorted list of strings in C++

I have a list of about a hundreds unique strings in C++, I need to check if a value exists in this list, but preferrably lightning fast.
I am currenly using a hash_set with std::strings (since I could not get it to work with const char*) like so:
stdext::hash_set<const std::string> _items;
_items.insert("LONG_NAME_A_WITH_SOMETHING");
_items.insert("LONG_NAME_A_WITH_SOMETHING_ELSE");
_items.insert("SHORTER_NAME");
_items.insert("SHORTER_NAME_SPECIAL");
stdext::hash_set<const std::string>::const_iterator it = _items.find( "SHORTER_NAME" ) );
if( it != _items.end() ) {
std::cout << "item exists" << std::endl;
}
Does anybody else have a good idea for a faster search method without building a complete hashtable myself?
The list is a fixed list of strings which will not change. It contains a list of names of elements which are affected by a certain bug and should be repaired on-the-fly when opened with a newer version.
I've built hashtables before using Aho-Corasick but I'm not really willing to add too much complexity.
I was amazed by the number of answers. I ended up testing a few methods for their performance and ended up using a combination of kirkus and Rob K.'s answers. I had tried a binary search before but I guess I had a small bug implementing it (how hard can it be...).
The results where shocking... I thought I had a fast implementation using a hash_set... well, ends out I did not. Here's some statistics (and the eventual code):
Random lookup of 5 existing keys and 1 non-existant key, 50.000 times
My original algorithm took on average 18,62 seconds
A lineair search took on average 2,49 seconds
A binary search took on average 0,92 seconds.
A search using a perfect hashtable generated by gperf took on average 0,51 seconds.
Here's the code I use now:
bool searchWithBinaryLookup(const std::string& strKey) {
static const char arrItems[][NUM_ITEMS] = { /* list of items */ };
/* Binary lookup */
int low, mid, high;
low = 0;
high = NUM_ITEMS;
while( low < high ) {
mid = (low + high) / 2;
if(arrAffectedSymbols[mid] > strKey) {
high = mid;
}
else if(arrAffectedSymbols[mid] < strKey) {
low = mid + 1;
}
else {
return true;
}
}
return false;
}
NOTE: This is Microsoft VC++ so I'm not using the std::hash_set from SGI.
I did some tests this morning using gperf as VardhanDotNet suggested and this is quite a bit faster indeed.
If your list of strings are fixed at compile time, use gperf
http://www.gnu.org/software/gperf/
QUOTE:
gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
The output of gperf is not governed by gpl or lgpl, afaik.
You could try a PATRICIA Trie if none of the standard containers meet your needs.
Worst-case lookup is bounded by the length of the string you're looking up. Also, strings share common prefixes so it is really easy on memory.So if you have lots of relatively short strings this could be beneficial.
Check it out here.
Note: PATRICIA = Practical Algorithm to Retrieve Information Coded in Alphanumeric
If it's a fixed list, sort the list and do a binary search? I can't imagine, with only a hundred or so strings on a modern CPU, you're really going to see any appreciable difference between algorithms, unless your application is doing nothing but searching said list 100% of the time.
What's wrong with std::vector? Load it, sort(v.begin(), v.end()) once and then use lower_bound() to see if the string is in the vector. lower_bound is guaranteed to be O(log2 N) on a sorted random access iterator. I can't understand the need for a hash if the values are fixed. A vector takes less room in memory than a hash and makes fewer allocations.
I doubt you'd come up with a better hashtable; if the list varies from time to time you've probably got the best way.
The fastest way would be to construct a finite state machine to scan the input. I'm not sure what the best modern tools are (it's been over ten years since I did anything like this in practice), but Lex/Flex was the standard Unix constructor.
A FSM has a table of states, and a list of accepting states. It starts in the beginning state, and does a character-by-character scan of the input. Each state has an entry for each possible input character. The entry could either be to go into another state, or to abort because the string isn't in the list. If the FSM gets to the end of the input string without aborting, it checks the final state it's in, which is either an accepting state (in which case you've matched the string) or it isn't (in which case you haven't).
Any book on compilers should have more detail, or you can doubtless find more information on the web.
If the set of strings to check for numbers in the hundreds as you say, and this is when doing I/O (loading a file, which I assume comes from a disk, commonly), then I'd say: profile what you've got, before looking for more exotic/complex solutions.
Of course, it could be that your "documents" contain hundreds of millions to these strings, in which case I guess it really starts to take time ... Without more detail, it's hard to say for sure.
What I'm saying boils down to "consider the use-case and typical scenarios, before (over)optimizing", which I guess is just a specialization of that old thing about roots of evil ... :)
100 unique strings? If this isn't called frequently, and the list doesn't change dynamically, I'd probably use a straight forward const char array with a linear search. Unless you search it a lot, something that small just isn't worth the extra code. Something like this:
const char _items[][MAX_ITEM_LEN] = { ... };
int i = 0;
for (; strcmp( a, _items[i] ) < 0 && i < NUM_ITEMS; ++i );
bool found = i < NUM_ITEMS && strcmp( a, _items[i] ) == 0;
For a list that small, I think your implementation and maintenance costs with anything more complex would probably outweigh the run time costs, and you're not really going to get cheaper space costs than this. To gain a little more speed, you could do a hash table of first char -> list index to set the initial value of i;
For a list this small, you probably won't get much faster.
You're using binary search, which is O(log(n)). You should look at interpolation search, which is not as good "worst case," but it's average case is better: O(log(log(n)).
I don't know which kind of hashing function MS uses for stings, but maybe you could come up with something simpler (=faster) that works in your special case. The container should allow you to use a custom hashing class.
If it's an implementation issue of the container you can also try if boosts std::tr1::unordered_set gives better results.
a hash table is a good solution, and by using a pre-existing implementation you are likely to get good performance. an alternative though i believe is called "indexing".
keep some pointers around to convenient locations. e.g. if its using letters for the sorting, keep a pointer to everything starting aa, ab, ac... ba, bc, bd... this is a few hundred pointers, but means that you can skip to part of the list which is quite near to the result before continuing. e.g. if an entry is is "afunctionname" then you can binary search between the pointers for af and ag, much faster than searching the whole lot... if you have a million records in total you will likely only have to binary search a list of a few thousand.
i re-invented this particular wheel, but there may be plenty of implementations out there already, which will save you the headache of implementing and are likely faster than any code I could paste in here. :)