I'm developing a project and I need to do a lot of comparisons between objects and insertions in lists.
Basically I have a object of type Board and I do the following:
if(!(seenStates.contains(children[i])))
{
statesToExpand.addToListOrderly(children[i]);
seenStates.insertHead(children[i]);
}
where statesToExpand and seenStates are two lists that I defined this way:
typedef struct t_Node
{
Board *board;
int distanceToGoal;
t_Node *next;
} m_Node;
typedef m_Node* m_List;
class ListOfStates {
...
Everything works fine but I did some profiling and discovered that almost 99% of the time is spent in operating on these lists, since I have to expand, compare, insert, etc. almost 20000 states.
My question is: is there a more efficient data structure that I could use in order to reduce the execution time of that portion of code?
Update
So I tried using std::vector and it is a bit worse (15 seconds instead of 13 with my old list). Probably I'm doing something wrong... With some more profiling I discovered that approximately 13.5 seconds are spent searching for an element in a vector. This is the code I am using:
bool Game::vectorContains(Board &b)
{
clock_t stop;
clock_t start = clock();
if(seenStates.size() == 0)
{
stop = clock();
clock_counter += (stop-start);
return false;
}
for(vector<m__Node>::iterator it = seenStates.begin(); it != seenStates.end(); it++)
{
if( /* condition */ )
{
stop = clock();
clock_counter += (stop - start);
return true;
}
}
stop = clock();
clock_counter += (stop - start);
return false;
}
Can I do something better here or should I move on to another data structure (maybe an unordered_set as suggested below)?
One more update
I tried the exact same code in release mode and the whole algorithm executes in just 1.2 seconds.
I didn't know there could be such a big difference between Debug and Release. I know that Release does some optimization but this is some difference!
This part:
if(!(seenStates.contains(children[i])))
for a linked list is going to be very slow. While the algorithmic time is O(n), same as it would be for a std::vector<Node>, the memory that you're walking over is going to be all over the place... so you're going to incur lots of cache misses as your container gets larger. After a while, your time is just going to be dominated by those cache misses. So std::vector will likely perform much better.
That said, if you're doing a lot of find()-type operations, you should consider using a container that is setup to do find very quickly... maybe a std::unordered_set?
Using a list ends up with O(n) time to search for elements. You could consider data-structures with more effiecient lookßup, e.g. std::map, std::unordered_map, a sorted vector, other tree-structures. There many data-structures. Which one is best depends on your algorithm design.
Indeed you don't want to use a linked list in your case. Looking for a specific value (ie contains()) is very slow in a linked list, O(n).
Thus using an array list (for example std::vector) or a binary search tree would be smarter, complexity of contains() would become on average O(log n).
However if you are worried about expanding your array list very often, you might make it take a lot of space when you create it (for example 20 000 elements).
Don't forget to consider using two different data structures for your two lists.
If I understand it correctly, your data structure resembles a singly linked list. So, instead of usong your own implementation, you can try to work with a
std::slist<Board*>
or probably better with a
std::slist<std::unique_ptr<Board> >
If you further also need the reference to the previous element, then use a standard std::list. Both will give you constant insertion, but only linear lookup (at least if you don't know where to search).
Alternatively, you can consider using a std::map<std::unique_ptr<Board> > which will give you logarithmic insertion and lookup, but without further effort you lose the information on the successor.
EDIT: std::vector seems no good choise for your kind of requirements. As far as I understood, you need fast search and fast insertion. Both are O(n) for a vector. Use a std::map instead, where both are O(log n). [But note that using the latter doesn't mean you will directly get faster execution times, as that depends on the number of elements]
Related
I completed two version of a leetcode algorithm and am wondering if my complexity analysis is correct, even though the online submission time in ms does not show it accurately. The goal is to take a vector of numbers as a reference and return true if it contains duplicate values and false if it does not.
The two most intuitive approaches are:
1.) Sort the vector and do one sweep to the second to last, and see if any neighboring elements are identical and return true if so.
2.) Use a hashtable and insert the values and if a key already exists in the table, return true.
I completed the first version first, and it was quick, but seeing as how the sort routine would take O(nlog(n)) and the hash table inserts & map.count()s would make the second version O(log(n) + N) = O(N) I would think the hashing version would be faster with very large data sets.
In the online judging I was proven wrong, however I assumed they weren't using large enough data sets to offset the std::map overhead. So I ran a lot of tests repeatedly filling vectors up to a size between 0 and 10000 incrementing by 2, adding random values in between 0 and 20000. I piped the output to a csv file and plotted it on linux and here's the image I got.
Is the provided image truly showing me the difference here, between an O(N) and an O(nlog(n)) algorithm? I just want to make sure my complexity analysis is correct on these?
Here are the algorithms run:
bool containsDuplicate(vector<int>& nums) {
if(nums.size() < 2) return false;
sort(nums.begin(), nums.end());
for(int i = 0; i < nums.size()-1; ++i) {
if(nums[i] == nums[i+1]) return true;
}
return false;
}
// Slightly slower in small cases because of data structure overhead I presume
bool containsDuplicateWithHashing(vector<int>& nums) {
map<int, int> map;
for (int i = 0; i < nums.size(); ++i) {
if(map.count(nums[i])) return true;
map.insert({nums[i], i});
}
return false;
}
std::map is sorted, and involves O(log n) cost for each insertion and lookup, so the total cost in the "no duplicates" case (or in the "first duplicate near the end of the vector" case) would have similar big-O to sorting and scanning: O(n log n); it's typically fragmented in memory, so overhead could easily be higher than that of an optimized std::sort.
It would appear much faster if duplicates were common though; if you usually find a duplicate in the first 10 elements, it doesn't matter if the input has 10,000 elements, because the map doesn't have time to grow before you hit a duplicate and duck out. It's just that a test that only works well when it succeeds is not a very good test for general usage (if duplicates are that common, the test seems a bit silly); you want good performance in both the contains duplicate and doesn't contain duplicate cases.
If you're looking to compare approaches with meaningfully different algorithmic complexity, try using std::unordered_set to replace your map-based solution (insert returns whether the key already existed as well, so you reduce work from one lookup followed by one insert to just one combined insert and lookup on each loop), which has average case O(1) insertion and lookup, for O(n) duplicate checking complexity.
FYI, another approach that would be O(n log n) but use a sort-like strategy that shortcuts when a duplicate is found early, would be to make a heap with std::make_heap (O(n) work), then repeatedly pop_heap (O(log n) per pop) from the heap and compare to the heap's .front(); if the value you just popped and the front are the same, you've got a duplicate and can exit immediately. You could also use the priority_queue adapter to simplify this into a single container, instead of manually using the utility functions on a std::vector or the like.
It is follow-up question for this MIC question. When adding items to the vector of reference wrappers I spend about 80% of time inside ++ operator whatever iterating approach I choose.
The query works as following
VersionView getVersionData(int subdeliveryGroupId, int retargetingId,
const std::wstring &flightName) const {
VersionView versions;
for (auto i = 0; i < 3; ++i) {
for (auto j = 0; j < 3; ++j) {
versions.insert(m_data.get<mvKey>().equal_range(boost::make_tuple(subdeliveryGroupId + i, retargetingId + j,
flightName)));
}
}
return versions;
}
I've tried following ways to fill the reference wrapper
template <typename InputRange> void insert(const InputRange &rng) {
// 1) base::insert(end(), rng.first, rng.second); // 12ms
// 2) std::copy(rng.first, rng.second, std::back_inserter(*this)); // 6ms
/* 3) size_t start = size(); // 12ms
auto tmp = std::reference_wrapper<const
VersionData>(VersionData(0,0,L""));
resize(start + boost::size(rng), tmp);
auto beg = rng.first;
for (;beg != rng.second; ++beg, ++start)
{
this->operator[](start) = std::reference_wrapper<const VersionData>(*beg);
}
*/
std::copy(rng.first, rng.second, std::back_inserter(*this));
}
Whatever I do I pay for operator ++ or the size method which just increments the iterator - meaning I'm still stuck in ++. So the question is if there is a way to iterate result ranges faster. If there is no such a way is it worth to try and go down the implementation of equal_range adding new argument which holds reference to the container of reference_wrapper which will be filled with results instead of creating range?
EDIT 1: sample code
http://coliru.stacked-crooked.com/a/8b82857d302e4a06/
Due to this bug it will not compile on Coliru
EDIT 2: Call tree, with time spent in operator ++
EDIT 3: Some concrete stuff. First of all I didnt started this thread just because the operator++ takes too much time in overall execution time and I dont like it just "because" but at this very moment it is the major bottleneck in our performance tests. Each request usually processed in hundreds of microseconds, request similar to this one (they are somewhat more complex) are processed ~1000-1500 micro and it is still acceptable. The original problem was that once the number of items in datastructure grows to hundreds of thousands the performance deteriorates to something like 20 milliseconds. Now after switching to MIC (which drastically improved the code readability, maintainability and overall elegance) I can reach something like 13 milliseconds per request of which 80%-90% spent in operator++. Now the question if this could be improved somehow or should I look for some tar and feathers for me? :)
The fact that 80% of getVersionData´s execution time is spent in operator++ is not indicative of any performance problem per se --at most, it tells you that equal_range and std::reference_wrapper insertion are faster in comparison. Put another way, when you profile some piece of code you will typically find locations where the most time is spent, but whether this is a problem or not depends on the required overall performance.
#kreuzerkrieg, your sample code does not exercise any kind of insertion into a vector of std::reference_wrappers! Instead, you're projecting the result of equal_range into a boost::any_range, which is expected to be fairly slow at iteration --basically, increment ops resolve to virtual calls.
So, unless I'm seriously missing something here, the sample code performance or lack thereof does not have anything to do with whatever your problem is in real code (assuming VersionView, of which you don't show the code, is not using boost::any_range).
That said, if you can afford replacing your ordered indices with equivalent hashed indices, iteration will probably be faster, but this is is an utter shot in the dark given you're not showing the real stuff.
I think that you're measuring the wrong things entirely. When I scale up from 3x3x11111 to 10x10x111111 (so 111x as many items in the index), it still runs in 290ms.
And populating the stuff takes orders of magnitude more time. Even deallocating the container appears to take more time.
What Doesn't Matter?
I've contributed a version with some trade offs, which mainly show that there's no sense in tweaking things: View On Coliru
there's a switch to avoid the any_range (it doesn't make sense using that if you care for performance)
there's a switch to tweak the flyweight:
#define USE_FLYWEIGHT 0 // 0: none 1: full 2: no tracking 3: no tracking no locking
again, it merely shows you could easily do without, and should consider doing so unless you need the memory optimization for the string (?). If so, consider using the OPTIMIZE_ATOMS approach:
the OPTIMIZE_ATOMS basically does fly weight for wstring there. Since all the strings are repeated here it will be mighty storage efficient (although the implementation is quick and dirty and should be improved). The idea is much better applied here: How to improve performance of boost interval_map lookups
Here's some rudimentary timings:
As you can see, basically nothing actually matters for query/iteration performance
Any Iterators: Doe They Matter?
It might be the culprit on your compiler. On my compile (gcc 4.8.2) it wasn't anything big, but see the disassembly of the accumulate loop without the any iterator:
As you can see from the sections I've highlighted, there doesn't seem to be much fat from the algorithm, the lambda nor from the iterator traversal. Now with the any_iterator the situation is much less clear, and if your compile optimizes less well, I can imagine it failing to inline elementary operations making iteration slow. (Just guessing a little now)
Ok, so the solution I applied is as following:
in addition to the odered_non_unique index (the 'byKey') I've added random_access index. When the data is loaded I rearrange the random index with m_data.get.begin(). Then when the MIC is queried for the data I just do boost::equal_range on the random index with custom predicate which emulates the same logic that was applied in ordering of 'byKey' index. That's it, it gave me fast 'size()' (O(1), as I understand) and fast traversal.
Now I'm ready for your rotten tomatoes :)
EDIT 1:
of course I've changed the any_range from bidirectional traversal tag to the random access one
I am implementing the Fast Marching algorithm, which is some kind of continuous Dijkstra. As I read in many papers, the Fibonacci heap is the most adequate heap for this purpose.
However, when profiling with callgrind my code I see that the following function is taking 58% of the execution time:
int popMinIdx () {
const int idx = heap_.top()->getIndex();
heap_.pop();
return idx;
}
Concretely, the pop() is taking 57.67% of the whole execution time.
heap_is defined as follows:
boost::heap::fibonacci_heap<const FMCell *, boost::heap::compare<compare_cells>> heap_;
Is it normal that it takes "that much" time or is there something I can do to improve performance?
Sorry if not enough information is given. I tried to be as brief as possible. I will add more info if needed.
Thank you!
The other answers aren't mentioning the big part: of course pop() takes the majority of your time: it's the only function that performs any real work!
As you may have read, the bounds on the operations of a Fibonacci Heap are amortized bounds. This means that if you perform enough operations in a good sequence, the bounds will average out to that. However, the actual costs are completely hidden.
Every time you insert an element, nothing happens. It is just thrown into the root list. Boom, O(1) time. Every time you merge two trees, its root is just linked into the root list. Boom, O(1) time. But hold on, your structure is not a valid Fibonacci Heap! That's where pop() (or extract-root) comes in: every time this operation is called, the entire Heap is restructured back into a correct shape. The Root is removed, its children are cut to the root list, and then we start merging trees in the root list so that no two trees with the same degree (number of children) exist in the root list.
So all of the work of Insert(e) and Merge(t) is actually delayed until Pop() is called, which then does all the work. What about the other operations?
Delete(e) is beautiful. We perform Decrease-Key(e, -inf) to make the element e become the root. And now we perform Pop()! Again, the work is done by Pop().
Decrease-Key(e, v) does its work by itself: it cuts e to the root list and starts a cutting cascade to put its children into the root list as well (which can cut their childlists too). So Decrease-Key puts a whole lot of elements into the root list. Can you guess which function has to fix that?
TL;DR: Pop() is the work horse of the Fibonacci Heap. All other operations are done efficiently because they create work for the Pop() operation. Pop() gathers the work and performs it in one go (which can take up to O(n)). This is actually really efficient because the "grouped" work can be done faster than each operation separately.
So yes, it is natural that Pop() takes up the majority of your time!
The Fibanacci Heap's pop() has an amortized runtime of O(log n) and worst case of O(n). If your heap is large, it could easily be consuming a majority of the CPU time in your algorithm, especially since most of the other operations you're likely using have O(1) runtimes (insert, top, etc.)
One thing I'd recommend is to try callgrind with your preferred optimization level (such as -O3) with debug info (-g), because the templatized datastructures/containers such as the fibonacci_heap are heavy on the inlined function usage. It could be that most of the CPU cycles you're measuring don't even exist in your optimized executable.
I have a (large) set of integers S, and I want to run the following pseudocode:
set result = {};
while(S isn't empty)
{
int i = S.getArbitraryElement();
result.insert(i);
set T = elementsToDelete(i);
S = S \ T; // set difference
}
The function elementsToDelete is efficient (sublinear in the initial size of S) and the size of T is small (assume it's constant). T may contain integers no longer in S.
Is there a way of implementing the above that is faster than O(|S|^2)? I suspect I should be able to get O(|S| k), where k is the time complexity of elementsToDelete. I can of course implement the above in a straightforward way using std::set_difference but my understanding is that set_difference is O(|S|).
Using std::set S;, you can do:
for (auto k : elementsToDelete(i)) {
S.erase(k);
}
Of course the lookup for erase is O(log(S.size())), not the O(1) you're asking for. That can be achieved with std::unordered_set, assuming not too many collisions (which is a big assumption in general but very often true in particular).
Despite the name, the std::set_difference algorithm doesn't have much to do with std::set. It works on anything you can iterate in order. Anyway it's not for in-place modification of a container. Since T.size() is small in this case, you really don't want to create a new container each time you remove a batch of elements. In another example where the result set is small enough, it would be more efficient than repeated erase.
The set_difference in C++ library has time complexity of O(|S|) hence it is not good for your purposes so i advice you to use S.erase() to delete set element in the S in O(logN) implemented as BST . Hence your time complexity reduces to O(NlogN)
I have a list of about a hundreds unique strings in C++, I need to check if a value exists in this list, but preferrably lightning fast.
I am currenly using a hash_set with std::strings (since I could not get it to work with const char*) like so:
stdext::hash_set<const std::string> _items;
_items.insert("LONG_NAME_A_WITH_SOMETHING");
_items.insert("LONG_NAME_A_WITH_SOMETHING_ELSE");
_items.insert("SHORTER_NAME");
_items.insert("SHORTER_NAME_SPECIAL");
stdext::hash_set<const std::string>::const_iterator it = _items.find( "SHORTER_NAME" ) );
if( it != _items.end() ) {
std::cout << "item exists" << std::endl;
}
Does anybody else have a good idea for a faster search method without building a complete hashtable myself?
The list is a fixed list of strings which will not change. It contains a list of names of elements which are affected by a certain bug and should be repaired on-the-fly when opened with a newer version.
I've built hashtables before using Aho-Corasick but I'm not really willing to add too much complexity.
I was amazed by the number of answers. I ended up testing a few methods for their performance and ended up using a combination of kirkus and Rob K.'s answers. I had tried a binary search before but I guess I had a small bug implementing it (how hard can it be...).
The results where shocking... I thought I had a fast implementation using a hash_set... well, ends out I did not. Here's some statistics (and the eventual code):
Random lookup of 5 existing keys and 1 non-existant key, 50.000 times
My original algorithm took on average 18,62 seconds
A lineair search took on average 2,49 seconds
A binary search took on average 0,92 seconds.
A search using a perfect hashtable generated by gperf took on average 0,51 seconds.
Here's the code I use now:
bool searchWithBinaryLookup(const std::string& strKey) {
static const char arrItems[][NUM_ITEMS] = { /* list of items */ };
/* Binary lookup */
int low, mid, high;
low = 0;
high = NUM_ITEMS;
while( low < high ) {
mid = (low + high) / 2;
if(arrAffectedSymbols[mid] > strKey) {
high = mid;
}
else if(arrAffectedSymbols[mid] < strKey) {
low = mid + 1;
}
else {
return true;
}
}
return false;
}
NOTE: This is Microsoft VC++ so I'm not using the std::hash_set from SGI.
I did some tests this morning using gperf as VardhanDotNet suggested and this is quite a bit faster indeed.
If your list of strings are fixed at compile time, use gperf
http://www.gnu.org/software/gperf/
QUOTE:
gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
The output of gperf is not governed by gpl or lgpl, afaik.
You could try a PATRICIA Trie if none of the standard containers meet your needs.
Worst-case lookup is bounded by the length of the string you're looking up. Also, strings share common prefixes so it is really easy on memory.So if you have lots of relatively short strings this could be beneficial.
Check it out here.
Note: PATRICIA = Practical Algorithm to Retrieve Information Coded in Alphanumeric
If it's a fixed list, sort the list and do a binary search? I can't imagine, with only a hundred or so strings on a modern CPU, you're really going to see any appreciable difference between algorithms, unless your application is doing nothing but searching said list 100% of the time.
What's wrong with std::vector? Load it, sort(v.begin(), v.end()) once and then use lower_bound() to see if the string is in the vector. lower_bound is guaranteed to be O(log2 N) on a sorted random access iterator. I can't understand the need for a hash if the values are fixed. A vector takes less room in memory than a hash and makes fewer allocations.
I doubt you'd come up with a better hashtable; if the list varies from time to time you've probably got the best way.
The fastest way would be to construct a finite state machine to scan the input. I'm not sure what the best modern tools are (it's been over ten years since I did anything like this in practice), but Lex/Flex was the standard Unix constructor.
A FSM has a table of states, and a list of accepting states. It starts in the beginning state, and does a character-by-character scan of the input. Each state has an entry for each possible input character. The entry could either be to go into another state, or to abort because the string isn't in the list. If the FSM gets to the end of the input string without aborting, it checks the final state it's in, which is either an accepting state (in which case you've matched the string) or it isn't (in which case you haven't).
Any book on compilers should have more detail, or you can doubtless find more information on the web.
If the set of strings to check for numbers in the hundreds as you say, and this is when doing I/O (loading a file, which I assume comes from a disk, commonly), then I'd say: profile what you've got, before looking for more exotic/complex solutions.
Of course, it could be that your "documents" contain hundreds of millions to these strings, in which case I guess it really starts to take time ... Without more detail, it's hard to say for sure.
What I'm saying boils down to "consider the use-case and typical scenarios, before (over)optimizing", which I guess is just a specialization of that old thing about roots of evil ... :)
100 unique strings? If this isn't called frequently, and the list doesn't change dynamically, I'd probably use a straight forward const char array with a linear search. Unless you search it a lot, something that small just isn't worth the extra code. Something like this:
const char _items[][MAX_ITEM_LEN] = { ... };
int i = 0;
for (; strcmp( a, _items[i] ) < 0 && i < NUM_ITEMS; ++i );
bool found = i < NUM_ITEMS && strcmp( a, _items[i] ) == 0;
For a list that small, I think your implementation and maintenance costs with anything more complex would probably outweigh the run time costs, and you're not really going to get cheaper space costs than this. To gain a little more speed, you could do a hash table of first char -> list index to set the initial value of i;
For a list this small, you probably won't get much faster.
You're using binary search, which is O(log(n)). You should look at interpolation search, which is not as good "worst case," but it's average case is better: O(log(log(n)).
I don't know which kind of hashing function MS uses for stings, but maybe you could come up with something simpler (=faster) that works in your special case. The container should allow you to use a custom hashing class.
If it's an implementation issue of the container you can also try if boosts std::tr1::unordered_set gives better results.
a hash table is a good solution, and by using a pre-existing implementation you are likely to get good performance. an alternative though i believe is called "indexing".
keep some pointers around to convenient locations. e.g. if its using letters for the sorting, keep a pointer to everything starting aa, ab, ac... ba, bc, bd... this is a few hundred pointers, but means that you can skip to part of the list which is quite near to the result before continuing. e.g. if an entry is is "afunctionname" then you can binary search between the pointers for af and ag, much faster than searching the whole lot... if you have a million records in total you will likely only have to binary search a list of a few thousand.
i re-invented this particular wheel, but there may be plenty of implementations out there already, which will save you the headache of implementing and are likely faster than any code I could paste in here. :)