I'm trying to make my program run more efficiently, and I believe fixing this linear search would do a great deal of help in terms of speed, but am curious as to how I'd go about changing this to something like binary search, as I believe the list isn't necessarily ordered. Is there some way of ordering the list based on it's first argument key?
What I'm working with currently:
int* key_sequences::data(int key){
for(it=myList.begin(); it!=myList.end(); ++it){
if(it->first==key){
return &(it->second[0]);
}
}
return nullptr;
};
Searching in a simply linked list is O(n), where n is the size of the list.
However, in some implementations, people use three list pointers, one at the start, one at the middle and one at the end, so that when searching comes up - they can speedup the process, so you can search online for this.
BUT, if I were you and was so interesting in speeding up the search, I would try another data structure (hash? like std::unordered_map) or sort the list.
Related
I want to find string has a specific short string array or linked list. I make a small program that search conference or workshop like http://dblp.uni-trier.de/ using c++. What I wonder is how to fast search string in an array or linked list. When use string.find() function, I think this function's performance have O(n) time complexity if array's length is n. Can I improve performance lower than O(n)?? Help me, please
For an array, unless it is sorted, the best you can do is O(n) average/worst case because you have to look linearly until you find the desired string. If it is sorted (which would take O(nlog(n)) to do the sorting), you can make it O(log(n)) searching using a binary search. For linked lists, the best you can do, regardless of sorted-ness, is O(n).
If you really want to complicate your code, at each insert to list store pointer to node in some balanced tree.Where nodes will be inserted based on string from that node comparisions. Then you can get string in O(logn) time.
If you want to get fast retrievals use hash map it will give you O(1) Time.
I have simple method in C++ which searchs for string in linked list. That works well but I need to make it faster. Is it possible? Maybe I need to insert items into list in alphabetical order? But I dont think it could help in serching list anymore. In list there is about 300 000 items (words).
int GetItemPosition(const char* stringToFind)
{
int i = 0;
MyList* Tmp = FistListItem;
while (Tmp){
if (!strcmp(Tmp->Value, stringToFind))
{
return i;
}
Tmp = Tmp->NextItem;
i++;
}
return -1;
}
Method returns the position number if item found, otherwise returns -1.
Any sugesstion will be helpfull.
Thanks for answers, I can change structure. I have only one constraint. Code must implement the following interface:
int Count(void);
int AddItem(const char* StringValue, int WordOccurrence);
int GetItemPosition(const char* StringValue);
char* GetString(int Index);
int GetOccurrenceNum(int Index);
void SetInteger(int Index, int WordOccurrence);
So which structure will be the in your opinion the most suitable?
Searching a linked list is linear so you need to iterate from beginning one by one so it is O(n). Linked lists are not the best if you will use it for searching, you can utilize more suitable data structures such as binary trees.
Ordering elements does not help much because still you need to iterate each element anyway.
Wikipedia article says:
In an unordered list, one simple heuristic for decreasing average search time is the move-to-front heuristic, which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again.
Another common approach is to "index" a linked list using a more
efficient external data structure. For example, one can build a
red-black tree or hash table whose elements are references to the
linked list nodes. Multiple such indexes can be built on a single
list. The disadvantage is that these indexes may need to be updated
each time a node is added or removed (or at least, before that index
is used again).
So in the first case you can slightly improve (by statistical assumptions) your search performance by moving items found previously closer to the beginning of the list. This assumes that previously found elements will be searched more frequently.
Second method requires to use other data structures.
If using linked lists is not a hard requirement, consider using hash tables, sorted arrays (random access) or balanced trees.
Consider using array or std::vector as a storage instead of linked list, and use binary search to find particular string, or even better, std::set, if you don't need a numerical index. If for some reasons it is not possible to use other containers, there is not much possible to do - you may want to speed up the process of comparison by storing hash of the string along with it in node.
I suggest hashing.
Since you've already got a linked list of your own), you can try chaining with linked lists for collision resolution.
Rather than using a linear linked list, you may want to use a binary search tree, or a red/black tree. These trees are designed on minimizing the traversals to find an item.
You could also store "short cut links". For example, if the list is of strings, you could have an array of links of where to start searching based on the first letter.
For example, shortcut['B'] would return a pointer to the first link to start searching for strings starting with 'B'.
The answer is no, you cannot improve the search without changing your data-structure.
As it stands, sorting the list will not give you a faster search for any random item.
It will only allow you to quickly decide if the given item is in the list by testing against the first item (which will be either the smallest or the largest entry) and this improvement is not likely to make a big difference.
So can you please edit your question and explain to us your constraints?
Can you use a completely different data structure, like an array or a tree? (as others have suggested)
If not, can you modify the way your linked list is linked?
If not, we will be unlikely to help you...
The best option is to use faster data structure for storing strings:
std::map - red-black tree behind the scenes. Has O(logn) for search/insert/delete operations. Suitable if you want to store additional values with strings (for example - positions).
std::set - basically the same tree but without values. Best for case when you need only contains operation.
std::unordered_map - hash table. O(1) access.
std::unordered_set - hash set. Also O(1) access.
Note. But in all of these cases there is a catch. Complexity is calculated only based on n (count of strings). In reality string comparison is not free. So, O(1) becomes O(m), O(logn) becomes O(mlogn) (where m is maximal length of string). This does not matter in case of relatively short strings. But if this is not true consider using Trie. In practice trie can be even faster than hash table - each character of query string is accessed only once instead of multiple times. For hash table/set it's at least once for hash calculation and at least once for actual string comparison (depending on collision resolution strategy - not sure how it is implemented in C++).
I'm developing a project and I need to do a lot of comparisons between objects and insertions in lists.
Basically I have a object of type Board and I do the following:
if(!(seenStates.contains(children[i])))
{
statesToExpand.addToListOrderly(children[i]);
seenStates.insertHead(children[i]);
}
where statesToExpand and seenStates are two lists that I defined this way:
typedef struct t_Node
{
Board *board;
int distanceToGoal;
t_Node *next;
} m_Node;
typedef m_Node* m_List;
class ListOfStates {
...
Everything works fine but I did some profiling and discovered that almost 99% of the time is spent in operating on these lists, since I have to expand, compare, insert, etc. almost 20000 states.
My question is: is there a more efficient data structure that I could use in order to reduce the execution time of that portion of code?
Update
So I tried using std::vector and it is a bit worse (15 seconds instead of 13 with my old list). Probably I'm doing something wrong... With some more profiling I discovered that approximately 13.5 seconds are spent searching for an element in a vector. This is the code I am using:
bool Game::vectorContains(Board &b)
{
clock_t stop;
clock_t start = clock();
if(seenStates.size() == 0)
{
stop = clock();
clock_counter += (stop-start);
return false;
}
for(vector<m__Node>::iterator it = seenStates.begin(); it != seenStates.end(); it++)
{
if( /* condition */ )
{
stop = clock();
clock_counter += (stop - start);
return true;
}
}
stop = clock();
clock_counter += (stop - start);
return false;
}
Can I do something better here or should I move on to another data structure (maybe an unordered_set as suggested below)?
One more update
I tried the exact same code in release mode and the whole algorithm executes in just 1.2 seconds.
I didn't know there could be such a big difference between Debug and Release. I know that Release does some optimization but this is some difference!
This part:
if(!(seenStates.contains(children[i])))
for a linked list is going to be very slow. While the algorithmic time is O(n), same as it would be for a std::vector<Node>, the memory that you're walking over is going to be all over the place... so you're going to incur lots of cache misses as your container gets larger. After a while, your time is just going to be dominated by those cache misses. So std::vector will likely perform much better.
That said, if you're doing a lot of find()-type operations, you should consider using a container that is setup to do find very quickly... maybe a std::unordered_set?
Using a list ends up with O(n) time to search for elements. You could consider data-structures with more effiecient lookßup, e.g. std::map, std::unordered_map, a sorted vector, other tree-structures. There many data-structures. Which one is best depends on your algorithm design.
Indeed you don't want to use a linked list in your case. Looking for a specific value (ie contains()) is very slow in a linked list, O(n).
Thus using an array list (for example std::vector) or a binary search tree would be smarter, complexity of contains() would become on average O(log n).
However if you are worried about expanding your array list very often, you might make it take a lot of space when you create it (for example 20 000 elements).
Don't forget to consider using two different data structures for your two lists.
If I understand it correctly, your data structure resembles a singly linked list. So, instead of usong your own implementation, you can try to work with a
std::slist<Board*>
or probably better with a
std::slist<std::unique_ptr<Board> >
If you further also need the reference to the previous element, then use a standard std::list. Both will give you constant insertion, but only linear lookup (at least if you don't know where to search).
Alternatively, you can consider using a std::map<std::unique_ptr<Board> > which will give you logarithmic insertion and lookup, but without further effort you lose the information on the successor.
EDIT: std::vector seems no good choise for your kind of requirements. As far as I understood, you need fast search and fast insertion. Both are O(n) for a vector. Use a std::map instead, where both are O(log n). [But note that using the latter doesn't mean you will directly get faster execution times, as that depends on the number of elements]
For certain reasons, I have a linked list of objects, with the Object containing a string.
I might be required to search for a particular string, and in doing so, retrieve the object, based on that string.
The starting header of the list is the only input I have for the list.
Though the number of objects I have, is capped at 3000, and that's not so much, I still wondered if there was an efficient way to do this, instead of searching the objects one by one for a matching string.
The Objects in the list are not sorted in any way, and I cannot expect them to be sorted, and the entry point to the linked list is the only input I have.
So, could anyone tell me if there's an efficient way ( search algorithm perhaps ) to achieve this?
Separately for this kind of search, if required, what would be the sort of data structure recommended, assuming that this search be the most data intensive function of the object?
Thanks..
Use a std::map<std::string, YourObjectType>. You can still iterate all objects. But they are sorted by the string now.
If you might have multiple objects with the same string, use a multimap instead.
If you can't switch to any different structure/container, then there is no way to do this better than linear to size of list.
Having 3000 you would like to use a unordered map instead of a linked list, which will give you average O(1) lookup, insertion, and deletion time.
I'm trying to work out the best method to search a vector of type "Tracklet" (a class I have built myself) to find the first and last occurrence of a given value for one of its variables. For example, I have the following classes (simplified for this example):
class Tracklet {
TimePoint *start;
TimePoint *end;
int angle;
public:
Tracklet(CvPoint*, CvPoint*, int, int);
}
class TimePoint {
int x, y, t;
public:
TimePoint(int, int, int);
TimePoint(CvPoint*, int);
// Relevant getters and setters exist here
};
I have a vector "vector<Tracklet> tracklets" and I need to search for any tracklets with a given value of "t" for the end timepoint. The vector is ordered in terms of end time (i.e. tracklet.end->t).
I'm happy to code up a search algorithm, but am unsure of which route to take with it. I'm not sure binary search would be suitable, as I seem to remember it won't necessarily find the first. I was thinking of a method where I use binary search to find an index of an element with the correct time, then iterate back to find the first and forward to find the last. I'm sure there's a better way than that, since it wastes binary searches O(log n) by iterating.
Hopefully that makes sense: I struggled to explain it a bit!
Cheers!
If the vector is sorted and contains the value, std::lower_bound will give you an iterator to the first element with a given value and std::upper_bound will give you an iterator to one element past the last one containing the value. Compare the value with the returned element to see if it existed in the vector. Both these functions use binary search, so time is O(logN).
To compare on tracklet.end->t, use:
bool compareTracklets(const Tracklet &tr1, const Tracklet &tr2) {
return (tr1.end->t < tr2.end->t);
}
and pass compareTracklets as the fourth argument to lower_bound or upper_bound
I'd just use find and find_end, and then do something more complicated only if testing showed it to be too slow.
If you're really concerned about lookup performance, you might consider a different data structure, like a map with timestamp as the key and a vector or list of elements as the value.
A binary search seems like your best option here, as long as your vector remains sorted. It's essentially identical, performance-wise, to performing a lookup in a binary tree-structure.
dirkgently referred to a sweet optimization comparative. But I would in fact not use a std::vector for this.
Usually, when deciding to use a STL container, I don't really consider the performance aspect, but I do consider its interface regarding the type of operation I wish to use.
std::set<T>::find
std::set<T>::lower_bound
std::set<T>::upper_bound
std::set<T>::equal_range
Really, if you want an ordered sequence, outside of a key/value setup, std::set is just easier to use than any other.
You don't have to worry about inserting at a 'bad' position
You don't have problems of iterators invalidation when adding / removing an element
You have built-in methods for searching
Of course, you also want your Comparison Predicate to really shine (hopes the compiler inlines the operator() implementation), in every case.
But really, if you are not convinced, try a build with a std::vector and manual insertion / searching (using the <algorithm> header) and try another build using std::set.
Compare the size of the implementations (number of lines of code), compare the number of bugs, compare the speed, and then decide.
Most often, the 'optimization' you aim for is actually a pessimization, and in those rares times it's not, it's just so complicated that it's not worth it.
Optimization:
Don't
Expert only: Don't, we mean it
The vector is ordered in terms of time
The start time or the end time?
What is wrong with a naive O(n) search? Remember you are only searching and not sorting. You could use a sorted container as well (if that doesn't go against the basic design).