How to improve linked list searching. C++ - c++

I have simple method in C++ which searchs for string in linked list. That works well but I need to make it faster. Is it possible? Maybe I need to insert items into list in alphabetical order? But I dont think it could help in serching list anymore. In list there is about 300 000 items (words).
int GetItemPosition(const char* stringToFind)
{
int i = 0;
MyList* Tmp = FistListItem;
while (Tmp){
if (!strcmp(Tmp->Value, stringToFind))
{
return i;
}
Tmp = Tmp->NextItem;
i++;
}
return -1;
}
Method returns the position number if item found, otherwise returns -1.
Any sugesstion will be helpfull.
Thanks for answers, I can change structure. I have only one constraint. Code must implement the following interface:
int Count(void);
int AddItem(const char* StringValue, int WordOccurrence);
int GetItemPosition(const char* StringValue);
char* GetString(int Index);
int GetOccurrenceNum(int Index);
void SetInteger(int Index, int WordOccurrence);
So which structure will be the in your opinion the most suitable?

Searching a linked list is linear so you need to iterate from beginning one by one so it is O(n). Linked lists are not the best if you will use it for searching, you can utilize more suitable data structures such as binary trees.
Ordering elements does not help much because still you need to iterate each element anyway.
Wikipedia article says:
In an unordered list, one simple heuristic for decreasing average search time is the move-to-front heuristic, which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again.
Another common approach is to "index" a linked list using a more
efficient external data structure. For example, one can build a
red-black tree or hash table whose elements are references to the
linked list nodes. Multiple such indexes can be built on a single
list. The disadvantage is that these indexes may need to be updated
each time a node is added or removed (or at least, before that index
is used again).
So in the first case you can slightly improve (by statistical assumptions) your search performance by moving items found previously closer to the beginning of the list. This assumes that previously found elements will be searched more frequently.
Second method requires to use other data structures.
If using linked lists is not a hard requirement, consider using hash tables, sorted arrays (random access) or balanced trees.

Consider using array or std::vector as a storage instead of linked list, and use binary search to find particular string, or even better, std::set, if you don't need a numerical index. If for some reasons it is not possible to use other containers, there is not much possible to do - you may want to speed up the process of comparison by storing hash of the string along with it in node.

I suggest hashing.
Since you've already got a linked list of your own), you can try chaining with linked lists for collision resolution.

Rather than using a linear linked list, you may want to use a binary search tree, or a red/black tree. These trees are designed on minimizing the traversals to find an item.
You could also store "short cut links". For example, if the list is of strings, you could have an array of links of where to start searching based on the first letter.
For example, shortcut['B'] would return a pointer to the first link to start searching for strings starting with 'B'.

The answer is no, you cannot improve the search without changing your data-structure.
As it stands, sorting the list will not give you a faster search for any random item.
It will only allow you to quickly decide if the given item is in the list by testing against the first item (which will be either the smallest or the largest entry) and this improvement is not likely to make a big difference.
So can you please edit your question and explain to us your constraints?
Can you use a completely different data structure, like an array or a tree? (as others have suggested)
If not, can you modify the way your linked list is linked?
If not, we will be unlikely to help you...

The best option is to use faster data structure for storing strings:
std::map - red-black tree behind the scenes. Has O(logn) for search/insert/delete operations. Suitable if you want to store additional values with strings (for example - positions).
std::set - basically the same tree but without values. Best for case when you need only contains operation.
std::unordered_map - hash table. O(1) access.
std::unordered_set - hash set. Also O(1) access.
Note. But in all of these cases there is a catch. Complexity is calculated only based on n (count of strings). In reality string comparison is not free. So, O(1) becomes O(m), O(logn) becomes O(mlogn) (where m is maximal length of string). This does not matter in case of relatively short strings. But if this is not true consider using Trie. In practice trie can be even faster than hash table - each character of query string is accessed only once instead of multiple times. For hash table/set it's at least once for hash calculation and at least once for actual string comparison (depending on collision resolution strategy - not sure how it is implemented in C++).

Related

How to find specific string in array or linkedlist

I want to find string has a specific short string array or linked list. I make a small program that search conference or workshop like http://dblp.uni-trier.de/ using c++. What I wonder is how to fast search string in an array or linked list. When use string.find() function, I think this function's performance have O(n) time complexity if array's length is n. Can I improve performance lower than O(n)?? Help me, please
For an array, unless it is sorted, the best you can do is O(n) average/worst case because you have to look linearly until you find the desired string. If it is sorted (which would take O(nlog(n)) to do the sorting), you can make it O(log(n)) searching using a binary search. For linked lists, the best you can do, regardless of sorted-ness, is O(n).
If you really want to complicate your code, at each insert to list store pointer to node in some balanced tree.Where nodes will be inserted based on string from that node comparisions. Then you can get string in O(logn) time.
If you want to get fast retrievals use hash map it will give you O(1) Time.

Insertion into a skip list

A skip list is a data structure in which the elements are stored in sorted order and each node of the list may contain more than 1 pointer, and is used to reduce the time required for a search operation from O(n) in a singly linked list to O(lg n) for the average case. It looks like this:
Reference: "Skip list" by Wojciech Muła - Own work. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Skip_list.svg#mediaviewer/File:Skip_list.svg
It can be seen as an analogy to a ruler:
In a skip list, searching an element and deleting one is fine, but when it comes to insertion, it becomes difficult, because according to Data Structures and Algorithms in C++: 2nd edition by Adam Drozdek:
To insert a new element, all nodes following the node just inserted have to be restructured; the number of pointers and the value of pointers have to be changed.
I can construe from this that although choosing a node with random number of pointers based on the likelihood of nodes to insert a new element, doesn't create a perfect skip list, it gets really close to it when large number of elements (~9 million for example) are considered.
My question is: Why can't we insert the new element in a new node, determine its number of pointers based on the previous node, attach it to the end of the list, and then use efficient sorting algorithms to sort just the data present in the nodes, thereby maintaining the perfect structure of the skip list and also achieving the O(lg n) insert complexity?
Edit: I haven't tried any code yet, I'm just presenting a view. Simply because implementing a skip list is somewhat difficult. Sorry.
There is no need to modify any following nodes when you insert a node. See the original paper, Skip Lists: A Probabilistic Alternative to Balanced Trees, for details.
I've implemented a skip list from that reference, and I can assure you that my insertion and deletion routines do not modify any nodes forward of the insertion point.
I haven't read the book you're referring to, but out of context the passage you highlight is just flat wrong.
You have a problem on this point and then use efficient sorting algorithms to sort just the data present in the nodes. Sorting the data will have complexity O(n*lg(n)) and thus it will increase the complexity of insertion. In theory you can choose "perfect" number of links for each node being inserted, but even if you do that, when you perform remove operations, the perfectness will be "broken". Using the randomized approach is close enough to perfect structure to perform well.
You need to have function / method that search for location.
It need to do following:
if you insert unique keys, it need to locate the node. then you keep everything, just change the data (baggage). e.g. node->data = data.
if you allow duplicates, or if key is not found, then this function / method need to give you previous node on each height (lane). Then you determine height of new node and insert it after the found nodes.
Here is my C realisation:
https://github.com/nmmmnu/HM2/blob/master/hm_skiplist.c
You need to check following function:
static const hm_skiplist_node_t *_hm_skiplist_locate(const hm_skiplist_t *l, const char *key, int complete_evaluation);
it stores the position inside hm_skiplist_t struct.
complete_evaluation is used to save time in case you need the data and you are not intend to insert / delete.

Efficiently searching a Linked List of Objects for a String

For certain reasons, I have a linked list of objects, with the Object containing a string.
I might be required to search for a particular string, and in doing so, retrieve the object, based on that string.
The starting header of the list is the only input I have for the list.
Though the number of objects I have, is capped at 3000, and that's not so much, I still wondered if there was an efficient way to do this, instead of searching the objects one by one for a matching string.
The Objects in the list are not sorted in any way, and I cannot expect them to be sorted, and the entry point to the linked list is the only input I have.
So, could anyone tell me if there's an efficient way ( search algorithm perhaps ) to achieve this?
Separately for this kind of search, if required, what would be the sort of data structure recommended, assuming that this search be the most data intensive function of the object?
Thanks..
Use a std::map<std::string, YourObjectType>. You can still iterate all objects. But they are sorted by the string now.
If you might have multiple objects with the same string, use a multimap instead.
If you can't switch to any different structure/container, then there is no way to do this better than linear to size of list.
Having 3000 you would like to use a unordered map instead of a linked list, which will give you average O(1) lookup, insertion, and deletion time.

Fast, sorted data structure with random access?

I want to add several objects in a structure which allows this:
Insertion of objects, immediately ordering the entire structure on add so I have a descending ordering of an int;
Being able to change the int by which the objects are ordered (I mean: say that object number 2, now has a int of 5, so it reorders the structure);
Fast structure, because it will be completely iterated 60 times a second;
Being able to directly access the objects by position;
Only needs to be iterated from top to bottom: higher INT to lower INT
No deletion required, but could become useful later on.
Some indications on how to use the structure would be great, since I don't know much about the C++ standard libraries.
All of the operations that you've listed (except for lookup by index) can be supported by a standard binary search tree, keyed by integer values. This gives you the ability to iterate over the elements in sorted order and to keep the objects sorted during any insertion. As #njr mentioned, you can also update priorities by removing objects from the binary search tree, changing their priority, then reinserting them into the binary search tree.
To support random access by index, you should consider looking into order statistic trees, a variant on binary search trees that in addition to all other operations supports very fast (O(log n)) lookup of an element by its index. That is, you could very efficiently query for the 15th element in the sorted sequence, or the 17th, etc. Order statistic trees aren't part of the C++ standard libraries, but this older question contains answers that can link you to an implementation.
Use a set or a map
For requirement 1 - provide a custom sorting function
For 2 - remove the item and add it again (or provide a wrapper that does that)
3 doesn't make sense (How big is the list, how fast is the processor/ram)
For 4 - Are you sure you need that? It seems to be kind of weird to try to access it by position when the position can change suddenly (some item was added or removed)
5 - same as 1

How to implement an associative array/map/hash table data structure (in general and in C++)

Well I'm making a small phone book application and I've decided that using maps would be the best data structure to use but I don't know where to start. (Gotta implement the data structure from scratch - school work)
Tries are quite efficient for implementing maps where the keys are short strings. The wikipedia article explains it pretty well.
To deal with duplicates, just make each node of the tree store a linked list of duplicate matches
Here's a basic structure for a trie
struct Trie {
struct Trie* letter;
struct List *matches;
};
malloc(26*sizeof(struct Trie)) for letter and you have an array. if you want to support punctuations, add them at the end of the letter array.
matches can be a linked list of matches, implemented however you like, I won't define struct List for you.
Simplest solution: use a vector which contains your address entries and loop over the vector to search.
A map is usually implemented either as a binary tree (look for red/black trees for balancing) or as a hash map. Both of them are not trivial: Trees have some overhead for organisation, memory management and balancing, hash maps need good hash functions, which are also not trivial. But both structures are fun and you'll get a lot of insight understanding by implementing one of them (or better, both :-)).
Also consider to keep the data in the vector list and let the map contain indices to the vector (or pointers to the entries): then you can easily have multiple indices, say one for the name and one for the phone number, so you can look up entries by both.
That said I just want to strongly recommend using the data structures provided by the standard library for real-world-tasks :-)
A simple approach to get you started would be to create a map class that uses two vectors - one for the key and one for the value. To add an item, you insert a key in one and a value in another. To find a value, you just loop over all the keys. Once you have this working, you can think about using a more complex data structure.