How to find specific string in array or linkedlist - c++

I want to find string has a specific short string array or linked list. I make a small program that search conference or workshop like http://dblp.uni-trier.de/ using c++. What I wonder is how to fast search string in an array or linked list. When use string.find() function, I think this function's performance have O(n) time complexity if array's length is n. Can I improve performance lower than O(n)?? Help me, please

For an array, unless it is sorted, the best you can do is O(n) average/worst case because you have to look linearly until you find the desired string. If it is sorted (which would take O(nlog(n)) to do the sorting), you can make it O(log(n)) searching using a binary search. For linked lists, the best you can do, regardless of sorted-ness, is O(n).

If you really want to complicate your code, at each insert to list store pointer to node in some balanced tree.Where nodes will be inserted based on string from that node comparisions. Then you can get string in O(logn) time.
If you want to get fast retrievals use hash map it will give you O(1) Time.

Related

c++ find vector of element present in the string

I have vector of strings. It contains some names. I need to search whether particular string is present in the vector. Eg: vector of string contains "Name" and "Age". Search string is "NameXYZ". So I have to search whether "NameXYZ" contains any of the vector element. Since one of the vector element is "Name", it should return true. Is there any possibility to achieve this without iterating.
The answer is NO.
It's impossibile to search something in a vector without iterating.
The vector is unorder and unmapped container so you need to iterating to it to find something.
I attach you this link to the cppreference site:
std::vector - cppreference.com
The more complex is answer is SOMETIMES
What you are looking for is something like a hash set. Or for your situation hash map with Key=Name and Value=Age.
This works by defining a function that turns a string into a number, called a hash. When you want to test whether the string is in the list, you calculate what number it has. You then get a list of potential candidates and iterate through those.
If you are lucky, every string has a unique number and you need to test at most one string. You still have to search for the number, but if you use the standard container, you can be certain that it is optimized. Unless you want to spend a long time making your own, it's a simple easy win.
However, be aware that it is very hard to get any search faster than O(Log(N)) complexity. This method is still O(log(N)) complex under the hood, as it has to still search for the hash.

What is fastest searching algorithm for searching IPv6:port combination from given list of IPv6:ports in O(1) time compexity?

I want efficient searching algorithm for searching the IP:PORT combination form pre-stored list of IP:PORT in a given list of vector string.
Any help appreciated.
with the data structure of vector list, it's may very hard to have O(1) complexiy but O(n). because you must read the list at least once. Bu if you do some pre process, and put them in map, you may get faster (almost O(1)) performance later in your program.

Count of previously smaller elements encountered in an input stream of integers?

Given an input stream of numbers ranging from 1 to 10^5 (non-repeating) we need to be able to tell at each point how many numbers smaller than this have been previously encountered.
I tried to use the set in C++ to maintain the elements already encountered and then taking upper_bound on the set for the current number. But upper_bound gives me the iterator of the element and then again I have to iterate through the set or use std::distance which is again linear in time.
Can I maintain some other data structure or follow some other algorithm in order to achieve this task more efficiently?
EDIT : Found an older question related to fenwick trees that is helpful here. Btw I have solved this problem now using segment trees taking hints from #doynax comment.
How to use Binary Indexed tree to count the number of elements that is smaller than the value at index?
Regardless of the container you are using, it is very good idea to enter them as sorted set so at any point we can just get the element index or iterator to know how many elements are before it.
You need to implement your own binary search tree algorithm. Each node should store two counters with total number of its child nodes.
Insertion to binary tree takes O(log n). During the insertion counters of all parents of that new element should be incremented O(log n).
Number of elements that are smaller than the new element can be derived from stored counters O(log n).
So, total running time O(n log n).
Keep your table sorted at each step. Use binary search. At each point, when you are searching for the number that was just given to you by the input stream, binary search is going to find either the next greatest number, or the next smallest one. Using the comparison, you can find the current input's index, and its index will be the numbers that are less than the current one. This algorithm takes O(n^2) time.
What if you used insertion sort to store each number into a linked list? Then you can count the number of elements less than the new one when finding where to put it in the list.
It depends on whether you want to use std or not. In certain situations, some parts of std are inefficient. (For example, std::vector can be considered inefficient in some cases due to the amount of dynamic allocation that occurs.) It's a case-by-case type of thing.
One possible solution here might be to use a skip list (relative of linked lists), as it is easier and more efficient to insert an element into a skip list than into an array.
You have to use the skip list approach, so you can use a binary search to insert each new element. (One cannot use binary search on a normal linked list.) If you're tracking the length with an accumulator, returning the number of larger elements would be as simple as length-index.
One more possible bonus to using this approach is that std::set.insert() is log(n) efficient already without a hint, so efficiency is already in question.

How to improve linked list searching. C++

I have simple method in C++ which searchs for string in linked list. That works well but I need to make it faster. Is it possible? Maybe I need to insert items into list in alphabetical order? But I dont think it could help in serching list anymore. In list there is about 300 000 items (words).
int GetItemPosition(const char* stringToFind)
{
int i = 0;
MyList* Tmp = FistListItem;
while (Tmp){
if (!strcmp(Tmp->Value, stringToFind))
{
return i;
}
Tmp = Tmp->NextItem;
i++;
}
return -1;
}
Method returns the position number if item found, otherwise returns -1.
Any sugesstion will be helpfull.
Thanks for answers, I can change structure. I have only one constraint. Code must implement the following interface:
int Count(void);
int AddItem(const char* StringValue, int WordOccurrence);
int GetItemPosition(const char* StringValue);
char* GetString(int Index);
int GetOccurrenceNum(int Index);
void SetInteger(int Index, int WordOccurrence);
So which structure will be the in your opinion the most suitable?
Searching a linked list is linear so you need to iterate from beginning one by one so it is O(n). Linked lists are not the best if you will use it for searching, you can utilize more suitable data structures such as binary trees.
Ordering elements does not help much because still you need to iterate each element anyway.
Wikipedia article says:
In an unordered list, one simple heuristic for decreasing average search time is the move-to-front heuristic, which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again.
Another common approach is to "index" a linked list using a more
efficient external data structure. For example, one can build a
red-black tree or hash table whose elements are references to the
linked list nodes. Multiple such indexes can be built on a single
list. The disadvantage is that these indexes may need to be updated
each time a node is added or removed (or at least, before that index
is used again).
So in the first case you can slightly improve (by statistical assumptions) your search performance by moving items found previously closer to the beginning of the list. This assumes that previously found elements will be searched more frequently.
Second method requires to use other data structures.
If using linked lists is not a hard requirement, consider using hash tables, sorted arrays (random access) or balanced trees.
Consider using array or std::vector as a storage instead of linked list, and use binary search to find particular string, or even better, std::set, if you don't need a numerical index. If for some reasons it is not possible to use other containers, there is not much possible to do - you may want to speed up the process of comparison by storing hash of the string along with it in node.
I suggest hashing.
Since you've already got a linked list of your own), you can try chaining with linked lists for collision resolution.
Rather than using a linear linked list, you may want to use a binary search tree, or a red/black tree. These trees are designed on minimizing the traversals to find an item.
You could also store "short cut links". For example, if the list is of strings, you could have an array of links of where to start searching based on the first letter.
For example, shortcut['B'] would return a pointer to the first link to start searching for strings starting with 'B'.
The answer is no, you cannot improve the search without changing your data-structure.
As it stands, sorting the list will not give you a faster search for any random item.
It will only allow you to quickly decide if the given item is in the list by testing against the first item (which will be either the smallest or the largest entry) and this improvement is not likely to make a big difference.
So can you please edit your question and explain to us your constraints?
Can you use a completely different data structure, like an array or a tree? (as others have suggested)
If not, can you modify the way your linked list is linked?
If not, we will be unlikely to help you...
The best option is to use faster data structure for storing strings:
std::map - red-black tree behind the scenes. Has O(logn) for search/insert/delete operations. Suitable if you want to store additional values with strings (for example - positions).
std::set - basically the same tree but without values. Best for case when you need only contains operation.
std::unordered_map - hash table. O(1) access.
std::unordered_set - hash set. Also O(1) access.
Note. But in all of these cases there is a catch. Complexity is calculated only based on n (count of strings). In reality string comparison is not free. So, O(1) becomes O(m), O(logn) becomes O(mlogn) (where m is maximal length of string). This does not matter in case of relatively short strings. But if this is not true consider using Trie. In practice trie can be even faster than hash table - each character of query string is accessed only once instead of multiple times. For hash table/set it's at least once for hash calculation and at least once for actual string comparison (depending on collision resolution strategy - not sure how it is implemented in C++).

Why does inserting sequential elements in a tree require more time than inserting random elements into a tree?

This is not homework I'm taking a data structures class and we recently finished trees. At the end of class, my professor showed this image.
ConcreteBTree is a binary tree that doesnt self balance. I have a few questions about the times it took to complete these procedures.
Why does it take so much more time to insert 100,000 sequential elements into ConcreteBTree than it takes to insert random elements into it? My intuition would be that since elements are sequential, it should take less time than it takes to insert 1,000,000 random elements.
Why are the times of insert() and find() of ConcreteBTree with random elements so close together? Is it because both have the same time complexity? I thought insert was O(1) and find was O(n)
I'd really like to understand what is going on here, any explanation would be greatly appreciated. Thanks
Inserting sequential items( 1,2,3,4...) to a binary tree will cause it to always add the nodes to the same side( left for example ) .
When you insert random items you will add nodes randomly left and right.
Adding sequentially will cause the list to behave as a ordinary linked list ( for the sequential items) because new items will have to visit every previously added item and that will take O(n) steps , when adding randomly it will take O( log N) steps on average.
Armin's answered Q1.
2.Why are the times of insert() and find() of ConcreteBTree with random elements so close together? Is it because both have the same time complexity? I thought insert was O(1) and find was O(n)
insert and find have to do the same work - they go down through whatever weird tree you've put together looking for that last node under which the value either is linked or would be (and will be in the case of insert), so they do the same number of comparisons and node traversals, taking similar time.
Insertion of random elements in a balanced tree is O(log2N). Your insertions of random values into an tree that doesn't self-rebalance will be a bit but not dramatically worse as some branches will end up considerably longer than others - you'll probably get some kind of bell curve of branch lengths. insert's only O(1) if you already know the node in the tree under which the insert is to be done (i.e. that find step above is normally needed). find's only O(n) if every node in the tree has to be visited, which is only the case for a pathologically unbalanced tree, effectively forming a linked list, as you've already been told you can generate by inserting pre-sorted elements.