How to find the count of an element in a matrix without brute force? Can we do that? - c++

I want to check whether an element is present in the given array(2D) and to find the count towards the left of a cell and to the right of a cell and also to the top and bottom.How can i do it without using brute force

If the array is sorted then you can find in O(nlogm) time complexity using binary search!
where n,m is the rows and columns.

In addition to the other answers (probably not very useful for any present applications, just an idea for thought) is Grover's algorithm. A exert from Wikipedia:
Grover's algorithm is a quantum algorithm for searching an unsorted
database with N entries in O(N1/2) time and using O(log N) storage
space (see big O notation). Lov Grover formulated it in 1996.
In models of classical computation, searching an unsorted database
cannot be done in less than linear time (so merely searching through
every item is optimal). Grover's algorithm illustrates that in the
quantum model searching can be done faster than this; in fact its time
complexity O(N1/2) is asymptotically the fastest possible for
searching an unsorted database in the linear quantum model.

If your matrix is unsorted, and you don't have anything like hash-tables for quick access, there is no way.
If your matrix is, for example, sorted, you can use more effective search algorithms (such as binary search, for example) to find the element faster. Don't forget that a 2D array can be represented with a vector, and a variable to hold the column count.

Related

How is binary search faster than linear search?

We need a sorted array to perform a binary search. In that case, the time complexity is already greater than the linear search, so isn't linear search a better option in that case?
A linear search runs in O(N) time, because it scans through the array from start to end.
On the other hand, a binary search first sorts the array in O(NlogN) time (if it is not already sorted), then performs lookups in O(logN) time.
For a small number of lookups, using a linear search would be faster than using binary search. However, whenever the number of lookups is greater than logN, binary search will theoretically have the upper hand in performance.
So, the answer to your question is: Linear search and binary search perform lookups in different ways. Linear search scans through the whole array, while binary search sorts the array first. These two search techniques have differing time complexities, but that does not mean that one will always be better than the other.
Specifically, linear search works well when the size of the list is small and/or you only need to perform a small number of lookups. Binary search should perform better in all other situations.
It'll be better if your container is sorted already or if you want to search for many values.
First of all for Binary Search the precondition is that the array is sorted, which means you do not need to resort it. Secondly if you are talking about integer arrays, you can use RadixSort O(d*n) or CountingSort O(n+l) which are similar to linear search in terms of complexity....
Binary search is faster than linear when the given array is already sorted.
For a sorted array, binary search offers an average O(log n) meanwhile linear offers O(n).
For any given array that is not sorted, linear search becomes best since O(n) is better than sorting the array ( using quicksort for example O(n log n) ) and then applying binary search after that, thus given O(n log n + log n) complexity.

Count of previously smaller elements encountered in an input stream of integers?

Given an input stream of numbers ranging from 1 to 10^5 (non-repeating) we need to be able to tell at each point how many numbers smaller than this have been previously encountered.
I tried to use the set in C++ to maintain the elements already encountered and then taking upper_bound on the set for the current number. But upper_bound gives me the iterator of the element and then again I have to iterate through the set or use std::distance which is again linear in time.
Can I maintain some other data structure or follow some other algorithm in order to achieve this task more efficiently?
EDIT : Found an older question related to fenwick trees that is helpful here. Btw I have solved this problem now using segment trees taking hints from #doynax comment.
How to use Binary Indexed tree to count the number of elements that is smaller than the value at index?
Regardless of the container you are using, it is very good idea to enter them as sorted set so at any point we can just get the element index or iterator to know how many elements are before it.
You need to implement your own binary search tree algorithm. Each node should store two counters with total number of its child nodes.
Insertion to binary tree takes O(log n). During the insertion counters of all parents of that new element should be incremented O(log n).
Number of elements that are smaller than the new element can be derived from stored counters O(log n).
So, total running time O(n log n).
Keep your table sorted at each step. Use binary search. At each point, when you are searching for the number that was just given to you by the input stream, binary search is going to find either the next greatest number, or the next smallest one. Using the comparison, you can find the current input's index, and its index will be the numbers that are less than the current one. This algorithm takes O(n^2) time.
What if you used insertion sort to store each number into a linked list? Then you can count the number of elements less than the new one when finding where to put it in the list.
It depends on whether you want to use std or not. In certain situations, some parts of std are inefficient. (For example, std::vector can be considered inefficient in some cases due to the amount of dynamic allocation that occurs.) It's a case-by-case type of thing.
One possible solution here might be to use a skip list (relative of linked lists), as it is easier and more efficient to insert an element into a skip list than into an array.
You have to use the skip list approach, so you can use a binary search to insert each new element. (One cannot use binary search on a normal linked list.) If you're tracking the length with an accumulator, returning the number of larger elements would be as simple as length-index.
One more possible bonus to using this approach is that std::set.insert() is log(n) efficient already without a hint, so efficiency is already in question.

Fastest way to search and sort vectors

I'm doing a project in which i need to insert data into vectors sort it and search it ...
i need fastest possible algorithms for sort and search ... i've been searching and found out that std::sort is basically quicksort which is one of the fastest sorts but i cant figure out which search algorithm is the best ? binarysearch?? can u help me with it? tnx ... So i've got 3 methods:
void addToVector(Obj o)
{
fvector.push_back(o);
}
void sortVector()
{
sort(fvector.begin(), fvector().end());
}
Obj* search(string& bla)
{
//i would write binary search here
return binarysearch(..);
}
I've been searching and found out that std::sort is basically
quicksort.
Answer: Not quite. Most implementations use a hybrid algorithm like
introsort, which combines quick-sort, heap-sort and insertion sort.
Quick-sort is one of the fastest sorting methods.
Answer: Not quite. In general it holds (i.e., in the average case quick-sort is of complexity). However, quick-sort has quadratic worst-case performance (i.e., ). Furthermore, for a small number of inputs (e.g., if you have a std::vector with a small numbers of elements) sorting with quick-sort tends to achieve worst performance than other sorting algorithms that are considered "slower" (see chart below):
I can't figure out which searching algorithm is the best. Is it binary-search?
Answer: Binary search has the same average and worst case performance (i.e., ). Also have in mind that binary-search requires that the container should be arranged in ascending or descending order. However, whether is better than other searching methods (e.g., linear search which has time complexity) depends on a number of factors. Some of them are:
The number of elements/objects (see chart below).
The type of elements/objects.
Bottom Line:
Usually looking for the "fastest" algorithm denotes premature optimization and according to one of the "great ones" (Premature optimization is the root of all evil - Donald Knuth). The "fastest", as I hope it has been clearly shown, depends on quite a number of factors.
Use std::sort to sort your std::vector.
After sorting your std::vector use std::binary_search to find out whether a certain element exists in your std::vector or use std::lower_bound or std::upper_bound to find and get an element from your std::vector.
For amortised O(1) access times, use a [std::unordered_map], maybe using a custom hash for best effects.
Sorting seems to be unneccessary extra work.
Searching and Sorting efficiency is highly dependent on the type of data, the ordering of the raw data, and the quantity of the data.
For example, for small sorted data sets, a linear search may be faster than a binary search; or the time differences between the two is negligible.
Some sort algorithms will perform horribly on inversely ordered data, such a binary tree sort. Data that does not have much variation may cause a high degree of collisions on hash algorithms.
Perhaps you need to answer the bigger question: Is search or sorting the execution bottleneck in my program? Profile and find out.
If you need the fastest or the best sorting algorithm... There is no such one. At least it haven't been found yet. There are algorithms that provide better results for different data, there are algorithms that provide good results for most of data. You either need to analyze your data and find the best one for your case or use generic algo like std::sort and expect it to provide good results but not the best.
if your elements are of integer you should use bucket sort algorithm which run at O(N) time instead of O(nlogn) average case as with qsort
[http://en.wikipedia.org/wiki/Bucket_sort]
Sorting
In case you want to know about the fastest sorting technique for integer values in a vector then I would suggest you to refer the following link:
https://github.com/fenilgmehta/Fastest-Integer-Sort
It uses radix sort and counting sort for large arrays and merge sort along with insertion sort for small arrays.
According to statistics, this sorting algorithm is way faster than C++ std::sort for integral values.
It is 6 times faster than C++ STL std::sort for "int64_t array[10000000]"
Searching
If you want to know whether a particular value is present in the vector or not, then you should use binary_search(...)
If you want to know the exact location of an element, then use lower_bound(...) and upper_bound(...)

Can we know if a collection is almost sorted without applying a sort algorithm?

In the wikipedia article on sorting algorithms,
http://en.wikipedia.org/wiki/Sorting_algorithm#Summaries_of_popular_sorting_algorithms
under Bubble sort it says:Bubble sort can also be used efficiently on a list of any length that is nearly sorted (that is, the elements are not significantly out of place)
So my question is: Without sorting the list using a sorting algoithm first, how can one know if that is nearly sorted or not?
Are you familiar with the general sorting lower bound? You can prove that in a comparison-based sorting algorithm, any sorting algorithm must make Ω(n log n) comparisons in the average case. The way you prove this is through an information-theoretic argument. The basic idea is that there are n! possible permutations of the input array, and since the only way you can learn about which permutation you got is to make comparisons, you have to make at least lg n! comparisons in order to be certain that you know the structure of your input permutation.
I haven't worked out the math on this, but I suspect that you could make similar arguments to show that it's difficult to learn how sorted a particular array is. Essentially, if you don't do a large number of comparisons, then you wouldn't be able to tell apart an array that's mostly sorted from an array that is actually quite far from sorted. As a result, all the algorithms I'm aware of that measure "sortedness" take a decent amount of time to do so.
For example, one measure of the level of "sortedness" in an array is the number of inversions in that array. You can count the number of inversions in an array in time O(n log n) using a divide-and-conquer algorithm based on mergesort, but with that runtime you could just sort the array instead.
Typically, the way that you'd know that your array was mostly sorted was to know something a priori about how it was generated. For example, if you're looking at temperature data gathered from 8AM - 12PM, it's very likely that the data is already mostly sorted (modulo some variance in the quality of the sensor readings). If your data looks at a stock price over time, it's also likely to be mostly sorted unless the company has a really wonky trajectory. Some other algorithms also partially sort arrays; for example, it's not uncommon for quicksort implementations to stop sorting when the size of the array left to sort is small and to follow everything up with a final insertion sort pass, since every element won't be very far from its final position then.
I don't believe there exists any standardized measure of how sorted or random an array is.
You can come up with your own measure - like count the number of adjacent pairs which are out of order (suggested in comment), or count the number of larger numbers which occur before smaller numbers in the array (this is trickier than a simple single pass).

An efficient sorting algorithm for almost sorted list containing time data?

The name says it all really. I suspect that insertion sort is best, since it's the best sort for mostly-sorted data in general. However, since I know more about the data there is a chance there are other sorts woth looking at. So the other relevant pieces of information are:
1) this is time data, which means I presumable could create an effective hash for ordering of data.
2) The data won't all exist at one time. instead I'll be reading in records which may contain a single vector, or dozen or hundreds of vectors. I want to output all time within a 5 second window. So it's possible that a sort that does the sorting as I insert the data would be a better option.
3) memory is not a big issue, but CPU speed is as this may be a bottleneck of the system.
Given these conditions can anyone suggest an algorithm that may be worth considering in addition to insertion sort? Also, How does one defined 'mostly sorted' to decide what is a good sort option? What I mean by that is how do I look at my data and decided 'this isn't as sorted as I thought it as, maybe insertion sort is no longer the best option'? Any link to an article which considered process complexity which better defines the complexity relative to the degree data is sorted would be appreciated.
Thanks
Edit:
thank you everyone for your information. I will be going with an easy insertion or merge sort (whichever I have already pre-written) for now. However, I'll be trying some of the other methods once were closer to the optimization phase (since they take more effort to implement). I appreciate the help
You could adopt option (2) you suggested - sort the data while you insert elements.
Use a skip list, sorted according to time, ascending to maintain your data.
Once a new entree arrives - check if it is larger then the last
element (easy and quick) if it is - simply append it (easy to do in a skip list). The
skip list will need to add 2 nodes on average for these cases, and will be O(1) on
average for these cases.
If the element is not larger then the last element - add it to the
skip list as a standard insert op, which will be O(logn).
This approach will yield you O(n+klogn) algorithm, where k is the number of elements inserted out of order.
I would throw in merge sort if you implement the natural version you get a best case of O(N) with a typical and worst case of O(N log N) if you have any problems. Insertion you get a worst case of O(N^2) and a best case of O(N).
You can sort a list of size n with k elements out of place in O(n + k lg k) time.
See: http://www.quora.com/How-can-I-quickly-sort-an-array-of-elements-that-is-already-sorted-except-for-a-small-number-of-elements-say-up-to-1-4-of-the-total-whose-positions-are-known/answer/Mark-Gordon-6?share=1
The basic idea is this:
Iterate over the elements of the array, building an increasing subsequence (if the current element is greater than or equal to the last element of the subsequence, append it to the end of the subsequence. Otherwise, discard both the current element and the last element of the subsequence). This takes O(n) time.
You will have discarded no more than 2k elements since k elements are out of place.
Sort the 2k elements that were discarded using an O(k lg k) sorting algorithm like merge sort or heapsort.
You now have two sorted lists. Merge the lists in O(n) time like you would in the merge step of merge sort.
Overall time complexity = O(n + k lg k)
Overall space complexity = O(n)
(this can be modified to run in O(1) space if you can merge in O(1) space, but it's by no means trivial)
Without fully understanding the problem, Timsort may fit the bill as you're alleging that your data is mostly sorted already.
There are many adaptive sorting algorithms out there that are specifically designed to sort mostly-sorted data. Ignoring the fact that you're storing dates, you might want to look at smoothsort or Cartesian tree sort as algorithms that can sort data that is reasonable sorted in worst-case O(n log n) time and best-case O(n) time. Smoothsort also has the advantage of requiring only O(1) space, like insertion sort.
Using the fact that everything is a date and therefore can be converted into an integer, you might want to look at binary quicksort (MSD radix sort) using a median-of-three pivot selection. This algorithm has best-case O(n log n) performance, but has a very low constant factor that makes it pretty competitive. Its worst case is O(n log U), where U is the number of bits in each date (probably 64), which isn't too bad.
Hope this helps!
If your OS or C library provides a mergesort function, it is very likely that it already handles the case where the data given is partially ordered (in any direction) running in O(N) time.
Otherwise, you can just copy the mergesort available from your favorite BSD operating system.