What would be the most efficient way to find a[i] = i in a sorted array?

What would be the most efficient way to find a[i] = i in a sorted array? - c++

Given an array a[], what would be the most efficient way to determine whether or not at least one element i satisfies the condition a[i] == i?
All the elements in the array are sorted and distinct, but they aren't necessarily integer types (i.e. they might be floating point types).

Several people have made claims about the relevance of “sorted”, “distinct” and “aren't necessarily integers”. In fact, proper selection of an efficient algorithm to solve this problem hinges on these characteristics. A more efficient algorithm would be possible if we could know that the values in the array were both distinct and integral, while a less efficient algorithm would be required if the values might be non-distinct, whether or not they were integral. And of course, if the array was not already sorted, you could sort it first (at average complexity O(n log n)) and then use the more efficient pre-sorted algorithm (i.e. for a sorted array), but in the unsorted case it would be more efficient to simply leave the array unsorted and run through it directly comparing the values in linear time (O(n)). Note that regardless of the algorithm chosen, best-case performance is O(1) (when the first element examined contains its index value); at any point during execution of any algorithm we might come across an element where a[i] == i at which point we return true; what actually matters in terms of algorithm performance in this problem is how quickly we can exclude all elements and declare that there is no such element a[i] where a[i] == i.
The problem does not state the sort order of a[], which is a pretty critical piece of missing information. If it’s ascending, the worst-case complexity will always be O(n), there’s nothing we can do to make the worst-case complexity better. But if the sort order is descending, even the worst-case complexity is O(log n): since values in the array are distinct and descending, there is only one possible index where a[i] could equal i, and basically all you have to do is a binary search to find the crossover point (where the ascending index values cross over the descending element values, if there even is such a crossover), and determine if a[c] == c at the crossover point index value c. Since that’s pretty trivial, I’ll proceed assuming that the sort order is ascending. Interestingly if the elements were integers, even in the ascending case there is a similar “crossover-like” situation (though in the ascending case there could be more than one a[i] == i match), so if the elements were integers, a binary search would also be applicable in the ascending case, in which case even the worst-case performance would be O(log n) (see Interview question - Search in sorted array X for index i such that X[i] = i). But we aren’t given that luxury in this version of the problem.
Here is how we might solve this problem:
Begin with the first element, a[0]. If its value is == 0, you’ve found an element which satisfies a[i] == i so return true. If its value is < 1, the next element (a[1]) could possibly contain the value 1, so you proceed to the next index. If, however, a[0] >= 1, you know (because the values are distinct) that the condition a[1] == 1 cannot possibly be true, so you can safely skip index 1. But you can even do better than that: For example, if a[0] == 12, you know (because the values are sorted in ascending order) that there cannot possibly be any elements that satisfy a[i] == i prior to element a[13]. Because the values in the array can be non-integral, we cannot make any further assumptions at this point, so the next element we can safely skip to directly is a[13] (e.g. a[1] through a[12] may all contain values between 12.000... and 13.000... such that a[13] could still equal exactly 13, so we have to check it).
Continuing that process yields an algorithm as follows:
// Algorithm 1
bool algorithm1(double* a, size_t len)
{
for (size_t i=0; i<len; ++i) // worst case is O(n)
{
if (a[i] == i)
return true; // of course we could also return i here (as an int)...
if (a[i] > i)
i = static_cast<size_t>(std::floor(a[i]));
}
return false; // ......in which case we’d want to return -1 here (an int)
}
This has pretty good performance if many of the values in a[] are greater than their index value, and has excellent performance if all values in a[] are greater than n (it returns false after only one iteration!), but it has dismal performance if all values are less than their index value (it will return false after n iterations). So we return to the drawing board... but all we need is a slight tweak. Consider that the algorithm could have been written to scan backwards from n down to 0 just as easily as it can scan forward from 0 to n. If we combine the logic of iterating from both ends toward the middle, we get an algorithm as follows:
// Algorithm 2
bool algorithm2(double* a, size_t len)
{
for (size_t i=0, j=len-1; i<j; ++i,--j) // worst case is still O(n)
{
if (a[i]==i || a[j]==j)
return true;
if (a[i] > i)
i = static_cast<size_t>(std::floor(a[i]));
if (a[j] < j)
j = static_cast<size_t>(std::ceil(a[j]));
}
return false;
}
This has excellent performance in both of the extreme cases (all values are less than 0 or greater than n), and has pretty good performance with pretty much any other distribution of values. The worst case is if all of the values in the lower half of the array are less than their index and all of the values in the upper half are greater than their index, in which case the performance degrades to the worst-case of O(n). Best case (either extreme case) is O(1), while average case is probably O(log n) but I’m deferring to someone with a math major to determine that with certainty.
Several people have suggested a “divide and conquer” approach to the problem, without specifying how the problem could be divided and what one would do with the recursively divided sub-problems. Of course such an incomplete answer would probably not satisfy the interviewer. The naïve linear algorithm and worst-case performance of algorithm 2 above are both O(n), while algorithm 2 improves the average-case performance to (probably) O(log n) by skipping (not examining) elements whenever it can. The divide-and-conquer approach can only outperform algorithm 2 if, in the average case, it is somehow able to skip more elements than algorithm 2 can skip. Let’s assume we divide the problem by splitting the array into two (nearly) equal contiguous halves , recursively, and decide if, with the resulting sub-problems, we are likely to be able to skip more elements than algorithm 2 could skip, especially in algorithm 2’s worst case. For the remainder of this discussion, let’s assume an input that would be worst-case for algorithm 2. After the first split, we can check both halves’ top & bottom elements for the same extreme case that results in O(1) performance for algorithm2, yet results in O(n) performance with both halves combined. This would be the case if all elements in the bottom half are less than 0 and all elements in the upper half are greater than n-1. In these cases, we can immediately exclude the bottom and/or top half with O(1) performance for any half we can exclude. Of course the performance of any half that cannot be excluded by that test remains to be determined after recursing further, dividing that half by half again until we find any segment whose top or bottom element contains its index value. That’s a reasonably nice performance improvement over algorithm 2, but it occurs in only certain special cases of algorithm 2’s worst case. All we’ve done with divide-and-conquer is decrease (slightly) the proportion of the problem space that evokes worst-case behavior. There are still worst-case scenarios for divide-and-conquer, and they exactly match most of the problem space that evokes worst-case behavior for algorithm 2.
So, given that the divide-and-conquer algorithm has less worst-case scenarios, doesn’t it make sense to go ahead and use a divide-and-conquer approach?
In a word, no. Well, maybe. If you know up front that about half of your data is less than 0 and half is greater than n, this special case would generally fare better with the divide-and-conquer approach. Or, if your system is multicore and your ‘n’ is large, it might be helpful to split the problem evenly between all of your cores, but once it’s split between them, I maintain that the sub-problems on each core are probably best solved with algorithm 2 above, avoiding further division of the problem and certainly avoiding recursion, as I argue below....
At each recursion level of a recursive divide-and-conquer approach, the algorithm needs some way to remember the as-yet-unsolved 2nd half of the problem while it recurses into the 1st half. Often this is done by having the algorithm recursively call itself first for one half and then for the other, a design which maintains this information implicitly on the runtime stack. Another implementation might avoid recursive function calls by maintaining essentially this same information on an explicit stack. In terms of space growth, algorithm 2 is O(1), but any recursive implementation is unavoidably O(log n) due to having to maintain this information on some sort of stack. But aside from the space issue, a recursive implementation has extra runtime overhead of remembering the state of as-yet-unrecursed-into subproblem halves until such time as they can be recursed into. This runtime overhead is not free, and given the simplicity of algorithm 2’s implementation above, I posit that such overhead is proportionally significant. Therefore I suggest that algorithm 2 above will roundly spank any recursive implementation for the vast majority of cases.

In the worst case, you can't do any better than checking every element. (Imagine something like a[i] = i + uniform_random(-.25, .25).) You'll need some information on what your input looks like.

Actually I would start from the last element, and do a basic check (for example, if you have 1000 elements, but highest is 100, you know you need only check 0..100). In a worst case scenario you still need to check every element, but it should be faster to find the areas where it may be possible. If it is as stated above (a[i] = i + [-0.25..0.25]), you are f($!ed and need to search every single element.

For a sorted array, you can perform an interpolation search. Similiar to a binary search, but assuming an even distribution of values, can be faster.

I think the main problem here is your conflicting statements:
a[i] == i
All the elements in the array are sorted and distinct , they need not be integer always.
If the array's value is equal to its accessing subscript that means it's an integer. If it's not an integer, and they're say.. char, what is considered "sorted"? ASCII value ( A < B < C)?
If it were an array of chars would we consider:
a[i] == i
to be true if
i == 6510 && a[i] == 'A'
If I were in this interview I would be grilling the interviewer with follow up questions before answering. That said...
If all we know is what you stated, we can safely say that we can find the value in O(n) because that is the time to make one full pass of the array. With more details we can probably limit this to O(log(n)) with a binary search of the array.

Noticed that all the elements in the array are sorted and distinct, so if we construct a new array b with b[i]=a[i]-i, elements in array b is also sorted, what we need to find is to find zeros in array b. I think binary search can solve the problem! Here is a link for count the number of occurrences in a sorted array. You can also do the similar Divide & Conquer technique on the original array without construct a auxiliary array! The time complexity is O(Logn)!
Take this as an example:
a=[0,1,2,4,8]
b=[0,0,0,1,4]
What we need to find is exactly index 0,1,2
Hope it helps!

Related

performance: find the index of max value in an arr(tie allowed)

Just as the title, and BTW, it's just out of curiosity and it's not a homework question. It might seem to be trivial for people of CS major. The problem is I would like to find the indices of max value in an array. Basically I have two approaches.
scan over and find the maximum, then scan twice to get the vector of indices
scan over and find the maximum, along this scan construct indices array and abandon if a better one is there.
May I now how should I weigh over these two approaches in terms of performance(mainly time complexity I suppose)? It is hard for me because I have even no idea what the worst case should be for the second approach! It's not a hard problem perse. But I just want to know how to approach this problem or how should I google this type of problem to get the answer.

In term of complexity:
scan over and find the maximum,
then scan twice to get the vector of indices
First scan is O(n).
Second scan is O(n) + k insertions (with k, the number of max value)
vector::push_back has amortized complexity of O(1).
so a total O(2 * n + k) which might be simplified to O(n) as k <= n
scan over and find the maximum,
along this scan construct indices array and abandon if a better one is there.
Scan is O(n).
Number of insertions is more complicated to compute.
Number of clear (and number of element cleared) is more complicated to compute too. (clear's complexity would be less or equal to number of element removed)
But both have upper bound to n, so complexity is less or equal than O(3 * n) = O(n) but also greater than equal to O(n) (Scan) so it is O(n) too.
So for both methods, complexity is the same: O(n).
For performance timing, as always, you have to measure.

For your first method, you can set a condition to add the index to the array. Whenever the max changes, you need to clear the array. You don't need to iterate twice.
For the second method, the implementation is easier. You just find max the first go. Then you find the indices that match on the second go.

As stated in a previous answer, complexity is O(n) in both cases, and measures are needed to compare performances.
However, I would like to add two points:
The first one is that the performance comparison may depend on the compiler, how optimisation is performed.
The second point is more critical: performance may depend on the input array.
For example, let us consider the corner case: 1,1,1, .., 1, 2, i.e. a huge number of 1 followed by one 2. With your second approach, you will create a huge temporary array of indices, to provide at the end an array of one element. It is possible at the end to redefine the size of the memory allocated to this array. However, I don't like the idea to create a temporary unnecessary huge vector, independently of the time performance concern. Note that such a array could suffer of several reallocations, which would impact time performance.
This is why in the general case, without any knowledge on the input, I would prefer your first approach, two scans. The situation could be different if you want to implement a function dedicated to a specific type of data.

Optimal data structure (in C++) for random access and looping through elements

I have the following problem: I have a set of N elements (N being somewhere between several hundred and several thousand elements, let's say between 500 and 3000 elements). Out of these elements, small percentage will have some property "X", but the elements "gain" and "lose" this property in a semi-random fashion; so if I store them all in an array, and assign 1 to elements with property X, and zero otherwise, this array of N elements will have n 1's and the N-n zeros (n being small in the 20-50 range).
The problem is the following: these elements change very frequently in a semi-random way (meaning that any element can flip from 0 to 1 and vice versa, but the process that controls that is somewhat stable, so the total number "n" fluctuates a bit, but is reasonably stable in the 20-50 range); and I frequently need all the "X" elements of the set (in other words, indices of the array where value of the array is 1), to perform some task on them.
One simple and slow way to achieve this is to simply loop through the array and if index k has value 1, perform the task, but this is kinda slow because well over 95% of all the elements have value 1. The solution would be to put all the 1s into a different structure (with n elements) and then loop through that structure, instead of looping through all N elements. The question is what's the best structure to use?
Elements will flip from 0 to 1 and vice versa randomly (from several different threads), so there's no order there of any sort (time when element flipped from 0 to 1 is has nothing to do with time it will flip back), and when I loop through them (from another thread), I do not need to loop in any particular order (in other words, I just need to get them all, but it's nor relevant in which order).
Any suggestions what would be the optimal structure for this? "std::map" comes to mind, but since the keys of std::map are sorted (and I don't need that feature), the questions is if there is anything faster?
EDIT: To clarify, the array example is just one (slow) way to solve the problem. The essence of the problem is that out of one big set "S" with "N" elements, there is a continuously changing subset "s" of "n" elements (with n much smaller then N), and I need to loop though that set "s". Speed is of essence, both for adding/removing elements to "s", and for looping through them. So while suggestions like having 2 arrays and moving elements between them would be fast from iteration perspective, adding and removing elements to an array would be prohibitively slow. It sounds like some hash-based approach like std::set would work reasonably fast on both iteration and addition/removal fronts, the question is is there something better than that? Reading the documentation on "unordered_map" and "unordered_set" doesn't really clarify how much faster addition/removal of elements is relative to std::map and std::set, nor how much slower the iteration through them would be. Another thing to keep in mind is that I don't need a generic solution that works best in all cases, I need one that works best when N is in the 500-3000 range, and n is in the 20-50 range. Finally, the speed is really of essence; there are plenty slow ways of doing it, so I'm looking for the fastest way.

Since order doesn't appear to be important, you can use a single array and keep the elements with property X at the front. You will also need an index or iterator to the point in the array that is the transition from X set to unset.
To set X, increment the index/iterator and swap that element with the one you want to change.
To unset X, do the opposite: decrement the index/iterator and swap that element with the one you want to change.
Naturally with multiple threads you will need some sort of mutex to protect the array and index.
Edit: to keep a half-open range as iterators are normally used, you should reverse the order of the operations above: swap, then increment/decrement. If you keep an index instead of an iterator then the index does double duty as the count of the number of X.

N=3000 isn't really much. If you use a single bit for each of them, you have a structure smaller than 400 bytes. You can use std::bitset for that. If you use an unordered_set or a set however be mindful that you'll spend many more bytes for each of the n elements in your list: if you just allocate a pointer for each element in a 64bit architecture you'll use at least 8*50 = 400 bytes, much more than the bitset

#geza : perhaps I misunderstood what you meant by two arrays; I assume you meant something like have one std::vector (or something similar) in which I store all elements with property X, and another where I store the rest? In reality, I don't care about others, so I really need one array. Adding an element is obviously simple if I can just add it to the end of the array; now, correct me if I'm wrong here, but finding an element in that array is O(n) operation (since the array is unsorted), and then removing it from the array again requires shifting all the elements by one place, so this in average requires n/2 operations. If I use linked list instead of vector, then deleting an element is faster, but finding it still takes O(n). That's what I meant when I said it would be prohibitively slow; if I misunderstood you, please do clarify.
It sounds like std::unordered_set or std::unordered_map would be fastest in adding/deleting elements, since it's O(1) to find an element, but it's unclear to me how fast can one loop through all the keys; the documentation clearly states that iteration through keys of std::unordered_map is slower then iteration through keys of std::map, but it's not quantified in any way just how slow is "slower", and how fast is "faster".
And finally, to repeat one more time, I'm not interested in general solution, I'm interested in one for small "n". So if for example I have two solutions, one that's k_1*log(n), and second that's k_2*n^2, first one might be faster in principle (and for large n), but if k_1 >> k_2 (let's say for example k_1 = 1000 and k_2=2 and n=20), second one can still be faster for relatively small "n" (1000*log(20) is still larger than 2*20^2). So even if addition/deletion in std::unordered_map might be done in constant time O(1), for small "n" it still matters if that constant time is 1 nanosecond or 1 microsecond or 1 millisecond. So I'm really looking for suggestions that work best for small "n", not for in the asymptotic limit of large "n".

An alternative approach (in my opinion worth only if the number of element is increased at least tenfold) might be keeping a double index:
#include<algorithm>
#include<vector>
class didx {
// v == indexes[i] && v > 0 <==> flagged[v-1] == i
std::vector<ptrdiff_t> indexes;
std::vector<ptrdiff_t> flagged;
public:
didx(size_t size) : indexes(size) {}
// loop through flagged items using iterators
auto begin() { return flagged.begin(); }
auto end() { return flagged.end(); }
void flag(ptrdiff_t index) {
if(!isflagged(index)) {
flagged.push_back(index);
indexes[index] = flagged.size();
}
}
void unflag(ptrdiff_t index) {
if(isflagged(index)) {
// swap last item with item to be removed in "flagged", update indexes accordingly
// in "flagged" we swap last element with element at index to be removed
auto idx = indexes[index]-1;
auto last_element = flagged.back();
std::swap(flagged.back(),flagged[idx]);
std::swap(indexes[index],indexes[last_element]);
// remove the element, which is now last in "flagged"
flagged.pop_back();
indexes[index] = 0;
}
}
bool isflagged(ptrdiff_t index) {
return indexes[index] > 0;
}
};

Find that unique element from the 10^5 array size [duplicate]

This question already has answers here:
How to find the only number in an array that doesn't occur twice [duplicate]
(5 answers)
Closed 7 years ago.
What would be the best algorithm for finding a number that occurs only once in a list which has all other numbers occurring exactly twice.
So, in the list of integers (lets take it as an array) each integer repeats exactly twice, except one. To find that one, what is the best algorithm.

The fastest (O(n)) and most memory efficient (O(1)) way is with the XOR operation.
In C:
int arr[] = {3, 2, 5, 2, 1, 5, 3};
int num = 0, i;
for (i=0; i < 7; i++)
num ^= arr[i];
printf("%i\n", num);
This prints "1", which is the only one that occurs once.
This works because the first time you hit a number it marks the num variable with itself, and the second time it unmarks num with itself (more or less). The only one that remains unmarked is your non-duplicate.

By the way, you can expand on this idea to very quickly find two unique numbers among a list of duplicates.
Let's call the unique numbers a and b. First take the XOR of everything, as Kyle suggested. What we get is a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask -- in more detail: choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that a and b are in different buckets. We also know that each pair of duplicates is still in the same bucket. So we can now apply ye olde "XOR-em-all" trick to each bucket independently, and discover what a and b are completely.
Bam.

O(N) time, O(N) memory
HT= Hash Table
HT.clear()
go over the list in order
for each item you see
if(HT.Contains(item)) -> HT.Remove(item)
else
ht.add(item)
at the end, the item in the HT is the item you are looking for.
Note (credit #Jared Updike): This system will find all Odd instances of items.
comment: I don't see how can people vote up solutions that give you NLogN performance. in which universe is that "better" ?
I am even more shocked you marked the accepted answer s NLogN solution...
I do agree however that if memory is required to be constant, then NLogN would be (so far) the best solution.

Kyle's solution would obviously not catch situations were the data set does not follow the rules. If all numbers were in pairs the algorithm would give a result of zero, the exact same value as if zero would be the only value with single occurance.
If there were multiple single occurance values or triples, the result would be errouness as well.
Testing the data set might well end up with a more costly algorithm, either in memory or time.
Csmba's solution does show some errouness data (no or more then one single occurence value), but not other (quadrouples). Regarding his solution, depending on the implementation of HT, either memory and/or time is more then O(n).
If we cannot be sure about the correctness of the input set, sorting and counting or using a hashtable counting occurances with the integer itself being the hash key would both be feasible.

I would say that using a sorting algorithm and then going through the sorted list to find the number is a good way to do it.
And now the problem is finding "the best" sorting algorithm. There are a lot of sorting algorithms, each of them with its strong and weak points, so this is quite a complicated question. The Wikipedia entry seems like a nice source of info on that.

Implementation in Ruby:
a = [1,2,3,4,123,1,2,.........]
t = a.length-1
for i in 0..t
s = a.index(a[i])+1
b = a[s..t]
w = b.include?a[i]
if w == false
puts a[i]
end
end

You need to specify what you mean by "best" - to some, speed is all that matters and would qualify an answer as "best" - for others, they might forgive a few hundred milliseconds if the solution was more readable.
"Best" is subjective unless you are more specific.
That said:
Iterate through the numbers, for each number search the list for that number and when you reach the number that returns only a 1 for the number of search results, you are done.

Seems like the best you could do is to iterate through the list, for every item add it to a list of "seen" items or else remove it from the "seen" if it's already there, and at the end your list of "seen" items will include the singular element. This is O(n) in regards to time and n in regards to space (in the worst case, it will be much better if the list is sorted).
The fact that they're integers doesn't really factor in, since there's nothing special you can do with adding them up... is there?
Question
I don't understand why the selected answer is "best" by any standard. O(N*lgN) > O(N), and it changes the list (or else creates a copy of it, which is still more expensive in space and time). Am I missing something?

Depends on how large/small/diverse the numbers are though. A radix sort might be applicable which would reduce the sorting time of the O(N log N) solution by a large degree.

The sorting method and the XOR method have the same time complexity. The XOR method is only O(n) if you assume that bitwise XOR of two strings is a constant time operation. This is equivalent to saying that the size of the integers in the array is bounded by a constant. In that case you can use Radix sort to sort the array in O(n).
If the numbers are not bounded, then bitwise XOR takes time O(k) where k is the length of the bit string, and the XOR method takes O(nk). Now again Radix sort will sort the array in time O(nk).

You could simply put the elements in the set into a hash until you find a collision. In ruby, this is a one-liner.
def find_dupe(array)
h={}
array.detect { |e| h[e]||(h[e]=true; false) }
end
So, find_dupe([1,2,3,4,5,1]) would return 1.
This is actually a common "trick" interview question though. It is normally about a list of consecutive integers with one duplicate. In this case the interviewer is often looking for you to use the Gaussian sum of n-integers trick e.g. n*(n+1)/2 subtracted from the actual sum. The textbook answer is something like this.
def find_dupe_for_consecutive_integers(array)
n=array.size-1 # subtract one from array.size because of the dupe
array.sum - n*(n+1)/2
end

How to efficiently nearly sort a list?

I have a list of items; I want to sort them, but I want a small element of randomness so they are not strictly in order, only on average ordered.
How can I do this most efficiently?
I don't mind if the quality of the random is not especially good, e.g. it simply based on the chance ordering of the input, e.g. an early-terminated incomplete sort.
The context is implementing a nearly-greedy search by introducing a very slight element of inexactness; this is in a tight loop and so the speed of sorting and calling random() are to be considered
My current code is to do a std::sort (this being C++) and then do a very short shuffle just in the early part of the array:
for(int i=0; i<3; i++) // I know I have more than 6 elements
std::swap(order[i],order[i+rand()%3]);

Use first two passes of JSort. Build heap twice, but do not perform insertion sort. If element of randomness is not small enough, repeat.
There is an approach that (unlike incomplete JSort) allows finer control over the resulting randomness and has time complexity dependent on randomness (the more random result is needed, the less time complexity). Use heapsort with Soft heap. For detailed description of the soft heap, see pdf 1 or pdf 2.

You could use a standard sort algorithm (is a standard library available?) and pass a predicate that "knows", given two elements, which is less than the other, or if they are equal (returning -1, 0 or 1). In the predicate then introduce a rare (configurable) case where the answer is random, by using a random number:
pseudocode:
if random(1000) == 0 then
return = random(2)-1 <-- -1,0,-1 randomly choosen
Here we have 1/1000 chances to "scamble" two elements, but that number strictly depends on the size of your container to sort.
Another thing to add in the 1000 case, could be to remove the "right" answer because that would not scramble the result!
Edit:
if random(100 * container_size) == 0 then <-- here I consider the container size
{
if element_1 < element_2
return random(1); <-- do not return the "correct" value of -1
else if element_1 > element_2
return random(1)-1; <-- do not return the "correct" value of 1
else
return random(1)==0 ? -1 : 1; <-- do not return 0
}
in my pseudocode:
random(x) = y where 0 <= y <=x

One possibility that requires a bit more space but would guarantee that existing sort algorithms could be used without modification would be to create a copy of the sort value(s) and then modify those in some fashion prior to sorting (and then use the modified value(s) for the sort).
For example, if the data to be sorted is a simple character field Name[N] then add a field (assuming data is in a structure or class) called NameMod[N]. Fill in the NameMod with a copy of Name but add some randomization. Then 3% of the time (or some appropriate amount) change the first character of the name (e.g., change it by +/- one or two characters). And then 10% of the time change the second character +/- a few characters.
Then run it through whatever sort algorithm you prefer. The benefit is that you could easily change those percentages and randomness. And the sort algorithm will still work (e.g., it would not have problems with the compare function returning inconsistent results).

If you are sure that element is at most k far away from where they should be, you can reduce quicksort N log(N) sorting time complexity down to N log(k)....
edit
More specifically, you would create k buckets, each containing N/k elements.
You can do quick sort for each bucket, which takes k * log(k) times, and then sort N/k buckets, which takes N/k log(N/k) time. Multiplying these two, you can do sorting in N log(max(N/k,k))
This can be useful because you can run sorting for each bucket in parallel, reducing total running time.
This works if you are sure that any element in the list is at most k indices away from their correct position after the sorting.
but I do not think you meant any restriction.

Split the list into two equally-sized parts. Sort each part separately, using any usual algorithm. Then merge these parts. Perform some merge iterations as usual, comparing merged elements. For other merge iterations, do not compare the elements, but instead select element from the same part, as in the previous step. It is not necessary to use RNG to decide, how to treat each element. Just ignore sorting order for every N-th element.
Other variant of this approach nearly sorts an array nearly in-place. Split the array into two parts with odd/even indexes. Sort them. (It is even possible to use standard C++ algorithm with appropriately modified iterator, like boost::permutation_iterator). Reserve some limited space at the end of the array. Merge parts, starting from the end. If merged part is going to overwrite one of the non-merged elements, just select this element. Otherwise select element in sorted order. Level of randomness is determined by the amount of reserved space.

Assuming you want the array sorted in ascending order, I would do the following:
for M iterations
pick a random index i
pick a random index k
if (i<k)!=(array[i]<array[k]) then swap(array[i],array[k])
M controls the "sortedness" of the array - as M increases the array becomes more and more sorted. I would say a reasonable value for M is n^2 where n is the length of the array. If it is too slow to pick random elements then you can precompute their indices beforehand. If the method is still too slow then you can always decrease M at the cost of getting a poorer sort.

Take a small random subset of the data and sort it. You can use this as a map to provide an estimate of where every element should appear in the final nearly-sorted list. You can scan through the full list now and move/swap elements that are not in a good position.
This is basically O(n), assuming the small initial sorting of the subset doesn't take a long time. Hopefully you can build the map such that the estimate can be extracted quickly.

Bubblesort to the rescue!
For a unsorted array, you could pick a few random elements and bubble them up or down. (maybe by rotation, which is a bit more efficient) It will be hard to control the amount of (dis)order, even if you pick all N elements, you are not sure that the whole array will be sorted, because elements are moved and you cannot ensure that you touched every element only once.
BTW: this kind of problem tends to occur in game playing engines, where the list with candidate moves is kept more-or-less sorted (because of weighted sampling), and sorting after each iteration is too expensive, and only one or a few elements are expected to move.

Algorithm to find a duplicate entry in constant space and O(n) time

Given an array of N integer such that only one integer is repeated. Find the repeated integer in O(n) time and constant space. There is no range for the value of integers or the value of N
For example given an array of 6 integers as 23 45 67 87 23 47. The answer is 23
(I hope this covers ambiguous and vague part)
I searched on the net but was unable to find any such question in which range of integers was not fixed.
Also here is an example that answers a similar question to mine but here he created a hash table with the highest integer value in C++.But the cpp does not allow such to create an array with 2^64 element(on a 64-bit computer).
I am sorry I didn't mention it before the array is immutable

Jun Tarui has shown that any duplicate finder using O(log n) space requires at least Ω(log n / log log n) passes, which exceeds linear time. I.e. your question is provably unsolvable even if you allow logarithmic space.
There is an interesting algorithm by Gopalan and Radhakrishnan that finds duplicates in one pass over the input and O((log n)^3) space, which sounds like your best bet a priori.
Radix sort has time complexity O(kn) where k > log_2 n often gets viewed as a constant, albeit a large one. You cannot implement a radix sort in constant space obviously, but you could perhaps reuse your input data's space.
There are numerical tricks if you assume features about the numbers themselves. If almost all numbers between 1 and n are present, then simply add them up and subtract n(n+1)/2. If all the numbers are primes, you could cheat by ignoring the running time of division.
As an aside, there is a well-known lower bound of Ω(log_2(n!)) on comparison sorting, which suggests that google might help you find lower bounds on simple problems like finding duplicates as well.

If the array isn't sorted, you can only do it in O(nlogn).
Some approaches can be found here.

If the range of the integers is bounded, you can perform a counting sort variant in O(n) time. The space complexity is O(k) where k is the upper bound on the integers(*), but that's a constant, so it's O(1).
If the range of the integers is unbounded, then I don't think there's any way to do this, but I'm not an expert at complexity puzzles.
(*) It's O(k) since there's also a constant upper bound on the number of occurrences of each integer, namely 2.

In the case where the entries are bounded by the length of the array, then you can check out Find any one of multiple possible repeated integers in a list and the O(N) time and O(1) space solution.
The generalization you mention is discussed in this follow up question: Algorithm to find a repeated number in a list that may contain any number of repeats and the O(n log^2 n) time and O(1) space solution.

The approach that would come closest to O(N) in time is probably a conventional hash table, where the hash entries are simply the numbers, used as keys. You'd walk through the list, inserting each entry in the hash table, after first checking whether it was already in the table.
Not strictly O(N), however, since hash search/insertion gets slower as the table fills up. And in terms of storage it would be expensive for large lists -- at least 3x and possibly 10-20x the size of the array of numbers.

As was already mentioned by others, I don't see any way to do it in O(n).
However, you can try a probabilistic approach by using a Bloom Filter. It will give you O(n) if you are lucky.

Since extra space is not allowed this can't be done without comparison.The concept of lower bound on the time complexity of comparison sort can be applied here to prove that the problem in its original form can't be solved in O(n) in the worst case.

We can do in linear time o(n) here as well
public class DuplicateInOnePass {
public static void duplicate()
{
int [] ar={6,7,8,8,7,9,9,10};
Arrays.sort(ar);
for (int i =0 ; i <ar.length-1; i++)
{
if (ar[i]==ar[i+1])
System.out.println("Uniqie Elements are" +ar[i]);
}
}
public static void main(String[] args) {
duplicate();
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

What would be the most efficient way to find a[i] = i in a sorted array? - c++

Given an array a[], what would be the most efficient way to determine whether or not at least one element i satisfies the condition a[i] == i? All the elements in the array are sorted and distinct, but they aren't necessarily integer types (i.e. they might be floating point types).

In the worst case, you can't do any better than checking every element. (Imagine something like a[i] = i + uniform_random(-.25, .25).) You'll need some information on what your input looks like.

For a sorted array, you can perform an interpolation search. Similiar to a binary search, but assuming an even distribution of values, can be faster.

Related

performance: find the index of max value in an arr(tie allowed)

Optimal data structure (in C++) for random access and looping through elements

Find that unique element from the 10^5 array size [duplicate]

How to efficiently nearly sort a list?

Algorithm to find a duplicate entry in constant space and O(n) time

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

What would be the most efficient way to find a[i] = i in a sorted array? - c++

Given an array a[], what would be the most efficient way to determine whether or not at least one element i satisfies the condition a[i] == i? All the elements in the array are sorted and distinct, but they aren't necessarily integer types (i.e. they might be floating point types).

In the worst case, you can't do any better than checking every element. (Imagine something like a[i] = i + uniform_random(-.25, .25).) You'll need some information on what your input looks like.

For a sorted array, you can perform an interpolation search. Similiar to a binary search, but assuming an even distribution of values, can be faster.

Related

performance: find the index of max value in an arr(tie allowed)

Optimal data structure (in C++) for random access and looping through elements

Find that unique element from the 10^5 array size [duplicate]

How to efficiently *nearly* sort a list?

Algorithm to find a duplicate entry in constant space and O(n) time

Categories

Resources

How to efficiently nearly sort a list?