find duplicate number in an array

find duplicate number in an array - c++

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
{
if (nums.size() > 1)
{
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
{
slow = nums[slow];
fast = nums[nums[fast]];
}
fast = 0;
while (fast != slow)
{
fast = nums[fast];
slow = nums[slow];
}
return slow;
}
return -1;
}

Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
if(len>1){
int slow = arr[0];
int fast = arr[arr[0]];
while(slow!=fast){
slow = arr[slow];
fast = arr[arr[fast]];
}
fast = 0;
while(slow!=fast){
slow = arr[slow];
fast = arr[fast];
}
return slow;
}
return -1;
}
int main() {
vector<int>v = {1,2,2,3,4};
cout<<findDup(v)<<endl;
return 0;
}
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].

The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.

Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.

Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
Algo:
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
Code:
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
cout<< abs(arr[i]) <<endl;
}
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

Related

1838. Frequency of the Most Frequent Element leetcode C++

I am trying LeetCode problem 1838. Frequency of the Most Frequent Element:
The frequency of an element is the number of times it occurs in an array.
You are given an integer array nums and an integer k. In one operation, you can choose an index of nums and increment the element at that index by 1.
Return the maximum possible frequency of an element after performing at most k operations.
I am getting a Wrong Answer error for a specific test case.
My code
int checkfreq(vector<int>nums,int k,int i)
{
//int sz=nums.size();
int counter=0;
//int i=sz-1;
int el=nums[i];
while(k!=0 && i>0)
{
--i;
while(nums[i]!=el && k>0 && i>=0)
{
++nums[i];
--k;
}
}
counter=count(nums.begin(),nums.end(),el);
return counter;
}
class Solution {
public:
int maxFrequency(vector<int>& nums, int k) {
sort(nums.begin(),nums.end());
vector<int> nums2=nums;
auto distinct=unique(nums2.begin(),nums2.end());
nums2.resize(distance(nums2.begin(),distinct));
int xx=nums.size()-1;
int counter=checkfreq(nums,k,xx);
for(int i=nums2.size()-2;i>=0;--i)
{
--xx;
int temp=checkfreq(nums,k,xx);
if(temp>counter)
counter=temp;
}
return counter;
}
};
Failing test case
Input
nums = [9968,9934,9996,9928,9934,9906,9971,9980,9931,9970,9928,9973,9930,9992,9930,9920,9927,9951,9939,9915,9963,9955,9955,9955,9933,9926,9987,9912,9942,9961,9988,9966,9906,9992,9938,9941,9987,9917,10000,9919,9945,9953,9994,9913,9983,9967,9996,9962,9982,9946,9924,9982,9910,9930,9990,9903,9987,9977,9927,9922,9970,9978,9925,9950,9988,9980,9991,9997,9920,9910,9957,9938,9928,9944,9995,9905,9937,9946,9953,9909,9979,9961,9986,9979,9996,9912,9906,9968,9926,10000,9922,9943,9982,9917,9920,9952,9908,10000,9914,9979,9932,9918,9996,9923,9929,9997,9901,9955,9976,9959,9995,9948,9994,9996,9939,9977,9977,9901,9939,9953,9902,9926,9993,9926,9906,9914,9911,9901,9912,9990,9922,9911,9907,9901,9998,9941,9950,9985,9935,9928,9909,9929,9963,9997,9977,9997,9938,9933,9925,9907,9976,9921,9957,9931,9925,9979,9935,9990,9910,9938,9947,9969,9989,9976,9900,9910,9967,9951,9984,9979,9916,9978,9961,9986,9945,9976,9980,9921,9975,9999,9922]
k = 1524
Output
Expected: 81
My code returns: 79
I tried to solve as many cases as I could. I realise this is a bruteforce approach, but don't understand why my code is giving the wrong answer.
My approach is to convert numbers from last into the specified number. I need to check these as we have to count how many maximum numbers we can convert. Then this is repeated for every number till second last number. This is basically what I was thinking while writing this code.

The reason for the different output is that your xx index is only decreased one unit at each iteration of the i loop. But that loop is iterating for the number of unique elements, while xx is an index in the original vector. When there are many duplicates, that means xx is coming nowhere near the start of the vector and so it misses opportunities there.
You could fix that problem by replacing:
--xx;
...with:
--xx;
while (xx >= 0 && nums[xx] == nums[xx+1]) --xx;
if (xx < 0) break;
That will solve the issue you raise. You can also drop the unique call, and the distinct, nums2 and i variables. The outer loop could just check that xx > 0.
Efficiency is your next problem
Your algorithm is not as efficient as needed, and other tests with huge input data will time out.
Hint 1: checkfreq's inner loop is incrementing nums[i] one unit at a time. Do you see a way to have it increase with a larger amount, so to avoid that inner loop?
Hint 2 (harder): checkfreq is often incrementing the same value in different calls -- even more so when k is large and the section of the vector that can be incremented is large. Can you think of a way to avoid that checkfreq needs to redo that much work in subsequent calls, and can only concentrate on what is different compared to what it had to calculate in the previous call?

What is the use of last--; here?

I am new to C++ as well as algorithm, can anyone helps explain me the use of the (last--;) in the middle of my code? The explanation I have got is every times the array pass through will add one more value, so we need to put a last-- out there. I have tried to remove it, it doesn't affect anything, so is there a necessary to put a last--;?
void bubbleSort(int array[], int size)
{
bool swap;
int temp;
int last = size - 1;
do
{
swap = false;
for (int count = 0; count < last; count++)
{
if (array[count] > array[count + 1])
{
temp = array[count];
array[count] = array[count + 1];
array[count + 1] = temp;
swap = true;
}
}
last--;
} while (swap != false);
}

I have tried to remove it, it doesn't affect anything,
Well, have you tested performance?
Try a huge array and measure the time it takes to sort with and without that line.
The line makes sure that the inner loop doesn't visit numbers that already have been sorted.
If you delete the line, the inner loop will iterate size times every time. In worst case that will give size x size iterations.
With the line, the inner loop will iterate size times first, then size-1, then size-2... In worst case that will give size x (size-1) / 2 iterations, i.e. aprox. half the iterations and thereby better performance.

The for loop of your algorithm loops array elements and swaps adjacent elements if a[i]>a[i+1]. After one pass of this loop, the last element passed over must surely be the largest of all elements looped over, it has been bubbled up. So, in the next round of the while loop, the for loop does not have to consider this element again. This is ensured by last--.
If you remove that line, the algorithm will work, but will do twice as many comparisons, all of which are unnecessary.

It is merely an optimization of the algorithm.
After each pass, the far part the array becomes sorted. So we don't need to check it anymore. Since the for loop is based on limit, adding limit--; just contracts the loop.
I have tried to remove it, it doesn't affect anything, so is there a necessary to put a last--;?
No, it is not for the algorithm to work. It will work just as happily without it. It just purely an optimization that will have an impact on performance, especially with larger arrays.

How to convert a simple computer algorithm into a mathematical function in order to determine the big o notation?

In my University we are learning Big O Notation. However, one question that I have in light of big o notation is, how do you convert a simple computer algorithm, say for example, a linear searching algorithm, into a mathematical function, say for example 2n^2 + 1?
Here is a simple and non-robust linear searching algorithm that I have written in c++11. Note: I have disregarded all header files (iostream) and function parameters just for simplicity. I will just be using basic operators, loops, and data types in order to show the algorithm.
int array[5] = {1,2,3,4,5};
// Variable to hold the value we are searching for
int searchValue;
// Ask the user to enter a search value
cout << "Enter a search value: ";
cin >> searchValue;
// Create a loop to traverse through each element of the array and find
// the search value
for (int i = 0; i < 5; i++)
{
if (searchValue == array[i])
{
cout << "Search Value Found!" << endl;
}
else
// If S.V. not found then print out a message
cout << "Sorry... Search Value not found" << endl;
In conclusion, how do you translate an algorithm into a mathematical function so that we can analyze how efficient an algorithm really is using big o notation? Thanks world.

First, be aware that it's not always possible to analyze the time complexity of an algorithm, there are some where we do not know their complexity, so we have to rely on experimental data.
All of the methods imply to count the number of operations done. So first, we have to define the cost of basic operations like assignation, memory allocation, control structures (if, else, for, ...). Some values I will use (working with different models can provide different values):
Assignation takes constant time (ex: int i = 0;)
Basic operations take constant time (+ - * ∕)
Memory allocation is proportional to the memory allocated: allocating an array of n elements takes linear time.
Conditions take constant time (if, else, else if)
Loops take time proportional to the number of time the code is ran.
Basic analysis
The basic analysis of a piece of code is: count the number of operations for each line. Sum those cost. Done.
int i = 1;
i = i*2;
System.out.println(i);
For this, there is one operation on line 1, one on line 2 and one on line 3. Those operations are constant: This is O(1).
for(int i = 0; i < N; i++) {
System.out.println(i);
}
For a loop, count the number of operations inside the loop and multiply by the number of times the loop is ran. There is one operation on the inside which takes constant time. This is ran n times -> Complexity is n * 1 -> O(n).
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
System.out.println(i+j);
}
}
This one is more tricky because the second loop starts its iteration based on i. Line 3 does 2 operations (addition + print) which take constant time, so it takes constant time. Now, how much time line 3 is ran depends on the value of i. Enumerate the cases:
When i = 0, j goes from 0 to N so line 3 is ran N times.
When i = 1, j goes from 1 to N so line 3 is ran N-1 times.
...
Now, summing all this we have to evaluate N + N-1 + N-2 + ... + 2 + 1. The result of the sum is N*(N+1)/2 which is quadratic, so complexity is O(n^2).
And that's how it works for many cases: count the number of operations, sum all of them, get the result.
Amortized time
An important notion in complexity theory is amortized time. Let's take this example: running operation() n times:
for (int i = 0; i < N; i++) {
operation();
}
If one says that operation takes amortized constant time, it means that running n operations took linear time, even though one particular operation may have taken linear time.
Imagine you have an empty array of 1000 elements. Now, insert 1000 elements into it. Easy as pie, every insertion took constant time. And now, insert another element. For that, you have to create a new array (bigger), copy the data from the old array into the new one, and insert the element 1001. The 1000 first insertions took constant time, the last one took linear time. In this case, we say that all insertions took amortized constant time because the cost of that last insertion was amortized by the others.
Make assumptions
In some other cases, getting the number of operations require to make hypothesises. A perfect example for this is insertion sort, because it is simple and it's running time depends of how is the data ordered.
First, we have to make some more assumptions. Sorting involves two elementary operations, that is comparing two elements and swapping two elements. Here I will consider both of them to take constant time. Here is the algorithm where we want to sort array a:
for (int i = 0; i < a.length; i++) {
int j = i;
while (j > 0 && a[j] < a[j-1]) {
swap(a, i, j);
j--;
}
}
First loop is easy. No matter what happens inside, it will run n times. So the running time of the algorithm is at least linear. Now, to evaluate the second loop we have to make assumptions about how the array is ordered. Usually, we try to define the best-case, worst-case and average case running time.
Best-case: We do never enter the while loop. Is this possible ? Yes. If a is a sorted array, then a[j] > a[j-1] no matter what j is. Thus, we never enter the second loop. So, what operations are done in this case is the assignation on line 2 and the evaluation of the condition on line 3. Both take constant time. Because of the first loop, those operations are ran n times. Then in the best case, insertion sort is linear.
Worst-case: We leave the while loop only when we reach the beginning of the array. That is, we swap every element all the way to the 0 index, for every element in the array. It corresponds to an array sorted in reverse order. In this case, we end up with the first element being swapped 0 times, element 2 is swapped 1 times, element 3 is swapped 2 times, etc up to element n being swapped n-1 times. We already know the result of this: worst-case insertion is quadratic.
Average case: For the average case, we assume the items are randomly distributed inside the array. If you're interested in the maths, it involves probabilities and you can find the proof in many places. Result is quadratic.
Conclusion
Those were basics about analyzing the time complexity of an algorithm. The cases were easy, but there are some algorithms which aren't as nice. For example, you can look at the complexity of the pairing heap data structure which is much more complex.

Insertion Sort Optimization

I'm trying to practice making some different sort functions and the insertion function that I came up with is giving me some trouble. I can sort lists that are less than 30K fairly quickly. But I have a list of 100K integers and it literally takes 15 minutes for the function to complete the sort. Everything is sorted correctly, but I don't believe it should take that long.
Am I missing something with my code that is making it take so long? Many thanks in advance.
void Sort::insertion_Sort(vector <int> v)
{
int vecSize = v.size();
//for loop to advance through the vector
for (int i=0; i < vecSize; i++)
{
//delcare some variables
int cursor = i;
int inputCursor = i-1;
int temp = v[cursor];
//check to see if we are considering only a single element
if (cursor > 0)
{
//if there is more than 1 element, then we test the following.
//1. is the cursor element less than the inputCursor(which
//is the previous element)
//2. is the input cursor greater than -1
while (inputCursor > -1 && v[cursor] < v[inputCursor] )
{
//if so, we swap the variables
//then move the cursors back to check
//the previous elment and see if we need to swap again.
temp = v[cursor];
v[cursor] = v[inputCursor];
v[inputCursor] = temp;
inputCursor--;
cursor--;
}
}
}
}

Insertion sort is an O(n^2) algorithm. It's slow for large inputs. It's going to take roughly 11 times longer to process a list of 100k items than a list of 30k items. For inputs larger than 20 or so, you should use something like quicksort, which is O(n*log(n)).

The O(n^2) vs O(n*log(n)) problem, as pointed out by the other answer, is the center of this problem. I would suggest a binary search algorithm, as it is more similar to the insert algorithm, and is simplier to implement. It would look for the point of insertion dividing the already inserted vector in half, and trying to see if the integer to be inserted is greater or not of the integer in the middle. Then, it will try again to split one of the half (the one on the choosen side) and so on, recursively.
I think this is the best approach without starting from scratch.

Given an array of integers, find the first integer that is unique

Given an array of integers, find the first integer that is unique.
my solution: use std::map
put integer (number as key, its index as value) to it one by one (O(n^2 lgn)), if have duplicate, remove the entry from the map (O(lg n)), after putting all numbers into the map, iterate the map and find the key with smallest index O(n).
O(n^2 lgn) because map needs to do sorting.
It is not efficient.
other better solutions?

I believe that the following would be the optimal solution, at least based on time / space complexity:
Step 1:
Store the integers in a hash map, which holds the integer as a key and the count of the number of times it appears as the value. This is generally an O(n) operation and the insertion / updating of elements in the hash table should be constant time, on the average. If an integer is found to appear more than twice, you really don't have to increment the usage count further (if you don't want to).
Step 2:
Perform a second pass over the integers. Look each up in the hash map and the first one with an appearance count of one is the one you were looking for (i.e., the first single appearing integer). This is also O(n), making the entire process O(n).
Some possible optimizations for special cases:
Optimization A: It may be possible to use a simple array instead of a hash table. This guarantees O(1) even in the worst case for counting the number of occurrences of a particular integer as well as the lookup of its appearance count. Also, this enhances real time performance, since the hash algorithm does not need to be executed. There may be a hit due to potentially poorer locality of reference (i.e., a larger sparse table vs. the hash table implementation with a reasonable load factor). However, this would be for very special cases of integer orderings and may be mitigated by the hash table's hash function producing pseudorandom bucket placements based on the incoming integers (i.e., poor locality of reference to begin with).
Each byte in the array would represent the count (up to 255) for the integer represented by the index of that byte. This would only be possible if the difference between the lowest integer and the highest (i.e., the cardinality of the domain of valid integers) was small enough such that this array would fit into memory. The index in the array of a particular integer would be its value minus the smallest integer present in the data set.
For example on modern hardware with a 64-bit OS, it is quite conceivable that a 4GB array can be allocated which can handle the entire domain of 32-bit integers. Even larger arrays are conceivable with sufficient memory.
The smallest and largest integers would have to be known before processing, or another linear pass through the data using the minmax algorithm to find out this information would be required.
Optimization B: You could optimize Optimization A further, by using at most 2 bits per integer (One bit indicates presence and the other indicates multiplicity). This would allow for the representation of four integers per byte, extending the array implementation to handle a larger domain of integers for a given amount of available memory. More bit games could be played here to compress the representation further, but they would only support special cases of data coming in and therefore cannot be recommended for the still mostly general case.

All this for no reason. Just using 2 for-loops & a variable would give you a simple O(n^2) algo.
If you are taking all the trouble of using a hash map, then it might as well be what #Micheal Goldshteyn suggests
UPDATE: I know this question is 1 year old. But was looking through the questions I answered and came across this. Thought there is a better solution than using a hashtable.
When we say unique, we will have a pattern. Eg: [5, 5, 66, 66, 7, 1, 1, 77]. In this lets have moving window of 3. first consider (5,5,66). we can easily estab. that there is duplicate here. So move the window by 1 element so we get (5,66,66). Same here. move to next (66,66,7). Again dups here. next (66,7,1). No dups here! take the middle element as this has to be the first unique in the set. The left element belongs to the dup so could 1. Hence 7 is the first unique element.
space: O(1)
time: O(n) * O(m^2) = O(n) * 9 ≈ O(n)

Inserting to a map is O(log n) not O(n log n) so inserting n keys will be n log n. also its better to use set.

Although it's O(n^2), the following has small coefficients, isn't too bad on the cache, and uses memmem() which is fast.
for(int x=0;x<len-1;x++)
if(memmem(&array[x+1], sizeof(int)*(len-(x+1)), array[x], sizeof(int))==NULL &&
memmem(&array[x+1], sizeof(int)*(x-1), array[x], sizeof(int))==NULL)
return array[x];

public static string firstUnique(int[] input)
{
int size = input.Length;
bool[] dupIndex = new bool[size];
for (int i = 0; i < size; ++i)
{
if (dupIndex[i])
{
continue;
}
else if (i == size - 1)
{
return input[i].ToString();
}
for (int j = i + 1; j < size; ++j)
{
if (input[i]==input[j])
{
dupIndex[j] = true;
break;
}
else if (j == size - 1)
{
return input[i].ToString();
}
}
}
return "No unique element";
}

#user3612419
Solution given you is good with some what close to O(N*N2) but further optimization in same code is possible I just added two-3 lines that you missed.
public static string firstUnique(int[] input)
{
int size = input.Length;
bool[] dupIndex = new bool[size];
for (int i = 0; i < size; ++i)
{
if (dupIndex[i])
{
continue;
}
else if (i == size - 1)
{
return input[i].ToString();
}
for (int j = i + 1; j < size; ++j)
{
if(dupIndex[j]==true)
{
continue;
}
if (input[i]==input[j])
{
dupIndex[j] = true;
dupIndex[i] = true;
break;
}
else if (j == size - 1)
{
return input[i].ToString();
}
}
}
return "No unique element";
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js