Is there a more efficient way to do this algorithm? - c++

To the best of my knowledge, this algorithm will search correctly and turn out true when it needs too. In class we are talking about Big O analysis so this assignment is to show how the recursive search is faster than an iterative search. The point is to search for a number such that A[i] = i (find an index that is the same as the number stored at the index). This algorithm versus an iterative one only varies by about 100 nanoseconds, but sometimes the iterative one is faster. I set up the vector in main using the rand() function. I run the two algorithms a million times and record the times. The question I am asking is, is this algorithm as efficient as possible or is there a better way to do it?
bool recursiveSearch(vector<int> &myList, int beginning, int end)
int mid = (beginning + end) / 2;
if (myList[beginning] == beginning) //check if the vector at "beginning" is
{ //equal to the value of "beginning"
return true;
else if (beginning == end) //when this is true, the recursive loop ends.
{ //when passed into the method: end = size - 1
return false;
return (recursiveSearch(myList, beginning, mid) || recursiveSearch(myList, mid + 1, end));
Edit: The list is pre-ordered before being passed in and a check is done in main to make sure that beginning and the end both exist

One possible "improvement" would be to not copy the vector in each recursion by passing a reference:
bool recursiveSearch(const vector<int>& myList, int beginning, int end)

Unless you know something special about the ordering of the data, there is absolutely no advantage to performing a partitioned search like this.
Indeed, your code is actually [trying] to do a linear search, so it is actually implementing a simple for loop with the cost of a lot of stack and overhead.
Note that there is a weirdness in your code: If the first element doesn't match, you will call recursiveSearch(myList, beginning /*=0*/, mid). Since we already know that element 0 doesn't match, you're going to subdivide again, but only after re-testing the element.
So given a vector of 6 elements that has no matches, you're going to call:
recursiveSearch(myList, 0, 6);
-> < recursiveSearch(myList, 0, 3) || recursiveSearch(myList, 4, 6); >
-> < recursiveSearch(myList, 0, 1) || recursiveSearch(2, 3) > < recursiveSearch(myList, 4, 5); || recursiveSearch(myList, 5, 6); >
-> < recursiveSearch(myList, 0, 0) || recursiveSearch(myList, 1, 1) > < recursiveSearch(myList, 2, 2) || recursiveSearch(myList, 3, 3) > ...
In the end, you're failing on a given index because you reached the condition where begin and end were both that value, that seems an expensive way of eliminating each node, and the end-result is not a partitioned search, it a simple linear search, you just use a lot of stack-depth to get there.
So, a simpler and faster way to do this would be:
for (size_t i = beginning; i < end; ++i) {
if (myList[i] != i)
return i;
Since we're trying to optimize here, it's worth pointing out that MSVC, GCC and Clang all assume that if expresses the likely case, so I'm optimizing here for the degenerate case where we have a large vector with no or late matches. In the case where we get lucky and we find a result early, then we're willing to pay the cost of a potential branch miss because we're leaving. I realize that the branch cache will soon figure this out for us, but again - optimizing ;-P
As others have pointed out, you could also benefit from not passing the vector by value (forcing a copy)
const std::vector<int>& myList

An obvious "improvement" would be to run threads on all the remaining cores. Simply divvy up the vector into number of cores - 1 pieces and use a condition variable to signal the main thread when found.

If you need to find an element in an unsorted array such that A[i] == i, then the only way to do it is to go through every element until you find one.
The simplest way to do this is like so:
bool find_index_matching_value(const std::vector<int>& v)
for (int i=0; i < v.size(); i++) {
if (v[i] == i)
return true;
return false; // no such element
This is O(n), and you're not going to be able to do any better than that algorithmically. So we have to turn our attention to micro-optimisations.
In general, I would be quite astonished if on modern machines, your recursive solution is faster in general than the simple solution above. While the compiler will (possibly) be able to remove the extra function call overhead (effectively turning your recursive solution into an iterative one), running through the array in order (as above) allows for optimal use of the cache, whereas, for large arrays, your partitioned search will not.


What is the time complexity of below program?

Below is the program which find the length of the longest substring without repeating characters, given a string str. (details)
int test(string str) {
int left = 0, right = 0, ans = 0;
unordered_set<char> set;
while(left < str.size() and right < str.size()) {
if(set.find(str[right]) == set.end()) set.insert(str[right]);
else {
while(str[left] != str[right]){
ans = (ans > set.size() ? ans : set.size());
return ans;
What is the time complexity of above solution? Is it O(n^2) or O(n) where n is the length of string?
Please note that I have gone through multiple questions on internet and also read about big oh but I am still confused. To me, it looks like O(n^2) complexity due to two while loops but I want to confirm from experts here.
It's O(n) on average.
What you see here is a sliding window technique (with variable window size, also called "two pointers technique").
Yes there are two loops, but if you look, any iteration of any of the two loops will always increase one of the pointers (either left or right).
In the first loop, either you call the second loop or you don't, but you will increase right at each iteration. The second loop always increases left.
Both left and right can have n different values (because both loops would stop when either right >= n or left == right).
So the first loop will have n executions (all the values of right from 0 to n-1) and the second loop can have at most n executions (all the possible values of left), which is a worst case of 2n = O(n) executions.
Worst case complexity
For the sake of completeness, please note that I wrote O(n) on average. The reason is that set.find has a complexity of O(1) in average but O(n) in the worst case. Same goes for set.erase. The reason is that unordered_set is implemented with a hash table and it the very unlikely case of all your items being in the same bucket, it needs to iterate on all the items.
So even though we have O(n) iterations of the loop, some iterations could be O(n). It means that in some very unlikely cases, the execution could go up to O(n^2). You shouldn't really worry about it as the probability of this to happen is close to 0, and even though I don't exactly know what the hashing technique for char in C++, I would bet that we will never end up with all characters in the same bucket.

Optimizing this 'statistical coincidence' finding algorithm

The code below is designed to take in a vector<vector<float> > of random numbers from a Gaussian distribution, and perform the following:
Iterate simultaneously through all n columns of the vector until you encounter the first value such exceeding some threshold.
Continue iterating until either a) you encounter a second value exceeding that threshold such that that value comes from a different column that the first found value, or b) you exceed some maximum number of iterations.
In the case of a), continue iterating until either c) you find a third value exceeding the threshold such that the value comes from a different column than the first found value and the second found value, or b) you exceed some maximum number of iterations from the first found value. In the case of b) start over again, except this time start iterating at one row after the first found value.
In the case of c), add one to a counter, and jump forward some x rows. In the case of d), start over, except this time start iterating at one row after the first found value.
How I accomplish this:
In my opinion, the most challenging part is making sure all three values are contributed by a unique column. To tackle this, I used std::set. I iterate through each row of the vector<vector<float> >, then iterate through each column of that row. I check each column for a value exceeding the threshold, and store it's columnar number in an std::set.
I continue iterating. If I reach max_iterations, I jump back to one after the first-found value, empty the set, and reset the counter. If the std::set has size 3, I add one to the counter.
My issue:
This code will need to run on multidimensional vectors of sizes on the order of tens of columns and hundreds of thousands to millions of rows. As of now, that's excruciatingly slow. I'd like to improve performance significantly, if possible.
My code:
void findRate(float thresholdVolts){
set<size_t> cache;
vector<size_t> index;
size_t count = 0, found = 0;
for(auto rowItr = waveform.begin(); rowItr != waveform.end(); ++rowItr){
auto &row = *rowItr;
for(auto colnItr = row.begin(); colnItr != row.end(); ++colnItr){
auto &cell = *colnItr;
if(abs(cell/rmsVoltage) >= (thresholdVolts/rmsVoltage)){
cache.insert(std::distance(row.begin(), colnItr));
index.push_back(std::distance(row.begin(), colnItr));
if(cache.size() == 0) count == 0;
if(cache.size() == 3){
if(std::distance(rowItr, output.end()) > ((4000 - count) + 4E+6)){
std::advance(rowItr, ((4000 - count) + 4E+6));
One thing you could do right away, in your inner loop. I understand that rmsVoltage is an external variable that is constant durng execution of the function.
for(auto colnItr = row.begin(); colnItr != row.end(); ++colnItr){
auto &cell = *colnItr;
// you can remove 2 divisions here. Divisions are the slowest
// arithmetic instructions on any cpu
// this:
// if(abs(cell/rmsVoltage) >= (thresholdVolts/rmsVoltage)){
// becomes this
if (abs(cell) >= thresholdVolts) {
cache.insert(std::distance(row.begin(), colnItr));
index.push_back(std::distance(row.begin(), colnItr));
And a bit below: why are you adding a floating point constant to a size_t ??
This may cause unnecessary conversions of size_t to double and then back to size_t, some compilers may hande this, but definitely not all.
These are relatively costly operations.
// this:
// if(std::distance(rowItr, output.end()) > ((4000 - count) + 4E+6)){
// std::advance(rowItr, ((4000 - count) + 4E+6));
// }
if (std::distance(rowItr, output.end()) > (4'004'000 - count))
std::advance(rowItr, 4'004'000 - count);
Also, after observing the needs in memory for your function, you should preallocate some reasonable space for containers cache and index, using vector<>::reserve(), and set<>::reserve().
Did you give us the entire algorithm? The contents of container index are not used anywhere.
Please let me know how much time you've gained with these changes.

Sorting Optimization

I'm currently following an algorithms class and thus decided it would be good practice to implement a few of the sorting algorithms and compare them.
I implemented merge sort and quick sort and then compared their run time, along with the std::sort:
My computer isn't the fastest but for 1000000 elements I get on average after 200 attempts:
std::sort -> 0.620342 seconds
quickSort -> 2.2692
mergeSort -> 2.19048
I would like to ask if possible for comments on how to improve and optimize the implementation of my code.
void quickSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
if(s >= e)
int pivot;
int a = s + (rand() % (e-s));
int b = s + (rand() % (e-s));
int c = s + (rand() % (e-s));
//find median of the 3 random pivots
int min = std::min(std::min(nums[a],nums[b]),nums[c]);
int max = std::max(std::max(nums[a],nums[b]),nums[c]);
if(nums[a] < max && nums[a] > min)
pivot = a;
else if(nums[b] < max && nums[b] > min)
pivot = b;
pivot = c;
int temp = nums[s];
nums[s] = nums[pivot];
nums[pivot] = temp;
int i = s + 1, j = s + 1;
for(; j < e; j++){
if(comparator(nums[j] , nums[s])){
temp = nums[i];
nums[i++] = nums[j];
nums[j] = temp;
temp = nums[i-1];
nums[i-1] = nums[s];
nums[s] = temp;
//sort left and right of partition
Here s is the index of the first element, e the index of the element after the last. defaultComparator is just the following lambda function:
auto defaultComparator = [](int a, int b){ return a <= b; };
std::vector<int> mergeSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
std::vector<int> sorted(e-s);
if(s == e)
return sorted;
int mid = (s+e)/2;
if(s == mid){
sorted[0] = nums[s];
return sorted;
std::vector<int> left = mergeSort(nums, s, mid);
std::vector<int> right = mergeSort(nums, mid, e);
unsigned int i = 0, j = 0;
unsigned int c = 0;
while(i < left.size() || j < right.size()){
if(i == left.size()){
sorted[c++] = right[j++];
else if(j == right.size()){
sorted[c++] = left[i++];
sorted[c++] = left[i++];
sorted[c++] = right[j++];
return sorted;
Thank you all
The first thing I see, you're passing a std::function<> which involves a virtual call, one of the most expensive calling strategies. Give it a try with simply a template T (which might be a function) - the result will be direct function calls.
Second thing, never do this result-in-local-container (vector<int> sorted;) when optimizing and when in-place variant exists. Do in-place sort. Client should be aware of you shorting their vector; if they wish, they can make a copy in advance. You take non-const reference for a reason. [1]
Third, there's a cost associated with rand() and it's far from negligible. Unless you're sure you need the randomized variant of quicksort() (and its benefits regarding 'no too bad sequence'), use just the first element as pivot. Or the middle.
Use std::swap() to swap two elements. Chances are, it gets translated to xchg (on x86 / x64) or an equivalent, which is hard to beat. Whether the optimizer identifies your intend to swap at these places without being explicit could be verified from assembly output.
The way you found the median of three elements is full of conditional moves / branches. It's simply nums[a] + nums[b] + nums[c] - max - min; but getting nums[...], min and max at the same time could also be optimized further.
Avoid i++ when aiming at speed. While most optimizers will usually create good code, there's a small chance that it's suboptimal. Be explicit when optimizing (++i after the swap), but _only_when_optimizing_.
But the most important one: valgrind/callgrind/kcachegrind. Profile, profile, profile. Only optimize what's really slow.
[1] There's an exception to this rule: const containers that you build from non-const ones. These are usually in-house types and are shared across multiple threads, hence it's better to keep them const & copy when modification is needed. In this case, you'll allocate a new container (either const or not) in your function, but you'll probably keep const one for users' convenience on API.
For quick sort, use Hoare like partition scheme.
Median of 3 only needs 3 if / swap statements (effectively a bubble sort). No need for min or max check.
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
if(nums[b] > nums[c])
std::swap(nums[b], nums[c]);
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
// use nums[b] as pivot value
For merge sort, use an entry function that does a one time creation of a working vector, then pass that vector by reference to the actual merge sort function. For top down merge sort, the indices determine the start, middle, and end of each sub-vector.
If using top down merge sort, the code can avoid copying data by alternating the direction of merge depending on the level of recursion. This can be done using two mutually recursive functions, the first one where the result ends up in the original vector, the second one where the result ends up in the working vector. The first one calls the second one twice, then merges from the working vector back to the original vector, and vice versa for the second one. For the second one, if the size == 1, then it needs to copy 1 element from the original vector to the working vector. An alternative to two functions is to pass a boolean for which direction to merge.
If using bottom up merge sort (which will be a bit faster), then each pass swaps vectors. The number of passes needed is determined up front and in the case of an odd number of passes, the first pass swaps in place, so that the data ends up in the original vector after all merge passes are done.

find duplicate number in an array

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
if (nums.size() > 1)
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
slow = nums[slow];
fast = nums[nums[fast]];
fast = 0;
while (fast != slow)
fast = nums[fast];
slow = nums[slow];
return slow;
return -1;
Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
int slow = arr[0];
int fast = arr[arr[0]];
slow = arr[slow];
fast = arr[arr[fast]];
fast = 0;
slow = arr[slow];
fast = arr[fast];
return slow;
return -1;
int main() {
vector<int>v = {1,2,2,3,4};
return 0;
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].
The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.
Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.
Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
for(i = 0; i < size; i++)
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
cout<< abs(arr[i]) <<endl;
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
for(int i = 0; i < n; ++i)
for(int j = 0; j < n; ++j)
if(arr[i] == - arr[j])
return true;
return false;
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).