Copy another vector for a smaller size - c++

I have 2 vectors. I want the second vector to copy the first vector for the size of n which is less than the length of the first vector. (the second vector length should be n too)
I tried doing this by a loop:
for (int i = 0; i < n; ++i)
{
//secVector[i] will equal firstVector[i] and n is less than fristVector length
}
but the time complexity of this is O(n) and it takes a lot of time in large lengths, I wonder if there is any function could do this faster.

This cannot be done with std vector.
There are immutable vectors where this can be done in logarithmic time, such as https://sinusoid.es/immer/ - this uses wide B trees and copy on write to give near-vector performance with O(1) copy and O(lg n) slice.
Such structures are considered exotic.

Related

Can i use the normal min heap method for solving "Merge k sorted array"

we have been given k sorted arrays. Lets say k =3
a1={1,4,7} a2={3,5} a3={2,6,7} now we are supposed to merge these 3 arrays in sorted order. Hence the output will be {1,2,3,4,5,6,7,7}.
Now in the tutorial that i am following they have maintained an index and used pairs to solve this question using min heaps.
But my question is that since min heaps stores the elements in sorted order so can we just simply use the push function of min heap for all the elements from k arrays and then at the end printing the min heap?? instead of keeping index and making pairs? in c++?
Sure, but that's slow. You are throwing away the work that has already gone into the input arrays (that they are already sorted) and basically making the sorted array from the unsorted collection of all the elements. Concretely, if all the input arrays have average length n, then you perform k*n inserts into the heap, and then you extract the minimum k*n times. The heap operations have complexity O(log(k*n)). So the entire algorithm takes O(k*n*log(k*n)) time, which you may recognize as the time it takes to sort an unsorted array of size k*n. Surely there's a better way, because you know the input arrays are sorted.
I presume the given solution is to construct k "iterators" into the arrays, place them into a heap sorted by the value at each iterator, and then repeatedly remove the least iterator, consume its value, increment it, and place it back in the heap. The key is that the heap (which is where all the work is happening) is smaller: it contains only k elements instead of k*n. This makes every operation on the heap faster: now the heap operations in this algorithm are O(log k). The overall algorithm is now O(k*n*log k), an improvement.
I think this Algorithm is what you are looking for
Algorithm:
Create a min Heap and insert the first element of all k arrays. Run a
loop until the size of MinHeap is greater than zero. Remove the top
element of the MinHeap and print the element. Now insert the next
element from the same array in which the removed element belonged. If
the array doesn’t have any more elements, then replace root with
infinite.After replacing the root, heapify the tree.
And about time needed: O( n * k * log k), Insertion and deletion in a Min Heap requires log k time. So the Overall time compelxity is O( n * k * log k)
But my question is that since min heaps stores the elements in sorted
order so can we just simply use the push function of min heap for all
the elements from k arrays and then at the end printing the min heap??
The main recursive rule for min-heap: left and right child should be less than parent. It does not mean that left child should be less than parent of right side of tree. Attached image show min-heap. But this min-heap is not finally sorted array

Space complexity of an array of pairs

So I'm wondering what the space complexity of an array of integer pairs is?
std::pair<int,int> arr[n];
I'm thinking that, because a pair is constant and the array is n, the space complexity is O(2) * O(n) = O(2n) = O(n). Or is the space complexity O(n^2) because the array of pairs is still essentially a 2D array?
The correct Space complexity is O(n)
The fact that it superficially resembles a 2D array is immaterial: the magnitude of the second dimension is known, and as such, it remains O(n). This would also be true if the pairs were, instead, 100-element arrays. Because the dimensions of the elements (each a 100 element array) is known, the Space complexity of the structure is O(100 * n) which is O(n).
Conversely, however, if the elements were instead explicitly always the same as the size of the container as a whole, i.e. this were something like this:
int n = /*...*/;
std::vector<std::vector<int>> arr(n);
for(std::vector<int> & subarr : arr) {
subarr.resize(n);
}
Then it would indeed be O(n2) instead. Because now both dimensions are depending on an unknown quantity.
Conversely, if the second dimension were unknown but known to not be correlated to the first dimension, you'd instead express it as O(nm), i.e. an array constructed like this:
int n = /*...*/;
int m = /*...*/;
std::vector<std::vector<int>> arr(n);
for(std::vector<int> & subarr : arr) {
subarr.resize(m);
}
Now this might seem contradictory: "But Xirema, you just said that if we knew the dimensions were n X 100 elements, it would be O(n), but if we substitute 100 for m, would we not instead have a O(nm) or O(100n) space complexity?"
But like I said: we remove known quantities. O(2n) is equivalent to O(5n) because all we care about is the unknowns. Once an unknown becomes known, we no longer include it when evaluating Space Complexity.
Space complexity (and Runtime Complexity, etc.) are intended to function as abstract representations of an algorithm or data structure. We use these concepts to work out, at a high level conception, how well they scale to larger and larger inputs. Two different data structures, one requiring 100 bytes per element, another requiring 4 bytes per element squared, will not have consistent space ranks between each other when scaling from a small environment to a large environment; in a smaller environment, the latter data structure will consume less memory, and in a larger environment, the former data structure will consume less memory. Space/Runtime Order complexity is just a shorthand for expressing that relationship, without needing to get bogged down in the details or semantics. If details or semantics are what you care about, then you're not going to just use the Order of the structure/algorithm, you're going to actually test and measure those different approaches.
The space taken is n * sizeof(std::pair<int, int>) bytes. sizeof(std::pair<int, int>) is a constant, and O(n * (constant)) == O(n).
The space complexity of an array can in general be said to be:
O(<size of array> * <size of each array element>)
Here you have:
std::pair<int,int> arr[n];
So arr is an array with n elements, and each element is a std::pair<int,int>. Let's suppose an int takes 4 bytes, so a pair of two int should take 8 bytes (this numbers could be slightly different depending on the implementation, but that doesn't matter for the purposes of complexity calculation). So the complexity would be O(n * 8), which is the same as O(n), because constants do not make a difference in complexity.
When would you have something like O(n^2)? Well, you would need a multi-dimensional array. For example, something like this:
std::pair<int,int> arr[n][m];
Now arr is an array with m elements, but each element is in turn an array of n std::pair<int,int> elements. So you have O(m * <size of array of n pairs>), which is to say O(m * n * 8), that is, O(m * n). If m happens to be the same as n, then you get O(n * n), or O(n^2).
As you can imagine, the same reasoning follows for any number of array dimensions.

Radix sorting a vector of ints using a vector of int vectors

I recently tried implementing radix sort for a vector of pair of integers (where the second element is considered only when the first elements are equal). I did so by applying counting sort twice - first to the second element of the pair, and then to the first element. Here is how I implemented the counting sort at first:
//vector to be sorted (of size n).
vector<int> arr;
//arr gets filled here
//N-1 is the maximum number which can occur in the array. N was equal to n in my case
vector<vector<int> > mat(N);
for (int i=0;i<n;i++)
{
mat[arr[i]].push_back(i);
}
//array in which the sorted array will be stored
vector<int> arr2;
for (int i=0;i<N;i++)
{
for(int j=0;j<sz(mat[i]);j++) arr2.push_back(arr1[mat[i][j]]);
}
The first for loop obviously runs in O(n). Since the 'mat' array has exactly n entries, it will be accessed at most 2n times in the second (nested) loop. This implies that the above code has a time complexity of O(n), as it should have.
I then compared the running time of this code with the STL sort() (which has a time complexity of O(nlog(n))) by running both of them on an array of 10^6 elements. To my great surprise, the STL sort() ended up performing slightly better than my implementation of radix sort.
I then changed my counting sort implementation to the following:
//vector to be sorted (of size n).
vector<int> arr;
//arr gets filled here
//N-1 is the maximum number which can occur in the array. N was equal to n in my case
vector<int> temp(N,0);
for(int i=0;i<n;i++) temp[arr[i]]++;
for(int i=1;i<N;i++) temp[i]+=temp[i-1];
//array in which the sorted array will be stored
vector<int> arr2(n);
for(int i=n-1;i>=0;i--) arr2[--temp[arr[i]]]=arr[i];
This time, the radix sort did run about 5-6 times faster than the STL sort(). This observation has left me wondering why is it that the my first radix sort implementation runs so much slower than the second one, when both of them are O(n)?
You're are using a pseudo linear algorithm. It's complexity is O(M) where
M = std::max_element(arr.begin(), arr.end())
You cannot compare it to std::sort which complexity is O(N log(N)) where
N = arr.size()
The second version allocates temp once while the push_back calls in the first version can cause many allocations affecting performance.
The radix sort is a different algorithm. Check this link.

How to select a column from a row-major array in sub-linear time?

Lets say that I'm given a row major array.
int* a = (int *)malloc( 9 x 9 x sizeof(int));
Look at this as a 2D 9x9 array where a (row,column) index corresponds to [row * 9 + column]
Is there a way where I can select a single column from this array in sub-linear time?
Since the columns wont be contiguous, we cant do a direct memcpy like we do to get a single row.
The linear-time solution would be obvious I guess, but I'm hoping for some sub-linear solution.
Thanks.
It is not clear what you mean by sublinear. If you consider the 2D array as NxN size, then sublinear on N is impossible. To copy N elements you need to perform N copy operations, the copy will be linear on the number of elements being copied.
The comment about memcpy seem to indicate that you mistakenly believe that memcpy is sublinear on the number of elements being copied. It is not. The advantage of memcpy is that the constant hidden in the big-O notation is small, but the operation is linear on the size of the memory being copied.
The next question is whether the big-O analysis actually makes sense. If your array is 9x9, then the effect hidden in the constant of the big-O notation can be more important than the complexity.
I don't really get what you mean but consider:
const size_t x_sz=9;
size_t x=3, y=6; //or which ever element you wish to access
int value=a[Y*x_sz+x];
this will be a constant time O(1) expression. It must calculate the offset and load the value.
to iterate through every value in a column:
const size_t x_sz=9, y_sz=9;
size_t x=3; //or which ever column you wish to access
for(size_t y=0; y!=y_sz; ++y){
int value=a[Y*x_sz+x];
//value is current column value
}
again each iteration is constant time, the whole iteration sequence is therefore O(n) (linear), note that it would still be linear if it was contiguous.

How does one remove duplicate elements in place in an array in O(n) in C or C++?

Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.
If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.
Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.
Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.
I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...
Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.
The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.
The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)