Space complexity of an array of pairs - c++

So I'm wondering what the space complexity of an array of integer pairs is?
std::pair<int,int> arr[n];
I'm thinking that, because a pair is constant and the array is n, the space complexity is O(2) * O(n) = O(2n) = O(n). Or is the space complexity O(n^2) because the array of pairs is still essentially a 2D array?

The correct Space complexity is O(n)
The fact that it superficially resembles a 2D array is immaterial: the magnitude of the second dimension is known, and as such, it remains O(n). This would also be true if the pairs were, instead, 100-element arrays. Because the dimensions of the elements (each a 100 element array) is known, the Space complexity of the structure is O(100 * n) which is O(n).
Conversely, however, if the elements were instead explicitly always the same as the size of the container as a whole, i.e. this were something like this:
int n = /*...*/;
std::vector<std::vector<int>> arr(n);
for(std::vector<int> & subarr : arr) {
subarr.resize(n);
}
Then it would indeed be O(n2) instead. Because now both dimensions are depending on an unknown quantity.
Conversely, if the second dimension were unknown but known to not be correlated to the first dimension, you'd instead express it as O(nm), i.e. an array constructed like this:
int n = /*...*/;
int m = /*...*/;
std::vector<std::vector<int>> arr(n);
for(std::vector<int> & subarr : arr) {
subarr.resize(m);
}
Now this might seem contradictory: "But Xirema, you just said that if we knew the dimensions were n X 100 elements, it would be O(n), but if we substitute 100 for m, would we not instead have a O(nm) or O(100n) space complexity?"
But like I said: we remove known quantities. O(2n) is equivalent to O(5n) because all we care about is the unknowns. Once an unknown becomes known, we no longer include it when evaluating Space Complexity.
Space complexity (and Runtime Complexity, etc.) are intended to function as abstract representations of an algorithm or data structure. We use these concepts to work out, at a high level conception, how well they scale to larger and larger inputs. Two different data structures, one requiring 100 bytes per element, another requiring 4 bytes per element squared, will not have consistent space ranks between each other when scaling from a small environment to a large environment; in a smaller environment, the latter data structure will consume less memory, and in a larger environment, the former data structure will consume less memory. Space/Runtime Order complexity is just a shorthand for expressing that relationship, without needing to get bogged down in the details or semantics. If details or semantics are what you care about, then you're not going to just use the Order of the structure/algorithm, you're going to actually test and measure those different approaches.

The space taken is n * sizeof(std::pair<int, int>) bytes. sizeof(std::pair<int, int>) is a constant, and O(n * (constant)) == O(n).

The space complexity of an array can in general be said to be:
O(<size of array> * <size of each array element>)
Here you have:
std::pair<int,int> arr[n];
So arr is an array with n elements, and each element is a std::pair<int,int>. Let's suppose an int takes 4 bytes, so a pair of two int should take 8 bytes (this numbers could be slightly different depending on the implementation, but that doesn't matter for the purposes of complexity calculation). So the complexity would be O(n * 8), which is the same as O(n), because constants do not make a difference in complexity.
When would you have something like O(n^2)? Well, you would need a multi-dimensional array. For example, something like this:
std::pair<int,int> arr[n][m];
Now arr is an array with m elements, but each element is in turn an array of n std::pair<int,int> elements. So you have O(m * <size of array of n pairs>), which is to say O(m * n * 8), that is, O(m * n). If m happens to be the same as n, then you get O(n * n), or O(n^2).
As you can imagine, the same reasoning follows for any number of array dimensions.

Related

Copy another vector for a smaller size

I have 2 vectors. I want the second vector to copy the first vector for the size of n which is less than the length of the first vector. (the second vector length should be n too)
I tried doing this by a loop:
for (int i = 0; i < n; ++i)
{
//secVector[i] will equal firstVector[i] and n is less than fristVector length
}
but the time complexity of this is O(n) and it takes a lot of time in large lengths, I wonder if there is any function could do this faster.
This cannot be done with std vector.
There are immutable vectors where this can be done in logarithmic time, such as https://sinusoid.es/immer/ - this uses wide B trees and copy on write to give near-vector performance with O(1) copy and O(lg n) slice.
Such structures are considered exotic.

Space Complexity of an initialised pointer data structure

I have got a question.
In terms of theoretical computer science, when we analyse an algorithm, if an algorithm initialises a new data structure, then we consider that data structure as part of space complexity.
Now I am not too sure about this part then.
Let's say I have got an array of int and I would like to map them by using a map of int pointers. Such as
std::map<int*,int*> mymap;
for (int i = 1; i < arraySize; i++) {
mymap[&arr[i-1]]=&arr[i];
}
If this algorithm was not using pointers, then we could clearly state that it is initialising a map with size of n, hence space complexity is O(n), however for this case, where we are using pointers, what would be the space complexity of this algorithm?
The space complexity of a single pointer is the same as that of any other primitive - i.e. O(1).
std::map<K,V> is implemented as a tree of N nodes. Its space complexity is O(N*space-complexity-of-one-node), so the total space complexity in your case is O(N).
Note that the big-O notation factors out the constant multiplier: although the big-O space complexity of an std::map<Ptr1,Ptr2> and std::vector<Ptr1> is the same, the multiplier for the map is higher, because tree construction imposes its overhead for storing tree nodes and connections among them.

How to select a column from a row-major array in sub-linear time?

Lets say that I'm given a row major array.
int* a = (int *)malloc( 9 x 9 x sizeof(int));
Look at this as a 2D 9x9 array where a (row,column) index corresponds to [row * 9 + column]
Is there a way where I can select a single column from this array in sub-linear time?
Since the columns wont be contiguous, we cant do a direct memcpy like we do to get a single row.
The linear-time solution would be obvious I guess, but I'm hoping for some sub-linear solution.
Thanks.
It is not clear what you mean by sublinear. If you consider the 2D array as NxN size, then sublinear on N is impossible. To copy N elements you need to perform N copy operations, the copy will be linear on the number of elements being copied.
The comment about memcpy seem to indicate that you mistakenly believe that memcpy is sublinear on the number of elements being copied. It is not. The advantage of memcpy is that the constant hidden in the big-O notation is small, but the operation is linear on the size of the memory being copied.
The next question is whether the big-O analysis actually makes sense. If your array is 9x9, then the effect hidden in the constant of the big-O notation can be more important than the complexity.
I don't really get what you mean but consider:
const size_t x_sz=9;
size_t x=3, y=6; //or which ever element you wish to access
int value=a[Y*x_sz+x];
this will be a constant time O(1) expression. It must calculate the offset and load the value.
to iterate through every value in a column:
const size_t x_sz=9, y_sz=9;
size_t x=3; //or which ever column you wish to access
for(size_t y=0; y!=y_sz; ++y){
int value=a[Y*x_sz+x];
//value is current column value
}
again each iteration is constant time, the whole iteration sequence is therefore O(n) (linear), note that it would still be linear if it was contiguous.

What is the complexity of the below program?

What is the complexity of the below program? I think it must be O(n), since there is a for loop that runs for n times.
It is a program to reverse the bits in a given integer.
unsigned int reverseBits(unsigned int num)
{
unsigned int NO_OF_BITS = sizeof(num) * 8;
unsigned int reverse_num = 0;
int i;
for (i = 0; i < NO_OF_BITS; i++)
{
if((num & (1 << i)))
reverse_num |= 1 << ((NO_OF_BITS - 1) - i);
}
return reverse_num;
}
What is the complexity of the above program and how? Someone said that the actual complexity is O(log n), but I can't see why.
Considering your above program, the complexity is O(1) because 8 * sizeof(unsigned int) is a constant. Your program will always run in constant time.
However if n is bound to NO_OF_BITS and you make that number an algorithm parameter (which is not the case), then the complexity will be O(n).
Note that with n bits the maximal value possible for num is 2^n, and that in this case if you want to express the complexity as a function of the maximal value allowed for num, the complexity is O(logâ‚‚(n)) or O(log(N)).
O-notation describes how the time or space requirements for an algorithm depend on the size of the input (denoted n), in the limit as n becomes very large. The input size is the number of bits required to represent the input, not the range of values that those bits can represent.
(Formally, describing an algorithm with running time t(n) as O(f(n)) means that there is some size N and some constant C for which t(n) <= C*f(n) for all n > N).
This algorithm does a fixed amount of work for each input bit, so the time complexity is O(n). It uses a working space, reverse_num, of the same size as the input (plus some asymptotically smaller variables), so the space complexity is also O(n).
This particular implementation imposes a limit on the input size, and therefore a fixed upper bound on the time and space requirements. This does not mean that the algorithm is O(1), as some answers say. O-notation describes the algorithm, not any particular implementation, and is meaningless if you place an upper bound on the input size.
if n==num, complexity is constant O(1) as the loop always runs fixed number of times. The space complexity is also O(1) as it does not depend on the input
If n is the input number, then NO_OF_BITS is O(log n) (think about it: to represent a binary number n, you need about log2(n) bits).
EDIT: Let me clarify, in the light of other responses and comments.
First, let n be the input number (num). It's important to clarify this because if we consider n to be NO_OF_BITS instead, we get a different answer!
The algorithm is conceptually O(log n). We need to reverse the bits of n. There are O(log n) bits needed to represent the number n, and reversing the bits involves a constant amount of work for each bit; hence the complexity is O(log n).
Now, in reality, built-in types in C cannot represent integers of arbitrary size. In particular, in this implementation uses unsigned int to represent the input, and this type is limited to a fixed number of bits (32 on most systems). Moreover, rather than just going through as many bits as necessary (from the lowest-order bit to the higher-order bit which is 1), this implementation chooses to go through all 32 bits. Since 32 is a constant, this implementation technically runs in O(1) time.
Nonetheless, the algorithm in conceptually O(log n), in the sense that if the input was 2^5, 5 iterations would be sufficient, if the input was 2^10, 10 iterations would be sufficient, and if there were no limit on the range of numbers an unsinged int would represent and the input was 2^1000, then 1000 iterations would be necessary.
Under no circumstances is this algorithm O(n) (unless we define n to be NO_OF_BITS, in which case it is).
You need to be clear what n is. If n is num then of course your code is O(log n) as NO_OF_BITS ~= log_2(n) * 8.
Also, as you are dealing with fixed size values, the whole thing is O(1). Of course, if you are viewing this as a more general concept and are likely to extend it, then feel free to think of it as O(log n) in the more general context where you intend to extend it beyond fixed bit numbers.

How does one remove duplicate elements in place in an array in O(n) in C or C++?

Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.
If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.
Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.
Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.
I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...
Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.
The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.
The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)