How do you fill with 0 a dynamic matrix, in C++? I mean, without:
for(int i=0;i<n;i++)for(int j=0;j<n;j++)a[i][j]=0;
I need it in O(n), not O(n*m) or O(n^2).

For the specific case where your array is going to to be large and sparse and you want to zero it at allocation time then you can get some benefit from using calloc - on most platforms this will result in lazy allocation with zero pages, e.g.
int **a = malloc(n * sizeof(a[0]); // allocate row pointers
int *b = calloc(n * n, sizeof(b[0]); // allocate n x n array (zeroed)
a[0] = b; // initialise row pointers
for (int i = 1; i < n; ++i)
a[i] = a[i - 1] + n;
Note that this is, of course, premature optimisation. It is also C-style coding rather than C++. You should only use this optimisation if you have established that performance is a bottleneck in your application and there is no better solution.

From your code:
for(int i=0;i<n;i++)for(int j=0;j<n;j++)a[i][j]=0;
I assume, that your matrix is two dimensional array declared as either
int matrix[a][b];
int** matrix;
In first case, change this for loop to a single call to memset():
memset(matrix, 0, sizeof(int) * a * b);
In second case, you will to do it this way:
for(int n = 0; n < a; ++n)
memset(matrix[n], 0, sizeof(int) * b);
On most platforms, a call to memset() will be replaced with proper compiler intrinsic.

every nested loop is not considered as O(n2)
the following code is a O(n),
No 1
for(int i=0;i<n;i++)for(int j=0;j<n;j++)a[i][j]=0;
imagine that you had all of the cells in matrix a copied into a one dimentional flat array and set zero for all of its elements by just one loop, what would be the order then? ofcouse you will say thats a O(n)
No 2 for(int i=0;i<n*m;i++) b[i]=0;
Now lets compare them, No 2 with No 1, ask the following questions from yourselves :
Does this code traverse matrix a cells more than once?
If I can measure the time will there be a difference?
Both answers are NO.
Both codes are O(n), A multi-tier nested loop on a multi-dimentional array produces a O(n) order.


Sort array of n elements which has k sorted sections

What is the best way to sort an section-wise sorted array as depicted in the second image?
The problem is performing a quick-sort using Message Passing Interface. The solution is performing quick-sort on array sections obtained by using MPI_Scatter() then joining the sorted
pieces using MPI_Gather().
Problem is that the array as a whole is unsorted but sections of it are.
Merging the sub-sections similarly to this solution seems like the best way of sorting the array, but considering that the sub-arrays are already within a single array other sorting algorithms may prove better.
The inputs for a sort function would be the array, it's length and the number of equally sorted sub-sections.
A signature would look something like int* sort(int* array, int length, int sections);
The sections parameter can have any value between 1 and 25. The length parameter value is greater than 0, a multiple of sections and smaller than 2^32.
This is what I am currently using:
int* merge(int* input, int length, int sections)
int* sub_sections_indices = new int[sections];
int* result = new int[length];
int section_size = length / sections;
for (int i = 0; i < sections; i++) //initialisation
sub_sections_indices[i] = 0;
int min, min_index, current_index;
for (int i = 0; i < length; i++) //merging
min_index = 0;
min = INT_MAX;
for (int j = 0; j < sections; j++)
if (sub_sections_indices[j] < section_size)
current_index = j * section_size + sub_sections_indices[j];
if (input[current_index] < min)
min = input[current_index];
min_index = j;
result[i] = min;
return result;
Optimizing for performance
I think this answer that maintains a min-heap of the smallest item of each sub-array is the best way to handle arbitrary input. However, for small values of k, think somewhere between 10 and 100, it might be faster to implement the more naive solutions given in the question you linked to; while maintaining the min-heap is only O(log n) for each step, it might have a higher overhead for small values of n than the simple linear scan from the naive solutions.
All these solutions create a copy of the input, and they maintain O(k) state.
Optimizing for space
The only way to save space I see is to sort in-place. This will be a problem for the algorithms mentioned above. An in-place algorithm will have two swap elements, but any swaps will likely destroy the property that each sub-array is sorted, unless the larger of the swapped pair is re-sorted into the sub-array it is being swapped to, which will result in an O(nĀ²) algorithm. So if you really do need to conserve memory, I think a regular in-place sorting algorithm would have to be used, which defeats your purpose.

Initializing large 2-dimensional array to all one value in C++

I want to initialize a large 2-dimensional array (say 1000x1000, though I'd like to go even larger) to all -1 in C++.
If my array were 1-dimensional, I know I could do:
int my_array[1000];
memset(my_array, -1, sizeof(my_array));
However, memset does not allow for initializing all the elements of an array to be another array. I know I could just make a 1-dimensional array of length 1000000, but for readability's sake I would prefer a 2-dimensional array. I could also just loop through the 2-dimensional array to set the values after initializing it to all 0, but this bit of code will be run many times in my program and I'm not sure how fast that would be. What's the best way of achieving this?
Edited to add minimal reproducible example:
int my_array[1000][1000];
// I want my_array[i][j] to be -1 for each i, j
I am a little bit surprised.
And I know, it is C++. And, I would never use plain C-Style arrays.
And therefore the accepted answer is maybe the best.
But, if we come back to the question
int my_array[1000];
memset(my_array, -1, sizeof(my_array));
int my_array[1000][1000];
// I want my_array[i][j] to be -1 for each i, j
Then the easiest and fastest solution is the same as the original assumption:
int my_array[1000][1000];
memset(my_array, -1, sizeof(my_array));
There is no difference. The compiler will even optimze this away and use fast assembler loop instructions.
Sizeof is smart enough. It will do the trick. And the memory is contiguous: So, it will work. Fast. Easy.
(A good compiler will do the same optimizations for the other solutions).
Please consider.
With GNU GCC you can:
int my_array[1000][1000] = { [0 .. 999] = { [0 .. 999] = -1, }, };
With any other compiler you need to:
int my_array[1000][1000] = { { -1, -1, -1, .. repeat -1 1000 times }, ... repeat { } 1000 times ... };
Side note: The following is doing assignment, not initialization:
int my_array[1000][1000];
for (auto&& i : my_array)
for (auto&& j : i)
j = -1;
Is there any real difference between doing what you wrote and doing for(int i=0; i<1000; i++){ for(int j=0; j<1000; j++){ my_array[i][j]=-1; } }?
It depends. If you have a bad compiler, you compile without optimization, etc., then yes. Most probably, no. Anyway, don't use indexes. I believe the range based for loop in this case roughly translates to something like this:
for (int (*i)[1000] = my_array; i < my_array + 1000; ++i)
for (int *j = *i; j < *i + 1000; ++j)
*j = -1;
Side note: Ach! It hurts to calculate my_array + 1000 and *i + 1000 each loop. That's like 3 operations done each loop. This cpu time wasted! It can be easily optimized to:
for (int (*i)[1000] = my_array, (*max_i)[1000] = my_array + 10000; i < max_i; ++i)
for (int *j = *i, *max_j = *i + 1000; j < max_j; ++j)
*j = -1;
The my_array[i][j] used in your loop, translates into *(*(my_array + i) + j) (see aarray subscript operator). That from pointer arithmetics is equal to *(*((uintptr_t)my_array + i * sizeof(int**)) + j * sizeof(int*)). Counting operations, my_array[i][j] is behind the scenes doing multiplication, addition, dereference, multiplication, addition, derefence - like six operations. (When using bad or non-optimizing compiler), your version could be way slower.
That said, a good compiler should optimize each version to the same code, as shown here.
And are either of these significantly slower than just initializing it explicitly by typing a million -1's?
I believe assigning each array element (in this particular case of elements having the easy to optimize type int) will be as fast or slower then initialization. It really depends on your particular compiler and on your architecture. A bad compiler can do very slow version of iterating over array elements, so it would take forever. On the other hand a static initialization can embed the values in your program, so your program size will increase by sizeof(int) * 1000 * 1000, and during program startup is will do plain memcpy when initializing static regions for your program. So, when compared to a properly optimized loop with assignment, you will not gain nothing in terms of speed and loose tons of read-only memory.
If the array is static, it's placed in sequential memory (check this question). So char [1000][1000] is equal to char [1000000] (if your stack can hold that much).
If the array has been created with multidimensional new (say char(*x)[5] = new char[5][5]) then it's also contiguous.
If it's not (if you create it with dynamic allocations), then you can use the solutions found in my question to map a n-dimension array to a single one after you have memsetd it.

Heap corruption while freeing memory in a recursion function

I'm implementing an algorithm to select Kth smallest element of an array . so far when i was trying to free heap memory i got this error : crt detected that the application wrote to memory after end of heap buffer ...
int SEQUENTIAL_SELECT(int *S , int k , int n)
if(n<=Q) // sort S and return the kth element directly
return S[k];
// subdivide S into n/Q subsequences of Q elements each
int countSets = ceil((float)n/(float)Q);
//sort each subsequnce and determine its median
int *medians = new int[countSets];
for(int i=0;i<countSets;i++)
int size = Q - (n%Q);
medians[i] = S[i*Q+size/2];
medians[i] = S[i*Q+Q/2];
// call SEQUENTIAL_SELECT recursively to find median of medians
int m = SEQUENTIAL_SELECT(medians,countSets/2,countSets);
delete[] medians;
int size = (3*n)/4;
int* s1 = new int[size]; // contains values less than m
int* s3 = new int[size]; // contains values graten than m
for(int i=0;i<size;i++)
s1[i] = INT_MAX;
s3[i] = INT_MAX;
int i1=0;
int i2=0;
int i3=0;
for(int i=0;i<n;i++)
s3[i3++] = S[i];
else if(S[i]<m)
s1[i1++] = S[i];
i2++; // count number of values equal to m
if( i1>=k )
else if( i1+i2+i3 >= k)
m = SEQUENTIAL_SELECT(s3,k-i1-i2,i3);
delete[] s3;
delete[] s1;
return m;
#Dcoder is certainly correct that Q - n%q is incorrect. It should be n%Q. In addition, the computation size = (3*n)/4 is not reliable; try it with n = 6 (assuming, as seems certain, that Q is actually 5) given the vector [1, 2, 3, 4, 5, 0].
You could have avoided having a lot of eyes looking at your code by simply checking the values of the indexes at every array subscript assignment (although that wouldn't have caught the assignments inside of qsort, but more on that below).
It must surely have occurred to you that you are using an awful lot of memory to perform a simple operation, which could in fact be done in-place. Normally the reason to avoid doing an in-place operation would be that you need to preserve the original vector, but you're computing medians with qsort which sorts in-place, so the original vector is already modified. If that's acceptable, then there is no reason not to do the rest of the median-of-medians algorithm in-place. [1]
By the way, although I'm certainly not one of those who fears floating-point computations, there is no reason at all for countSets = ceil(float(n)/float(Q)). (n + Q - 1)/Q will work just fine. That idiom could usefully have been used in the computation of size as well, although I'm not at all sure where you got the 3n/4 computation from in the first place.
[Note 1] Hint: instead of grouping consecutively, divide the vector into five regions and find the median of the ith element of each region. Once you've found it, swap it with the ith element of the first region; once that is done, your first region -- the first fifth of the vector -- contains the medians and you can recurse on that subvector. That means actually writing out the median code as a series of comparisons, which is tedious but a lot faster than calling qsort. That also avoids the degenerate case I mentioned above, where the median-of-medians computation incorrectly returns the smallest element in the vector.

Double buffer vs double array c++

I was asked to create a matrix with 5 rows and unknown column.
And my boss want me to use a 1 dimensional buffer. concatenated by 5 rows buffer.
I don't get what is that mean, can some one provide me a simple example please!
With array I can do
double[][] arr = new double[5][someNumber];
But he says then the size would be limited.
So I don't know what he means by using a DOUBLE buffer, I am not very good #C++
Thank you very much, an example would be nice!
For R rows and C columns declare double arr[R * C], and arr[i * C + j] is the element at cell [i, j].
This generalizes to arbitrary dimensions.
Flattening out an array like that can be a very useful optimization, especially when you use dynamic arrays such as std::vector, where you can get a single dynamic array rather than one for each row.
Sounds like you're saying
double *arr[5];
for(unsigned int x = 0; x < 5; ++x)
arr[x] = new double[someNumber];
Since, you know that you have 5 for sure, and an unknown part my assumption is this is how you're referring to it.

Finding repeating signed integers with O(n) in time and O(1) in space

(This is a generalization of: Finding duplicates in O(n) time and O(1) space)
Problem: Write a C++ or C function with time and space complexities of O(n) and O(1) respectively that finds the repeating integers in a given array without altering it.
Example: Given {1, 0, -2, 4, 4, 1, 3, 1, -2} function must print 1, -2, and 4 once (in any order).
EDIT: The following solution requires a duo-bit (to represent 0, 1, and 2) for each integer in the range of the minimum to the maximum of the array. The number of necessary bytes (regardless of array size) never exceeds (INT_MAX ā€“ INT_MIN)/4 + 1.
#include <stdio.h>
void set_min_max(int a[], long long unsigned size,\
int* min_addr, int* max_addr)
long long unsigned i;
if(!size) return;
*min_addr = *max_addr = a[0];
for(i = 1; i < size; ++i)
if(a[i] < *min_addr) *min_addr = a[i];
if(a[i] > *max_addr) *max_addr = a[i];
void print_repeats(int a[], long long unsigned size)
long long unsigned i;
int min, max = min;
long long diff, q, r;
char* duos;
set_min_max(a, size, &min, &max);
diff = (long long)max - (long long)min;
duos = calloc(diff / 4 + 1, 1);
for(i = 0; i < size; ++i)
diff = (long long)a[i] - (long long)min; /* index of duo-bit
corresponding to a[i]
in sequence of duo-bits */
q = diff / 4; /* index of byte containing duo-bit in "duos" */
r = diff % 4; /* offset of duo-bit */
switch( (duos[q] >> (6 - 2*r )) & 3 )
case 0: duos[q] += (1 << (6 - 2*r));
case 1: duos[q] += (1 << (6 - 2*r));
printf("%d ", a[i]);
void main()
int a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof(a)/sizeof(int));
The definition of big-O notation is that its argument is a function (f(x)) that, as the variable in the function (x) tends to infinity, there exists a constant K such that the objective cost function will be smaller than Kf(x). Typically f is chosen to be the smallest such simple function such that the condition is satisfied. (It's pretty obvious how to lift the above to multiple variables.)
This matters because that K ā€” which you aren't required to specify ā€” allows a whole multitude of complex behavior to be hidden out of sight. For example, if the core of the algorithm is O(n2), it allows all sorts of other O(1), O(logn), O(n), O(nlogn), O(n3/2), etc. supporting bits to be hidden, even if for realistic input data those parts are what actually dominate. That's right, it can be completely misleading! (Some of the fancier bignum algorithms have this property for real. Lying with mathematics is a wonderful thing.)
So where is this going? Well, you can assume that int is a fixed size easily enough (e.g., 32-bit) and use that information to skip a lot of trouble and allocate fixed size arrays of flag bits to hold all the information that you really need. Indeed, by using two bits per potential value (one bit to say whether you've seen the value at all, another to say whether you've printed it) then you can handle the code with fixed chunk of memory of 1GB in size. That will then give you enough flag information to cope with as many 32-bit integers as you might ever wish to handle. (Heck that's even practical on 64-bit machines.) Yes, it's going to take some time to set that memory block up, but it's constant so it's formally O(1) and so drops out of the analysis. Given that, you then have constant (but whopping) memory consumption and linear time (you've got to look at each value to see whether it's new, seen once, etc.) which is exactly what was asked for.
It's a dirty trick though. You could also try scanning the input list to work out the range allowing less memory to be used in the normal case; again, that adds only linear time and you can strictly bound the memory required as above so that's constant. Yet more trickiness, but formally legal.
[EDIT] Sample C code (this is not C++, but I'm not good at C++; the main difference would be in how the flag arrays are allocated and managed):
#include <stdio.h>
#include <stdlib.h>
// Bit fiddling magic
int is(int *ary, unsigned int value) {
return ary[value>>5] & (1<<(value&31));
void set(int *ary, unsigned int value) {
ary[value>>5] |= 1<<(value&31);
// Main loop
void print_repeats(int a[], unsigned size) {
int *seen, *done;
unsigned i;
seen = calloc(134217728, sizeof(int));
done = calloc(134217728, sizeof(int));
for (i=0; i<size; i++) {
if (is(done, (unsigned) a[i]))
if (is(seen, (unsigned) a[i])) {
set(done, (unsigned) a[i]);
printf("%d ", a[i]);
} else
set(seen, (unsigned) a[i]);
void main() {
int a[] = {1,0,-2,4,4,1,3,1,-2};
Since you have an array of integers you can use the straightforward solution with sorting the array (you didn't say it can't be modified) and printing duplicates. Integer arrays can be sorted with O(n) and O(1) time and space complexities using Radix sort. Although, in general it might require O(n) space, the in-place binary MSD radix sort can be trivially implemented using O(1) space (look here for more details).
The O(1) space constraint is intractable.
The very fact of printing the array itself requires O(N) storage, by definition.
Now, feeling generous, I'll give you that you can have O(1) storage for a buffer within your program and consider that the space taken outside the program is of no concern to you, and thus that the output is not an issue...
Still, the O(1) space constraint feels intractable, because of the immutability constraint on the input array. It might not be, but it feels so.
And your solution overflows, because you try to memorize an O(N) information in a finite datatype.
There is a tricky problem with definitions here. What does O(n) mean?
Konstantin's answer claims that the radix sort time complexity is O(n). In fact it is O(n log M), where the base of the logarithm is the radix chosen, and M is the range of values that the array elements can have. So, for instance, a binary radix sort of 32-bit integers will have log M = 32.
So this is still, in a sense, O(n), because log M is a constant independent of n. But if we allow this, then there is a much simpler solution: for each integer in the range (all 4294967296 of them), go through the array to see if it occurs more than once. This is also, in a sense, O(n), because 4294967296 is also a constant independent of n.
I don't think my simple solution would count as an answer. But if not, then we shouldn't allow the radix sort, either.
I doubt this is possible. Assuming there is a solution, let's see how it works. I'll try to be as general as I can and show that it can't work... So, how does it work?
Without losing generality we could say we process the array k times, where k is fixed. The solution should also work when there are m duplicates, with m >> k. Thus, in at least one of the passes, we should be able to output x duplicates, where x grows when m grows. To do so, some useful information has been computed in a previous pass and stored in the O(1) storage. (The array itself can't be used, this would give O(n) storage.)
The problem: we have O(1) of information, when we walk over the array we have to identify x numbers(to output them). We need a O(1) storage than can tell us in O(1) time, if an element is in it. Or said in a different way, we need a data structure to store n booleans (of wich x are true) that uses O(1) space, and takes O(1) time to query.
Does this data structure exists? If not, then we can't find all duplicates in an array with O(n) time and O(1) space (or there is some fancy algorithm that works in a completely different manner???).
I really don't see how you can have only O(1) space and not modify the initial array. My guess is that you need an additional data structure. For example, what is the range of the integers? If it's 0..N like in the other question you linked, you can have an additinal count array of size N. Then in O(N) traverse the original array and increment the counter at the position of the current element. Then traverse the other array and print the numbers with count >= 2. Something like:
int* counts = new int[N];
for(int i = 0; i < N; i++) {
for(int i = 0; i < N; i++) {
if(counts[i] >= 2) cout << i << " ";
delete [] counts;
Say you can use the fact you are not using all the space you have. You only need one more bit per possible value and you have lots of unused bit in your 32-bit int values.
This has serious limitations, but works in this case. Numbers have to be between -n/2 and n/2 and if they repeat m times, they will be printed m/2 times.
void print_repeats(long a[], unsigned size) {
long i, val, pos, topbit = 1 << 31, mask = ~topbit;
for (i = 0; i < size; i++)
a[i] &= mask;
for (i = 0; i < size; i++) {
val = a[i] & mask;
if (val <= mask/2) {
pos = val;
} else {
val += topbit;
pos = size + val;
if (a[pos] < 0) {
printf("%d\n", val);
a[pos] &= mask;
} else {
a[pos] |= topbit;
void main() {
long a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof (a) / sizeof (long));