I have implemented radix sort in c++
...
void *countSort(int *tab, int size, int exp, string *comp, bool *stat) {
int output[size];
int i, index, count[10] = {0};
sysinfo(&amem);
for (i = 0; i < size; i++){
index = (tab[i]/exp)%10;
count[index]++;
}
for (i = 1; i < 10; i++)
count[i] += count[i - 1];
for (i = size - 1; i >= 0; i--) {
index = count[ (tab[i]/exp)%10 ] - 1;
output[index] = tab[i];
count[ (tab[i]/exp)%10 ]--;
}
if((*comp).rfind("<",0) == 0){
for (i = 0; i < size; i++){
tab[i] = output[i];
swap_counter++;
if(!*stat){ fprintf(stderr, "przestawiam\n"); }
}
}else{
for (i = 0; i < size; i++){
tab[i] = output[size-i-1];
swap_counter++;
if(!*stat){ fprintf(stderr, "przestawiam\n"); }
}
}
}
void *radix_sort(int size, int *tab, string *comp, bool *stat) {
int m;
auto max = [tab, size](){
int m = tab[0];
for (int i = 1; i < size; i++) {
if (tab[i] > m)
m = tab[i];
}
return m;
};
m = max();
for (int exp = 1; m/exp > 0; exp *= 10)
countSort(tab, size, exp, comp, stat);
}
...
int main(){
int tab = (int *) malloc(n*sizeof(int));
for(int n = 100; n <=10000; n+=100){
generate_random_tab(tab, n);
radix_sort(sorted_tab, 0, n-1, ">=", 1);
free(tab);
}
}
Now I want to check and print out information of how much memory radix sort uses.
I want to do this to compare how much of memory different sorting algorithms uses.
How to achieve this?
I was given a hint to use a sysinfo() to analyze how system memory usage changes but I couldn't achieve constant results.
(I'm working on linux)
Your program has linear memory usage malloc(n*sizeof(int)) and int output[size]; --- one of them on heap, the other on stack, so basically you don't need to make run-time measurements as you can calculate it easily.
As you are on Linux, for more complicated cases there is e.g. massif tool in valgrind, but it is focused on heap measurements (which in normal cases in which you want to measure memory usage is enough, as stack is usually to small for serious amounts of data).
sysinfo only shows whole system memory, not individual process memory.
For process memory usage, you might try mallinfo, e.g.
struct mallinfo before = mallinfo();
// radix sort code
struct mallinfo after = mallinfo();
Now you may compare the various entries before and after your sorting code.
Be aware, that this doesn't include stack memory.
Although I don't know, how accurate these numbers are in a C++ context.
Testing a complete example
#include <malloc.h>
#include <stdio.h>
#define SHOW(m) printf(#m "=%d-%d\n", after.m, before.m)
int main()
{
struct mallinfo before = mallinfo();
void *p1 = malloc(1000000);
//int *p2 = new int[1000000];
struct mallinfo after = mallinfo();
SHOW(arena);
SHOW(ordblks);
SHOW(smblks);
SHOW(hblks);
SHOW(hblkhd);
SHOW(usmblks);
SHOW(fsmblks);
SHOW(uordblks);
SHOW(fordblks);
SHOW(keepcost);
return 0;
}
shows different values, depending on whether you use malloc
arena=135168-0
ordblks=1-1
smblks=0-0
hblks=1-0
hblkhd=1003520-0
usmblks=0-0
fsmblks=0-0
uordblks=656-0
fordblks=134512-0
keepcost=134512-0
or new
arena=135168-135168
ordblks=1-1
smblks=0-0
hblks=1-0
hblkhd=4001792-0
usmblks=0-0
fsmblks=0-0
uordblks=73376-73376
fordblks=61792-61792
keepcost=61792-61792
It looks like C++ (Ubuntu, GCC 9.2.1) does some preallocation, but the relevant number seems to be hblkhd (on my machine).
Since your only dynamic allocation is at the beginning of main, you must do the first mallinfo there. Testing only the radix sort code reveals, that there are no additional dynamic memory allocations.
Related
In one of the tutorial videos for merge sort, it was mentioned that once the right and left sub arrays have to merged to the parent array, in order to reduce the space complexity we need to free the memory allocated for the left and right sub arrays. But whenever we come out of the function call, the local variable will be destroyed. Do correct me if I am wrong. So will the action of freeing the memory make any difference?
Here is the code that I wrote:
#include <iostream>
#include <bits/stdc++.h>
using namespace std;
void mergeArr(int *rarr, int *larr, int *arr, int rsize, int lsize) {
int i = 0, r = 0, l = 0;
while (r < rsize && l < lsize) {
if (rarr[r] < larr[l]) {
arr[i++] = rarr[r++];
} else {
arr[i++] = larr[l++];
}
}
while (r < rsize) {
arr[i++] = rarr[r++];
}
while (l < lsize) {
arr[i++] = larr[l++];
}
}
void mergeSort(int *arr, int length) {
if (length > 1) {
int l1 = length / 2;
int l2 = length - l1;
int rarr[l1], larr[l2];
for (int i = 0; i < l1; i++) {
rarr[i] = arr[i];
}
for (int i = l1; i < length; i++) {
larr[i - l1] = arr[i];
}
mergeSort(rarr, l1);
mergeSort(larr, l2);
mergeArr(rarr, larr, arr, l1, l2);
// will free(rarr); free(larr); make any difference in space complexity
}
}
int main() {
int arr[5] = { 1, 10, 2, 7, 5 };
mergeSort(arr, 5);
for (int i = 0; i < 5; i++)
cout << arr[i] << " ";
}
I have multiple things to say about this. More from a C++ pov:
int rarr[l1],larr[l2]; - this is illegal c++. This is just an extension provided by g++ and is not valid across other compilers. You should either do int* rarr = new int[l1]; or even better use an std::vector: std::vector<int> rarr(l1).
If you are doing the former (dynamic allocation using new i.e int* rarr = new int[l1]), you have to manage the memory on your own. So when you're done using it you have to delete it: delete[] rarr. Mind it, malloc and free are not c++, they are c. new and delete are c++ way of allocating/deallocating memory.
If you use a vector, c++ will handle the deletion/deallocation of memory so you need not worry.
Now coming back to your original question: whether or not an idea like this would improve your space complexity: the answer is NO. It won't.
Why? Think about the max temporary storage you're using. Check the first case of your recursion. Isn't the space that you're using O(N)? because larr and rarr will both be of size N/2. Moreover, the space complexity is O(N) assuming the temporary storage is being freed. If somehow the space is not freed, the space complexity will increase to O(N)+2*(N/2)+4*O(N/4).... which is O(Nlog2N) because each step of recursion is allocating some space which it is not freeeing.
In your implementation, the left and right arrays are defined with automatic storage, so deallocation is automatic when the function returns but it poses 2 problems:
a sufficiently large array will invoke undefined behavior because allocating too much space with automatic storage will cause a stack overflow.
variable sized arrays are not standard C++. You are relying on a compiler specific extension.
The maximum stack space used by your function is proportional to N, so the space complexity is O(N) as expected. You could allocate these arrays with new, and of course you would then have to deallocate them with delete otherwise you would have memory leaks and the amount of memory lost would be proportional to N*log2(N).
An alternative approach would use a temporary array, allocated at the initial call and passed to the recursive function.
Note also that the names for the left and right arrays are very confusing. rarr is actually to the left of larr!
Here is a modified version:
#include <iostream>
using namespace std;
void mergeArr(int *larr, int *rarr, int *arr, int lsize, int rsize) {
int i = 0, r = 0, l = 0;
while (l < lsize && r < rsize) {
if (larr[l] <= rarr[r]) {
arr[i++] = larr[l++];
} else {
arr[i++] = rarr[r++];
}
}
while (l < lsize) {
arr[i++] = larr[l++];
}
while (r < rsize) {
arr[i++] = rarr[r++];
}
}
void mergeSort(int *arr, int length) {
if (length > 1) {
int l1 = length / 2;
int l2 = length - l1;
int *larr = new int[l1];
int *rarr = new int[l2];
for (int i = 0; i < l1; i++) {
larr[i] = arr[i];
}
for (int i = l1; i < length; i++) {
rarr[i - l1] = arr[i];
}
mergeSort(larr, l1);
mergeSort(rarr, l2);
mergeArr(larr, rarr, arr, l1, l2);
delete[] larr;
delete[] rarr;
}
}
int main() {
int arr[] = { 1, 10, 2, 7, 5 };
int length = sizeof arr / sizeof *arr;
mergeSort(arr, length);
for (int i = 0; i < length; i++) {
cout << arr[i] << " ";
}
return 0;
}
Freeing temporary arrays does not influence on space complexity because we must consider maximum memory consumption - it is about size of initial array.
From the performance point of view, it seems reasonable to allocate temporary storage once in the beginning of sorting, reuse it at every stage, and free it after all the work is done.
int newHeight = _height/2;
int newWidth = _width/2;
double*** imageData = new double**[newHeight];
for (int i = 0; i < newHeight; i++)
{
imageData[i] = new double*[newWidth];
for (int j = 0; j < newWidth; j++)
{
imageData[i][j] = new double[4];
}
}
I have dynamically allocated this 3D matrix.
what is the fastest and safest way to free the memory here?
here is that I have done but this takes a few seconds my matrix is big (1500,2000,4)
for (int i = 0; i != _height/2; i++)
{
for (int j = 0; j != _width/2; j++)
{
delete[] imageData[i][j];
}
delete[] imageData[i];
}
delete[] imageData;
Update
As suggested I have chosen this solution:
std::vector<std::vector<std::array<double,4>>>
the performance is great for my case
Allocate the entire image data as one block so you can free it as one block, ie. double* imageData = new double[width*height*4]; delete [] imageData; and index into it using offsets. Right now you are making 3 million separate allocations which is thrashing your heap.
I agree with qartar's answer right up until he said "index into it using offsets". That isn't necessary. You can have your single allocation and multiple subscript access (imageData[i][j][k]) too. I previously showed this method here, it's not difficult to adapt it for the 3-D case:
allocation code as follows:
double*** imageData;
imageData = new double**[width];
imageData[0] = new double*[width * height];
imageData[0][0] = new double[width * height * 4];
for (int i = 0; i < width; i++) {
if (i > 0) {
imageData[i] = imageData[i-1] + height;
imageData[i][0] = imageData[i-1][0] + height * 4;
}
for (int j = 1; j < height; j++) {
imageData[i][j] = imageData[i][j-1] + 4;
}
}
Deallocation becomes simpler:
delete[] imageData[0][0];
delete[] imageData[0];
delete[] imageData;
Of course, you can and should use std::vector to do the deallocation automatically:
std::vector<double**> imageData(width);
std::vector<double*> imageDataRows(width * height);
std::vector<double> imageDataCells(width * height * 4);
for (int i = 0; i < width; i++) {
imageData[i] = &imageDataRows[i * height];
for (int j = 0; j < height; j++) {
imageData[i][j] = &imageDataCells[(i * height + j) * 4];
}
}
and deallocation is completely automatic.
See my other answer for more explanation.
Or use std::array<double,4> for the last subscript, and use 2-D dynamic allocation via this method.
A slight variation on the first idea of Ben Voigt's answer:
double ***imagedata = new double**[height];
double **p = new double*[height * width];
double *q = new double[height * width * length];
for (int i = 0; i < height; ++i, p += width) {
imagedata[i] = p;
for (int j = 0; j < width; ++j, q += length) {
imagedata[i][j] = q;
}
}
// ...
delete[] imagedata[0][0];
delete[] imagedata[0];
delete[] imagedata;
It is possible to do the whole thing with a single allocation, but that would introduce a bit of complexity that you might not want to pay.
Now, the fact that each table lookup involves a couple of back-to-back reads of pointers from memory, this solution will pretty much always be quite inferior to allocating a flat array, and doing index calculations to convert a triple of indices into one flat index (and you should write a wrapper class that does these index calculations for you).
The main reason to use arrays of pointers to arrays of pointers to arrays is when your array is ragged — that is, imagedata[a][b] and imagedata[c][d] have different lengths — or maybe for swapping rows around, such as swap(imagedata[a][b], imagedata[c][d]). And under these circumstances, vector as you've used it is preferable to use until proven otherwise.
The primary portion of your algorithm that is killing performance is the granularity and sheer number of allocations you're making. In total you're producing 3001501 broken down as:
1 allocation for 1500 double**
1500 allocations, each of which obtains 2000 double*
3000000 allocations each of which obtains double[4]
This can be considerably reduced. You can certainly do as other suggest and simply allocate 1 massive array of double, leaving the index calculation to accessor functions. Of course, if you do that you need to ensure you bring the sizes along for the ride. The result, however, will easily deliver the fastest allocation time and access performance. Using a std::vector<double> arr(d1*d2*4); and doing the offset math as needed will serve very well.
Another Way
If you are dead set on using a pointer array approach, you can eliminate the 3000000 allocations by obtaining both of the inferior dimensions in single allocations. Your most-inferior dimension is fixed (4), thus you could do this: (but you'll see in a moment there is a much more C++-centric mechanism):
double (**allocPtrsN(size_t d1, size_t d2))[4]
{
typedef double (*Row)[4];
Row *res = new Row[d1];
for (size_t i=0; i<d1; ++i)
res[i] = new T[d2][4];
return res;
}
and simply invoke as:
double (**arr3D)[4] = allocPtrsN(d1,d2);
where d1 and d2 are your two superior dimensions. This produces exactly d1 + 1 allocations, the first being d1 pointers, the remaining be d1 allocations, one for each double[d2][4].
Using C++ Standard Containers
The prior code is obviously tedious, and frankly prone to considerable error. C++ offers a tidy solution this using a vector of vector of fixed array, doing this:
std::vector<std::vector<std::array<double,4>>> arr(1500, std::vector<std::array<double,4>>(2000));
Ultimately this will do nearly the same allocation technique as the rather obtuse code shown earlier, but provide you all the lovely benefits of the standard library while doing it. You get all those handy members of the std::vector and std::array templates, and RAII features as an added bonus.
However, this is one significant difference. The raw pointer method shown earlier will not value-initialize each allocated entity; the vector of vector of array method will. If you think it doesn't make a difference...
#include <iostream>
#include <vector>
#include <array>
#include <chrono>
using Quad = std::array<double, 4>;
using Table = std::vector<Quad>;
using Cube = std::vector<Table>;
Cube allocCube(size_t d1, size_t d2)
{
return Cube(d1, Table(d2));
}
double ***allocPtrs(size_t d1, size_t d2)
{
double*** ptrs = new double**[d1];
for (size_t i = 0; i < d1; i++)
{
ptrs[i] = new double*[d2];
for (size_t j = 0; j < d2; j++)
{
ptrs[i][j] = new double[4];
}
}
return ptrs;
}
void freePtrs(double***& ptrs, size_t d1, size_t d2)
{
for (size_t i=0; i<d1; ++i)
{
for (size_t j=0; j<d2; ++j)
delete [] ptrs[i][j];
delete [] ptrs[i];
}
delete [] ptrs;
ptrs = nullptr;
}
double (**allocPtrsN(size_t d1, size_t d2))[4]
{
typedef double (*Row)[4];
Row *res = new Row[d1];
for (size_t i=0; i<d1; ++i)
res[i] = new double[d2][4];
return res;
}
void freePtrsN(double (**p)[4], size_t d1, size_t d2)
{
for (size_t i=0; i<d1; ++i)
delete [] p[i];
delete [] p;
}
std::vector<std::vector<std::array<double,4>>> arr(1500, std::vector<std::array<double,4>>(2000));
template<class C>
void print_duration(const std::chrono::time_point<C>& beg,
const std::chrono::time_point<C>& end)
{
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - beg).count() << "ms\n";
}
int main()
{
using namespace std::chrono;
time_point<system_clock> tp;
volatile double vd;
static constexpr size_t d1 = 1500, d2 = 2000;
tp = system_clock::now();
for (int i=0; i<10; ++i)
{
double ***cube = allocPtrs(d1,d2);
cube[d1/2][d2/21][1] = 1.0;
vd = cube[d1/2][d2/2][3];
freePtrs(cube, 1500, 2000);
}
print_duration(tp, system_clock::now());
tp = system_clock::now();
for (int i=0; i<10; ++i)
{
Cube cube = allocCube(1500,2000);
cube[d1/2][d2/21][1] = 1.0;
vd = cube[d1/2][d2/2][3];
}
print_duration(tp, system_clock::now());
tp = system_clock::now();
for (int i=0; i<10; ++i)
{
auto cube = allocPtrsN(d1,d2);
cube[d1/2][d2/21][1] = 1.0;
vd = cube[d1/2][d2/21][1];
freePtrsN(cube, d1, d2);
}
print_duration(tp, system_clock::now());
}
Output
5328ms
418ms
95ms
Thusly, if you're planning on loading up every element with something besides zero anyway, it is something to keep in mind.
Conclusion
If performance were critical I would use the 24MB (on my implementation, anyway) single-allocation, likely in a std::vector<double> arr(d1*d2*4);, and do the offset calculations as needed using one form of secondary indexing or another. Other answers proffer up interesting ideas on this, notably Ben's, which radically reduces the allocation count two a mere three blocks (data, and two secondary pointer arrays). Sorry, I didn't have time to bench it, but I would suspect the performance would be stellar. But if you really want to keep your existing technique, consider doing it in a C++ container as shown above. If the extra cycles spent value initializing the world aren't too heavy a price to pay, it will be much easier to manage (and obviously less code to deal with in comparison to raw pointers).
Best of luck.
In Linux, the kernel doesn't allocate any physical memory pages until we actually using that memory, but I am having a hard time here trying to find why it does in fact allocate this memory:
for(int t = 0; t < T; t++){
for(int b = 0; b < B; b++){
Matrix[t][b].length = 0;
Matrix[t][b].size = 60;
Matrix[t][b].pointers = (Node**)malloc(60*sizeof(Node*));
}
}
I then access this data structure to add one element to it like this:
Node* elem = NULL;
Matrix[a][b].length++;
Matrix[a][b]->pointers[ Matrix[a][b].length ] = elem;
Essentially, I run my program with htop on the side and Linux does allocate more memory if I increase the no. "60" I have in the code above. Why? Shouldn't it only allocate one page when the first element is added to the array?
It depends on how your Linux system is configured.
Here's a simple C program that tries to allocate 1TB of memory and touches some of it.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
char *array[1000];
int i;
for (i = 0; i < 1000; ++i)
{
if (NULL == (array[i] = malloc((int) 1e9)))
{
perror("malloc failed!");
return -1;
}
array[i][0] = 'H';
}
for (i = 0; i < 1000; ++i)
printf("%c", array[i][0]);
printf("\n");
sleep(10);
return 0;
}
When I run top by its side, it says the VIRT memory usage goes to 931g (where g means GiB), while RES only goes to 4380 KiB.
Now, when I change my system to use a different overcommit strategy by /sbin/sysctl -w vm.overcommit_memory=2 and re-run it, I get:
malloc failed!: Cannot allocate memory
So your system may be using a different overcommit strategy than you expected. For more information read this.
Your assumption that malloc / new doesn't cause any memory to be written, and therefore assigned physical memory by the OS, is incorrect (for the memory allocator implementation you have).
I've reproduced the behavior you are describing in the following simple program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
char **array[128][128];
int size;
int i, j;
if (1 == argc || 0 >= (size = atoi(argv[1])))
fprintf(stderr, "usage: %s <num>; where num > 0\n", argv[0]), exit(-1);
for (i = 0; i < 128; ++i)
for (j = 0; j < 128; ++j)
if (NULL == (array[i][j] = malloc(size * sizeof(char*))))
{
fprintf(stderr, "malloc failed when i = %d, j = %d\n", i, j);
perror(NULL);
return -1;
}
sleep(10);
return 0;
}
When I run this with various small size parameters as input, the VIRT and RES memory footprints (as reported by top) grow together in-step, even though I'm not explicitly touching the inner arrays that I'm allocating.
This basically holds true until size exceeds ~512. Thereafter, RES stays constant at 64 MiB while VIRT can be extremely large (e.g. - 1220 GiB when size is 10M). That is because 512 * 8 = 4096, which is a common virtual page size on Linux systems, and 128 * 128 * 4096 B = 64 MiB.
Therefore, it looks like the first page of every allocation is being mapped to physical memory, probably because malloc / new itself is writing to part of the allocation for its own internal book keeping. Of course, lots of small allocations may fit in and be placed on the same page, so only one page gets mapped to physical memory for many such allocations.
In your code example, changing the size of the array matters because it means less of those arrays can be fit on one page, therefore requiring more memory pages to be touched by malloc / new itself (and therefore mapped to physical memory by the OS) over the run of the program.
When you use 60, that takes about 480 bytes, so ~8 of those allocations can be put on one page. When you use 100, that takes about 800 bytes, so only ~5 of those allocations can be put on one page. So, I'd expect the "100 program" to use about 8/5ths as much memory as the "60 program", which seems to be a big enough difference to make your machine start swapping to stable storage.
If each of your smaller "60" allocations were already over 1 page in size, then changing it to be bigger "100" wouldn't affect your program's initial physical memory usage, just like you originally expected.
PS - I think whether you explicitly touch the initial page of your allocations or not will be irrelevant as malloc / new will have already done so (for the memory allocator implementation you have).
Here's a sketch of what you could do if you typically expect that your b arrays will usually be small, usually be less than 2^X pointers (X = 5 in the code below), but also handles exceptional cases where they get even bigger.
You can adjust X down if your expected usage doesn't match. You could also adjust the minimum size arrays up from 0 (and not allocate the smaller 2^i levels), if you expect most of your arrays will usually use at least 2^Y pointers (e.g. - Y = 3).
If you think that actually X == Y (e.g. - 4) for your usage pattern, then you can just do one allocation of B * (0x1 << X) * sizeof(Node*) and divvy up that T array to your b's. Then if a b array needs to exceed 2^X pointers, then resort to malloc for it followed by realloc's if it needs to grow even further.
The main point here is that the initial allocation will map to very little physical memory, addressing the problem that initially spurred your original question.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define T 1278
#define B 131072
#define CAP_MAX_LG2 5
#define CAP_MAX (0x1 << CAP_MAX_LG2) // pre-alloc T's to handle all B arrays of length up to 2^CAP_MAX_LG2
typedef struct Node Node;
typedef struct
{
int t; // so a matrix element can know to which T_Allocation it belongs
int length;
int cap_lg2; // log base 2 of capacity; -1 if capacity is zero
Node **pointers;
} MatrixElem;
typedef struct
{
Node **base; // pre-allocs B * 2^(CAP_MAX_LG2 + 1) Node pointers; every b array can be any of { 0, 1, 2, 4, 8, ..., CAP_MAX } capacity
Node **frees_pow2[CAP_MAX_LG2 + 1]; // frees_pow2[i] will point at the next free array of 2^i pointers to Node to allocate to a growing b array
} T_Allocation;
MatrixElem Matrix[T][B];
T_Allocation T_Allocs[T];
int Node_init(Node *n) { return 0; } // just a dummy
void Node_fini(Node *n) { } // just a dummy
int Node_eq(const Node *n1, const Node *n2) { return 0; } // just a dummy
void Init(void)
{
for(int t = 0; t < T; t++)
{
T_Allocs[t].base = malloc(B * (0x1 << (CAP_MAX_LG2 + 1)) * sizeof(Node*));
if (NULL == T_Allocs[t].base)
abort();
T_Allocs[t].free_pows2[0] = T_Allocs[t].base;
for (int x = 1; x <= CAP_MAX_LG2; ++x)
T_Allocs[t].frees_pow2[x] = &T_Allocs[t].base[B * (0x1 << (x - 1))];
for(int b = 0; b < B; b++)
{
Matrix[t][b].t = t;
Matrix[t][b].length = 0;
Matrix[t][b].cap_lg2 = -1;
Matrix[t][b].pointers = NULL;
}
}
}
Node *addElement(MatrixElem *elem)
{
if (-1 == elem->cap_lg2 || elem->length == (0x1 << elem->cap_lg2)) // elem needs a bigger pointers array to add an element
{
int new_cap_lg2 = elem->cap_lg2 + 1;
int new_cap = (0x1 << new_cap_lg2);
if (new_cap_lg2 <= CAP_MAX_LG2) // new b array can still fit in pre-allocated space in T
{
Node **new_pointers = T_Allocs[elem->t].frees_pow2[new_cap_lg2];
memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
elem->pointers = new_pointers;
T_Allocs[elem->t].frees_pow2[new_cap_lg2] += new_cap;
}
else if (elem->cap_lg2 == CAP_MAX_LG2) // exceeding pre-alloc'ed arrays in T; use malloc
{
Node **new_pointers = malloc(new_cap * sizeof(Node*));
if (NULL == new_pointers)
return NULL;
memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
elem->pointers = new_pointers;
}
else // already exceeded pre-alloc'ed arrays in T; use realloc
{
Node **new_pointers = realloc(elem->pointers, new_cap * sizeof(Node*));
if (NULL == new_pointers)
return NULL;
elem->pointers = new_pointers;
}
++elem->cap_lg2;
}
Node *ret = malloc(sizeof(Node);
if (ret)
{
Node_init(ret);
elem->pointers[elem->length] = ret;
++elem->length;
}
return ret;
}
int removeElement(const Node *a, MatrixElem *elem)
{
int i;
for (i = 0; i < elem->length && !Node_eq(a, elem->pointers[i]); ++i);
if (i == elem->length)
return -1;
Node_fini(elem->pointers[i]);
free(elem->pointers[i]);
--elem->length;
memmove(&elem->pointers[i], &elem->pointers[i+1], sizeof(Node*) * (elem->length - i));
return 0;
}
int main()
{
return 0;
}
I'm trying to create a magic square that will print four different grid sizes (5x5, 7x7, 9x9, 15x15). The error I'm having is the array magsquare within the function tells me it needs a constant integer. (I can't use pointers) This is a class assignment.
#include <iostream>
#include <iomanip>
using namespace std;
void magicSquare(int n){
int magsquare[n][n] = { 0 }; /*THIS is the error with [n][n]*/
int gridsize = n * n;
int row = 0;
int col = n / 2;
for (int i = 1; i <= gridsize; ++i)
{
magsquare[row][col] = i;
row--;
col++;
if (i%n == 0)
{
row += 2;
--col;
}
else
{
if (col == n)
col -= n;
else if (row < 0)
row += n;
}
}
for (int i = 0; i < n; i++){
for (int j = 0; j < n; j++){
cout << setw(3) << right << magsquare[i][j];
}
cout << endl;
}
}
int main(){
int n = 5;
magicSquare(n);
return 0;
}
Indentation may look incorrect, but it's right. Sorry.
The failure is because standard C++ cannot allocate dynamically sized array on the stack, as you are trying to do.
int magsquare[n][n];
As far as magicSquare is concerned n is only known at runtime and for an array to be allocated on the stack it's size must be known at compile time.
Use a 15 x 15 array.
int magsquare[15][15];
As long as you know this is the largest you'll ever need, you should be ok.
Alternatives (which you've already said you can't use)
Use new to declare a 2d array of the required dimensions. (Remember to delete[] it though)
Use std::vector
It may also be a good idea to add a check that n values over 15 or under 1 are rejected, otherwise you'll face undefined behaviour if any values outside of 1-15 are passed into the function.
This is my code. When I access dtr array in initImg function it gives a stack overflow exception. What might be the reason?
#define W 1000
#define H 1000
#define MAX 100000
void initImg(int img[], float dtr[])
{
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
img[i*W+j]=255;
for(int j=0;j<H;j++)
{
img[j] = 0;
img[W*(W-1)+j] = 0;
}
for(int i=0;i<W;i++)
{
img[i*W] = 0;
img[i*W+H-1] = 0;
}
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
{
if(img[i*W+j]==0)
dtr[i*W+j] = 0; // <------here
else
dtr[i*W+j] = MAX; // <------here
}
}
int main()
{
int image[W*H];
float dtr[W*H];
initImg(image,dtr);
return 0;
}
This:
int image[W*H];
float dtr[W*H];
Creates each a 4 * 1000 * 1000 ~ 4 MB array into the stack. The stack space is limited, and usually it's less than 4 MB. Don't do that, create the arrays in the heap using new.
int *image = new int[W*H];
float *dtr = new float[W*H];
Your stack probably isn't big enough to hold a million ints and a million floats (8MB). So as soon as you try to access beyond your stack size, your operating system throws you an error. Objects or arrays above a certain size need to be allocated on the heap - preferably using a self-managing self-bounds-checking class such as std::vector - the specific size depends on your implementation.
In addition to the stack overrun, you have another problem -- one which is masked by your definitions of W and H.
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
{
if(img[i*W+j]==0)
dtr[i*W+j] = 0; // <------here
else
dtr[i*W+j] = MAX; // <------here
}
Your i loop should count from 0 to H-1, rather than W-1 (and the j loop should swap as well). Otherwise your code will only work correctly if W==H. If WH you will overrun your buffers.
This same problem exists elsewhere in your code sample as well.
You're creating giant arrays on the stack. Just use std::vector instead:
std::vector<int> image(W*H);
std::vector<float> dtr(W*H);
Your stack is full. You can allocate memory in heap or increase the stack memory. From what I know the maximum size is about 8MB, but this is not a very good idea. The best solution is to use heap allocation or some containers (vector) available in std.
You will eventually get to
dtr[W*W+j] = 0; <------here
Which is much more than you have allocated.
Your compiler will define the stack size. A way to get around this is to dynamically allocate your arrays using std::vector array_one(W*H).
You are trying to allocate memory from stack. the maximum memory which can be allocated using stack is complier dependent.
So try something like this to avoid this kind of exception.
#include <stdlib.h>
#define W 1000
#define H 1000
#define MAX 100000
void initImg(int img[], float dtr[])
{
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
img[i*W+j]=255;
for(int j=0;j<H;j++)
{
img[j] = 0;
img[W*(W-1)+j] = 0;
}
for(int i=0;i<W;i++)
{
img[i*W] = 0;
img[i*W+H-1] = 0;
}
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
{
if(img[i*W+j]==0)
dtr[i*W+j] = 0; // <------here
else
dtr[i*W+j] = MAX; // <------here
}
}
int main()
{
int *image = (int*)malloc(4*W*H); //Malloc the memory....(Allocated from Heap..)
float *dtr = (float*)malloc(4*W*H);
if(image && dtr) //If none of the ptr is NULL. Means memory is allocated...
{
initImg(image,dtr);
}
return 0;
}
You can use new as well instead of using malloc to allocate memory from heap...