Linux really allocating memory it shoudn't in C++ code - c++

In Linux, the kernel doesn't allocate any physical memory pages until we actually using that memory, but I am having a hard time here trying to find why it does in fact allocate this memory:
for(int t = 0; t < T; t++){
for(int b = 0; b < B; b++){
Matrix[t][b].length = 0;
Matrix[t][b].size = 60;
Matrix[t][b].pointers = (Node**)malloc(60*sizeof(Node*));
}
}
I then access this data structure to add one element to it like this:
Node* elem = NULL;
Matrix[a][b].length++;
Matrix[a][b]->pointers[ Matrix[a][b].length ] = elem;
Essentially, I run my program with htop on the side and Linux does allocate more memory if I increase the no. "60" I have in the code above. Why? Shouldn't it only allocate one page when the first element is added to the array?

It depends on how your Linux system is configured.
Here's a simple C program that tries to allocate 1TB of memory and touches some of it.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
char *array[1000];
int i;
for (i = 0; i < 1000; ++i)
{
if (NULL == (array[i] = malloc((int) 1e9)))
{
perror("malloc failed!");
return -1;
}
array[i][0] = 'H';
}
for (i = 0; i < 1000; ++i)
printf("%c", array[i][0]);
printf("\n");
sleep(10);
return 0;
}
When I run top by its side, it says the VIRT memory usage goes to 931g (where g means GiB), while RES only goes to 4380 KiB.
Now, when I change my system to use a different overcommit strategy by /sbin/sysctl -w vm.overcommit_memory=2 and re-run it, I get:
malloc failed!: Cannot allocate memory
So your system may be using a different overcommit strategy than you expected. For more information read this.

Your assumption that malloc / new doesn't cause any memory to be written, and therefore assigned physical memory by the OS, is incorrect (for the memory allocator implementation you have).
I've reproduced the behavior you are describing in the following simple program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
char **array[128][128];
int size;
int i, j;
if (1 == argc || 0 >= (size = atoi(argv[1])))
fprintf(stderr, "usage: %s <num>; where num > 0\n", argv[0]), exit(-1);
for (i = 0; i < 128; ++i)
for (j = 0; j < 128; ++j)
if (NULL == (array[i][j] = malloc(size * sizeof(char*))))
{
fprintf(stderr, "malloc failed when i = %d, j = %d\n", i, j);
perror(NULL);
return -1;
}
sleep(10);
return 0;
}
When I run this with various small size parameters as input, the VIRT and RES memory footprints (as reported by top) grow together in-step, even though I'm not explicitly touching the inner arrays that I'm allocating.
This basically holds true until size exceeds ~512. Thereafter, RES stays constant at 64 MiB while VIRT can be extremely large (e.g. - 1220 GiB when size is 10M). That is because 512 * 8 = 4096, which is a common virtual page size on Linux systems, and 128 * 128 * 4096 B = 64 MiB.
Therefore, it looks like the first page of every allocation is being mapped to physical memory, probably because malloc / new itself is writing to part of the allocation for its own internal book keeping. Of course, lots of small allocations may fit in and be placed on the same page, so only one page gets mapped to physical memory for many such allocations.
In your code example, changing the size of the array matters because it means less of those arrays can be fit on one page, therefore requiring more memory pages to be touched by malloc / new itself (and therefore mapped to physical memory by the OS) over the run of the program.
When you use 60, that takes about 480 bytes, so ~8 of those allocations can be put on one page. When you use 100, that takes about 800 bytes, so only ~5 of those allocations can be put on one page. So, I'd expect the "100 program" to use about 8/5ths as much memory as the "60 program", which seems to be a big enough difference to make your machine start swapping to stable storage.
If each of your smaller "60" allocations were already over 1 page in size, then changing it to be bigger "100" wouldn't affect your program's initial physical memory usage, just like you originally expected.
PS - I think whether you explicitly touch the initial page of your allocations or not will be irrelevant as malloc / new will have already done so (for the memory allocator implementation you have).

Here's a sketch of what you could do if you typically expect that your b arrays will usually be small, usually be less than 2^X pointers (X = 5 in the code below), but also handles exceptional cases where they get even bigger.
You can adjust X down if your expected usage doesn't match. You could also adjust the minimum size arrays up from 0 (and not allocate the smaller 2^i levels), if you expect most of your arrays will usually use at least 2^Y pointers (e.g. - Y = 3).
If you think that actually X == Y (e.g. - 4) for your usage pattern, then you can just do one allocation of B * (0x1 << X) * sizeof(Node*) and divvy up that T array to your b's. Then if a b array needs to exceed 2^X pointers, then resort to malloc for it followed by realloc's if it needs to grow even further.
The main point here is that the initial allocation will map to very little physical memory, addressing the problem that initially spurred your original question.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define T 1278
#define B 131072
#define CAP_MAX_LG2 5
#define CAP_MAX (0x1 << CAP_MAX_LG2) // pre-alloc T's to handle all B arrays of length up to 2^CAP_MAX_LG2
typedef struct Node Node;
typedef struct
{
int t; // so a matrix element can know to which T_Allocation it belongs
int length;
int cap_lg2; // log base 2 of capacity; -1 if capacity is zero
Node **pointers;
} MatrixElem;
typedef struct
{
Node **base; // pre-allocs B * 2^(CAP_MAX_LG2 + 1) Node pointers; every b array can be any of { 0, 1, 2, 4, 8, ..., CAP_MAX } capacity
Node **frees_pow2[CAP_MAX_LG2 + 1]; // frees_pow2[i] will point at the next free array of 2^i pointers to Node to allocate to a growing b array
} T_Allocation;
MatrixElem Matrix[T][B];
T_Allocation T_Allocs[T];
int Node_init(Node *n) { return 0; } // just a dummy
void Node_fini(Node *n) { } // just a dummy
int Node_eq(const Node *n1, const Node *n2) { return 0; } // just a dummy
void Init(void)
{
for(int t = 0; t < T; t++)
{
T_Allocs[t].base = malloc(B * (0x1 << (CAP_MAX_LG2 + 1)) * sizeof(Node*));
if (NULL == T_Allocs[t].base)
abort();
T_Allocs[t].free_pows2[0] = T_Allocs[t].base;
for (int x = 1; x <= CAP_MAX_LG2; ++x)
T_Allocs[t].frees_pow2[x] = &T_Allocs[t].base[B * (0x1 << (x - 1))];
for(int b = 0; b < B; b++)
{
Matrix[t][b].t = t;
Matrix[t][b].length = 0;
Matrix[t][b].cap_lg2 = -1;
Matrix[t][b].pointers = NULL;
}
}
}
Node *addElement(MatrixElem *elem)
{
if (-1 == elem->cap_lg2 || elem->length == (0x1 << elem->cap_lg2)) // elem needs a bigger pointers array to add an element
{
int new_cap_lg2 = elem->cap_lg2 + 1;
int new_cap = (0x1 << new_cap_lg2);
if (new_cap_lg2 <= CAP_MAX_LG2) // new b array can still fit in pre-allocated space in T
{
Node **new_pointers = T_Allocs[elem->t].frees_pow2[new_cap_lg2];
memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
elem->pointers = new_pointers;
T_Allocs[elem->t].frees_pow2[new_cap_lg2] += new_cap;
}
else if (elem->cap_lg2 == CAP_MAX_LG2) // exceeding pre-alloc'ed arrays in T; use malloc
{
Node **new_pointers = malloc(new_cap * sizeof(Node*));
if (NULL == new_pointers)
return NULL;
memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
elem->pointers = new_pointers;
}
else // already exceeded pre-alloc'ed arrays in T; use realloc
{
Node **new_pointers = realloc(elem->pointers, new_cap * sizeof(Node*));
if (NULL == new_pointers)
return NULL;
elem->pointers = new_pointers;
}
++elem->cap_lg2;
}
Node *ret = malloc(sizeof(Node);
if (ret)
{
Node_init(ret);
elem->pointers[elem->length] = ret;
++elem->length;
}
return ret;
}
int removeElement(const Node *a, MatrixElem *elem)
{
int i;
for (i = 0; i < elem->length && !Node_eq(a, elem->pointers[i]); ++i);
if (i == elem->length)
return -1;
Node_fini(elem->pointers[i]);
free(elem->pointers[i]);
--elem->length;
memmove(&elem->pointers[i], &elem->pointers[i+1], sizeof(Node*) * (elem->length - i));
return 0;
}
int main()
{
return 0;
}

Related

Check memory usage of radixsort C++

I have implemented radix sort in c++
...
void *countSort(int *tab, int size, int exp, string *comp, bool *stat) {
int output[size];
int i, index, count[10] = {0};
sysinfo(&amem);
for (i = 0; i < size; i++){
index = (tab[i]/exp)%10;
count[index]++;
}
for (i = 1; i < 10; i++)
count[i] += count[i - 1];
for (i = size - 1; i >= 0; i--) {
index = count[ (tab[i]/exp)%10 ] - 1;
output[index] = tab[i];
count[ (tab[i]/exp)%10 ]--;
}
if((*comp).rfind("<",0) == 0){
for (i = 0; i < size; i++){
tab[i] = output[i];
swap_counter++;
if(!*stat){ fprintf(stderr, "przestawiam\n"); }
}
}else{
for (i = 0; i < size; i++){
tab[i] = output[size-i-1];
swap_counter++;
if(!*stat){ fprintf(stderr, "przestawiam\n"); }
}
}
}
void *radix_sort(int size, int *tab, string *comp, bool *stat) {
int m;
auto max = [tab, size](){
int m = tab[0];
for (int i = 1; i < size; i++) {
if (tab[i] > m)
m = tab[i];
}
return m;
};
m = max();
for (int exp = 1; m/exp > 0; exp *= 10)
countSort(tab, size, exp, comp, stat);
}
...
int main(){
int tab = (int *) malloc(n*sizeof(int));
for(int n = 100; n <=10000; n+=100){
generate_random_tab(tab, n);
radix_sort(sorted_tab, 0, n-1, ">=", 1);
free(tab);
}
}
Now I want to check and print out information of how much memory radix sort uses.
I want to do this to compare how much of memory different sorting algorithms uses.
How to achieve this?
I was given a hint to use a sysinfo() to analyze how system memory usage changes but I couldn't achieve constant results.
(I'm working on linux)
Your program has linear memory usage malloc(n*sizeof(int)) and int output[size]; --- one of them on heap, the other on stack, so basically you don't need to make run-time measurements as you can calculate it easily.
As you are on Linux, for more complicated cases there is e.g. massif tool in valgrind, but it is focused on heap measurements (which in normal cases in which you want to measure memory usage is enough, as stack is usually to small for serious amounts of data).
sysinfo only shows whole system memory, not individual process memory.
For process memory usage, you might try mallinfo, e.g.
struct mallinfo before = mallinfo();
// radix sort code
struct mallinfo after = mallinfo();
Now you may compare the various entries before and after your sorting code.
Be aware, that this doesn't include stack memory.
Although I don't know, how accurate these numbers are in a C++ context.
Testing a complete example
#include <malloc.h>
#include <stdio.h>
#define SHOW(m) printf(#m "=%d-%d\n", after.m, before.m)
int main()
{
struct mallinfo before = mallinfo();
void *p1 = malloc(1000000);
//int *p2 = new int[1000000];
struct mallinfo after = mallinfo();
SHOW(arena);
SHOW(ordblks);
SHOW(smblks);
SHOW(hblks);
SHOW(hblkhd);
SHOW(usmblks);
SHOW(fsmblks);
SHOW(uordblks);
SHOW(fordblks);
SHOW(keepcost);
return 0;
}
shows different values, depending on whether you use malloc
arena=135168-0
ordblks=1-1
smblks=0-0
hblks=1-0
hblkhd=1003520-0
usmblks=0-0
fsmblks=0-0
uordblks=656-0
fordblks=134512-0
keepcost=134512-0
or new
arena=135168-135168
ordblks=1-1
smblks=0-0
hblks=1-0
hblkhd=4001792-0
usmblks=0-0
fsmblks=0-0
uordblks=73376-73376
fordblks=61792-61792
keepcost=61792-61792
It looks like C++ (Ubuntu, GCC 9.2.1) does some preallocation, but the relevant number seems to be hblkhd (on my machine).
Since your only dynamic allocation is at the beginning of main, you must do the first mallinfo there. Testing only the radix sort code reveals, that there are no additional dynamic memory allocations.

In merge sort algorithm, will freeing the left and right sub arrays after the arrays have been merged make any difference in the space complexity?

In one of the tutorial videos for merge sort, it was mentioned that once the right and left sub arrays have to merged to the parent array, in order to reduce the space complexity we need to free the memory allocated for the left and right sub arrays. But whenever we come out of the function call, the local variable will be destroyed. Do correct me if I am wrong. So will the action of freeing the memory make any difference?
Here is the code that I wrote:
#include <iostream>
#include <bits/stdc++.h>
using namespace std;
void mergeArr(int *rarr, int *larr, int *arr, int rsize, int lsize) {
int i = 0, r = 0, l = 0;
while (r < rsize && l < lsize) {
if (rarr[r] < larr[l]) {
arr[i++] = rarr[r++];
} else {
arr[i++] = larr[l++];
}
}
while (r < rsize) {
arr[i++] = rarr[r++];
}
while (l < lsize) {
arr[i++] = larr[l++];
}
}
void mergeSort(int *arr, int length) {
if (length > 1) {
int l1 = length / 2;
int l2 = length - l1;
int rarr[l1], larr[l2];
for (int i = 0; i < l1; i++) {
rarr[i] = arr[i];
}
for (int i = l1; i < length; i++) {
larr[i - l1] = arr[i];
}
mergeSort(rarr, l1);
mergeSort(larr, l2);
mergeArr(rarr, larr, arr, l1, l2);
// will free(rarr); free(larr); make any difference in space complexity
}
}
int main() {
int arr[5] = { 1, 10, 2, 7, 5 };
mergeSort(arr, 5);
for (int i = 0; i < 5; i++)
cout << arr[i] << " ";
}
I have multiple things to say about this. More from a C++ pov:
int rarr[l1],larr[l2]; - this is illegal c++. This is just an extension provided by g++ and is not valid across other compilers. You should either do int* rarr = new int[l1]; or even better use an std::vector: std::vector<int> rarr(l1).
If you are doing the former (dynamic allocation using new i.e int* rarr = new int[l1]), you have to manage the memory on your own. So when you're done using it you have to delete it: delete[] rarr. Mind it, malloc and free are not c++, they are c. new and delete are c++ way of allocating/deallocating memory.
If you use a vector, c++ will handle the deletion/deallocation of memory so you need not worry.
Now coming back to your original question: whether or not an idea like this would improve your space complexity: the answer is NO. It won't.
Why? Think about the max temporary storage you're using. Check the first case of your recursion. Isn't the space that you're using O(N)? because larr and rarr will both be of size N/2. Moreover, the space complexity is O(N) assuming the temporary storage is being freed. If somehow the space is not freed, the space complexity will increase to O(N)+2*(N/2)+4*O(N/4).... which is O(Nlog2N) because each step of recursion is allocating some space which it is not freeeing.
In your implementation, the left and right arrays are defined with automatic storage, so deallocation is automatic when the function returns but it poses 2 problems:
a sufficiently large array will invoke undefined behavior because allocating too much space with automatic storage will cause a stack overflow.
variable sized arrays are not standard C++. You are relying on a compiler specific extension.
The maximum stack space used by your function is proportional to N, so the space complexity is O(N) as expected. You could allocate these arrays with new, and of course you would then have to deallocate them with delete otherwise you would have memory leaks and the amount of memory lost would be proportional to N*log2(N).
An alternative approach would use a temporary array, allocated at the initial call and passed to the recursive function.
Note also that the names for the left and right arrays are very confusing. rarr is actually to the left of larr!
Here is a modified version:
#include <iostream>
using namespace std;
void mergeArr(int *larr, int *rarr, int *arr, int lsize, int rsize) {
int i = 0, r = 0, l = 0;
while (l < lsize && r < rsize) {
if (larr[l] <= rarr[r]) {
arr[i++] = larr[l++];
} else {
arr[i++] = rarr[r++];
}
}
while (l < lsize) {
arr[i++] = larr[l++];
}
while (r < rsize) {
arr[i++] = rarr[r++];
}
}
void mergeSort(int *arr, int length) {
if (length > 1) {
int l1 = length / 2;
int l2 = length - l1;
int *larr = new int[l1];
int *rarr = new int[l2];
for (int i = 0; i < l1; i++) {
larr[i] = arr[i];
}
for (int i = l1; i < length; i++) {
rarr[i - l1] = arr[i];
}
mergeSort(larr, l1);
mergeSort(rarr, l2);
mergeArr(larr, rarr, arr, l1, l2);
delete[] larr;
delete[] rarr;
}
}
int main() {
int arr[] = { 1, 10, 2, 7, 5 };
int length = sizeof arr / sizeof *arr;
mergeSort(arr, length);
for (int i = 0; i < length; i++) {
cout << arr[i] << " ";
}
return 0;
}
Freeing temporary arrays does not influence on space complexity because we must consider maximum memory consumption - it is about size of initial array.
From the performance point of view, it seems reasonable to allocate temporary storage once in the beginning of sorting, reuse it at every stage, and free it after all the work is done.

Why deallocating heap memory is much slower than allocating it?

This is an empirical assumption (that allocating is faster then de-allocating).
This is also one of the reason, i guess, why heap based storages (like STL containers or else) choose to not return currently unused memory to the system (that is why shrink-to-fit idiom was born).
And we shouldn't confuse, of course, 'heap' memory with the 'heap'-like data structures.
So why de-allocation is slower?
Is it Windows-specific (i see it on Win 8.1) or OS independent?
Is there some C++ specific memory manager automatically involved on using 'new' / 'delete' or the whole mem. management is completely relies on the OS? (i know C++11 introduced some garbage-collection support, which i never used really, better relying on the old stack and static duration or self managed containers and RAII).
Also, in the code of the FOLLY string i saw using old C heap allocation / deallocation, is it faster then C++ 'new' / 'delete'?
P. S. please note that the question is not about virtual memory mechanics, i understand that user-space programs didn't use real mem. addresation.
The assertion that allocating memory is faster than deallocating it seemed a bit odd to me, so I tested it. I ran a test where I allocated 64MB of memory in 32-byte chunks (so 2M calls to new), and I tried deleting that memory in the same order it was allocated, and in a random order. I found that linear-order deallocation was about 3% faster than allocation, and that random deallocation was about 10% slower than linear allocation.
I then ran a test where I started with 64MB of allocated memory, and then 2M times either allocated new memory or deleted existing memory (at random). Here, I found that deallocation was about 4.3% slower than allocation.
So, it turns out you were correct - deallocation is slower than allocation (though I wouldn't call it "much" slower). I suspect this has simply to do with more random accesses, but I have no evidence for this other than that the linear deallocation was faster.
To answer some of your questions:
Is there some C++ specific memory manager automatically involved on using 'new' / 'delete'?
Yes. The OS has system calls which allocate pages of memory (typically 4KB chunks) to processes. It's the process' job to divide up those pages into objects. Try looking up the "GNU Memory Allocator."
I saw using old C heap allocation / deallocation, is it faster then C++ 'new' / 'delete'?
Most C++ new/delete implementations just call malloc and free under the hood. This is not required by the standard, however, so it's a good idea to always use the same allocation and deallocation function on any particular object.
I ran my tests with the native testing framework provided in Visual Studio 2015, on a Windows 10 64-bit machine (The tests were also 64-bit). Here's the code:
#include "stdafx.h"
#include "CppUnitTest.h"
using namespace Microsoft::VisualStudio::CppUnitTestFramework;
namespace AllocationSpeedTest
{
class Obj32 {
uint64_t a;
uint64_t b;
uint64_t c;
uint64_t d;
};
constexpr int len = 1024 * 1024 * 2;
Obj32* ptrs[len];
TEST_CLASS(UnitTest1)
{
public:
TEST_METHOD(Linear32Alloc)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
}
TEST_METHOD(Linear32AllocDealloc)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
for (int i = 0; i < len; ++i) {
delete ptrs[i];
}
}
TEST_METHOD(Random32AllocShuffle)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
srand(0);
for (int i = 0; i < len; ++i) {
int pos = (rand() % (len - i)) + i;
Obj32* temp = ptrs[i];
ptrs[i] = ptrs[pos];
ptrs[pos] = temp;
}
}
TEST_METHOD(Random32AllocShuffleDealloc)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
srand(0);
for (int i = 0; i < len; ++i) {
int pos = (rand() % (len - i)) + i;
Obj32* temp = ptrs[i];
ptrs[i] = ptrs[pos];
ptrs[pos] = temp;
}
for (int i = 0; i < len; ++i) {
delete ptrs[i];
}
}
TEST_METHOD(Mixed32Both)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
srand(0);
for (int i = 0; i < len; ++i) {
if (rand() % 2) {
ptrs[i] = new Obj32();
}
else {
delete ptrs[i];
}
}
}
TEST_METHOD(Mixed32Alloc)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
srand(0);
for (int i = 0; i < len; ++i) {
if (rand() % 2) {
ptrs[i] = new Obj32();
}
else {
//delete ptrs[i];
}
}
}
TEST_METHOD(Mixed32Dealloc)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
srand(0);
for (int i = 0; i < len; ++i) {
if (rand() % 2) {
//ptrs[i] = new Obj32();
}
else {
delete ptrs[i];
}
}
}
TEST_METHOD(Mixed32Neither)
{
for (int i = 0; i < len; ++i) {
ptrs[i] = new Obj32();
}
srand(0);
for (int i = 0; i < len; ++i) {
if (rand() % 2) {
//ptrs[i] = new Obj32();
}
else {
//delete ptrs[i];
}
}
}
};
}
And here are the raw results over several runs. All numbers are in milliseconds.
I had much the same idea as #Basile: I wondered whether your base assumption was actually (even close to) correct. Since you tagged the question C++, I wrote a quick benchmark in C++ instead.
#include <vector>
#include <iostream>
#include <numeric>
#include <chrono>
#include <iomanip>
#include <locale>
int main() {
std::cout.imbue(std::locale(""));
using namespace std::chrono;
using factor = microseconds;
auto const size = 2000;
std::vector<int *> allocs(size);
auto start = high_resolution_clock::now();
for (int i = 0; i < size; i++)
allocs[i] = new int[size];
auto stop = high_resolution_clock::now();
auto alloc_time = duration_cast<factor>(stop - start).count();
start = high_resolution_clock::now();
for (int i = 0; i < size; i++)
delete[] allocs[i];
stop = high_resolution_clock::now();
auto del_time = duration_cast<factor>(stop - start).count();
std::cout << std::left << std::setw(20) << "alloc time: " << alloc_time << " uS\n";
std::cout << std::left << std::setw(20) << "del time: " << del_time << " uS\n";
}
I also used VC++ on Windows instead of gcc on Linux. The result wasn't much different though: freeing the memory took substantially less time than allocating it did. Here are the results from three successive runs.
alloc time: 2,381 uS
del time: 1,429 uS
alloc time: 2,764 uS
del time: 1,592 uS
alloc time: 2,492 uS
del time: 1,442 uS
I'd warn, however, allocation and freeing is handled (primarily) by the standard library, so this could be different between one standard library and another (even when using the same compiler). I'd also note that it wouldn't surprise me if this were to change somewhat in multi-threaded code. Although it's not actually correct, there appear to be a few authors who are under the mis-apprehension that freeing in a multithreaded environment requires locking a heap for exclusive access. This can be avoided, but the means to do so isn't necessarily immediately obvious.
I am not sure of your observation. I wrote the following program (on Linux, hopefully you could port it to your system).
// public domain code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <errno.h>
#include <string.h>
#include <assert.h>
const unsigned possible_word_sizes[] = {
1, 2, 3, 4, 5,
8, 12, 16, 24,
32, 48, 64, 128,
256, 384, 2048
};
long long totalsize;
// return a calloc-ed array of nbchunks malloced zones of
// somehow random size
void **
malloc_chunks (int nbchunks)
{
const int nbsizes =
(int) (sizeof (possible_word_sizes)
/ sizeof (possible_word_sizes[0]));
void **ad = calloc (nbchunks, sizeof (void *));
if (!ad)
{
perror ("calloc chunks");
exit (EXIT_FAILURE);
};
for (int ix = 0; ix < nbchunks; ix++)
{
unsigned sizindex = random () % nbsizes;
unsigned size = possible_word_sizes[sizindex];
void *zon = malloc (size * sizeof (void *));
if (!zon)
{
fprintf (stderr,
"malloc#%d (%d words) failed (total %lld) %s\n",
ix, size, totalsize, strerror (errno));
exit (EXIT_FAILURE);
}
((int *) zon)[0] = ix;
totalsize += size;
ad[ix] = zon;
}
return ad;
}
void
free_chunks (void **chks, int nbchunks)
{
// first, free the two thirds of chunks in random order
for (int i = 0; 3 * i < 2 * nbchunks; i++)
{
int pix = random () % nbchunks;
if (chks[pix])
{
free (chks[pix]);
chks[pix] = NULL;
}
}
// then, free the rest in reverse order
for (int i = nbchunks - 1; i >= 0; i--)
if (chks[i])
{
free (chks[i]);
chks[i] = NULL;
}
}
int
main (int argc, char **argv)
{
assert (sizeof (int) <= sizeof (void *));
int nbchunks = (argc > 1) ? atoi (argv[1]) : 32768;
if (nbchunks < 128)
nbchunks = 128;
srandom (time (NULL));
printf ("nbchunks=%d\n", nbchunks);
void **chks = malloc_chunks (nbchunks);
clock_t clomall = clock ();
printf ("clomall=%ld totalsize=%lld words\n",
(long) clomall, totalsize);
free_chunks (chks, nbchunks);
clock_t clofree = clock ();
printf ("clofree=%ld\n", (long) clofree);
return 0;
}
I compiled it with gcc -O2 -Wall mf.c -o mf on my Debian/Sid/x86-64 (i3770k, 16Gb). I run time ./mf 100000 and got:
nbchunks=100000
clomall=54162 totalsize=19115681 words
clofree=83895
./mf 100000 0.02s user 0.06s system 95% cpu 0.089 total
on my system clock gives CPU microseconds. If the call to random is negligible (and I don't know if it is) w.r.t. malloc & free time, I tend to disagree with your observations. free seems to be twice as fast as malloc. My gcc is 6.1, my libc is Glibc 2.22.
Please take time to compile the above benchmark on your system and report the timings.
FWIW, I took Jerry's code and
g++ -O3 -march=native jerry.cc -o jerry
time ./jerry; time ./jerry; time ./jerry
gives
alloc time: 1940516
del time: 602203
./jerry 0.00s user 0.01s system 68% cpu 0.016 total
alloc time: 1893057
del time: 558399
./jerry 0.00s user 0.01s system 68% cpu 0.014 total
alloc time: 1818884
del time: 527618
./jerry 0.00s user 0.01s system 70% cpu 0.014 total
When you allocate small memory blocks, the block size you specify maps directly to a suballocator for that size, which is commonly represented as a "slab" of memory containing same size records, to avoid memory fragmentation. This can be very fast, similar to an array access. But freeing such blocks is not so straight forward, because you are passing a pointer to memory of unknown size, requiring additional work to determine what slab it belongs to, before the block can be returned to its proper place.
When you allocate large blocks of virtual memory, a memory page range is set up in your process space without actually mapping any physical memory to it, and that requires very little work to accomplish. But freeing such large blocks can require much more work, because the pointer freed must first be matched to the page tables for that range, followed by walking through all of the page entries for the memory range that it spans, and releasing all of the physical memory pages assigned to that range by the intervening page faults.
Of course, the details of this will vary depending on the implementation being used, but the principles remain much the same: memory allocation of a known block size requires less effort than releasing a pointer to a memory block of unknown size. My knowledge of this comes directly from my experience developing high-performance commercial grade RAII memory allocators.
I should also point out that since every heap allocation has a matching and corresponding release, this pair of operations represents a single allocation cycle, i.e. as the two sides of one coin. Together, their execution time can be accurately measured, but separately such measurement is difficult to pin down, as it varies widely depending on block size, previous activity across similar sizes, caching and other operational considerations. But in the end, allocate/free differences may not much matter, since you don't do one without the other.
The problem here is heap fragmentation. Programs written in languages with explicit pointer arithmetic have no realistic ways of defragmenting heap.
If your heap is fragmented, you can't return memory to OS. OS, barring virtual memory, depends on brk(2)-like mechanism - i.e. you set an upper bound for all memory addresses you'll refer to. But when you have even one buffer allocated and still in use near existing boundary, you can't return memory to OS explicitly. Doesn't matter if 99% of all the memory in your program is freed.
Dealocation doesn't have to be slower than allocation. But the fact that you have manual deallocation with heap fragmenting makes allocation slower and more complex.
GCs fight this by compactifying heap. This way, allocation is just incrementing pointer for them, and deallocation is not needed for bulk of objects.

Keeping track of a list of integers in C, and sampling from the list

Let's just say that at the simplest, I have a function that generates random integers. Every number generated I want to tack onto the end of the list.
Then, at the end, I want to sample a random numbers from this list.
I'm quite newb at C, manoeuvring around singly-linked lists and pointer arguments, so any advice would be welcome.
edit: I have no problem switching to C++ if there are structures that would help me out. I just need lists and sampling.
Here's a go at it:
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <time.h>
int main() {
srand(time(NULL));
int size = 0;
int capacity = 4;
int *data = malloc(capacity * sizeof(int));
for (int i=0; i < 1000; ++i) {
// Allocate more space
if (size == capacity) {
capacity *= 2;
data = realloc(data, capacity * sizeof(int));
if (data == NULL)
exit(-1);
}
// Append a random number
data[size] = rand();
size++;
}
// Choose a random number (poorly)
printf("%d\n", data[rand() % size]);
}
Now, why is this code terrible?
rand() is terrible. It's allowed to return random numbers in the range [0, 32768). Or larger if your implementation supports it.
rand() is allowed to use relatively terrible algorithms to generate random numbers.
rand() % size is not necessarily uniform.
time(NULL) is a terrible seed. time(NULL) returns the current time with a precision of seconds. So if I run this code twice quickly, it will often return the same result.
So if you know how many elements you will be putting into your list
then you can just do this
double array[100] // or whateversize you want
for(int i = 0; i < 100; i++){
array[i] = yourRandomNumber;
}
double index = Rand(0,100);
return array[index];
Keep in mind I did not do the correct sytax for 'yourRandomNumber' and the Rand function. This is just an outline of what you can do.
Your second option is to create a mutable list if you do not know the size of the list before hand. I.e basically a C++ vector. This has to be coded by hand.
typedef struct
{
size_t elementsAllocated;
size_t elementsUsed;
int* buffer;
} vector;
vector* push_front(vector* v, int item)
{
if (elementsUsed == elementsAllocated)
{
// Time to grow the buffer.
int elementsAllocated = v->elementsAllocated * 2;
if (elementsAllocated <= v->elementsAllocated)
{
abort(); // Overflow occurred.
}
int* buffer = realloc(v->buffer, elementsAllocated * sizeof *buffer);
if (buffer == NULL)
{
abort();
}
// Shift the existing data over.
memmove(buffer + elementsAllocated - v->elementsUsed,
buffer + v->elementsAllocated - v->elementsUsed,
v->elementsUsed * sizeof *buffer);
v->buffer = buffer;
v->elementsAllocated = elementsAllocated;
}
// Prepend the new item.
int* p = v->buffer + v->elementsAllocated - v->elementsUsed - 1;
*p = item;
v->elementsUsed++;
return p;
}
Using std::vectors in c++
#include <iostream>
#include <vector>
int main(){
std::vector<double> myVec;
while(cin){ // this lines syntax is not correct because I don't know how you are
// getting your values
myVec.push_back(value);
}
std::cout << myVec[Rand(0, myVec.size())] << std::endl;
}
All the syntax is not correct I.e cin and rand because I don't know how you are gettting your inputs and Rand can be implemented differently. BUT the rest should be straight forward.
If you know in advance at least an upper bound on the count of points you may accept, then you can use a variation on #Jay's answer. Otherwise, you need some kind of dynamic allocation. You could use a linked list and allocate nodes as you go, but it might be faster and easier to simply malloc() an array and realloc() if you find you need more space:
#include <stdlib.h>
double select_one() {
int capacity = 100;
int count;
double rval;
double *accepted = malloc(capacity * sizeof(double));
// accepted == NULL on malloc() failure
count = 0;
while (/* more points to test */) {
double point = /* next point to test */;
if (/* point is accepted */) {
if (count >= capacity) {
/* expand the array */
double *temp;
capacity = 3 * capacity / 2;
temp = realloc(accepted, capacity * sizeof(double));
if (temp == NULL) {
/* need to handle allocation failure to avoid a memory leak */
} else {
accepted = temp;
}
}
accepted[count++] = point;
}
}
/* Note: not perfectly uniform because of the modulus */
rval = accepted[rand() % count];
/* clean up the allocated memory, AFTER selecting the value */
free(accepted);
return rval;
}

Stack overflow C++

This is my code. When I access dtr array in initImg function it gives a stack overflow exception. What might be the reason?
#define W 1000
#define H 1000
#define MAX 100000
void initImg(int img[], float dtr[])
{
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
img[i*W+j]=255;
for(int j=0;j<H;j++)
{
img[j] = 0;
img[W*(W-1)+j] = 0;
}
for(int i=0;i<W;i++)
{
img[i*W] = 0;
img[i*W+H-1] = 0;
}
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
{
if(img[i*W+j]==0)
dtr[i*W+j] = 0; // <------here
else
dtr[i*W+j] = MAX; // <------here
}
}
int main()
{
int image[W*H];
float dtr[W*H];
initImg(image,dtr);
return 0;
}
This:
int image[W*H];
float dtr[W*H];
Creates each a 4 * 1000 * 1000 ~ 4 MB array into the stack. The stack space is limited, and usually it's less than 4 MB. Don't do that, create the arrays in the heap using new.
int *image = new int[W*H];
float *dtr = new float[W*H];
Your stack probably isn't big enough to hold a million ints and a million floats (8MB). So as soon as you try to access beyond your stack size, your operating system throws you an error. Objects or arrays above a certain size need to be allocated on the heap - preferably using a self-managing self-bounds-checking class such as std::vector - the specific size depends on your implementation.
In addition to the stack overrun, you have another problem -- one which is masked by your definitions of W and H.
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
{
if(img[i*W+j]==0)
dtr[i*W+j] = 0; // <------here
else
dtr[i*W+j] = MAX; // <------here
}
Your i loop should count from 0 to H-1, rather than W-1 (and the j loop should swap as well). Otherwise your code will only work correctly if W==H. If WH you will overrun your buffers.
This same problem exists elsewhere in your code sample as well.
You're creating giant arrays on the stack. Just use std::vector instead:
std::vector<int> image(W*H);
std::vector<float> dtr(W*H);
Your stack is full. You can allocate memory in heap or increase the stack memory. From what I know the maximum size is about 8MB, but this is not a very good idea. The best solution is to use heap allocation or some containers (vector) available in std.
You will eventually get to
dtr[W*W+j] = 0;   <------here
Which is much more than you have allocated.
Your compiler will define the stack size. A way to get around this is to dynamically allocate your arrays using std::vector array_one(W*H).
You are trying to allocate memory from stack. the maximum memory which can be allocated using stack is complier dependent.
So try something like this to avoid this kind of exception.
#include <stdlib.h>
#define W 1000
#define H 1000
#define MAX 100000
void initImg(int img[], float dtr[])
{
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
img[i*W+j]=255;
for(int j=0;j<H;j++)
{
img[j] = 0;
img[W*(W-1)+j] = 0;
}
for(int i=0;i<W;i++)
{
img[i*W] = 0;
img[i*W+H-1] = 0;
}
for(int i=0;i<W;i++)
for(int j=0;j<H;j++)
{
if(img[i*W+j]==0)
dtr[i*W+j] = 0; // <------here
else
dtr[i*W+j] = MAX; // <------here
}
}
int main()
{
int *image = (int*)malloc(4*W*H); //Malloc the memory....(Allocated from Heap..)
float *dtr = (float*)malloc(4*W*H);
if(image && dtr) //If none of the ptr is NULL. Means memory is allocated...
{
initImg(image,dtr);
}
return 0;
}
You can use new as well instead of using malloc to allocate memory from heap...