I am testing the following code with Visual Studio 2019 Diagnostic Tools.
It says that memory consumption is 55 KB instead of the 20 KB I previously calculated. As you can see, it is much more memory than I thought and I don't know why.
What I want to know is: what is happening or how could I calculate the correct memory consumption? (since I don't always have the "Diagnostic Tools" at hand.)
#include <iostream>
#define TEST_SIZE_ARR 1000
struct Node
{
Node(int)
: id(0),
time(0),
next(0),
back(0)
{}
int id;
int time;
Node* next;
Node* back;
};
int main()
{
int counter = 0;
std::cout << "= Node =" << std::endl;
std::cout << "Array size: " << sizeof(Node*) << " * " << TEST_SIZE_ARR << " = " << sizeof(Node*) * TEST_SIZE_ARR << std::endl;
std::cout << "Element size: " << sizeof(Node) << " * " << TEST_SIZE_ARR << " = " << sizeof(Node) * TEST_SIZE_ARR << std::endl;
Node **dataArr = new Node*[TEST_SIZE_ARR]; //break point
for (counter = 0; counter < TEST_SIZE_ARR; counter++) //break point
{
dataArr[counter] = new Node(counter);
}
counter++; //break point
return 0;
}
Console:
Array size: 4 * 1000 = 4000
Element size: 16 * 1000 = 16000
Diagnostic tool:
Array size: 3.94 KB
Element size: 50.78 KB
Your diagnostic tool is measuring an allocation overhead of 36 bytes per allocation.
50.78 KB is 52000 bytes, or 52 bytes per element allocation. Minus 16 is 36 bytes.
4000 bytes with 36 bytes overhead is 4036 bytes, which is 3.94 KB.
The heap has to track which blocks of memory are in use and which are not. Possibly your diagnostic tool has additional overhead and self measures stupidly; I don't know.
In your case, it appears to be adding an additional 36 bytes per value returned from new. Your system seems to be 32 bit pointers (ick), so that is enough room for 9 pointers. You probably want to include the size of each allocation in its block, which is 4 bytes on a 32 bit system. That leaves 8 pointers.
What your heap is using those 8 pointers for, I don't know. Maybe a skip list, or a red black tree, or even some buffers around each allocation to detect memory corruption because you profiled a debug build and heap.
In general, small heap allocations are inefficient and a bad idea. It is one of the many reasons why block containers, like std vector, are good idea, and node containers are iffy.
Related
This question already has answers here:
What are the differences between virtual memory and physical memory?
(6 answers)
Closed 1 year ago.
I'am currently learning c++. During some heap allocation exercises I tried to generate a bad allocation. My physical memory is about 38GB. Why is it possible to allocate such a high amount of memory? Is my basic calculation of bytes wrong? I don't get it. Can anyone give me a hint please? Thx.
#include <iostream>
int main(int argc, char **argv){
const size_t MAXLOOPS {1'000'000'000};
const size_t NUMINTS {2'000'000'000};
int* p_memory {nullptr};
std::cout << "Starting program heap_overflow.cpp" << std::endl;
std::cout << "Max Loops: " << MAXLOOPS << std::endl;
std::cout << "Number of Int per allocation: " << NUMINTS << std::endl;
for(size_t loop=0; loop<MAXLOOPS; ++loop){
std::cout << "Trying to allocate new heap in loop " << loop
<< ". current allocated mem = " << (NUMINTS * loop * sizeof(int))
<< " Bytes." << std::endl;
p_memory = new (std::nothrow) int[NUMINTS];
if (nullptr != p_memory)
std::cout << "Mem Allocation ok." << std::endl;
else {
std::cout << "Mem Allocation FAILED!." << std::endl;
break;
}
}
return 0;
}
Output:
...
Trying to allocate new heap in loop 17590. current allocated mem = 140720000000000 Bytes.
Mem Allocation ok.
Trying to allocate new heap in loop 17591. current allocated mem = 140728000000000 Bytes.
Mem Allocation FAILED!.
Many (but not all) virtual-memory-capable operating systems use a concept known as demand-paging - when you allocate memory, you perform bookkeeping allowing you to use that memory. However, you do not reserve actual pages of physical memory at that time.1
When you actually attempt to read or write to any byte within a page of that allocated memory, a page fault occurs. The fault handler detects that the page has been pre-allocated but not demand-paged in. It then reserves a page of physical memory, and sets up the PTE before returning control to the program.
If you attempt to write into the memory you allocate right after each allocation, you may find that you run out of physical memory much faster.
Notes:
1 It is possible to have an OS implementation that supports virtual memory but immediately allocates physical memory to back virtual allocations; virtual memory is a necessary, but not sufficient condition, to replicate your experiment.
One comment mentions swapping to disk. This is likely a red herring - the pagefile size is typically comparable in size to memory, and the total allocation was around 140 TB which is much larger than individual disks. It's also ineffective to page-to-disk empty, untouched pages.
Okey, I have a weird (in my opinion) behaviour of my program, which is now reduced to just reading 3 arrays from pretty large (approximately 24GB and 48 GB) binary files. The structure of those files is pretty simple, they contain a small header, and 3 arrays after: of type int, int and float, all 3 of size N, where N is very large: 2147483648 for 28 GB file and 4294967296 for 48 GB one.
To track down the memory consumption, I'm using a simple function based on Linux sysinfo, to detect how much free memory I have on each stage of my program (for example after I allocated the arrays to store data and while reading the file). This is the code of the function:
#include <sys/sysinfo.h>
size_t get_free_memory_in_MB()
{
struct sysinfo info;
sysinfo(&info);
return info.freeram / (1024 * 1024);
}
Now straight to the problem: the strange part is that after reading each of 3 arrays from the file using standard C fread function or C++ read function (doesn't matter at all), and checking how much free memory we have after the read, I see that the amount of free memory is heavily reduced (approximately by edges_count * sizeof(int) for the next example).
fread(src_ids, sizeof(int), edges_count, graph_file);
cout << "1 test: " << get_free_memory_in_MB() << " MB" << endl;
So basically, after reading the whole file my memory consumption according to sysinfo is almost 2x times larger than expected. To illustrate the problem better, I provide the code of the whole function together with it's output; please, read it, it's very small and will illustrate the problem much better.
bool load_from_edges_list_bin_file(string _file_name)
{
bool directed = true;
int vertices_count = 1;
long long int edges_count = 0;
// open the file
FILE *graph_file = fopen(_file_name.c_str(), "r");
if(graph_file == NULL)
return false;
// just reading a simple header here
fread(reinterpret_cast<char*>(&directed), sizeof(bool), 1, graph_file);
fread(reinterpret_cast<char*>(&vertices_count), sizeof(int), 1, graph_file);
fread(reinterpret_cast<char*>(&edges_count), sizeof(long long), 1, graph_file);
cout << "edges count: " << edges_count << endl;
cout << "Before graph alloc free memory: " << get_free_memory_in_MB() << " MB" << endl;
// allocate the arrays to store the result
int *src_ids = new int[edges_count];
int *dst_ids = new int[edges_count];
_TEdgeWeight *weights = new _TEdgeWeight[edges_count];
cout << "After graph alloc free memory: " << get_free_memory_in_MB() << " MB" << endl;
memset(src_ids, 0, edges_count * sizeof(int));
memset(dst_ids, 0, edges_count * sizeof(int));
memset(weights, 0, edges_count * sizeof(_TEdgeWeight));
cout << "After memset: " << get_free_memory_in_MB() << " MB" << endl;
// add edges from file
fread(src_ids, sizeof(int), edges_count, graph_file);
cout << "1 test: " << get_free_memory_in_MB() << " MB" << endl;
fread(dst_ids, sizeof(int), edges_count, graph_file);
cout << "2 test: " << get_free_memory_in_MB() << " MB" << endl;
fread(weights, sizeof(_TEdgeWeight), edges_count, graph_file);
cout << "3 test: " << get_free_memory_in_MB() << " MB" << endl;
cout << "After actual load: " << get_free_memory_in_MB() << " MB" << endl;
delete []src_ids;
delete []dst_ids;
delete []weights;
cout << "After we removed the graph load: " << get_free_memory_in_MB() << " MB" << endl;
fclose(graph_file);
cout << "After we closed the file: " << get_free_memory_in_MB() << " MB" << endl;
return true;
}
So, nothing complicated. Straight to the output (with some comments form me after //). First, for 24GB file:
Loading graph...
edges count: 2147483648
Before graph alloc free memory: 91480 MB
After graph alloc free memory: 91480 MB // allocated memory here, but noting changed, why?
After memset: 66857 MB // ok, we put some data into the memory (memset) and consumed exactly 24 GB, seems correct
1 test: 57658 MB // first read and we have lost 9 GB...
2 test: 48409 MB // -9 GB again...
3 test: 39161 MB // and once more...
After actual load: 39161 MB // we lost in total 27 GB during the reads. How???
After we removed the graph load: 63783 MB // removed the arrays from memory and freed the memory we have allocated
// 24 GB freed, but 27 are still consumed somewhere
After we closed the file: 63788 MB // closing the file doesn't help
Complete!
After we quit the function: 63788 MB // quitting the function doesn't help too.
Similar for 48GB file:
edges count: 4294967296
Before graph alloc free memory: 91485 MB
After graph alloc free memory: 91485 MB
After memset: 42236 MB
1 test: 23784 MB
2 test: 5280 MB
3 test: 490 MB
After actual load: 490 MB
After we removed the graph load: 49737 MB
After we closed the file: 49741 MB
Complete!
After we quit the function: 49741 MB
So, what is happening inside my program?
1) Why so much memory is lost during the reads (both using fread from C and file streams from c++)?
2) Why closing the file doesn't free the memory consumed?
3) Maybe sysinfo is showing me incorrect info?
4) Can this problem be connected to memory fragmentation?
By the way, I'm launching my program on a supercomputer node, on which I have exclusive access (so other people can't influence it), and where are no side-application which can influence my program.
Thank you for reading this!
This is almost certainly the disk (/page) cache. When you read a file the operating system stores some or all of the contents in memory, thus decreasing the amount of free memory. This is to optimise future reads.
This however does not mean the memory is either used by the process or otherwise unavailable. If/when the memory is needed then it will be freed by the OS and made available.
You should be able to confirm this by tracking the value of the bufferram parameter in the sysinfo structure (https://www.systutorials.com/docs/linux/man/2-sysinfo/), or by looking at the output of the free -m command before and after running your program.
For more detailed information on this, see the following answer: https://superuser.com/questions/980820/what-is-the-difference-between-memfree-and-memavailable-in-proc-meminfo
I am using Ubuntu 14.04 64-bit. Here is my C++ code to see how memory is used.
int main() {
int **ptr;
ptr = new int* [2];
cout << &ptr << " -> " << ptr << endl;
for (int r = 1; r <= 2; r++) {
ptr[r-1] = new int [2 * r];
cout << &ptr[r-1] << " -> " << ptr[r-1] << endl;
for (int c = 0; c < 2 * r; c++) {
ptr[r-1][c] = r * c;
cout << &ptr[r-1][c] << " -> " << ptr[r-1][c] << endl;
}
}
return 0;
}
Here is my output:
0x7fff09faf018 -> 0x1195010
0x1195010 -> 0x1195030
0x1195030 -> 0
0x1195034 -> 1
0x1195018 -> 0x1195050
0x1195050 -> 0
0x1195054 -> 2
0x1195058 -> 4
0x119505c -> 6
I expected the OS would allocate memory contiguously. So ptr[0][0] would be at 0x1195020 instead of 0x1195030!? What does OS use at 0x1195020 - 0x119502F, 0x1195038 - 0x0x119504F for?
Because:
Some space at the beginning and end of each block of allocated memory is often used for bookkeeping. (In particular, many allocators find it useful to store the size of the preceding/following blocks, or pointers to them, around there.)
The memory allocator may "round up" the size of an allocated block to make things easier for it. For instance, an allocation of 7 bytes will likely be rounded up to 8 bytes, if not even 16 or 32.
Blocks of memory may already be available in noncontiguous locations. (Keep in mind that the C runtime may have been making some memory allocations of its own before main() even runs.)
The allocator may have a plan in mind for laying out memory which would be ruined by putting the next block at the "next" address. (It may, for instance, have reserved that memory for allocations of a particular size.)
Why should it? There are no guarantees. Allocated memory could end up anywhere. (Well, almost.) Don't make any assumptions, just let memory go wherever the allocator says it'll go, and you'll be fine.
I need to store a dynamic array of bits.
The C++ reference page on vector< bool > has the following information:
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is stored in a single bit.
How do I make sure that my program that uses vector<bool> does in fact store bits in a vector instead of boolean values (bytes)?
Don't try to do that. Instead, use boost::dynamic_bitset which clearly indicates what you actually want. The vector<bool> optimization actually creates a number of possibilities for bugs, for example when using iterators (because it usually returns a proxy object).
Well, you can always look into the header files that come with your compiler. Since STL containers are almost exclusively template classes, most if not all parts of the implementation will be visible in the headers.
Maybe looking at a vector object in a debugger can also be helpful.
Note: You should also be aware that vector<bool> is meanwhile rather frowned upon by the C++ community, and that this optimization is for size, not for speed:
https://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=98
It may be possible to check this at compile time, by checking the return type of the non-const version of vector<bool>::operator[]: An implementation that stores its values as bits will have to return a proxy reference class rather than a bool&.
There's really nothing to check here at all. The specialization of vector<bool> to store bits instead of larger objects is required by the standard. §23.2.5: "To optimize space allocation, a specialization of vector for bool elements is provided:".
I suppose from some viewpoint, what you've quoted is at least sort of correct. Since there's essentially nobody to certify conformance of a compiler, and essentially no compiler that even attempts to fulfill all conformance requirements, a compiler could choose to ignore this requirement as well.
I don't know of any compiler that does so though -- and if anybody did, I'd guess it would probably be pretty well known. There have been pretty heated discussions at times about removing the vector<bool> specialization, so if somebody had real-life examples of how much better (or worse) that made things, I suspect we'd have heard about it.
Edit: In C++11, the requirements for std::vector<bool> have been moved to §23.3.7. More importantly, the wording has been changed, to specify that a packed representation where each bool is stored as a single bit instead of a contiguous allocation of bool values is now only a recommendation.
At least IMO, this makes little real difference. As far as I know, all real implementations still use a packed representation, so even though packed storage is no longer theoretically guaranteed, it happens in practice anyway.
This program kind of prooves it.
#include <vector>
#include <iostream>
template <typename T>
void showSize() {
std::vector<T> myvec;
size_t capacity = myvec.capacity();
std::cout << "capacity: " << myvec.capacity() << std::endl;
std::cout << "size: " << myvec.size() << std::endl;
while (myvec.capacity() < 1024) {
while (myvec.capacity() == capacity) {
myvec.push_back(T());
}
capacity = myvec.capacity();
std::cout << "capacity: " << myvec.capacity() << std::endl;
std::cout << "size: " << myvec.size() << std::endl;
}
}
int main(int, char**) {
std::cout << std::endl << std::endl;
std::cout << "*********************" << std::endl << std::endl;
std::cout << "Booleans: " << std::endl << std::endl;
showSize<bool>();
std::cout << std::endl << std::endl;
std::cout << "*********************" << std::endl << std::endl;
std::cout << "Chars: " << std::endl << std::endl;
showSize<char>();
}
output:
*********************
Booleans:
capacity: 0
size: 0
capacity: 64
size: 1
capacity: 128
size: 65
capacity: 256
size: 129
capacity: 512
size: 257
capacity: 1024
size: 513
*********************
Chars:
capacity: 0
size: 0
capacity: 1
size: 1
capacity: 2
size: 2
capacity: 4
size: 3
capacity: 8
size: 5
capacity: 16
size: 9
capacity: 32
size: 17
capacity: 64
size: 33
capacity: 128
size: 65
capacity: 256
size: 129
capacity: 512
size: 257
capacity: 1024
size: 513
So the key is that the capacity for bools increases 64 entries at a time (size of int or my machine). This hints that it's just reserving just 8 bytes at a time.
Create a huge vector<bool> and look at the memory usage of the program.
Or simply check out the source code - you can peek at the vector header.
I started noticing that sometimes when deallocating memory in some of my programs, they would inexplicably crash. I began narrowing down the culprit and have come up with an example that illustrates a case that I am having difficulty understanding:
#include <iostream>
#include <stdlib.h>
using namespace std;
int main() {
char *tmp = (char*)malloc(16);
char *tmp2 = (char*)malloc(16);
long address = reinterpret_cast<long>(tmp);
long address2 = reinterpret_cast<long>(tmp2);
cout << "tmp = " << address << "\n";
cout << "tmp2 = " << address2 << "\n";
memset(tmp, 1, 16);
memset(tmp2, 1, 16);
char startBytes[4] = {0};
char endBytes[4] = {0};
memcpy(startBytes, tmp - 4, 4);
memcpy(endBytes, tmp + 16, 4);
cout << "Start: " << static_cast<int>(startBytes[0]) << " " << static_cast<int>(startBytes[1]) << " " << static_cast<int>(startBytes[2]) << " " << static_cast<int>(startBytes[3]) << "\n";
cout << "End: " << static_cast<int>(endBytes[0]) << " " << static_cast<int>(endBytes[1]) << " " << static_cast<int>(endBytes[2]) << " " << static_cast<int>(endBytes[3]) << "\n";
cout << "---------------\n";
free(tmp);
memcpy(startBytes, tmp - 4, 4);
memcpy(endBytes, tmp + 16, 4);
cout << "Start: " << static_cast<int>(startBytes[0]) << " " << static_cast<int>(startBytes[1]) << " " << static_cast<int>(startBytes[2]) << " " << static_cast<int>(startBytes[3]) << "\n";
cout << "End: " << static_cast<int>(endBytes[0]) << " " << static_cast<int>(endBytes[1]) << " " << static_cast<int>(endBytes[2]) << " " << static_cast<int>(endBytes[3]) << "\n";
free(tmp2);
return 0;
}
Here is the output that I am seeing:
tmp = 8795380
tmp2 = 8795400
Start: 16 0 0 0
End: 16 0 0 0
---------------
Start: 17 0 0 0
End: 18 0 0 0
I am using Borland's free compiler. I am aware that the header bytes that I am looking at are implementation specific, and that things like "reinterpret_cast" are bad practice. The question I am merely looking to find an answer to is: why does the first byte of "End" change from 16 to 18?
The 4 bytes that are considered "end" are 16 bytes after tmp, which are 4 bytes before tmp2. They are tmp2's header - why does a call to free() on tmp affect this place in memory?
I have tried the same example using new [] and delete [] to create/delete tmp and tmp2 and the same results occur.
Any information or help in understanding why this particular place in memory is being affected would be much appreciated.
You will have to ask your libc implementation why it changes. In any case, why does it matter? This is a memory area that libc has not allocated to you, and may be using to maintain its own data structures or consistency checks, or may not be using at all.
Basically you are looking at memory you didn't allocate. You can't make any supposition on what happens to the memory outside what you requested (ie the 16 bytes you allocated). There is nothing abnormal going on.
The runtime and compilers are free to do whatever they want to do with them so you should not use them in your programs. The runtime probably change the values of those bytes to keep track of its internal state.
Deallocating memory is very unlikely to crash a program. On the other hand, accessing memory you have deallocated like in your sample is big programming mistake that is likely to do so.
A good way to avoid this is to set any pointers you free to NULL. Doing so you'll force your program to crash when accessing freed variables.
It's possible that the act of removing an allocated element from the heap modifies other heap nodes, or that the implementation reserves one or more bytes of headers for use as guard bytes from previous allocations.
The memory manager must remember for example what is the size of the memory block that has been allocated with malloc. There are different ways, but probably the simplest one is to just allocate 4 bytes more than the size requested in the call and store the size value just before the pointer returned to the caller.
The implementation of free can then subtract 4 bytes from the passed pointer to get a pointer to where the size has been stored and then can link the block (for example) to a list of free reusable blocks of that size (may be using again those 4 bytes to store the link to next block).
You are not supposed to change or even look at bytes before/after the area you have allocated. The result of accessing, even just for reading, memory that you didn't allocate is Undefined Behavior (and yes, you really can get a program to really crash or behave crazily just because of reading memory that wasn't allocated).