I have a program (full code here) that is exiting around the 46000th iteration:
{
PROCESSER<MONO_CONT> processer;
c_start = std::clock();
for (unsigned long long i = 0; i < iterations; i++) {
BloombergLP::bdlma::BufferedSequentialAllocatoralloc(pool, sizeof(pool));
MONO_CONT* container = new(alloc) MONO_CONT(&alloc);
container->reserve(elements);
processer(container, elements);
}
c_end = std::clock();
std::cout << (c_end - c_start) * 1.0 / CLOCKS_PER_SEC << " ";
}
In this case, MONO_CONT is a vector<string, scoped_allocator_adaptor<alloc_adaptor<BloombergLP::bdlma::BufferedSequentialAllocator>>>.
My understanding is that the scoped_allocator_adaptor would make sure that the supplied allocator would be used for allocations for the strings being passed in, thus ensuring the strings would be deallocated at the end of each loop iteration (avoiding #1201ProgramAlarm's suggestion for the problem). The alloc_adapter is just a wrapper to make Bloomberg allocators conform to the proper interface.
PROCESSER is the following templated functor that just performs some basic operations on the templated container, MONO_CONT:
template<typename DS2>
struct process_DS2 {
void operator() (DS2 *ds2, size_t elements) {
escape(ds2);
for (size_t i = 0; i < elements; i++) {
ds2->emplace_back(&random_data[random_positions[i]], random_lengths[i]);
}
clobber();
}
};
Note that escape and clobber are just some magic that do nothing other than defeat the optimizer (see this talk if you're interested). random_data is just an array of chars containing garbage. random_positions defines valid indices into random_data. random_lengths defines a valid string length (does not go off the end of the garbage data) starting from the corresponding position in random_positions.
I have similar code that runs the exact same number of iterations, and does not fail:
{
PROCESSER<MONO_CONT> processer;
c_start = std::clock();
for (unsigned long long i = 0; i < iterations; i++) {
BloombergLP::bdlma::BufferedSequentialAllocator alloc(pool, sizeof(pool));
MONO_CONT container(&alloc);
container.reserve(elements);
processer(&container, elements);
}
c_end = std::clock();
std::cout << (c_end - c_start) * 1.0 / CLOCKS_PER_SEC << " ";
}
The main difference between the two snippets is that in the first, I'm newing the container into the allocator, and then passing the allocator to the container, relying on on the allocator's destruction to deallocate all the memory of the container (without having to actually call the destructor of the container itself). In the second snippet I'm allowing the more natural destruction of the container by going out of scope at the end of each iteration of the loop.
I'm building this with Clang, running in a Docker container on Debian. An suggestions on what the issue could be or how I could start debugging this?
While you're relying on the allocator's destruction to deallocate the memory allocated for container, this will not free up memory used by the contained strings, which will not be using the allocator for the vector but will be using the global heap (new). When the program runs out of memory it exits without reporting anything, possibly because it doesn't have enough memory available to do so.
In your second version container is destroyed which will free up the allocated strings when the vector is destroyed.
As far as how to debug something like this, the usual advice of "try stepping thru it in the debugger" is a start. If you run attached in a debugger it might break when the std::bad_alloc exception is created or thrown. And you can monitor the process's memory usage.
Related
Suppose I have a forever loop to create hashmap:
void createMap() {
map<int, int> mymap;
for (int i = 0; i < INT_MAX; i++) {
mymap[i] = i;
}
mymap.clear(); // <-- this line doesn't seem to make a difference in memory growth
}
int main (void) {
while (1) {
createMap();
}
return 0;
}
I watched the code run and on MacOS, watching the Activity Monitor, the application keeps growing the memory usage with or without the mymap.clear() at end of the createMap() function.
Shouldn't memory usage be constant for the case where mymap.clear() is used?
What's the general recommendation for using STL data containers? Need to .clear() before end of function?
I asked in another forum, the folks there helped me understand the answer. It turns out, I didn't wait long enough to exit createMap function nor do I have enough memory to sustain this program.
It takes INT_MAX=2147483647 elements to be created, and for each map = 24 bytes element of pair<int, int> = 8 bytes.
Total minimum memory = 2.147483647^9 * 8 + 24 = 17179869200 bytes ~= 17.2 GB.
I reduced the size of the elements and tested both with and without .clear() the program grew and reduce in size accordingly.
The container you create is bound to the scope of your function. If the function returns, its lifetime ends. And as std::map owns its data, the memory it allocates is freed upon destruction.
Your code hence constantly allocates and frees the same amount of memory. Memory consumption is hence constant, although the exact memory locations will probably differ. This also means that you should not manually call clear at the end of this function. Use clear when you want to empty a container that you intend to continue using afterwards.
As a side note, std::map is not a hash map (std::unordered_map is one).
let's assume I have a structure MyStruct and i want to allocate a "big" chunk of memory like this:
std::size_t memory_chunk_1_size = 10;
MyStruct * memory_chunk_1 = reinterpret_cast <MyStruct *> (new char[memory_chunk_1_size * sizeof(MyStruct)]);
and because of the "arbitrary reasons" I would like to "split" this chunk of memory into two smaller chunks without; moving data, resizing the "dynamic array", deallocating/allocating/reallocating memory, etc.
so I am doing this:
std::size_t memory_chunk_2_size = 5; // to remember how many elements there are in this chunk;
MyStruct * memory_chunk_2 = &memory_chunk_1[5]; // points to the 6th element of memory_chunk_1;
memory_chunk_1_size = 5; // to remember how many elements there are in this chunk;
memory_chunk_1 = memory_chunk_1; // nothing changes still points to the 1st element.
Unfortunately, when I try to deallocate the memory, I'm encountering an error:
// release memory from the 2nd chunk
for (int i = 0; i < memory_chunk_2_size ; i++)
{
memory_chunk_2[i].~MyStruct();
}
delete[] reinterpret_cast <char *> (memory_chunk_2); // deallocates memory from both "memory_chunk_2" and "memory_chunk_1"
// release memory from the 1st chunk
for (int i = 0; i < memory_chunk_1_size ; i++)
{
memory_chunk_1[i].~MyStruct(); // Throws exception.
}
delete[] reinterpret_cast <char *> (memory_chunk_1); // Throws exception. This part of the memory was already dealocated.
How can I delete only a selected number of elements (to solve this error)?
Compilable example:
#include <iostream>
using namespace std;
struct MyStruct
{
int first;
int * second;
void print()
{
cout << "- first: " << first << endl;
cout << "- second: " << *second << endl;
cout << endl;
}
MyStruct() :
first(-1), second(new int(-1))
{
cout << "constructor #1" << endl;
print();
}
MyStruct(int ini_first, int ini_second) :
first(ini_first), second(new int(ini_second))
{
cout << "constructor #2" << endl;
print();
}
~MyStruct()
{
cout << "destructor" << endl;
print();
delete second;
}
};
int main()
{
// memory chunk #1:
std::size_t memory_chunk_1_size = 10;
MyStruct * memory_chunk_1 = reinterpret_cast <MyStruct *> (new char[memory_chunk_1_size * sizeof(MyStruct)]);
// initialize:
for (int i = 0; i < memory_chunk_1_size; i++)
{
new (&memory_chunk_1[i]) MyStruct(i,i);
}
// ...
// Somewhere here I decided I want to have two smaller chunks of memory instead of one big,
// but i don't want to move data nor reallocate the memory:
std::size_t memory_chunk_2_size = 5; // to remember how many elements there are in this chunk;
MyStruct * memory_chunk_2 = &memory_chunk_1[5]; // points to the 6th element of memory_chunk_1;
memory_chunk_1_size = 5; // to remember how many elements there are in this chunk;
memory_chunk_1 = memory_chunk_1; // nothing changes still points to the 1st element.
// ...
// some time later i want to free memory:
// release memory from the 2nd chunk
for (int i = 0; i < memory_chunk_2_size ; i++)
{
memory_chunk_2[i].~MyStruct();
}
delete[] reinterpret_cast <char *> (memory_chunk_2); // deallocates memory from both "memory_chunk_2" and "memory_chunk_1"
// release memory from the 1st chunk
for (int i = 0; i < memory_chunk_1_size ; i++)
{
memory_chunk_1[i].~MyStruct(); // Throws exception.
}
delete[] reinterpret_cast <char *> (memory_chunk_1); // Throws exception. This part of the memory was already dealocated.
// exit:
return 0;
}
This kind of selective deallocation is not supported by the C++ programming language, and is probably never going to be supported.
If you intend to deallocate individual portions of memory, those individual portions need to be individually allocated in the first place.
It is possible that a specific OS or platform might support this kind of behavior, but it would be with OS-specific system calls, not through C++ standard language syntax.
Memory allocated with malloc or new cannot be partially deallocate. Many heaps use bins of different sized allocations for performance and to prevent fragmentation so allowing partial frees would make such a strategy impossible.
That of course does not prevent you writing your own allocator.
The simplest way I could think of by means of standard c++ would follow this idiomatic code:
std::vector<int> v1(1000);
auto block_start = v1.begin() + 400;
auto block_end = v1.begin() + 500;
std::vector<int> v2(block_start,block_end);
v1.erase(block_start,block_end);
v1.shrink_to_fit();
If a compiler is intelligent enough to translate such pattern to the most efficient low level OS and CPU memory management operations, is an implementation detail.
Here's the working example.
Let's be honest: this is a very bad practice ! Trying to cast new and delete and in addition call yourself destructor between the two is an evidence of low-level manual memory management.
Alternatives
The proper way to manage dynamic memory structures in contiguous blocks in C++ is to use std::vector instead of manual arrays or manual memory management, and let the library proceed. You can resize() a vector to delete the unneeded elements. You can shrink_to_fit() to say that you no longer need the extra free capacity, although it's not guaranteed that unneeded memory is released.
The use of C memory allocation functions and in particular realloc() is to be avoided, as it is very error prone, and it works only with trivially copiable objects.
Edit: Your own container
If you want to implement your own special container and must allows this kind of dynamic behaviour, due to unusual special constraints, then you should consider writing your own memory management function that would manage a kind of "private heap".
Heap management is often implement via a linked list of free chunks.
One strategy could be to allocate a new chunk when there's no sufficient contiguous memory left in your private heap. You could then offer a more permissive myfree() function that reinserts a freed or partially freed chunk into that linked list. Of course, this requires to iterate through the linked list to find if the released memory is contiguous to any other chunk of free memory in the private heap, and merge the blocks if adjacent.
I see that MyStruct is very small. Another approach could then be to write a special allocation function optimised for small fixed size blocks. There is an example in Loki's small object library that is described in depth in "Modern C++ Design".
Finally, you could perhaps have a look at the Pool library of Boost, which also offers a chunk based approach.
I need to construct a large std::vector<std::shared_ptr<A>> many_ptr_to_A.
Ideally, for A a non-default constructor with arguments is used.
Several variants are defined in the code sample below:
#include <iostream>
#include <vector>
#include <memory>
#include <ctime>
class A
{
public:
A(std::vector<double> data):
data(data)
{}
A():
data(std::vector<double>(3, 1.))
{}
std::vector<double> data;
};
int main()
{
int n = 20000000;
std::vector<std::shared_ptr<A>> many_ptr_to_A;
// option 1
std::clock_t start = std::clock();
std::vector<A> many_A(n, std::vector<double>(3, 1.));
std::cout << double(std::clock() - start) / CLOCKS_PER_SEC << std::endl;
// end option 1
many_ptr_to_A.clear();
// option 2
start = std::clock();
many_ptr_to_A.reserve(n);
for (int i=0; i<n; i++) {
many_ptr_to_A.push_back(std::shared_ptr<A>(new A(std::vector<double>(3, 1.))));
}
std::cout << double(std::clock() - start) / CLOCKS_PER_SEC << std::endl;
// end option 2
many_ptr_to_A.clear();
// option 3
start = std::clock();
A* raw_ptr_to_A = new A[n];
for (int i=0; i<n; i++) {
many_ptr_to_A.push_back(std::shared_ptr<A>(&raw_ptr_to_A[i]));
}
std::cout << double(std::clock() - start) / CLOCKS_PER_SEC << std::endl;
// end option 3
return 0;
}
Option 1
Rather fast but unfortunately I need pointers instead of raw objects. A method to create pointers to the resulting allocated space and preventing the vector from deleting the objects would be great but I can't think of one.
Option 2
This works and I can feed specific data in the constructor for every A. Unfortunately, this is rather slow. Using std::make_shared instead of new is not really improving the situation.
Even worse, this seems to be a big bottleneck when used in multiple threads. Assuming, that I run option 2 in 10 threads with n_thread = n / 10, instead of being around ten times faster the whole thing is around four times slower. Why does this happen? Is it a problem when multiple thread try to allocate many small pieces of memory?
The number of cores on the server I'm using is larger than the number of threads. The rest of my application scales nicely with the number of cores, thus this actually represents a bottleneck.
Unfortunately, I'm not really experienced when it comes to parallelization...
Option 3
With this approach I tried to combine the fast allocation with a raw new at one go and the shared_ptrs. This compiles, but unfortunately yields a segmentation fault when the destructor of the vector is called. I don't fully understand why this happens. Is it because A is not POD?
In this approach I would manually fill the object-specific data into the objects after their creation.
Question
How can I perform the allocation of a large number of shared_ptr to A in an efficient way which also scales nicely when used on many threads/cores?
Am I missing an obvious way to construct the std::vector<std::shared_ptr<A>> many_ptr_to_A in one go?
My system is a Linux/Debian server. I compile with g++ and -O3, -std=c++11 options.
Any help is highly appreciated :)
Option 3 is undefined behaviour, you have n shared_ptrs which will all try to delete a single A, but there must be only one delete[] for the whole array, not delete used n times. You could do this though:
std::unique_ptr<A[]> array{ new A[n] };
std::vector<std::shared_ptr<A>> v;
v.reserve(n);
v.emplace_back(std::move(array));
for (int i = 1; i < n; ++i)
v.push_back(std::shared_ptr<A>{v[0], v[0].get() + i});
This creates a single array, then creates n shared_ptr objects which all share ownership of the array and which each point to a different element of the array. This is done by creating one shared_ptr that owns the array (and a suitable deleter) and then creating n-1 shared_ptrs that alias the first one, i.e. share the same reference count, even though their get() member will return a different pointer.
A unique_ptr<A[]> is initialized with the array first, so that default_delete<A[]> will be used as the deleter, and that deleter will be transferred into the first shared_ptr, so that when the last shared_ptr gives up ownership the right delete[] will be used to free the whole array. To get the same effect you could create the first shared_ptr like this instead:
v.push_back(std::shared_ptr<A>{new A[n], std::default_delete<A[]>{}});
Or:
v.emplace_back(std::unique_ptr<A[]>{new A[n]});
Lets say I have a loop:
for(int i = 0; i < 1000; i++)
{
vector<int> table(100000, 0);
int result = some_function(&table);
// ...
}
Is the memory of previous "table" instance is freed in each loop cycle?
Depends what you mean by "memory freed".
At the end of each iteration, the destructor for the vector is called, and therefore the destructor for each contained element is called. So you don't have a memory leak (if that's what your concern was).
But whether the memory is returned to the operating system is a different question; that's implementation-specific. There are at least two levels of abstraction involved, the container allocator, and the standard new/delete implementation underneath that.
Yes; the vector is destroyed on each iteration (and created on each iteration).
Yes, the destructor of vector will free the memory.
I would do something like this. In this way the memory is reserved just once, but the vector's value is 0:
size_t vector_size = 100000;
vector<int> table(vector_size);
for(int i = 0; i < 1000; ++i)
{
memset( &table[0], 0, table.size() * sizeof(int) );
int result = some_function(&table);
// ...
}
std::map<int, int> * mp = new std::map<int, int>;
for(int i = 0; i < 999999; i++){
mp->insert(std::pair<int, int>(i, 999999-i ));
}
p("created");
//mp->clear(); - doesn't help either
delete mp;
p("freed");
The problem is: "delete mp" doesn't do anything. To compare:
std::vector<int> * vc = new std::vector<int>;
for(int i = 0; i < 9999999; i++){
vc->push_back(i);
}
p("created");
delete vc;
p("freed");
releases memory. How to release memory from map?
PS: p("string") just pauses program and waits for input.
The RAM used by the application is not a precise way to tell if the memory has been semantically freed.
Freed memory is memory that you can reuse. Sometimes though, you don't observe this freed memory directly in what the OS reports as memory used by our app.
You know that the memory is freed because the semantics of the language say so.
Actually, if the following code doesn't leak:
{
std::map<int, int> some_map;
}
The following code shouldn't leak as well:
{
std::map<int, int>* some_map = new std::map<int, int>();
/* some instructions that do not throw or exit the function */
delete some_map;
}
This applies whatever the type you use with new, as long as the type is well written. And std::map is probably very well written.
I suggest you use valgrind to check for your leaks. I highly doubt that what you observed was a real leak.
As Daniel mentions, RAM used by an application is not necessarily an indicator of a memory leak.
With respect to the behavior you notice regarding vectors, a vector guarantees that the memory layout is contiguous & hence when you create a vector of size 99999999, all the 99999999 elements are laid out in sequence & would constitute a pretty sizable chunk of memory. Deleting this would definitely impact process size. A map behaves differently (as per Wikipedia, its often implemented as a self balancing binary tree) so I guess deleting it causes fragmentation in the process memory space & maybe due to that the process memory does not drop immediately.
The best way to detect memory leaks would be to use a tool like valgrind which will clearly indicate whats wrong.
To pinpoint if you are releasing memory or not, try adding observable effects to the destructors of your object and... observe them.
For instance, instead of a map, create a custom class which emits output when the destructor is invoked.
Something like this:
#include <map>
#include <iostream>
#include <utility>
class Dtor{
int counter;
public:
explicit Dtor(int c):counter(c) {std::cout << "Constructing counter: " << counter << std::endl; }
Dtor(const Dtor& d):counter(d.counter) {std::cout << "Copy Constructing counter: " << counter << std::endl; }
~Dtor(){ std::cout << "Destroying counter: " << counter << std::endl; }
};
int main(){
std::map<int, const Dtor&> * mp = new std::map<int, const Dtor&>;
for (int i = 0; i < 10; ++i){
mp -> insert(std::make_pair(i, Dtor(i)));
}
delete mp;
return 0;
}
You will witness that deleting the pointer invokes the destructor of your objects, as expected.