Vector of structs containing allocation pointers is failing to destruct - c++

In my project I use a class for paged memory allocation.
This class uses a struct to store all of its allocations:
enum PageStatus : uint_fast8_t { //!< possible use stati of an allocated page
PAGE_STATUS_INVALID = 0b00,
PAGE_STATUS_FREE = 0b01, //!< the page is free
PAGE_STATUS_USED = 0b10, //!< the page is (partially) used
};
struct PhysicalPage { //!< represents a page that has been allocated
char* pData; //!< pointer to the allocation
PageStatus status; //!< status of the allocation
};
These PhysicalPages are stored in a vector std::vector<PhysicalPage> physicalAllocations {};.
During runtime pages are added to the vector and some may be removed. During the removal the last element is popped and the memory is returned using delete page.pData. However a problem occurs when the allocator class reaches end of life and gets deallocated from the stack. When pyhsicalAllocations's vector destructor is called it tries to destruct not only the elements themself but also the reserved memory (which the vector keeps as a buffer for when the size is changed). That causes invalid memory pointers to be deleted, stopping program execution:
double free or corruption (!prev)
Signal: SIGABRT (Aborted)
It's probably also worth mentioning that allocations are done in chunks larger than pages, which means that only one in every x pointers are actually valid allocations. All other pointers are just offsets from the actual memory locations.
To prevent the error from occurring I tried:
deleting manually (this is a bit overcomplicated due to chunked allocation)
for (size_t i = physicalAllocations.size(); 0 < i; i -= 1 << allocationChunkSize) {
delete physicalAllocations[i - (1 << allocationChunkSize)].pData;
for (size_t a = 0; a < 1 << allocationChunkSize; a++)
physicalAllocations.pop_back();
}
clearing the vector
physicalAllocations.clear();
swapping for a clear vector
std::vector<PhysicalPage>(0).swap(physicalAllocations);
of which none worked.
I've been working on this problem for a lot longer that I would like to admit and your help is very much appreciated. Thanks!

std::shared_ptr<char[]> pData and its aliasing constructor (8) might help. (that might even allow to get rid of PageStatus).
It would look something like:
constexpr std::size_t page_size = 6;
struct PhysicalPage {
std::shared_ptr<char[]> pData;
};
int main()
{
std::vector<PhysicalPage> pages;
{
std::shared_ptr<char[]> big_alloc = std::unique_ptr<char[]>(new char[42]{"hello world. 4 8 15 16 23 42"});
for (std::size_t i = 0; i != 42 / page_size; ++i) {
pages.push_back(PhysicalPage{std::shared_ptr<char[]>{big_alloc, big_alloc.get() + i * page_size}});
}
}
pages.erase(pages.begin());
pages.erase(pages.begin() + 2);
for (auto& p : pages) {
std::cout << std::string_view(p.pData.get(), page_size) << std::endl;
}
}
Demo

Related

Allocating memory thorugh malloc for std::string array not working

I'm having trouble allocating memory on the heap for an array of strings. Allocating with new works but malloc segfaults each time. The reason I want to use malloc in the first place is that I don't want to call the constructor unnecessarily.
This works fine
std::string* strings = new std::string[6];
This doesn't
std::string* strings = (std::string *)malloc(sizeof(std::string[6]));
One thing I've noticed is that the first variant (using new) allocates 248 bytes of memory while the second allocates only 240. This 8 byte difference is constant no matter the size of the array from what I've gathered and I cannot find what the source of the difference is.
Here's the whole code that segfaults.
#include <iostream>
void* operator new(size_t size)
{
std::cout << size << std::endl;
return malloc(size);
}
void* operator new [](size_t size)
{
std::cout << size << std::endl;
return malloc(size);
}
int main() {
std::string* strings = new std::string[6];
strings = (std::string *)malloc(sizeof(std::string[6]));
strings[0] = std::string("test");
return 0;
}
Another thing I've noticed is that the above code seems to work if I use memset after malloc to set all of the bytes I allocated with malloc to 0. I don't understand where the 8 byte difference comes from if this works and also why this variant works at all. Why would it work just because I set all of the bytes to 0?
malloc() only allocates raw memory, but it does not construct any objects inside of that memory.
new and new[] both allocate memory and construct objects.
If you really want to use malloc() to create an array of C++ objects (which you really SHOULD NOT do!), then you will have to call the object constructors yourself using placement-new, and also call the object destructors yourself before freeing the memory, eg:
std::string* strings = static_cast<std::string*>(
malloc(sizeof(std::string) * 6)
);
for(int i = 0; i < 6; ++i) {
new (&strings[i]) std::string;
}
...
for(int i = 0; i < 6; ++i) {
strings[i].~std::string();
}
free(strings);
In C++11 and C++14, you should use std::aligned_storage to help calculate the necessary size of the array memory, eg:
using string_storage = std::aligned_storage<sizeof(std::string), alignof(std::string)>::type;
void *buffer = malloc(sizeof(string_storage) * 6);
std::string* strings = reinterpret_cast<std::string*>(buffer);
for(int i = 0; i < 6; ++i) {
new (&strings[i]) std::string;
}
...
for(int i = 0; i < 6; ++i) {
strings[i].~std::string();
}
free(buffer);
In C++17 and later, you should use std::aligned_alloc() instead of malloc() directly, eg:
std::string* strings = static_cast<std::string*>(
std::aligned_alloc(alignof(std::string), sizeof(std::string) * 6)
);
for(int i = 0; i < 6; ++i) {
new (&strings[i]) std::string;
}
...
for(int i = 0; i < 6; ++i) {
strings[i].~std::string();
}
std::free(strings);
Allocating via new means the constructor is run. Please always use new and delete with C++ classes (and std::string is a C++ class), whenever you can.
When you do malloc() / free(), only memory allocation is done, constructor (destructor) is not run. This means, the object is not initialized. Technically you might still be able to use placement new (i.e., new(pointer) Type) to initialize it, but it's better and more conformant to use classic new.
If you wanted to allocate multiple objects, that's what containers are for. Please use them. Multiple top-grade engineers work on std::vector<>, std::array<>, std::set<>, std::map<> to work and be optimal - it's very hard to beat them in performance, stability or other metrics and, even if you do, the next coder at the same company needs to learn into your specific data structures. So it's suggested not to use custom and locally implemented allocations where a container could be used, unless for a very strong reason (or, of course, didactic purposes).

C++ placement new, Invalid read and InvalidInvalid free() / delete / delete[] / realloc()

I'm experimenting the usage of placement new to try to understand how it works.
Executing the code below:
#include <iostream>
#define SHOW(x) std::cout << #x ": " << x << '\n'
template<typename T>
static T *my_realloc(T *ptr, size_t count) {
return (T *) realloc((void *) ptr, count * sizeof(T));
}
template<typename T>
static void my_free(T *ptr) {
free((T *) ptr);
}
int main() {
constexpr int count = 40;
int cap = 0;
int size = 0;
std::string *strs = nullptr;
auto tmp_str = std::string();
for(int i = 0; i < count; i++) {
tmp_str = std::to_string(i);
if(size == cap) {
if(cap == 0)
cap = 1;
else
cap *= 2;
strs = my_realloc(strs, cap);
}
new (&strs[size++]) std::string(std::move(tmp_str));
}
for(int i = 0; i < size; i++)
SHOW(strs[i]);
std::destroy_n(strs, size);
my_free(strs);
return 0;
}
I get the errors:
Invalid read of size 1
Invalid free() / delete / delete[] / realloc()
Removing the line
std::destroy_n(strs, size);
The error of invalid free is solved, but somehow all memory of the program is freed and no leaks are generated. But i can't find how the std::string destructor is called in the program.
If you want to store non-trivial types (such as std::string), then realloc simply cannot be used. You will find that standard library containers like e.g. std::vector will also not use it.
realloc may either extend the current allocation, without moving it in memory, or it might internally make a new allocation in separate memory and copy the contents of the old allocation to the new one. This step is performed as if by std::memcpy. The problem here is that std::memcpy will only work to actually create new objects implicitly in the new allocation and copy the values correctly if the type is trivially-copyable and if it and all of its subobjects are implicit-lifetime types. This definitively doesn't apply to std::string or any other type that manages some (memory) resource.
You are also forgetting to check the return value of realloc. If allocation failed, it may return a null pointer, which you will then use regardless, causing a null pointer dereference, as well as a memory leak of the old allocation.
Instead of using realloc you should make a new allocation for the new size, then placement-new copies of the objects already in the old allocation into the new one and then destroy the objects in the old allocation and free it.
If you want to guarantee that there won't be memory leaks when exceptions are thrown things become somewhat complicated. I suggest you look at the std::vector implementation of one of the standard library implementations if you want to figure out the details.
strs = my_realloc(strs, cap);
strs is a pointer to a std::string, and this will result in the contents of the pointer to be realloc()ed.
This is undefined behavior. C++ objects cannot be malloced, realloced, or freeed. Using a wrapper function, or placement new, at some point along the way, does not change that.
Everything from this point on is undefined behavior, and the resulting consequences are of no importance.

Heap corruption when using make_shared

I have a class with private member variable
shared_ptr<short> m_p_data;
I get heap corruption when I use this constructor:
Volume2D(const int dimX, const int dimY) :m_dimX{ dimX }, m_dimY{ dimY }, m_p_data{ make_shared<short>(dimX*dimY) } {
}
but there is no heap corruption if I do this instead:
Volume2D(const int dimX, const int dimY) :m_dimX(dimX), m_dimY(dimY) {
m_p_data.reset(new short[dimX*dimY]);
}
To be more specific, here is the code that corrupts the heap:
Volume2D vol(10, 1);
for (auto i = 0; i < 10; ++i) {
vol(i, 0) = i;
cout << "value = " << vol(i, 0) << endl;
}
return 0;
Both versions of your code are problematic.
The first version,
make_shared<short>(dimX*dimY)
creates a single heap-allocated short with the value dimX*dimY. It is apparent from the rest of your question, that your code later treats this logically as an array of dimension dimX*dimY, which is exactly what's causing the heap corruption (you only allocated a single short, but you're treating it like many).
The second version has the opposite problem. You're allocating dimX*dimY shorts, but, as far as your shared_ptr, it doesn't know that. So it doesn't have your heap corruption, but the shared_ptr destructor calls delete, not delete[] (even though you allocated with new[], not new).
For this case, its' unclear why you need a shared_ptr to begin with. Why not use std::vector<short>, or std::vector<std::vector<short>>?

How to get the number of bytes actually occupied by a string in memory?

As far as I know, there is implementation-dependent string optimization in C++ which lets the string not to allocate any additional heap memory to store its characters, but rather store the characters in the string object itself. So if the string s allocates additional memory on the heap, the total memory it consumes is sizeof(string) + s.capacity(), however, if it does not allocate any additional memory on the heap, i.e. stores its characters in the string object, then the total memory consumption is sizeof(string).
Is there a way to figure out this quantity - the total memory consumed by a string? The problem is that I don't see a way to figure out whether a string object has allocated memory on the heap or not, so I don't know which formula to use for a certain string.
EDIT: a hack injecting something in STL namespace to figure out the implementation-dependent detail (the threshold at which a string starts to allocate additional memory) would be ok if there is no other solution.
Since s.data() points to the first character of the string, you can check whether that address lies within the string object itself.
Make sure to use equality-comparison only, since pointers to objects that are not subobjects of the same complete object do not have a specified ordering. Moreover, you cannot use pointer arithmetic, which is only allowed for pointers into a given array, and you don't know beforehand whether the data pointer lies inside the string object.
For example:
bool data_is_inline(const std::string & s)
{
const char * p = reinterpret_cast<const char *>(&s), * q = s.data();
for (std::size_t i = 0; i != sizeof(s); ++i)
if (p + i == q) return true;
return false;
}
You can overload global malloc function and ::operator new and add the requested size to some global variable. Then you can subtract total allocated count before and after creating or modifying the string.
size_t before = getTotalAllocated();
std::string str("Hello!");
size_t after = getTotalAllocated();
size_t diff = afer - before;
You can add sizeof(std::string) to diff if you like
So if the string s allocates additional memory on the heap, the total memory it consumes is sizeof(string) + s.capacity(), however, if it does not allocate any additional memory on the heap, i.e. stores its characters in the string object, then the total memory consumption is sizeof(string).
I believe you cannot make the distinction without diving into implementation specific details (different on Linux/GCC libstdc++ and on Windows...). And the standard does not require (even if it highly probable) to just spend sizeof(string) + s.capacity() in the bad case (perhaps some std::string implementations could spend more memory).
AFAIK, many implementations deal differently with "short" strings and with "long" ones, the threshold being an implementation specific detail.
The standard does not prohibit std::string to use some own specific allocator, doing some hash consing -even if that is unlikely-, or whatever
This code shows how to use allocation hook to measure string allocation with GCC compiler.
Please note there are other hooks that may be interesting like realloc and free.
Here is the reference: Hooks for malloc
#include <malloc.h>
#include <iostream>
using namespace std;
static void my_init_hook (void);
static void *my_malloc_hook (size_t, const void *);
void* (* old_malloc_hook)(size_t size, const void *caller);
void (* volatile __malloc_initialize_hook) (void) = my_init_hook;
static void my_init_hook (void)
{
old_malloc_hook = __malloc_hook;
__malloc_hook = my_malloc_hook;
}
size_t g_counter = 0;
static void * my_malloc_hook (size_t size, const void *caller)
{
g_counter += size;
__malloc_hook = old_malloc_hook;
cout << "Allocated: " << size << endl;
void* result = malloc(size);
__malloc_hook = my_malloc_hook;
}
int main()
{
cout << "*** On stack ***" << endl;
string a("12345");
string b("1234567890");
string c("12345678901234567890123456789012345678901234567890");
string d(c);
d[1] = 'A'; // this makes big allocation!
cout << "*** On heap ***" << endl;
string* x = new string("1234567890");
string* y = new string(*x);
string* z = new string(*x);
cout << "Total allocation:" << g_counter << endl;
}
The result is:
*** On stack ***
Allocated: 18
Allocated: 23
Allocated: 63
Allocated: 63
*** On heap ***
Allocated: 4
Allocated: 23
Allocated: 4
Allocated: 4
Total allocation:202
You can verify that line d[1] = 'A'; makes another allocation of 63 bytes.
This can be improved to count the allocation which originates only from specific caller, e.g. string allocator or to make allocation statistics by caller. I've dome something similar a long time ago to find out who is allocating strings, but I used the method of unwinding the stack.

How to avoid dynamic allocation of memory C++

[edit] Outside of this get method (see below), i'd like to have a pointer double * result; and then call the get method, i.e.
// Pull results out
int story = 3;
double * data;
int len;
m_Scene->GetSectionStoryGrid_m(story, data, len);
with that said, I want to a get method that simply sets the result (*&data) by reference, and does not dynamically allocate memory.
The results I am looking for already exist in memory, but they are within C-structs and are not in one continuous block of memory. Fyi, &len is just the length of the array. I want one big array that holds all of the results.
Since the actual results that I am looking for are stored within the native C-struct pointer story_ptr->int_hv[i].ab.center.x;. How would I avoid dynamically allocating memory like I am doing above? I’d like to point the data* to the results, but I just don’t know how to do it. It’s probably something simple I am overlooking… The code is below.
Is this even possible? From what I've read, it is not, but as my username implies, I'm not a software developer. Thanks to all who have replied so far by the way!
Here is a snippet of code:
void GetSectionStoryGrid_m( int story_number, double *&data, int &len )
{
std::stringstream LogMessage;
if (!ValidateStoryNumber(story_number))
{
data = NULL;
len = -1;
}
else
{
// Check to see if we already retrieved this result
if ( m_dStoryNum_To_GridMap_m.find(story_number) == m_dStoryNum_To_GridMap_m.end() )
{
data = new double[GetSectionNumInternalHazardVolumes()*3];
len = GetSectionNumInternalHazardVolumes()*3;
Story * story_ptr = m_StoriesInSection.at(story_number-1);
int counter = 0; // counts the current int hv number we are on
for ( int i = 0; i < GetSectionNumInternalHazardVolumes() && story_ptr->int_hv != NULL; i++ )
{
data[0 + counter] = story_ptr->int_hv[i].ab.center.x;
data[1 + counter] = story_ptr->int_hv[i].ab.center.y;
data[2 + counter] = story_ptr->int_hv[i].ab.center.z;
m_dStoryNum_To_GridMap_m.insert( std::pair<int, double*>(story_number,data));
counter += 3;
}
}
else
{
data = m_dStoryNum_To_GridMap_m.find(story_number)->second;
len = GetSectionNumInternalHazardVolumes()*3;
}
}
}
Consider returning a custom accessor class instead of the "double *&data". Depending on your needs that class would look something like this:
class StoryGrid {
public:
StoryGrid(int story_index):m_storyIndex(story_index) {
m_storyPtr = m_StoriesInSection.at(story_index-1);
}
inline int length() { return GetSectionNumInternalHazardVolumes()*3; }
double &operator[](int index) {
int i = index / 3;
int axis = index % 3;
switch(axis){
case 0: return m_storyPtr->int_hv[i].ab.center.x;
case 1: return m_storyPtr->int_hv[i].ab.center.y;
case 2: return m_storyPtr->int_hv[i].ab.center.z;
}
}
};
Sorry for any syntax problems, but you get the idea. Return a reference to this and record this in your map. If done correctly the map with then manage all of the dynamic allocation required.
So you want the allocated array to go "down" in the call stack. You can only achieve this allocating it in the heap, using dynamic allocation. Or creating a static variable, since static variables' lifecycle are not controlled by the call stack.
void GetSectionStoryGrid_m( int story_number, double *&data, int &len )
{
static g_data[DATA_SIZE];
data = g_data;
// continues ...
If you want to "avoid any allocation", the solution by #Speed8ump is your first choice! But then you will not have your double * result; anymore. You will be turning your "offline" solution (calculates the whole array first, then use the array elsewhere) to an "online" solution (calculates values as they are needed). This is a good refactoring to avoid memory allocation.
This answer to this question relies on the lifetime of the doubles you want pointers to. Consider:
// "pointless" because it takes no input and throws away all its work
void pointless_function()
{
double foo = 3.14159;
int j = 0;
for (int i = 0; i < 10; ++i) {
j += i;
}
}
foo exists and has a value inside pointless_function, but ceases to exist as soon as the function exits. Even if you could get a pointer to it, that pointer would be useless outside of pointless_function. It would be a dangling pointer, and dereferencing it would trigger undefined behavior.
On the other hand, you are correct that if you have data in memory (and you can guarantee it will live long enough for whatever you want to do with it), it can be a great idea to get pointers to that data instead of paying the cost to copy it. However, the main way for data to outlive the function that creates it is to call new, new[], or malloc. You really can't get out of that.
Looking at the code you posted, I don't see how you can avoid new[]-ing up the doubles when you create story. But you can then get pointers to those doubles later without needing to call new or new[] again.
I should mention that pointers to data can be used to modify the original data. Often that can lead to hard-to-track-down bugs. So there are times that it's better to pay the price of copying the data (which you're then free to muck with however you want), or to get a pointer-to-const (in this case const double* or double const*, they are equivalent; a pointer-to-const will give you a compiler error if you try to change the data being pointed to). In fact, that's so often the case that the advice should be inverted: "there are a few times when you don't want to copy or get a pointer-to-const; in those cases you must be very careful."