I'm trying to make my own memory allocator in C++ for educational purposes, and I have a code like this:
class IntObj
{
public:
IntObj(): var_int(6) {}
void setVar(int var)
{
var_int = var;
}
int getVar()
{
return var_int;
}
virtual size_t getMemorySize()
{
return sizeof(*this);
}
int a = 8;
~IntObj()
{}
private:
int var_int;
};
And I'm stuck with how to have unused memory chunks merge. I'm trying to test it like this:
char *pz = new char[sizeof(IntObj) * 2]; //In MacOS, IntObj takes 16 bytes
char *pz2 = &pz[sizeof(IntObj)]; // Take address of 16-th cell
char *pz3 = new char[sizeof(IntObj) / 2]; //Array of 8 bytes
char **pzz = &pz2;
pzz[sizeof(IntObj)] = pz3; // Set address of cell 16 to the pz3 array
new (&pzz) IntObj; //placement new
IntObj *ss = reinterpret_cast<IntObj *>(&pzz);
cout << ss->a;
The output is 8 as expected. My questions:
Why the output is correct?
Is the code like this correct? If not, are there any other ways to implement coalescence of two memory chunks?
UPDATE: All methods work correctly.
e.g this would work:
ss->setVar(54);
cout << ss->getVar();
The output is 54.
UPDATE 2: First of all, my task is not to request a new memory block from OS for instantiating an object, but to give it from a linked list of free blocks(that were allocated when starting a program). My problem is that I can have polymorphic objects with different sizes, and don't know how to split memory blocks, or merge (that is what I understand by merging or coalescing chunks) them (if allocation is requested) effectively.
There's a number of misunderstandings apparent here
char *pz = new char[sizeof(IntObj) * 2]; // fine
char *pz2 = &pz[sizeof(IntObj)]; // fine
char *pz3 = new char[sizeof(IntObj) / 2]; // fine
char **pzz = &pz2; // fine
pzz[sizeof(IntObj)] = pz3; // bad
pzz is a pointer that is pointing to only a single char*, which is the variable pz2. Meaning that any access or modification past pzz[0] is undefined behavior (very bad). You're likely modifying the contents of some other variable.
new (&pzz) IntObj; // questionable
This is constructing an IntObj in the space of the variable pzz, not where pzz is pointing to. The constructor of course sets a to 8 thereby stomping on the contents of pzz (it won't be pointing to pz2 anymore). I'm uncertain if this in-and-of-itself is undefined behavior (since there would be room for a whole IntObj), but using it certainly is:
IntObj *ss = reinterpret_cast<IntObj *>(&pzz); // bad
This violates the strict-aliasing rule. While the standard is generous for char* aliases, it does not allow char** to IntObj* aliases. This exhibits more undefined behavior.
If your question comes down to whether or not you can use two independent and contiguous blocks of memory as a single block then no, you cannot.
Related
Unlike in the C, as what i've learned about C++, there is no instruction realloc in C++ for it is not recommended. But when I was creating a function that concatenates strings and at the same time can be dynamically re-allocating the given strings' memory without using vector, I've come to need some code just like as the realloc instruction functioning.
So what i've come up with is that using a reference of a pointer(in the code char* &des) could adjust the size of memory by using the usual instruction of C++, new and delete. However, an error occured: "[Error] invalid initialization of non-const reference of type 'char*&' from an rvalue of type 'char*'" Why is it impossible to initialize char*& type with the type char*? Isn't it the same as a statement char* &des = str0? The total code is as follows:
void Mystrcat(char* &des, const char* src) {
int des_len = Mystrlen(des); // Mystrlen just returns the length of a string with the type unsigned int excluding null character
int src_len = Mystrlen(src);
char* temp_str = des;
des = new char[des_len + src_len + 1];
//a copy process
for(int i = 0; i < des_len; i++) {
des[i] = *(temp_str + i);
}
for(int i = des_len + 1; i < des_len + src_len + 1; i++)
des[i - 1] = *(src + i - des_len - 1);
}
int main() {
char str0[100] = "Hello";
Mystrcat(str0, ", World!");
std::cout << str0 << std::endl; //expecting "Hello, World!" to be printed
return 0;
}
What i've tried before is just writing the parameter char* des instead of char* &des. But unlike in main function, it was not possible to get the size of total str0 array in Mystrcat function by simply using sizeof. As a result, I thought it would be good to use pointer reference. I was expecting this a reference of a pointer parameter to be working properly because it is equal to the statement char* &des = str0.
The problem here is:
char str0[100] = "Hello";
str in this case has a pinned (static) memory address. It's immutable in terms of its address -- so to speak -- because it's not a pointer to a string, but an array of characters of a size that can be evaluated at compile-time (not dynamically allocated). Making str itself point to a different address makes no sense and invites a whole lot of chaos. Even modifying the original pointer address to a dynamically-allocated array is chaos since you need the original address to properly free it. Think of an array of T as T* const (the address is immutable even if the contents are mutable and even if dynamically allocated, you need to keep the original address unmodified).
But in general as a non-profit advertisement of sorts, I want to encourage embracing value semantics as much as you can over pointer/reference ones. So instead of:
void Mystrcat(char* &des, const char* src)
{
// Modify the address of 'des' in place.
}
You can do:
[[nodiscard]] char* Mystrcat(char* des, const char* src)
{
// Input an address to a string and return an address to a new string.
}
Then you can pass an address to your array, get a pointer to a new modified copy (same thing you were doing before), and store the pointer to the new array (along with freeing it when you're done). There's little benefit to modifying things in place if you're just going to allocate a new string anyway.
This is still ignoring the conventional advice that you should use std::string which is what I think you need now and wholeheartedly echo over all this low-level pointer stuff and manual heap allocation and deallocation (which can be disastrous without the use of RAII when combined with thrown exceptions) But later you might want to deviate from it if the SBO is too large or too small or if the SBO optimization is counter-productive, for example but that's diving deep into things like custom memory allocators and whatnot and something you typically reserve until you encounter profiler hotspots and really know what you're doing.
[edit] Outside of this get method (see below), i'd like to have a pointer double * result; and then call the get method, i.e.
// Pull results out
int story = 3;
double * data;
int len;
m_Scene->GetSectionStoryGrid_m(story, data, len);
with that said, I want to a get method that simply sets the result (*&data) by reference, and does not dynamically allocate memory.
The results I am looking for already exist in memory, but they are within C-structs and are not in one continuous block of memory. Fyi, &len is just the length of the array. I want one big array that holds all of the results.
Since the actual results that I am looking for are stored within the native C-struct pointer story_ptr->int_hv[i].ab.center.x;. How would I avoid dynamically allocating memory like I am doing above? I’d like to point the data* to the results, but I just don’t know how to do it. It’s probably something simple I am overlooking… The code is below.
Is this even possible? From what I've read, it is not, but as my username implies, I'm not a software developer. Thanks to all who have replied so far by the way!
Here is a snippet of code:
void GetSectionStoryGrid_m( int story_number, double *&data, int &len )
{
std::stringstream LogMessage;
if (!ValidateStoryNumber(story_number))
{
data = NULL;
len = -1;
}
else
{
// Check to see if we already retrieved this result
if ( m_dStoryNum_To_GridMap_m.find(story_number) == m_dStoryNum_To_GridMap_m.end() )
{
data = new double[GetSectionNumInternalHazardVolumes()*3];
len = GetSectionNumInternalHazardVolumes()*3;
Story * story_ptr = m_StoriesInSection.at(story_number-1);
int counter = 0; // counts the current int hv number we are on
for ( int i = 0; i < GetSectionNumInternalHazardVolumes() && story_ptr->int_hv != NULL; i++ )
{
data[0 + counter] = story_ptr->int_hv[i].ab.center.x;
data[1 + counter] = story_ptr->int_hv[i].ab.center.y;
data[2 + counter] = story_ptr->int_hv[i].ab.center.z;
m_dStoryNum_To_GridMap_m.insert( std::pair<int, double*>(story_number,data));
counter += 3;
}
}
else
{
data = m_dStoryNum_To_GridMap_m.find(story_number)->second;
len = GetSectionNumInternalHazardVolumes()*3;
}
}
}
Consider returning a custom accessor class instead of the "double *&data". Depending on your needs that class would look something like this:
class StoryGrid {
public:
StoryGrid(int story_index):m_storyIndex(story_index) {
m_storyPtr = m_StoriesInSection.at(story_index-1);
}
inline int length() { return GetSectionNumInternalHazardVolumes()*3; }
double &operator[](int index) {
int i = index / 3;
int axis = index % 3;
switch(axis){
case 0: return m_storyPtr->int_hv[i].ab.center.x;
case 1: return m_storyPtr->int_hv[i].ab.center.y;
case 2: return m_storyPtr->int_hv[i].ab.center.z;
}
}
};
Sorry for any syntax problems, but you get the idea. Return a reference to this and record this in your map. If done correctly the map with then manage all of the dynamic allocation required.
So you want the allocated array to go "down" in the call stack. You can only achieve this allocating it in the heap, using dynamic allocation. Or creating a static variable, since static variables' lifecycle are not controlled by the call stack.
void GetSectionStoryGrid_m( int story_number, double *&data, int &len )
{
static g_data[DATA_SIZE];
data = g_data;
// continues ...
If you want to "avoid any allocation", the solution by #Speed8ump is your first choice! But then you will not have your double * result; anymore. You will be turning your "offline" solution (calculates the whole array first, then use the array elsewhere) to an "online" solution (calculates values as they are needed). This is a good refactoring to avoid memory allocation.
This answer to this question relies on the lifetime of the doubles you want pointers to. Consider:
// "pointless" because it takes no input and throws away all its work
void pointless_function()
{
double foo = 3.14159;
int j = 0;
for (int i = 0; i < 10; ++i) {
j += i;
}
}
foo exists and has a value inside pointless_function, but ceases to exist as soon as the function exits. Even if you could get a pointer to it, that pointer would be useless outside of pointless_function. It would be a dangling pointer, and dereferencing it would trigger undefined behavior.
On the other hand, you are correct that if you have data in memory (and you can guarantee it will live long enough for whatever you want to do with it), it can be a great idea to get pointers to that data instead of paying the cost to copy it. However, the main way for data to outlive the function that creates it is to call new, new[], or malloc. You really can't get out of that.
Looking at the code you posted, I don't see how you can avoid new[]-ing up the doubles when you create story. But you can then get pointers to those doubles later without needing to call new or new[] again.
I should mention that pointers to data can be used to modify the original data. Often that can lead to hard-to-track-down bugs. So there are times that it's better to pay the price of copying the data (which you're then free to muck with however you want), or to get a pointer-to-const (in this case const double* or double const*, they are equivalent; a pointer-to-const will give you a compiler error if you try to change the data being pointed to). In fact, that's so often the case that the advice should be inverted: "there are a few times when you don't want to copy or get a pointer-to-const; in those cases you must be very careful."
I am working on a test which checks if all class attributes are initialized in a constructor.
My current solution works for non pointer attributes:
void CSplitVectorTest::TestConstructorInitialization()
{
const size_t memorySize = sizeof(CSplitVector);
char* pBuffer1 = (char*) malloc(memorySize);
char* pBuffer2 = (char*) malloc(memorySize);
memset(pBuffer1,'?',memorySize);
memset(pBuffer2,'-',memorySize);
new(pBuffer1) CSplitVector;
new(pBuffer2) CSplitVector;
const bool bObjectsAreEqual = memcmp(pBuffer1,pBuffer2,memorySize)==0;
if (!TEST(bObjectsAreEqual))
{
COMMENT("Constructor initialization list not complete!");
}
free(pBuffer1);
free(pBuffer2);
}
Do you have an idea how could it be improved to test if pointers are initialized?
Your test checks whether every byte of the object has been written over by the constructor. As a straight memory check it looks OK, although if the class contains other objects which don't necessarily initialise themselves fully, you may be in trouble.
That said, my main question would be: Is it really an effective test? For example, is it critical that every attribute in the CSplitVector class is initialised by the initialisation list? Do you perhaps have some which may not need to be initialised at this point? Also, how about checking whether the attributes are set to values that you'd expect?
Instead of comparing byte by byte, you probably should use the right padding or word size, and test if any byte of each word got initialized. That way you will probably get around compiler using padding and constructor leaving uninitialized bytes between padded shorter-than-word fields.
To test the real padding size, shooting from the hip, following code should do it pretty reliably:
struct PaddingTest {
volatile char c; // volatile probably not needed, but will not hurt either
volatile int i;
static int getCharPadding() {
PaddingTest *t = new PaddingTest;
int diff = (int)(&(t->i)) - (int)&((t->c));
delete t;
return diff;
}
}
Edit: You still need the two objects, but you no longer compare them to each others, you just compare each initialized data to the memset value, and if either object has any change, it means the word got touched (also on the other one, it's just chance that it got initialized to same value you memset).
I found a solution for mentioned problems, tested it with initialized/not initialized pointers and with different length types.
In test header I added #pragma pack(1) (I am working on gcc)
#pragma pack(1)
#include <CSplitVector>
Test got a little bit complicated:
void CSplitVectorTest::TestConstructorInitialization()
{
const size_t memorySize = sizeof(CSplitVector);
char* pBuffer = (char*) malloc(memorySize);
memset(pBuffer,'?',memorySize);
CSplitVector* pSplitVector = new(pBuffer) CSplitVector;
// find pointers for all '?'
QList<char*> aFound;
char* pFoundChar = (char*) memchr(pBuffer,'?',memorySize);
while (pFoundChar)
{
aFound.append(pFoundChar);
char* pStartFrom = pFoundChar+1;
pFoundChar = (char*) memchr(pStartFrom,'?',memorySize-(int)(pStartFrom-pBuffer));
}
// if there are any '?'....
if (aFound.count())
{
// allocate the same area with '-'...
pSplitVector->~CSplitVector();
memset(pBuffer,'-',memorySize);
pSplitVector = new(pBuffer) CSplitVector;
// and check if places found before contain '-'
while (aFound.count())
{
pFoundChar = aFound.takeFirst();
if (*pFoundChar=='-')
{
// if yes then class has uninitialized attribute
TEST_FAILED("Constructor initialization list not complete!");
pSplitVector->~CSplitVector();
free(pBuffer);
return;
}
}
}
// if no then all attributes are initialized
pSplitVector->~CSplitVector();
free(pBuffer);
TEST(true);
}
Feel free to point any flaws in this solution.
I'am wondering if built-in types in objects created on heap with new will be initialized to zero? Is it mandated by the standard or is it compiler specific?
Given the following code:
#include <iostream>
using namespace std;
struct test
{
int _tab[1024];
};
int main()
{
test *p(new test);
for (int i = 0; i < 1024; i++)
{
cout << p->_tab[i] << endl;
}
delete p;
return 0;
}
When run, it prints all zeros.
You can choose whether you want default-initialisation, which leaves fundamental types (and POD types in general) uninitialised, or value-initialisation, which zero-initialises fundamental (and POD) types.
int * garbage = new int[10]; // No initialisation
int * zero = new int[10](); // Initialised to zero.
This is defined by the standard.
No, if you do something like this:
int *p = new int;
or
char *p = new char[20]; // array of 20 bytes
or
struct Point { int x; int y; };
Point *p = new Point;
then the memory pointed to by p will have indeterminate/uninitialized values.
However, if you do something like this:
std::string *pstring = new std::string();
Then you can be assured that the string will have been initialized as an empty string, but that is because of how class constructors work, not because of any guarantees about heap allocation.
It's not mandated by the standard. The memory for the primitive type members may contain any value that was last left in memory.
Some compilers I guess may choose to initialize the bytes. Many do in debug builds of code. They assign some known byte sequence to give you a hint when debugging that the memory wasn't initialized by your program code.
Using calloc will return bytes initialized to 0, but that's not standard-specific. calloc as been around since C along with malloc. However, you will pay a run-time overhead for using calloc.
The advice given previously about using the std::string is quite sound, because after all, you're using the std, and getting the benefits of class construction/destruction behaviour. In other words, the less you have to worry about, like initialization of data, the less that can go wrong.
If I have a typedef of a struct
typedef struct
{
char SmType;
char SRes;
float SParm;
float EParm;
WORD Count;
char Flags;
char unused;
GPOINT2 Nodes[];
} GPATH2;
and it contains an uninitialized array, how can I create an instance of this type so that is will hold, say, 4 values in Nodes[]?
Edit: This belongs to an API for a program written in Assembler. I guess as long as the underlying data in memory is the same, an answer changing the struct definition would work, but not if the underlying memory is different. The Assembly Language application is not using this definition .... but .... a C program using it can create GPATH2 elements that the Assembly Language application can "read".
Can I ever resize Nodes[] once I have created an instance of GPATH2?
Note: I would have placed this with a straight C tag, but there is only a C++ tag.
You could use a bastard mix of C and C++ if you really want to:
#include <new>
#include <cstdlib>
#include "definition_of_GPATH2.h"
using namespace std;
int main(void)
{
int i;
/* Allocate raw memory buffer */
void * raw_buffer = calloc(1, sizeof(GPATH2) + 4 * sizeof(GPOINT2));
/* Initialize struct with placement-new */
GPATH2 * path = new (raw_buffer) GPATH2;
path->Count = 4;
for ( i = 0 ; i < 4 ; i++ )
{
path->Nodes[i].x = rand();
path->Nodes[i].y = rand();
}
/* Resize raw buffer */
raw_buffer = realloc(raw_buffer, sizeof(GPATH2) + 8 * sizeof(GPOINT2));
/* 'path' still points to the old buffer that might have been free'd
* by realloc, so it has to be re-initialized
* realloc copies old memory contents, so I am not certain this would
* work with a proper object that actaully does something in the
* constructor
*/
path = new (raw_buffer) GPATH2;
/* now we can write more elements of array */
path->Count = 5;
path->Nodes[4].x = rand();
path->Nodes[4].y = rand();
/* Because this is allocated with malloc/realloc, free it with free
* rather than delete.
* If 'path' was a proper object rather than a struct, you should
* call the destructor manually first.
*/
free(raw_buffer);
return 0;
}
Granted, it's not idiomatic C++ as others have observed, but if the struct is part of legacy code it might be the most straightforward option.
Correctness of the above sample program has only been checked with valgrind using dummy definitions of the structs, your mileage may vary.
If it is fixed size write:
typedef struct
{
char SmType;
char SRes;
float SParm;
float EParm;
WORD Count;
char Flags;
char unused;
GPOINT2 Nodes[4];
} GPATH2;
if not fixed then change declaration to
GPOINT2* Nodes;
after creation or in constructor do
Nodes = new GPOINT2[size];
if you want to resize it you should use vector<GPOINT2>, because you can't resize array, only create new one. If you decide to do it, don't forget to delete previous one.
also typedef is not needed in c++, you can write
struct GPATH2
{
char SmType;
char SRes;
float SParm;
float EParm;
WORD Count;
char Flags;
char unused;
GPOINT2 Nodes[4];
};
This appears to be a C99 idiom known as the "struct hack". You cannot (in standard C99; some compilers have an extension that allows it) declare a variable with this type, but you can declare pointers to it. You have to allocate objects of this type with malloc, providing extra space for the appropriate number of array elements. If nothing holds a pointer to an array element, you can resize the array with realloc.
Code that needs to be backward compatible with C89 needs to use
GPOINT2 Nodes[1];
as the last member, and take note of this when allocating.
This is very much not idiomatic C++ -- note for instance that you would have to jump through several extra hoops to make new and delete usable -- although I have seen it done. Idiomatic C++ would use vector<GPOINT2> as the last member of the struct.
Arrays of unknown size are not valid as C++ data members. They are valid in C99, and your compiler may be mixing C99 support with C++.
What you can do in C++ is 1) give it a size, 2) use a vector or another container, or 3) ditch both automatic (local variable) and normal dynamic storage in order to control allocation explicitly. The third is particularly cumbersome in C++, especially with non-POD, but possible; example:
struct A {
int const size;
char data[1];
~A() {
// if data was of non-POD type, we'd destruct data[1] to data[size-1] here
}
static auto_ptr<A> create(int size) {
// because new is used, auto_ptr's use of delete is fine
// consider another smart pointer type that allows specifying a deleter
A *p = ::operator new(sizeof(A) + (size - 1) * sizeof(char));
try { // not necessary in our case, but is if A's ctor can throw
new(p) A(size);
}
catch (...) {
::operator delete(p);
throw;
}
return auto_ptr<A>(p);
}
private:
A(int size) : size (size) {
// if data was of non-POD type, we'd construct here, being very careful
// of exception safety
}
A(A const &other); // be careful if you define these,
A& operator=(A const &other); // but it likely makes sense to forbid them
void* operator new(size_t size); // doesn't prevent all erroneous uses,
void* operator new[](size_t size); // but this is a start
};
Note you cannot trust sizeof(A) any where else in the code, and using an array of size 1 guarantees alignment (matters when the type isn't char).
This type of structure is not trivially useable on the stack, you'll have to malloc it. the significant thing to know is that sizeof(GPATH2) doesn't include the trailing array. so to create one, you'd do something like this:
GPATH2 *somePath;
size_t numPoints;
numPoints = 4;
somePath = malloc(sizeof(GPATH2) + numPoints*sizeof(GPOINT2));
I'm guessing GPATH2.Count is the number of elements in the Nodes array, so if it's up to you to initialize that, be sure and set somePath->Count = numPoints; at some point. If I'm mistaken, and the convention used is to null terminate the array, then you would do things just a little different:
somePath = malloc(sizeof(GPATH2) + (numPoints+1)*sizeof(GPOINT2));
somePath->Nodes[numPoints] = Some_Sentinel_Value;
make darn sure you know which convention the library uses.
As other folks have mentioned, realloc() can be used to resize the struct, but it will invalidate old pointers to the struct, so make sure you aren't keeping extra copies of it (like passing it to the library).