How do I allocate variably-sized structures contiguously in memory? - c++

I'm using C++, and I have the following structures:
struct ArrayOfThese {
int a;
int b;
};
struct DataPoint {
int a;
int b;
int c;
};
In memory, I want to have 1 or more ArrayOfThese elements at the end of each DataPoint. There are not always the same number of ArrayOfThese elements per DataPoint.
Because I have a ridiculous number of DataPoints to assemble and then stream across a network, I want all my DataPoints and their ArrayOfThese elements to be contiguous. Wasting space for a fixed number of the ArrayOfThese elements is unacceptable.
In C, I would have made an element at the end of DataPoint that was declared as ArrayOfThese d[0];, allocated a DataPoint plus enough extra bytes for however many ArrayOfThese elements I had, and used the dummy array to index into them. (Of course, the number of ArrayOfThese elements would have to be in a field of DataPoint.)
In C++, is using placement new and the same 0-length array hack the correct approach? If so, does placement new guarantee that subsequent calls to new from the same memory pool will allocate contiguously?

Since you are dealing with plain structures that have no constructors, you could revert to C memory management:
void *ptr = malloc(sizeof(DataPoint) + n * sizeof(ArrayOfThese));
DataPoint *dp = reinterpret_cast<DataPoint *>(ptr));
ArrayOfThese *aotp = reinterpet_cast<ArrayOfThese *>(reintepret_cast<char *>(ptr) + sizeof(DataPoint));

Since your structs are PODs you might as well do it just as you would in C. The only thing you'll need is a cast. Assuming n is the number of things to allocate:
DataPoint *p=static_cast<DataPoint *>(malloc(sizeof(DataPoint)+n*sizeof(ArrayOfThese)));
Placement new does come into this sort of thing, if your objects have a a non-trivial constructor. It guarantees nothing about any allocations though, for it does no allocating itself and requires the memory to have been already allocated somehow. Instead, it treats the block of memory passed in as space for the as-yet-unconstructed object, then calls the right constructor to construct it. If you were to use it, the code might go like this. Assume DataPoint has the ArrayOfThese arr[0] member you suggest:
void *p=malloc(sizeof(DataPoint)+n*sizeof(ArrayOfThese));
DataPoint *dp=new(p) DataPoint;
for(size_t i=0;i<n;++i)
new(&dp->arr[i]) ArrayOfThese;
What gets constructed must get destructed so if you do this you should sort out the call of the destructor too.
(Personally I recommend using PODs in this sort of situation, because it removes any need to call constructors and destructors, but this sort of thing can be done reasonably safely if you are careful.)

As Adrian said in his answer, what you do in memory doesn't have to be the same as what you stream over the network. In fact, it might even be good to clearly divide this, because having a communication protocol relying on your data being designed in a specific way makes huge problem if you later need to refactor your data.
The C++ way to store an arbitrary number of elements contiguously is of course to std::vector. Since you didn't even consider this, I assume that there's something that makes this undesirable. (Do you only have small numbers of ArrayOfThese and fear the space overhead associated with std::vector?)
While the trick with over-allocating a zero-length array probably isn't guaranteed to work and might, technically, invoke the dreaded undefined behavior, it's a widely spread one. What platform are you on? On Windows, this is done in the Windows API, so it's hard to imagine a vendor shipping a C++ compiler which wouldn't support this.
If there's a limited number of possible ArrayOfThese element counts, you could also use fnieto's trick to specify those few numbers and then new one of the resulting template instances, depending on the run-time number:
struct DataPoint {
int a;
int b;
int c;
};
template <std::size_t sz>
struct DataPointWithArray : DataPoint {
ArrayOfThese array[sz];
};
DataPoint* create(std::size_t n)
{
switch(n) {
case 1: return new DataPointWithArray[1];
case 2: return new DataPointWithArray[2];
case 5: return new DataPointWithArray[5];
case 7: return new DataPointWithArray[7];
case 27: return new DataPointWithArray[27];
default: assert(false);
}
return NULL;
}

Prior to C++0X, the language had no memory model to speak of. And with the new standard, I don't recall any talk of guarantees of contiguity.
Regarding this particular question, it sounds as if what you want is a pool allocator, many examples of which exist. Consider, for instance, Modern C++ Design, by Alexandrescu. The small object allocator discussion is what you should look at.

I think boost::variant might accomplish this. I haven't had an opportunity to use it, but I believe it's a wrapper around unions, and so a std::vector of them should be contiguous, but of course each item will take up the larger of the two sizes, you can't have a vector with differently-sized elements.
Take a look at the comparison of boost::variant and boost::any.
If you want the offset of each element to be dependent on the composition of the previous elements, you will have to write your own allocator and accessors.

Seems like it would be simpler to allocate an array of pointers and work with that rather than using placement new. That way you could just reallocate the whole array to the new size with little runtime cost. Also if you use placement new, you have to explicitly call destructors, which means mixing non-placement and placement in a single array is dangerous. Read http://www.parashift.com/c++-faq-lite/dtors.html before you do anything.

don't confuse data organisation inside your program and data organisation for serialization: they do not have the same goal.
for streaming across a network, you have to consider both side of the channel, the sending and the receiving side: how does the receiving side differentiate between a DataPoint and an ArrayOfThese ? how does the receiving side know how many ArrayOfThese are appended after a DataPoint ? (also to consider: what is the byte ordering of each side ? does data types have the same size in memory ?)
personally, i think you need a different structure for streaming your data, in which you add the number of DataPoint you are sending as well as the number of ArrayOfThese after each DataPoint. i would also not care about the way data is already organized in my program and reorganize/reformat to suit my protocol and not my program. after that writing a function for sending and another for receiving is not a big deal.

Why not have DataPoint contain a variable-length array of ArrayOfThese items? This will work in C or C++. There are some concerns if either struct contains non-primitive types
But use free() rather than delete on the result:
struct ArrayOfThese {
int a;
int b;
};
struct DataPoint {
int a;
int b;
int c;
int length;
ArrayOfThese those[0];
};
DataPoint* allocDP(int a, int b, int c, size_t length)
{
// There might be alignment issues, but not for most compilers:
size_t sz = sizeof(DataPoint) + length * sizeof(ArrayOfThese);
DataPoint dp = (DataPoint*)calloc( sz );
// (Check for out of memory)
dp->a = a; dp->b = b; tp->c = c; dp->length = length;
}
Then you can use it "normally" in a loop where the DataPoint knows its length:
DataPoint *dp = allocDP( 5, 8, 3, 20 );
for(int i=0; i < dp->length; ++i)
{
// Initialize or access: dp->those[i]
}

Could you make those into classes with the same superclass and then use your favourite stl container of choice, using the superclass as the template?

Two questions: Is the similarity between ArrayOfThese and DataPoint real, or a simplification for posting? I.e. is the real difference just one int (or some arbitrary number of the same type of items)?
Is the number of ArrayOfThese associated with a particular DataPoint known at compile time?
If the first is true, I'd think hard about simply allocating an array of as many items as necessary for one DataPoint+N ArrayOfThese. I'd then build a quick bit of code to overload operator[] for that to return item N+3, and overload a(), b() and c() to return the first three items.
If the second is true, I was going to suggest essentially what I see fnieto has just posted, so I won't go into more detail.
As far as placement new goes, it doesn't really guarantee anything about allocation -- in fact, the whole idea about placement new is that it's completely unrelated to memory allocation. Rather, it allows you to create an object at an arbitrary address (subject to alignment restrictions) in a block of memory that's already allocated.

Here's the code I ended up writing:
#include <iostream>
#include <cstdlib>
#include <cassert>
using namespace std;
struct ArrayOfThese {
int e;
int f;
};
struct DataPoint {
int a;
int b;
int c;
int numDPars;
ArrayOfThese d[0];
DataPoint(int numDPars) : numDPars(numDPars) {}
DataPoint* next() {
return reinterpret_cast<DataPoint*>(reinterpret_cast<char*>(this) + sizeof(DataPoint) + numDPars * sizeof(ArrayOfThese));
}
const DataPoint* next() const {
return reinterpret_cast<const DataPoint*>(reinterpret_cast<const char*>(this) + sizeof(DataPoint) + numDPars * sizeof(ArrayOfThese));
}
};
int main() {
const size_t BUF_SIZE = 1024*1024*200;
char* const buffer = new char[BUF_SIZE];
char* bufPtr = buffer;
const int numDataPoints = 1024*1024*2;
for (int i = 0; i < numDataPoints; ++i) {
// This wouldn't really be random.
const int numArrayOfTheses = random() % 10 + 1;
DataPoint* dp = new(bufPtr) DataPoint(numArrayOfTheses);
// Here, do some stuff to fill in the fields.
dp->a = i;
bufPtr += sizeof(DataPoint) + numArrayOfTheses * sizeof(ArrayOfThese);
}
DataPoint* dp = reinterpret_cast<DataPoint*>(buffer);
for (int i = 0; i < numDataPoints; ++i) {
assert(dp->a == i);
dp = dp->next();
}
// Here, send it out.
delete[] buffer;
return 0;
}

Related

Calculate length of double pointer array

I have a double pointer Array of a structure:
typedef struct Position{
int x;
int y;
} Position;
Position** array = (Position**)malloc(sizeof(Position*)*10); //10 elements
array[0] = (Position*)malloc(sizeof(Position*));
array[0]->x = 10;
array[0]->y = 5;
Can I calculate the length of set array and if so, how?
The normal way for arrays does not work :
int length = sizeof(<array>)/sizeof(<array>[0]);
Once you have dynamically allocated an array, there is no way of finding out the number of elements in it.
I once heard of some hacky way to obtain the size of a memory block, (msize) which would allegedly allow you to infer the size of the data within the block, but I would advice against any such weird tricks, because they are not covered by the standard, they represent compiler-vendor-specific extensions.
So, the only way to know the size of your array is to keep the size of the array around. Declare a struct, put the array and its length in the struct, and use that instead of the naked array.
As you marked the question as C++, I would suggest that you use std::vector, then, after you "allocated some memory" (or requested some memory to allocated by std::vector constructor or by using push_back, or resize), you can simply get the size back using by using std::vector::size.
typedef struct Position{
int x;
int y;
} Position;
std::vector<Position> array(10);
array[0].x = 10;
array[0].y = 5;
size_t size = array.size(); // will be 10
Having only a pointer to some memory block, you cannot defer the size of this memory block. So you cannot defer the number of elements in it.
For arrays of pointers, however, you could infer the number of elements in it under the following conditions:
make sure that every pointer (except the last one) points to a valid object.
for the last pointer in the array, make sure that it is always NULL.
Then you can derive the length by counting until you reach NULL.
Maybe there are some other similar strategies.
Solely from the pointer itself, however, you cannot derive the number of elements in it.
Old question, but in case someone needs it:
#include <stdio.h>
...
int main()
{
char **double_pointer_char;
...
int length_counter = 0;
while(double_pointer_char[length_counter])
length_counter++;
...
return 0;
}

Reallocation of a struct array

I would like to ask you how to reallocate a struct array in C++?
In C there is realloc which is quite good, but it is not recommended to use it in C++. Maybe some of you would tell me that I should not use a struct array?
Well, in this task we cannot use any STL containers, so struct is the only option, I suppose. It is for the matter of practice with allocation, reallocation of memory and other things...
In the example bellow I wrote a code how I would do it in C by using malloc and realloc. Can you give me an advice how to do it in C++.
Thanks.
class CCompany
{
public:
CCompany();
bool NewAccount(const char * accountID, int initialBalance);
struct ACCOUNT
{
char *accID;
int initialBalance;
...
};
ACCOUNT* accounts ;
...
...
private:
int ReallocationStep = 100;
int accountCounter = 1;
int allocatedAccounts = 100;
...
}
CCompany::CCompany()
{
accounts = (ACCOUNT*)malloc(allocatedItems*sizeof(*accounts));
}
bool CCompany::NewAccount(const char * accountID, int initialBalance)
{
// Firstly I check if there is already an account in the array of struct. If so, return false.
...
// Account is not there, lets check if there is enough memory allocated.
if (accountCounter == allocatedAccounts)
{
allocatedAccounts += ReallocationStep;
accounts = (ACCOUNT *) realloc(accounts, allocatedAccounts * sizeof(*accounts));
}
// Everything is okay, we can add it to the struct array
ACCOUNT account = makeStruct(accID, initialBalance);
accounts[CounterAccounts] = account;
return true;
}
If You have no possibility to use STL containers, maybe You should consider using some sort of list instead of array. Basing on Your code, this could be better solution than reallocating memory over and over the time.
Personally I don't think that realloc is not recommended in C++, yet for many uses of malloc, realloc, free there are other concepts in C++ (like new, placement new, delete, ...), shifting the semantics more on "objects" rather than on "plain memory".
So it is still valid to use the realloc-approach as you did; And - if dynamic data structures like linked lists are not a choice - actually the realloc-metaphor is the best I can think of, because it avoids unnecessarily copying, deleting, recreating items again and again while still providing a continuous block of memory holding all the objects.
According to other questions+answers(1, 2), you should avoid using malloc and realloc in C++ where possible.
The latter of those two references gives a good suggestion: If you're not allowed to use std::vector due to it being an STL container, perhaps std::fstream might be worth looking into as an alternative. This would suggest that working with files without relying upon excess working memory could be the intended goal of the assessment task. I can't see the assignment criteria, so I can't say whether or not this would be compliant.
Even with an assignment criteria on your side, some lecturers like to change requirements with little or no notice; in fact, sometimes just seeing a solution to the assignment that isn't what they had in mind will (unfairly) prompt such a modification. Any assessment that prompts you to reinvent std::vector seems silly to me, but if you have two options, and only one of them involves staying in your degree, I think your only solution will be to use realloc; there's no need for malloc here.
To reduce the overhead of calling realloc so often (as pointed out by another answer), you could remove two of your three counters, call realloc when the remaining counter is about to become a power of two, and reallocate by a factor of two like I did in push_back:
void *push_back(void **base, void const *value, size_t *nelem, size_t size) {
typedef unsigned char array[size];
array *b = *base;
if (SIZE_MAX / sizeof *b <= *nelem) {
return NULL;
}
if (*nelem & -~*nelem ? 0 : 1) {
b = realloc(b, (*nelem * 2 + 1) * sizeof *b);
if (!b) {
return NULL;
}
*base = b;
}
b += (*nelem)++;
return value
? memmove(b, value, sizeof *b)
: b;
}
The correct C++ way would be to use a std::vector which can deal nicely with reallocations. As your assignment do not allow you to use standard containers, you can:
either build a custom container using new and delete for reallocation and based on an array or a linked list
or directly use an array and stick to new and delete for reallocations - still acceptable C++
or revert to the good old malloc and realloc from the C standard library which is included in the C++ standard library. But you must be aware that this will not initialize the structs.
Because malloc/realloc would not call constructors, the last way must be seen as a low level optimization and the no initialization should be explicetely documented.

Is it worth to use vector in case of making a map

I have got a class that represents a 2D map with size 40x40.
I read some data from sensors and create this map with marking cells if my sensors found something and I set value of propablity of finding an obstacle. For example when I am find some obstacle in cell [52,22] I add to its value for example to 10 and add to surrounded cells value 5.
So each cell of this map should keep some little value(propably not bigger). So when a cell is marked three times by sensor, its value will be 30 and surronding cells will have 15.
And my question is, is it worth to use casual array or is it better to use vector even I do not sort this cells, dont remove them etc. I just set its value, and read it later?
Update:
Actually I have in my header file:
using cell = uint8_t;
class Grid {
private:
int xSize, ySize;
cell *cells;
public:
//some methods
}
In cpp :
using cell = uint8_t;
Grid::Grid(int xSize, int ySize) : xSize(xSize), ySize(ySize) {
cells = new cell[xSize * ySize];
for (int i = 0; i < xSize; i++) {
for (int j = 0; j < ySize; j++)
cells[x + y * xSize] = 0;
}
}
Grid::~Grid(void) {
delete cells;
}
inline cell* Grid::getCell(int x, int y) const{
return &cells[x + y * xSize];
}
Does it look fine?
I'd use std::array rather than std::vector.
For fixed size arrays you get the benefits of STL containers with the performance of 'naked' arrays.
http://en.cppreference.com/w/cpp/container/array
A static (C-style) array is possible in your case since the size in known at compile-time.
BUT. It may be interesting to have the data on the heap instead of the stack.
If the array is a global variable, it's ugly an bug-prone (avoid that when you can).
If the array is a local variable (let say, in your main() function), then a stack overflow may occur. Well, it's very unlikely for a 40*40 array of tiny things, but I'd prefer have my data on the heap, to keep things safe, clean, and future-proof.
So, IMHO you should definitely go for the vector, it's fast, clean and readable, and you don't have to worry about stack overflow, memory allocation, etc.
About your data. If you know your values are storable on a single byte, go for it !
An uint8_t (same as unsigned char) can store values from 0 to 255. If it's enough, use it.
using cell = uint8_t; // define a nice name for your data type
std::vector<cell> myMap;
size_t size = 40;
myMap.reserve(size*size);
side note: don't use new[]. Well, you can, but it has no advantages over a vector. You will probably only gain headaches handling memory manually.
Some advantages of using a std::vector is that it can be dynamically allocated (flexible size, can be resized during execution, etc) and can be passed/returned from a function. Since you have a fixed size 40x40 and you know you have one element int in every cell, I don't think it matters that much in your case and I would NOT suggest using a class object std::vector to process this simple task.
And here is a possible duplicate.

elegant way to create&pass multi-dimensional array in c++?

first question:
for known dimensions, we don't need new/malloc for the creation
const int row = 3;
const int col = 2;
int tst_matrix[row][col] ={{1,2},{3,4},{5,6}}
however, there is no easy to pass this two-dimensional array to another function, right? because
int matrix_process(int in_matrix[][])
is illegal, you have to specify all the dimensions except the first one. if I need to change the content of in_matrix, how could I easily pass tst_matrix to the function matrix_process?
second question:
what's the standard way to create 2-dimensional array in c++ with new? I dont wanna use std::vector etc.. here.
here is what I come up with, is it the best way?
int **tst_arr = new int*[5];
int i=0, j=0;
for (i=0;i<5;i++)
{
tst_arr[i] = new int[5];
for (j=0;j<5;j++)
{
tst_arr[i][j] = i*5+j;
}
}
In addition, if I pass tst_array to another function, like:
int change_row_col( int **a)
{
.....................
//check which element is 0
for (i=0; i<5; i++)
for(j=0;j<5;j++)
{
if (*(*(a+i)+j)==0) //why I can not use a[i][j] here?
{
row[i]=1;
col[j]=1;
}
}
.....................
}
In addition, if I use ((a+i)+j), the result is not what I want.
Here is the complete testing code I had:
#include <iostream>
using namespace std;
//Input Matrix--a: Array[M][N]
int change_row_col( int **a)
{
int i,j;
int* row = new int[5];
int* col = new int[5];
//initialization
for(i=0;i<5;i++)
{
row[i]=0;
}
for(j=0;j<5;i++)
{
col[j]=0;
}
//check which element is 0
for (i=0; i<5; i++)
for(j=0;j<5;j++)
{
if (*(*(a+i)+j)==0) //why I can not use a[i][j] here?
{
row[i]=1;
col[j]=1;
}
}
for(i=0;i<5;i++)
for (j=0;j<5;j++)
{
if (row[i] || col[j])
{
*(*(a+i)+j)=0;
}
}
return 1;
}
int main ()
{
int **tst_arr = new int*[5];
int i=0, j=0;
for (i=0;i<5;i++)
{
tst_arr[i] = new int[5];
for (j=0;j<5;j++)
{
tst_arr[i][j] = i*5+j;
}
}
for (i=0; i<5;i++)
{
for(j=0; j<5;j++)
{
cout<<" "<<tst_arr[i][j];
}
cout<<endl;
}
change_row_col(tst_arr);
for (i=0; i<5;i++)
{
for(j=0; j<5;j++)
{
cout<<" "<<tst_arr[i][j];
}
cout<<endl;
}
for (i=0;i<5;i++)
{
delete []tst_arr[i];
}
delete []tst_arr;
}
For multidimensional arrays were all the bounds are variable at run time, the most common approach that I know of is to use a dynamically allocated one dimensional array and do the index calculations "manually". In C++ you would normally use a class such as a std::vector specialization to manage the allocation and deallocation of this array.
This produces essentially the same layout as a multidimensional array with fixed bounds and doesn't have any real implied overhead as, without fixed bounds, any approach would require passing all bar one of the array dimensions around at run time.
I honestly think the best idea is to eschew raw C++ arrays in favor of a wrapper class like the boost::multi_array type. This eliminates all sorts of weirdness that arises with raw arrays (difficulty passing them S parameters to functions, issues keeping track of the sizes of the arrays, etc.)
Also, I strongly urge you to reconsider your stance on std::vector. It's so much safer than raw arrays that there really isn't a good reason to use dynamic arrays over vectors in most circumstances. If you have a C background, it's worth taking the time to make the switch.
My solution using function template:
template<size_t M,size_t N>
void Fun(int (&arr)[M][N])
{
for ( int i = 0 ; i < M ; i++ )
{
for ( int j = 0 ; j < N ; j++ )
{
/*................*/
}
}
}
1)
template < typename T, size_t Row_, size_t Col_>
class t_two_dim {
public:
static const size_t Row = Row_;
static const size_t Col = Col_;
/* ... */
T at[Row][Col];
};
template <typename T>
int matrix_process(T& in_matrix) {
return T::Row * T::Col + in_matrix.at[0][0];
}
2) use std::vector. you're adding a few function calls (which may be inlined in an optimized build) and may be exporting a few additional symbols. i suppose there are very good reasons to avoid this, but appropriate justifications are sooooo rare. do you have an appropriate justification?
The simple answer is that the elegant way of doing it in C++ (you tagged C and C++, but your code is C++ new/delete) is by creating a bidimensional matrix class and pass that around (by reference or const reference). After that, the next option should always be std::vector (and again, I would implement the matrix class in terms of a vector). Unless you have a very compelling reason for it, I would avoid dealing with raw arrays of arrays.
If you really need to, but only if you really need to, you can perfectly work with multidimensional arrays, it is just a little more cumbersome than with plain arrays. If all dimensions are known at compile time, as in your first block this are some of the options.
const unsigned int dimX = ...;
const unsigned int dimY = ...;
int array[dimY][dimX];
void foo( int *array[dimX], unsigned int dimy ); // [1]
void foo( int (&array)[dimY][dimX] ); // [2]
In [1], by using pass-by-value syntax the array decays into a pointer to the first element, which means a pointer into an int [dimX], and that is what you need to pass. Note that you should pass the other dimension in another argument, as that will be unknown by the code in the function. In [2], by passing a reference to the array, all dimensions can be fixed and known. The compiler will ensure that you call only with the proper size of array (both dimensions coincide), and thus no need to pass the extra parameter. The second option can be templated to accomodate for different sizes (all of them known at compile time):
template <unsigned int DimX, unsigned int DimY>
void foo( int (&array)[DimY][DimX] );
The compiler will deduct the sizes (if a real array is passed to the template) and you will be able to use it inside the template as DimX and DimY. This enables the use of the function with different array sizes as long as they are all known at compile time.
If dimensions are not known at compile time, then things get quite messy and the only sensible approach is encapsulating the matrix in a class. There are basically two approaches. The first is allocating a single contiguous block of memory (as the compiler would do in the previous cases) and then providing functions that index that block by two dimensions. Look at the link up in the first paragraph for a simple approach, even if I would use std::vector instead of a raw pointer internally. Note that with the raw pointer you need to manually manage deletion of the pointer at destruction or your program will leak memory.
The other approach, which is what you started in the second part of your question is the one I would avoid at all costs, and consists in keeping a pointer into a block of pointers into integers. This complicates memory management (you moved from having to delete a pointer into having to delete DimY+1 pointers --each array[i], plus array) and you also need to manually guarantee during allocation that all rows contain the same number of columns. There is a substantial increase in the number of things that can go wrong and no gain, but some actual loss (more memory required to hold the intermediate pointers, worse runtime performance as you have to double reference, probably worse locality of data...
Wrapping up: write a class that encapsulates the bidimensional object in terms of a contiguous block of memory (array if sizes are known at compile time --write a template for different compile time sizes--, std::vector if sizes are not known until runtime, pointer only if you have a compelling reason to do so), and pass that object around. Any other thing will more often than not just complicate your code and make it more error prone.
For your first question:
If you need to pass a ND array with variable size you can follow the following method to define such a function. So, in this way you can pass the required size arguments to the function.
I have tested this in gcc and it works.
Example for 2D case:
void editArray(int M,int N,int matrix[M][N]){
//do something here
}
int mat[4][5];
editArray(4,5,mat); //call in this way

C++ Allocate Memory Without Activating Constructors

I'm reading in values from a file which I will store in memory as I read them in. I've read on here that the correct way to handle memory location in C++ is to always use new/delete, but if I do:
DataType* foo = new DataType[sizeof(DataType) * numDataTypes];
Then that's going to call the default constructor for each instance created, and I don't want that. I was going to do this:
DataType* foo;
char* tempBuffer=new char[sizeof(DataType) * numDataTypes];
foo=(DataType*) tempBuffer;
But I figured that would be something poo-poo'd for some kind of type-unsafeness. So what should I do?
And in researching for this question now I've seen that some people are saying arrays are bad and vectors are good. I was trying to use arrays more because I thought I was being a bad boy by filling my programs with (what I thought were) slower vectors. What should I be using???
Use vectors!!! Since you know the number of elements, make sure that you reserve the memory first (by calling myVector.reserve(numObjects) before you then insert the elements.).
By doing this, you will not call the default constructors of your class.
So use
std::vector<DataType> myVector; // does not reserve anything
...
myVector.reserve(numObjects); // tells vector to reserve memory
You can use ::operator new to allocate an arbitrarily sized hunk of memory.
DataType* foo = static_cast<DataType*>(::operator new(sizeof(DataType) * numDataTypes));
The main advantage of using ::operator new over malloc here is that it throws on failure and will integrate with any new_handlers etc. You'll need to clean up the memory with ::operator delete
::operator delete(foo);
Regular new Something will of course invoke the constructor, that's the point of new after all.
It is one thing to avoid extra constructions (e.g. default constructor) or to defer them for performance reasons, it is another to skip any constructor altogether. I get the impression you have code like
DataType dt;
read(fd, &dt, sizeof(dt));
If you're doing that, you're already throwing type safety out the window anyway.
Why are you trying to accomplish by not invoking the constructor?
You can allocate memory with new char[], call the constructor you want for each element in the array, and then everything will be type-safe. Read What are uses of the C++ construct "placement new"?
That's how std::vector works underneath, since it allocates a little extra memory for efficiency, but doesn't construct any objects in the extra memory until they're actually needed.
You should be using a vector. It will allow you to construct its contents one-by-one (via push_back or the like), which sounds like what you're wanting to do.
I think you shouldn't care about efficiency using vector if you will not insert new elements anywhere but at the end of the vector (since elements of vector are stored in a contiguous memory block).
vector<DataType> dataTypeVec(numDataTypes);
And as you've been told, your first line there contains a bug (no need to multiply by sizeof).
Building on what others have said, if you ran this program while piping in a text file of integers that would fill the data field of the below class, like:
./allocate < ints.txt
Then you can do:
#include <vector>
#include <iostream>
using namespace std;
class MyDataType {
public:
int dataField;
};
int main() {
const int TO_RESERVE = 10;
vector<MyDataType> everything;
everything.reserve( TO_RESERVE );
MyDataType temp;
while( cin >> temp.dataField ) {
everything.push_back( temp );
}
for( unsigned i = 0; i < everything.size(); i++ ) {
cout << everything[i].dataField;
if( i < everything.size() - 1 ) {
cout << ", ";
}
}
}
Which, for me with a list of 4 integers, gives:
5, 6, 2, 6