Array of doubles and heap corruption - c++

I have a function like
template <class Type>
myFunc(Type** arrayToBeFilled);
I call it like this:
double* array = NULL;
myFunc(&array);
And inside the function I do some reading and parsing numbers with strtod function:
//Here comes file opening, getting number of lines and number of doubles in every line
...
char *inputString = new char[LONG_STRING_SIZE];
char *pNext = NULL;
(*arrayToBeFilled) = new Type[length*rowSize];
for (int i=0; i<length; i++)
{
source.getline(inputString, LONG_STRING_SIZE);
pNext = NULL;
for (int j=0; j<rowSize; j++)
{
double d = strtod(inputString, &pNext);
(*arrayToBeFilled)[i*rowSize+j] = d;
inputString = pNext;
pNext = NULL;
}
}
Variable d is just for check with debugger - and it's just fine while running.
But after filling the array I try to print it (just for check)
for (int i=0; i<length; i++)
{
for (int j=0; j<rowSize; j++)
{
cout<<(*arrayToBeFilled)[i*rowSize+j]<<" ";
}
cout<<"\n";
}
And here comes bad output - other numbers, sometimes heap corruption and so. I was printing it in and out of the function - the same results. And I can't delete this array no or neither out the function - run time errors follow me!

Why do you use raw C arrays in C++? If you use STL classes like std::vector instead of raw new[], your code will become cleaner, simpler to read and maintain (for example, you don't need explicit delete[] calls: the destructor will cleanup heap memory). In general, in modern C++ the rule is "if you are writing new or delete, you are doing it wrong" (with some exceptions).
Note also that with C++11 move semantics, you can simply return the vector instead of using output reference/pointer arguments:
template <typename Type>
inline std::vector<Type> myFunc()
{
...
}
Inside your function body, instead of your code
(*arrayToBeFilled) = new Type[length*rowSize];
just write:
std::vector<Type> arrayToBeFilled(length*rowSize);
and then simply return arrayToBeFilled; .
(Note also that vector's can be nested together: you may also use vector<vector<Type>> to make a 2D array, but this is less efficient than a single vector<Type>, which more directly maps to your raw new[] call.)
In addition, in the code you posted you create a raw C array on the heap with new char[LONG_STRING_SIZE] and assign the pointer to it to inputString; then you modify inputString with an assignment from pNext: but in doing so, you leak the initial array whose pointer was stored in inputString.

Seems you don't have a return type
template <class Type>
void myFunc(Type** arrayToBeFilled);
and you should initialize your function
double array = NULL;
myFunc<double>(&array);
also when it comes to input, print out the values that you get, more often than not you may get something unexpected which causes the error.

Related

How do I allocate memory with "new" instead of malloc?

As a homework I have to write a program that operates around structures. As a first task i have to write a function that allocates memory for an array of N pointers that point to new structures(user decides about the value of N) and returns the adress of an array. The first problem that i have is understanding the form of malloc. I asked my professor what would be the equivalent by using "new" because it is more transparent for me but he answered that I should stick to malloc so i avoid making any mistakes. The following function looks like this:
struct Structure
{
int a;
char b;
float c;
};
Structure** allocating(int N)
{
struct Structure** tab = (struct Structure**) malloc(amount * sizeof(struct Structure*));
for (int i = 0; i < N; i++)
{
tab[i] = (struct Structure*) malloc(sizeof(struct Structure));
}
return tab;
}
I have tried understanding this form of allocating memory but so far i understand this as if i was allocating memory for a pointer pointing to the array of pointers(double **) which is not what was stated in a task. To sum up, i don't understand the way the allocating is written and how could it be written by using new.
Your C code allocates an array of pointers to Structure. I have no idea why it is done this way instead of simply allocating an array of Structure, you have to ask the author of the task for that.
Direct equivalent using new would look like this :
Structure** allocating(int N)
{
struct Structure** tab = new Structure*[N];
for (int i = 0; i < N; i++)
{
tab[i] = new Structure;
}
return tab;
}
Deallocating can be a bit tricky:
void deallocating(Structure** tab, int N)
{
for (int i = 0; i < N; i++)
{
delete tab[i]; //delete objects
}
delete[] tab; //delete array itself, notice the [] after delete!
}
If you can move to array of objects instead of array of pointers, the code gets much simpler:
Structure* allocating(int N)
{
return new Structure[N];
}
Later you also deallocate it with delete[]
As noticed in comments, in C++ we prefer to not use memory management directly (especially in modern C++ keyword new is considered to be a bad smell). std::vector can handle all your array-on-the-heap needs perfectly well, it will never leak memory and can resize itself easily.

C++ returning array

I am trying to wrap my mind around arrays and pointers in C++.
I am trying to write a function that allocates memory for an int array of variable size. The array is then filled with random numbers and returned.
The problem I have is assigning values with pointers.
This is the function:
int* getArray(int length) {
int* values = new int[length];
for (int i=0; i<length; i++) {
values[i]=i;
}
return values;
}
And it works just fine.
But, if I change it to this:
int* getArray(int length) {
int* values = new int[length];
for (int i=0; i<length; i++, values ++) {
*values=i;
}
return values;
}
It doesn't work.
Can someone explain to me why I cannot do this?
This is the main method that I used to print all the values:
int main() {
int* v = getArray(100);
for(int i=0;i<100;i++, v++) {
cout << *v <<endl;
}
delete[] v;
return 0;
}
I think it is because the new operator allocates new memory in the heap and not in the stack so the pointer might actually point to the wrong direction.
But, if that is true, why is the main method working?
Side note: I do not get any kind of warning or error, but the output is just a bunch of zeros and some random numbers.
I am using the GCC compiler on Ubuntu.
values++ is modifying values, which eventually forms the function return value.
i.e. you return one past the end of the allocated array. That in itself is perfectly legal C++.
But things go very badly wrong at the function call site: you access invalid elements of the array (which very informally speaking are your "random numbers"), and calling delete[] on that is also undefined behaviour. Boom!

How can I make my dynamic array or vector operate at a similar speed to a standard array? C++

I'm still quite inexperienced in C++ and i'm trying to write sum code to add numbers precisely. This is a dll plugin for some finite difference software and the code is called several million times during a run. I want to write a function where any number of arguments can be passed in and the sum will be returned. My code looks like:
#include <cstdarg>
double SumFunction(int numArgs, ...){ // this allows me to pass any number
// of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg
double myarray[10];
for (int i = 0; i < numArgs; i++) {
myarray[i] = va_arg(args,double);
} // I imagine this is sloppy code; however i cannot create
// myarray{numArgs] because numArgs is not a const int.
sum(myarray); // The actual method of addition is not relevant here, but
//for more complicated methods, I need to put the summation
// terms in a list.
vector<double> vec(numArgs); // instead, place all values in a vector
for (int i = 0; i < numArgs; i++) {
vec.at(i) = va_arg(args,double);
}
sum(vec); //This would be passed by reference, of course. The function sum
// doesn't actually exist, it would all be contained within the
// current function. This is method is twice as slow as placing
//all the values in the static array.
double *vec;
vec = new double[numArgs];
for (int i = 0; i < (numArgs); i++) {
vec[i] = va_arg(args,double);
}
sum(vec); // Again half of the speed of using a standard array and
// increasing in magnitude for every extra dynamic array!
delete[] vec;
va_end(args);
}
So the problem I have is that using an oversized static array is sloppy programming, but using either a vector or a dynamic array slows the program down considerably. So I really don't know what to do. Can anyone help, please?
One way to speed the code up (at the cost of making it more complicated) is to reuse a dynamic array or vector between calls, then you will avoid incurring the overhead of memory allocation and deallocation each time you call the function.
For example declare these variables outside your function either as global variables or as member variables inside some class. I'll just make them globals for ease of explanation:
double* sumArray = NULL;
int sumArraySize = 0;
In your SumFunction, check if the array exists and if not allocate it, and resize if necessary:
double SumFunction(int numArgs, ...){ // this allows me to pass any number
// of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg
// if the array has already been allocated, check if it is large enough and delete if not:
if((sumArray != NULL) && (numArgs > sumArraySize))
{
delete[] sumArray;
sumArray = NULL;
}
// allocate the array, but only if necessary:
if(sumArray == NULL)
{
sumArray = new double[numArgs];
sumArraySize = numArgs;
}
double *vec = sumArray; // set to your array, reusable between calls
for (int i = 0; i < (numArgs); i++) {
vec[i] = va_arg(args,double);
}
sum(vec, numArgs); // you will need to pass the array size
va_end(args);
// note no array deallocation
}
The catch is that you need to remember to deallocate the array at some point by calling a function similar to this (like I said, you pay for speed with extra complexity):
void freeSumArray()
{
if(sumArray != NULL)
{
delete[] sumArray;
sumArray = NULL;
sumArraySize = 0;
}
}
You can take a similar (and simpler/cleaner) approach with a vector, allocate it the first time if it doesn't already exist, or call resize() on it with numArgs if it does.
When using a std::vector the optimizer must consider that relocation is possible and this introduces an extra indirection.
In other words the code for
v[index] += value;
where v is for example a std::vector<int> is expanded to
int *p = v._begin + index;
*p += value;
i.e. from vector you need first to get the field _begin (that contains where the content starts in memory), then apply the index, and then dereference to get the value and mutate it.
If the code performing the computation on the elements of the vector in a loop calls any unknown non-inlined code, the optimizer is forced to assume that unknown code may mutate the _begin field of the vector and this will require doing the two-steps indirection for each element.
(NOTE: that the vector is passed with a cost std::vector<T>& reference is totally irrelevant: a const reference doesn't mean that the vector is const but simply puts a limitation on what operations are permitted using that reference; external code could have a non-const reference to access the vector and constness can also be legally casted away... constness of references is basically ignored by the optimizer).
One way to remove this extra lookup (if you know that the vector is not being resized during the computation) is to cache this address in a local and use that instead of the vector operator [] to access the element:
int *p = &v[0];
for (int i=0,n=v.size(); i<n; i++) {
/// use p[i] instead of v[i]
}
This will generate code that is almost as efficient as a static array because, given that the address of p is not published, nothing in the body of the loop can change it and the value p can be assumed constant (something that cannot be done for v._begin as the optimizer cannot know if someone else knows the address of _begin).
I'm saying "almost" because a static array only requires indexing, while using a dynamically allocated area requires "base + indexing" access; most CPUs however provide this kind of memory access at no extra cost. Moreover if you're processing elements in sequence the indexing addressing becomes just a sequential memory access but only if you can assume the start address constant (i.e. not in the case of std::vector<T>::operator[]).
Assuming that the "max storage ever needed" is in the order of 10-50, I'd say using a local array is perfectly fine.
Using vector<T> will use 3 * sizeof(*T) (at least) to track the contents of the vector. So if we compare that to an array of double arr[10];, then that's 7 elements more on the stack of equal size (or 8.5 in 32-bit build). But you also need a call to new, which takes a size argument. So that takes up AT LEAST one, more likely 2-3 elements of stackspace, and the implementation of new is quite possibly not straightforward, so further calls are needed, which take up further stack-space.
If you "don't know" the number of elements, and need to cope with quite large numbers of elements, then using a hybrid solution, where you have a small stack-based local array, and if numargs > small_size use vector, and then pass vec.data() to the function sum.

Filling an array of pointers, deleting when exiting

In C++, Lets say I'm creating an array of pointers and each element should point to a data type MyType. I want to fill this array in a function fillArPtr(MyType *arPtr[]). Lets also say I can create MyType objects with a function createObject(int x). It works the following way:
MyType *arptr[10]; // Before there was a mistake, it was written: "int *arptr[10]"
void fillArPtr(MyType *arptr[])
{
for (int i = 0; i < 10; i++)
{
MyType myObject = createObject(i);
arptr[i] = new MyType(myobject);
}
}
Is it the best way to do it? In this program how should I use delete to delete objects created by "new" (or should I use delete at all?)
Since you asked "What is the best way", let me go out on a limb here and suggest a more C++-like alternative. Since your createObject is already returning objects by value, the following should work:
#include <vector>
std::vector<MyType> fillArray()
{
std::vector<MyType> res;
for (size_t i = 0; i != 10; ++i)
res.push_back(createObject(i));
return res;
}
Now you don't need to do any memory management at all, as allocation and clean-up is done by the vector class. Use it like this:
std::vector<MyType> myArr = fillArray();
someOtherFunction(myArr[2]); // etc.
someLegacyFunction(&myArr[4]); // suppose it's "void someLegacyFunction(MyType*)"
Do say if you have a genuine requirement for manual memory management and for pointers, though, but preferably with a usage example.
Your method places the array of pointers on the stack, which is fine. Just thought I'd point out that it's also possible to store your array of pointers on the heap like so. Youd do this if you want your array to persist beyond the current scope
MyType **arptr = new MyType[10];
void fillArPtr(MyType *arptr[])
{
for (int i = 0; i < 10; i++)
{
MyType myObject = createObject(i);
arptr[i] = new MyType(myobject);
}
}
If you do this, don't forget to delete the array itself from the heap
for ( int i = 0 ; i < 10 ; i++ ) {
delete arptr[i];
}
delete [] arptr;
If you're going to use vector, and you know the size of the array beforehand, you should pre-size the array. You'll get much better performance.
vector<MyType*> arr(10);
for (int i = 0; i < 10; i++)
{
delete arptr[i];
arptr[i] = 0;
}
I suggest you look into boost shared_ptr (also in TR1 library)
Much better already:
std::vector<MyType*> vec;
for (int i=0; i<10; i++)
vec.push_back(new MyType(createObject(i));
// do stuff
// cleanup:
while (!vec.empty())
{
delete (vec.back());
vec.pop_back();
}
Shooting for the stars:
typedef boost::shared_ptr<MyType> ptr_t;
std::vector<ptr_t> vec;
for (int i=0; i<10; i++)
vec.push_back(ptr_t(new MyType(createObject(i)));
You would basically go through each element of the array and call delete on it, then set the element to 0 or null.
for (int i = 0; i < 10; i++)
{
delete arptr[i];
arptr[i] = 0;
}
Another way to do this is with an std::vector.
Use an array of auto_ptrs if you don't have to return the array anywhere. As long as you don't make copies of the auto_ptrs, they won't change ownership and they will deallocate their resources upon exiting of the function since its RAII based. It's also part of the standard already, so don't need boost to use it :) They're not useful in most places but this sounds like a good one.
You can delete the allocated objects using delete objPtr. In your case,
for (int i = 0; i < 10; i++)
{
delete arptr[i];
arptr[i] = 0;
}
The rule of thumb to remember is, if you allocate an object using new, you should delete it. If you allocate an array of objects using new[N], then you must delete[] it.
Instead of sticking pointers into a raw array, have a look at std::array or std::vector. If you also use a smart pointer, like std::unique_ptr to hold the objects within an std::array you don't need to worry about deleting them.
typedef std::array<std::unique_ptr<MyType>, 10> MyTypeArray;
MyTypeArray arptr;
for( MyTypeArray::iterator it = arptr.begin(), int i = 0; it != arptr.end(); ++it ) {
it->reset( new MyType( createObject(i++) ) );
}
You don't need to worry about deleting those when you're done using them.
Is the createObject(int x) function using new to create objects and returning a pointer to this?. In that case, you need to delete that as well because in this statement
new MyType( createObject(i++) )
you're making a copy of the object returned by createObject, but the original is then leaked. If you change createObject also to return an std::unique_ptr<MyType> instead of a raw pointer, you can prevent the leak.
If createObject is creating objects on the stack and returning them by value, the above should work correctly.
If createObject is not using new to create objects, but is creating them on the stack and returning pointers to these, your program is not going to work as you want it to, because the stack object will be destroyed when createObject exits.

elegant way to create&pass multi-dimensional array in c++?

first question:
for known dimensions, we don't need new/malloc for the creation
const int row = 3;
const int col = 2;
int tst_matrix[row][col] ={{1,2},{3,4},{5,6}}
however, there is no easy to pass this two-dimensional array to another function, right? because
int matrix_process(int in_matrix[][])
is illegal, you have to specify all the dimensions except the first one. if I need to change the content of in_matrix, how could I easily pass tst_matrix to the function matrix_process?
second question:
what's the standard way to create 2-dimensional array in c++ with new? I dont wanna use std::vector etc.. here.
here is what I come up with, is it the best way?
int **tst_arr = new int*[5];
int i=0, j=0;
for (i=0;i<5;i++)
{
tst_arr[i] = new int[5];
for (j=0;j<5;j++)
{
tst_arr[i][j] = i*5+j;
}
}
In addition, if I pass tst_array to another function, like:
int change_row_col( int **a)
{
.....................
//check which element is 0
for (i=0; i<5; i++)
for(j=0;j<5;j++)
{
if (*(*(a+i)+j)==0) //why I can not use a[i][j] here?
{
row[i]=1;
col[j]=1;
}
}
.....................
}
In addition, if I use ((a+i)+j), the result is not what I want.
Here is the complete testing code I had:
#include <iostream>
using namespace std;
//Input Matrix--a: Array[M][N]
int change_row_col( int **a)
{
int i,j;
int* row = new int[5];
int* col = new int[5];
//initialization
for(i=0;i<5;i++)
{
row[i]=0;
}
for(j=0;j<5;i++)
{
col[j]=0;
}
//check which element is 0
for (i=0; i<5; i++)
for(j=0;j<5;j++)
{
if (*(*(a+i)+j)==0) //why I can not use a[i][j] here?
{
row[i]=1;
col[j]=1;
}
}
for(i=0;i<5;i++)
for (j=0;j<5;j++)
{
if (row[i] || col[j])
{
*(*(a+i)+j)=0;
}
}
return 1;
}
int main ()
{
int **tst_arr = new int*[5];
int i=0, j=0;
for (i=0;i<5;i++)
{
tst_arr[i] = new int[5];
for (j=0;j<5;j++)
{
tst_arr[i][j] = i*5+j;
}
}
for (i=0; i<5;i++)
{
for(j=0; j<5;j++)
{
cout<<" "<<tst_arr[i][j];
}
cout<<endl;
}
change_row_col(tst_arr);
for (i=0; i<5;i++)
{
for(j=0; j<5;j++)
{
cout<<" "<<tst_arr[i][j];
}
cout<<endl;
}
for (i=0;i<5;i++)
{
delete []tst_arr[i];
}
delete []tst_arr;
}
For multidimensional arrays were all the bounds are variable at run time, the most common approach that I know of is to use a dynamically allocated one dimensional array and do the index calculations "manually". In C++ you would normally use a class such as a std::vector specialization to manage the allocation and deallocation of this array.
This produces essentially the same layout as a multidimensional array with fixed bounds and doesn't have any real implied overhead as, without fixed bounds, any approach would require passing all bar one of the array dimensions around at run time.
I honestly think the best idea is to eschew raw C++ arrays in favor of a wrapper class like the boost::multi_array type. This eliminates all sorts of weirdness that arises with raw arrays (difficulty passing them S parameters to functions, issues keeping track of the sizes of the arrays, etc.)
Also, I strongly urge you to reconsider your stance on std::vector. It's so much safer than raw arrays that there really isn't a good reason to use dynamic arrays over vectors in most circumstances. If you have a C background, it's worth taking the time to make the switch.
My solution using function template:
template<size_t M,size_t N>
void Fun(int (&arr)[M][N])
{
for ( int i = 0 ; i < M ; i++ )
{
for ( int j = 0 ; j < N ; j++ )
{
/*................*/
}
}
}
1)
template < typename T, size_t Row_, size_t Col_>
class t_two_dim {
public:
static const size_t Row = Row_;
static const size_t Col = Col_;
/* ... */
T at[Row][Col];
};
template <typename T>
int matrix_process(T& in_matrix) {
return T::Row * T::Col + in_matrix.at[0][0];
}
2) use std::vector. you're adding a few function calls (which may be inlined in an optimized build) and may be exporting a few additional symbols. i suppose there are very good reasons to avoid this, but appropriate justifications are sooooo rare. do you have an appropriate justification?
The simple answer is that the elegant way of doing it in C++ (you tagged C and C++, but your code is C++ new/delete) is by creating a bidimensional matrix class and pass that around (by reference or const reference). After that, the next option should always be std::vector (and again, I would implement the matrix class in terms of a vector). Unless you have a very compelling reason for it, I would avoid dealing with raw arrays of arrays.
If you really need to, but only if you really need to, you can perfectly work with multidimensional arrays, it is just a little more cumbersome than with plain arrays. If all dimensions are known at compile time, as in your first block this are some of the options.
const unsigned int dimX = ...;
const unsigned int dimY = ...;
int array[dimY][dimX];
void foo( int *array[dimX], unsigned int dimy ); // [1]
void foo( int (&array)[dimY][dimX] ); // [2]
In [1], by using pass-by-value syntax the array decays into a pointer to the first element, which means a pointer into an int [dimX], and that is what you need to pass. Note that you should pass the other dimension in another argument, as that will be unknown by the code in the function. In [2], by passing a reference to the array, all dimensions can be fixed and known. The compiler will ensure that you call only with the proper size of array (both dimensions coincide), and thus no need to pass the extra parameter. The second option can be templated to accomodate for different sizes (all of them known at compile time):
template <unsigned int DimX, unsigned int DimY>
void foo( int (&array)[DimY][DimX] );
The compiler will deduct the sizes (if a real array is passed to the template) and you will be able to use it inside the template as DimX and DimY. This enables the use of the function with different array sizes as long as they are all known at compile time.
If dimensions are not known at compile time, then things get quite messy and the only sensible approach is encapsulating the matrix in a class. There are basically two approaches. The first is allocating a single contiguous block of memory (as the compiler would do in the previous cases) and then providing functions that index that block by two dimensions. Look at the link up in the first paragraph for a simple approach, even if I would use std::vector instead of a raw pointer internally. Note that with the raw pointer you need to manually manage deletion of the pointer at destruction or your program will leak memory.
The other approach, which is what you started in the second part of your question is the one I would avoid at all costs, and consists in keeping a pointer into a block of pointers into integers. This complicates memory management (you moved from having to delete a pointer into having to delete DimY+1 pointers --each array[i], plus array) and you also need to manually guarantee during allocation that all rows contain the same number of columns. There is a substantial increase in the number of things that can go wrong and no gain, but some actual loss (more memory required to hold the intermediate pointers, worse runtime performance as you have to double reference, probably worse locality of data...
Wrapping up: write a class that encapsulates the bidimensional object in terms of a contiguous block of memory (array if sizes are known at compile time --write a template for different compile time sizes--, std::vector if sizes are not known until runtime, pointer only if you have a compelling reason to do so), and pass that object around. Any other thing will more often than not just complicate your code and make it more error prone.
For your first question:
If you need to pass a ND array with variable size you can follow the following method to define such a function. So, in this way you can pass the required size arguments to the function.
I have tested this in gcc and it works.
Example for 2D case:
void editArray(int M,int N,int matrix[M][N]){
//do something here
}
int mat[4][5];
editArray(4,5,mat); //call in this way