Dynamic memory allocation, C++ - c++

I need to write a function that can read a file, and add all of the unique words to a dynamically allocated array. I know how to create a dynamically allocated array if, for instance, you are asking for the number of entries in the array:
int value;
cin >> value;
int *number;
number = new int[value];
My problem is that I don't know ahead of time how many unique words are going to be in the file, so I can't initially just read the value or ask for it. Also, I need to make this work with arrays, and not vectors. Is there a way to do something similar to a push_back using a dynamically allocated array?
Right now, the only thing I can come up with is first to create an array that stores ALL of the words in the file (1000), then have it pass through it and find the number of unique words. Then use that value to create a dynamically allocated array which I would then pass through again to store all the unique words. Obviously, that solution sounds pretty overboard for something that should have a more effective solution.
Can someone point me in the right direction, as to whether or not there is a better way? I feel like this would be rather easy to do with vectors, so I think it's kind of silly to require it to be an array (unless there's some important thing that I need to learn about dynamically allocated arrays in this homework assignment).
EDIT: Here's another question. I know there are going to be 1000 words in the file, but I don't know how many unique words there will be. Here's an idea. I could create a 1000 element array, write all of the unique words into that array while keeping track of how many I've done. Once I've finished, I could provision a dynamically allocate a new array with that count, and then just copy the words from the initial array to the second. Not sure if that's the most efficient, but with us not being able to use vectors, I don't think efficiency is a huge concern in this assignment.

A vector really is a better fit for this than an array. Really.
But if you must use an array, you can at least make it behave like a vector :-).
Here's how: allocate the array with some capacity. Store the allocated capacity in a "capacity" variable. Each time you add to the array, increment a separate "length" variable. When you go to add something to the array and discover it's not big enough (length == capacity), allocate a second, longer array, then copy the original's contents to the new one, then finally deallocate the original.
This gives you the effect of being able to grow the array. If performance becomes a concern, grow it by more than one element at a time.
Congrats, after following these easy steps you have implemented a small subset of std::vector functionality atop an array!

As you have rightly pointed out this is trivial with a Vector.
However, given that you are limited to using an array, you will likely need to do one of the following:
Initialize the array with a suitably large size and live with poor memory utilization
Write your own code to dynamically increase the size of the array at run time (basically the internals of a Vector)
If you were permitted to do so, some sort of hash map or linked list would also be a good solution.

If I had to use an array, I'd just allocate one with some initial size, then keep doubling that size when I fill it to accommodate any new values that won't fit in an array with the previous sizes.
Since this question regards C++, memory allocation would be done with the new keyword. But what would be nice is if one could use the realloc() function, which resizes the memory and retains the values in the previously allocated memory. That way one wouldn't need to copy the new values from the old array to the new array. Although I'm not so sure realloc() would play well with memory allocated with new.

You can "resize" array like this (N is size of currentArray, T is type of its elements):
// create new array
T *newArray = new T[N * 2];
// Copy the data
for ( int i = 0; i < N; i++ )
newArray[i] = currentArray[i];
// Change the size to match
N *= 2;
// Destroy the old array
delete [] currentArray;
// set currentArray to newArray
currentArray = newArray;
Using this solution you have to copy the data. There might be a solution that does not require it.
But I think it would be more convenient for you to use std::vectors. You can just push_back into them and they will resize automatically for you.

You can cheat a bit:
use std::set to get all the unique words then copy the set into a dynamically allocated array (or preferably vector).
#include <iterator>
#include <set>
#include <iostream>
#include <string>
// Copy into a set
// this will make sure they are all unique
std::set<std::string> data;
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(data, data.end()));
// Copy the data into your array (or vector).
std::string* words = new std::string[data.size()];
std::copy(data.begin(), data.end(), &words[0]);

This could be going a bit overboard, but you could implement a linked list in C++... it would actually allow you to use a vector-like implementation without actually using vectors (which are actually the best solution).
The implementation is fairly easy: just a pointer to the next and previous nodes and storing the "head" node in a place you can easily access to. Then just looping through the list would let you check which words are already in, and which are not. You could even implement a counter, and count the number of times a word is repeated throughout the text.

Related

Dynamic and static array

I am studying C++ reading Stroustrup's book that in my opinion is not very clear in this topic (arrays). From what I have understood C++ has (like Delphi) two kind of arrays:
Static arrays that are declared like
int test[3] = {10,487,-22};
Dynamic arrays that are called vectors
std::vector<int> a;
a.push_back(10);
a.push_back(487);
a.push_back(-22);
I have already seen answers about this (and there were tons of lines and concepts inside) but they didn't clarify me the concept.
From what I have understood vectors consume more memory but they can change their size (dynamically, in fact). Arrays instead have a fixed size that is given at compile time.
In the chapter Stroustrup said that vectors are safe while arrays aren't, whithout explaining the reason. I trust him indeed, but why? Is the reason safety related to the location of the memory? (heap/stack)
I would like to know why I am using vectors if they are safe.
The reason arrays are unsafe is because of memory leaks.
If you declare a dynamic array
int * arr = new int[size]
and you don't do delete [] arr, then the memory remains uncleared and this is known as a memory leak. It should be noted, ANY time you use the word new in C++, there must be a delete somewhere in there to free that memory. If you use malloc(), then free() should be used.
http://ptolemy.eecs.berkeley.edu/ptolemyclassic/almagest/docs/prog/html/ptlang.doc7.html
It is also very easy to go out of bounds in an array, for example inserting a value in an index larger than its size -1. With a vector, you can push_back() as many elements as you want and the vector will resize automatically. If you have an array of size 15 and you try to say arr[18] = x,
Then you will get a segmentation fault. The program will compile, but will crash when it reaches a statement that puts it out of the array bounds.
In general when you have large code, arrays are used infrequently. Vectors are objectively superior in almost every way, and so using arrays becomes sort of pointless.
EDIT: As Paul McKenzie pointed out in the comments, going out of array bounds does not guarantee a segmentation fault, but rather is undefined behavior and is up to the compiler to determine what happens
Let us take the case of reading numbers from a file.
We don't know how many numbers are in the file.
To declare an array to hold the numbers, we need to know the capacity or quantity, which is unknown. We could pick a number like 64. If the file has more than 64 numbers, we start overwriting the array. If the file has fewer than 64 (like 16), we are wasting memory (by not using 48 slots). What we need is to dynamically adjust the size of the container (array).
To dynamically adjust the capacity of an array, a new larger array must be created, then elements copied and the old array deleted.
The std::vector will adjust its capacity as necessary. It handles the dynamic allocation of memory for you.
Another aspect is the passing of the container to a function. With an array, you need to pass the array and the capacity. With std::vector, you only need to pass the vector. The vector object can be queried about its capacity.
One Security I can see is that you can't access something in vector which is not there.
What I meant by that is , if you push_back only 4 elements and you try to access index 7 , then it will throw back an error. But in array that doesn't happen.
In short, it stops you from accessing corrupt data.
edit :
programmer has to compare the index with vector.size() to throw an error. and it doesn't happne automatically. One has to do it by himself/herself.

assign space for vector of pointer to struct

For some reason, I have a vector of pointer of struct, I would like to assign new space to every block of the vector. But I don't want to do it in a loop for every block as it may slow the whole process. Is there a faster way to do it?
Could anyone provide me a solution with code?
This is what I current doing("pool" is the struct name):
vector<pool*> poolPointer(vectorSize);
for(int i = 0; i<poolPointer.size() ;i++){
poolPointer.at(i) = new pool;}
I think it is very slow thus I would like to search for a faster way to allocate space and return point of struct to each individual block in the vector.
I has nothing to do with vector. What you are looking for is custom memory allocation, and in place operator new in particular. So you can allocate a single chunk of memory for all your pool instances and then create instances in this memory chunk.
EDIT:
As #JSF commented, you can allocate many instances all together as an array of "values", not pointers. You can then use vector of pointers if you wish, or you can use vector of values and don't bother with pointers at all. I'd start with vector of values and only if profiling showed that frequent removal from a vector is a bottleneck I'd think about vector of pointers as an optimisation.

C++: Dynamically growing 2d array

I have the following situation solved with a vector, but one of my older colleagues told me in a discussion that it would be much faster with an array.
I calculate lots (and I mean lots!) of 12-dimensional vectors from lots of audio files and have to store them for processing. I really need all those vectors before I can start my calculation. Anyhow, I can not predict how many audios, and I can not predict how many vectors are extracted from each audio. Therefor I need a structure to hold the vectors dynamically.
Therefor I create a new double array for each vector and push it to a vector.
I now want to face and test, if my colleague is really right that the calculation can be boosted with using also an array instead of a vector for storing.
vector<double*>* Features = new vector<double*>();
double* feature = new double[12];
// adding elements
Features->push_back(features);
As far as i know to create dynamically 2d array I need to know the count of rows.
double* container = new double*[rows];
container[0] = new double[12];
// and so on..
I know rows after processing all audios, and I don't want to process the audio double times.
Anyone got any idea on how to solve this and append it, or is it just not possible in that way and I should use either vector or create own structure (which assumed may be slower than vector).
Unless have any strong reasons not to, I would suggest something like this:
std::vector<std::array<double, 12>> Features;
You get all the memory locality you could want, and all of the the automagic memory management you need.
You can certainly do this, but it would be much better if you perform this with std::vector. For dynamic growth of a 2D array, you would have to perform all these things.
Create a temporary 2D Array
Allocate memory to it.
Allocate memory to its each component array.
Copy data into its component arrays.
Delete each component array of the original 2D Array.
Delete the 2D Array.
Take new Input.
Add new item to the temporary 2D array.
Create the original 2D Array and allocate memory to it.
Allocate memory to its component arrays.
Copy temporary data into it again.
After doing this in each step, it is hardly acceptable that arrays would be any faster. Use std:vector. The above written answers explain that.
Using vector will make the problem easier because it makes growing the data automatic. Unfortunately due to how vectors grow, using vectors may not be the best solution because of the number of times required to grow for a large data set. On the other hand if you set the initial size of the vector quite large but only need a small number of 12 index arrays. You just wasted a large amount of memory. If there is someway to produce a guess of the size required you might use that guess value to dynamically allocate arrays or set the vector to that size initially.
If you are only going to calculate with the data once or twice, than maybe you should consider using map or list. These two structures for large arrays will create a memory structure that matches your exact needs, and bypass the extra time requirements for growing the arrays. On the other hand the calculations with these data structures will be slower.
I hope these thoughts add some alternative solutions to this discussion.

C++ Deleting part of dynamic array

Say I have a dynamic array like:
int* integers = new int[100];
Is there a way to delete only part of the array such as:
int* integers2 = integers + 50;
delete[] integers2;
I want to not only delete everything 50 and beyond, but if I call another delete[] on the original integers array, then it would only delete the correct amount of memory and not try to delete the originally allocated amount and seg fault.
Why I want to do this: I have a data structure that is structured in levels of arrays, and I want to be able to create this data structure from a full array. So I want to be able to say
int* level1 = integers;
int* level2 = integers + 50;
int* level3 = integers + 100;
But when level 3 is no longer needed, the data structure will automatically delete[] level3. I need to know that this will behave correctly and not just destroy everything in the array. If it will then I need to just create new arrays and copy the contents over, but it would be nice to avoid doing that for performance reasons.
Edit: Everyone seems to be jumping to the conclusion that I just should use a dynamic resizing container (ie vector, deque) in the first place for my data structure. I am using levels of arrays for a good reason (and they aren't equally sized like I make it look like in my example). I was merely looking for a good way to have a constructor to my data structure that takes in an array or vector and not need to copy the original contents over into the new data structure.
No, this will not behave correctly. You can only delete[] pointers that you got from new[], else the results are undefined and bad things might happen.
If you really need the array to get smaller you have to allocate a new one and copy the contents manually.
Typically when memory gets allocated, there is some housekeeping stuff before the pointer.
i.e. houskeeping (pointer) data
You will mess that up.
int* integers2 = integers + 50;
delete[] integers2;
Will not work because new is created on int*, so space of 100 int has been assigned to integers, now integers2 is only a pointer to 50th location of integers, it has no space assigned to it of its own, so using delete will not delete rest of integers2, it'll only give erratic results.
What you can do is copy the first 50 in another array, and delete the previous array completely.
delete will only delete the pointer which has space assigned to it, using delete to another pointer pointing to the space assigned to first pointer will not delete any space assigned to the first pointer.
delete[] integers2 will not delete any space assigned to integers1 or any other pointer.
Dynamic allocators (like new) generally don't like you releasing part of the memory they gave you. If you use the <malloc.h> defined library functions malloc() and free() instead of new and delete, then you can use realloc(), though in most cases that you would care about the size difference it's just going to copy for you anyway.
Dynamically sizing containers generally use an exponential rule for resizing: if you run out of space, they (for example) double the allocation (and copy the old data over), if you remove data until you are using (for example) less than half the allocation they copy into a smaller allocation. This means you never waste more than half the memory, and the cost of copying per element added or removed is effectively constant. Implementing all of this is a pain in the ass, though, so just use std::vector and let it do it for you :).
No you can't do this with a fixed sized array allocated with new[]. If you want to have a dynamic array use one of the STL containers, such as std::vector.

Why "delete [][]... multiDimensionalArray;" operator in C++ does not exist

I was always wondering if there is operator for deleting multi dimensional arrays in the standard C++ language.
If we have created a pointer to a single dimensional array
int *array = new int[size];
the delete looks like:
delete [] array;
That's great. But if we have two dimension array, we can not do
delete [][] twoDimenstionalArray;
Instead, we should loop and delete the items, like in this example.
Can anybody explain why?
Technically, there aren't two dimensional arrays in C++. What you're using as a two dimensional array is a one dimensional array with each element being a one dimensional array. Since it doesn't technically exist, C++ can't delete it.
Because there is no way to call
int **array = new int[dim1][dim2];
All news/deletes must be balanced, so there's no point to a delete [][] operator.
new int[dim1][dim2] returns a pointer to an array of size dim1 of type int[dim2]. So dim2 must be a compile time constant. This is similar to allocating multi-dimensional arrays on the stack.
The reason delete is called multiple times in that example is because new is called multiple times too. Delete must be called for each new.
For example if I allocate 1,000,000 bytes of memory I cannot later delete the entries from 200,000 - 300,00, it was allocated as one whole chunk and must be freed as one whole chunk.
The reason you have to loop, like in the example you mention, is that the number of arrays that needs to be deleted is not known to the compiler / allocator.
When you allocated your two-dimensional array, you really created N one-dimensional arrays. Now each of those have to be deleted, but the system does not know how many of them there are. The size of the top-level array, i.e. the array of pointers to your second-level arrays, is just like any other array in C: its size is not stored by the system.
Therefore, there is no way to implement delete [][] as you describe (without changing the language significantly).
not sure of the exact reason from a language design perspective, I' guessing it has something to do with that fact that when allocating memory you are creating an array of arrays and each one needs to be deleted.
int ** mArr = new int*[10];
for(int i=0;i<10;i++)
{
mArr[i]=new int[10];
}
my c++ is rusty, I'm not sure if thats syntactically correct, but I think its close.
While all these answers are relevant, I will try to explain what came to an expectation, that something like delete[][] array; may work on dynamically allocated arrays and why it's not possible:
The syntax int array[ROWS][COLS]; allowed on statically allocated arrays is just abstraction for programmers, which in reality creates one-dimensional array int array[ROWS*COLS];. But during compilation process (when dimension sizes COLS and ROWS must be constants by standard) the compiler also remembers the size of those dimensions, that are necessary to later address elements using syntax e.g. array[x][y] = 45. Compiler, being known of this size, will then replace [x][y] with the corresponding index to one-dimensional array using simple math: [COLS*x + y].
On the other hand, this is not the case with dynamically allocated arrays, if you want the same multi-dimensional functionality (in fact notation). As their size can be determined during runtime, they would have to remember the size of each additional dimension for later usage as well - and remember that for the whole life of the array. Moreover, system changes would have to be implemented here to work with arrays actually as multi-dimensional, leaving the form of [x][y] access notation in the code, not replacing it with an one-dimensional notation during compilation, but later replacing it within runtime.
Therefore an absence of array = new int[ROWS][COLS] implies no necessity for delete[][] array;. And as already mentioned, it can't be used on your example to delete your "multi-dimensional" array, because your sub-arrays (additional dimensions) are allocated separately (using separate new call), so they are independent of the top array (array_2D) which contains them and they all can't be deleted at once.
delete[] applies to any non-scalar (array).
You can use a wrapper class to do all those things for you.
Working with "primitive" data types usually is not a good solution (the arrays should be encapsulated in a class). For example std::vector is a very good example that does this.
Delete should be called exactly how many times new is called. Because you cannot call "a = new X[a][b]" you cannot also call "delete [][]a".
Technically it's a good design decision preventing the appearance of weird initialization of an entire n-dimensional matrix.
Well, I think it is easy to implement, but too dangerous. It is easy to tell whether a pointer is created by new[], but hard to tell about new[]...[](if allowed).