How do I dynamically add to an array? - c++

This is a Windows C program. In the below code I'm converting a delimited char (pbliblist) into an array. The array is passed to a function from a third party.
My question is how can this be done without knowing how many entries there are (libcount). I was thinking I might be able to 'new' each array entry within the token loop.
LPCTSTR * LibList = new LPCTSTR[libcount];
token = strtok_s(pbliblist, seps, &next_token);
while (token != NULL) {
LibList[cnt] = token;
token = strtok_s(NULL, seps, &next_token);
cnt++;
}

There's different ways you can do it.
But, right off the bat, out of the box, you can't just "add" to an array in C, and, honestly, you don't want to.
There's a couple things you can try.
One, is to assume the size. If you know, for example, that there won't be more than 100 fields in the string, then you can just use 100 for libcount. This "wastes" memory, but it's really efficient, because it's allocated only once. Just make sure you check for the boundary condition and signal an error if you go over 100, because "it should never happen".
Now, if you do the 100 thing, at the end, now that you do, indeed, know how many items you read, you can then create a NEW array, exactly as big as you want, copy the elements from the original array, free it, and return the new one.
Similarly, continuing down this path, there's no need to new the original 100 array. Simply allocate it on the stack, parse the elements in to it, and at the end copy them to the newly allocated result array. The ones on the stack "vanish" when the stack unwinds (super cheap).
If you don't want to have any limit, you can still expand on the above concept.
Start with an original array (but this time, you'll want to allocate it). Then start filling it up. When it fills, and you have more coming, create a new, bigger array, copy the old one to the new one, and release the old one.
In the end, you'll have everything in a single array. You don't want to do this for each element, however. You don't want to new 4 bytes, then new 8 bytes, then new 12 bytes, copying all the time. If you do that you fragment your heap something awful, and you copy way, way to much.
Instead you do it in chunks. 10 entries, 20 entries, 40 entries.
At the end, you can, like above, copy the array a last time to a correctly size block of memory (or not, if this array is going to be returned and destroyed 5ms later, kind of "who cares" about the excess at the end).
Finally, of course, since you're talking C++, there's something in the template libraries that does all this for you (not that I know what it is, I imagine you could just push to extend a vector). But, see, I can barely spell C++, so...don't defer to me for expertise and details on that.

Related

Add new index in the middle of an array

I know that I can remove something in the middle of an array such as
char* cArray = (char*) malloc(sizeof(char) * sizeTracker);
using the memmove function. In this way something will be removed from the array without having to use a temp array, switching to vectors, etc. The question here is that can I add a new index in the middle of the array (is there a function for it)? Or let's say that using realloc I add a new index at the end, then how can I move the values down efficiently?
Alternative Answer
I have been thinking about this and the comments where #DietmarKühl started talking about inserting blocks like a deque does. The problem with this is that a deque is a linked list of blocks so then you can't start with an array. if you start with an array and then want to insert something in the middle you have to do something else and I think I have an idea - it isn't fleshed out very much so it may not work but I will share it anyway. Please leave comments telling me what you think of the idea.
If you had an array of items and then want to add an item into the middle all you really want to do is add a block and update the mapping. The mapping is the thing that makes it all work - but it slows down access because you need to check the mapping before every access of the array.
The mapping would be a binary tree. It would start empty but the nodes would contain a value: if the index you want is < the value you traverse the left pointer and if it is >= you traverse the right pointer.
So, an example:
Before the insert:
root -> (array[100000], offset: 0)
After the insert at 5000:
root -> {value: 5000,
left: (array[100000], offset: 0),
right: {value: 5001,
left: (newarray[10], offset: -5000),
right: (array[100000], offset: 1),
}
}
I have used blocks of 10 here - newarray is 10 in size. If you just randomly insert indexes all over the place the block size should be 1 but if you insert groups of consecutive indexes having a blovk size larger than 1 would be good. It really depends on your usage pattern...
When you check index 7000 you check the root node: 7000 is >= 5000 so you follow the right pointer: 7000 is >= 5001 so you follow the right pointer: it points to the original array with an offset of 1 so you access array[index+offset].
When you check index 700 you check the root node: 700 is < 5000 so you follow the left pointer: it points to the original array with an offset of 0 so you access array[index+offset].
When you check index 5000 you check the root node: 5000 is >= 5000 so you follow the right pointer: 5000 is < 5001 so you follow the left pointer: it points to the new array with an offset of -5000 so you access newarray[index+offset].
Of course optimizations to this would be really important to make this useful - you would have to balance the tree after each insert because otherwise the right side would be much much longer than the left side.
The downside to this is that accesses to the array are now O(log inserts) instead of O(1) so if there are lots of inserts you will want to realloc every so often to compact the data structure back to an array but you could save that for an opportune time.
Like I said it isn't very fleshed out so it may not work in practice but I hope it is worth sharing anyway.
Original Answer
If you have a C style array and want to insert an index in the middle you would need to either have an array larger than you need (plus a variable like sizeTracker to keep track of the size).
Then if there was room left you could just memmove the last half of the array out one to create a spot in the middle.
If there wasn't any room left you could malloc another whole array that includes extra space and then memmove the first half and memmove the second half separately leaving a gap.
If you want to make the malloc amortized constant time you need to double the size of the array each time you reallocate it. The memmove becomes one machine instruction on x86 but even then it will still be O(n) because of moving every value.
But performance isn't any worse then your deleting trick - if you can delete everywhere throughout the array the cost is O(n) for that as well because you memmove half the values in average when you delete.
There is no custom C function which allows to increase an array using the C memory function and inserting an object into the middle. Essentially you'd build the functionality using malloc(), free(), memmove() (when enough space is available and elements are just moved back within the memory), or memcpy() (if you need to allocate new memory and you want to avoid first copying and then moving the tail).
In C++ where object locations tend to matter you'd obviously use std::copy(), std::reverse_copy() and/or std::move() (both forms thereof) as there may be relevant structors for the respect objects. Most likely you'd also obtain memory different, e.g., using operator new() and/or an allocator if you really travel in terms of raw memory.
The fun implementation of the actual insertion (assuming there is enough space for another element) is using std::rotate() to construct the last element and then shuffle elements:
void insert(T* array, std::size_t size, T const& value) {
// precodition: array points to at least size+1 elements
new(array + size) T(value);
std::rotate(array, array + size, array + size + 1);
}
Of course, this doesn't avoid potentially unnecessarily shuffling elements when the array needs to be relocated. In that case it more effective to allocate new memory and move the initial objects to the start, add the newly inserted element, move the trailing objects to the location right past the new object.
If you are using manually allocated memory you have to reallocate and you should hope this operation does not move the memory block to a new location. Then the best is to use the rotate algorithm.
By the way, prefer stl containers such as vectors to manually allocated memory for this kind of tasks. If you are using vectors you should have reserved memory.
You have marked this post as C++.
can I add a new index in the middle of the array (is there a function
for it)
No. From cppreference.com, std::array:
std::array is a container that encapsulates fixed size arrays.
I interpret this to mean you can change the elements, but not the indexes.
(sigh) But I suspect C style arrays are still allowed.
And I notice Dietmar's answer also says no.

C++ doesn't tell you the size of a dynamic array. But why?

I know that there is no way in C++ to obtain the size of a dynamically created array, such as:
int* a;
a = new int[n];
What I would like to know is: Why? Did people just forget this in the specification of C++, or is there a technical reason for this?
Isn't the information stored somewhere? After all, the command
delete[] a;
seems to know how much memory it has to release, so it seems to me that delete[] has some way of knowing the size of a.
It's a follow on from the fundamental rule of "don't pay for what you don't need". In your example delete[] a; doesn't need to know the size of the array, because int doesn't have a destructor. If you had written:
std::string* a;
a = new std::string[n];
...
delete [] a;
Then the delete has to call destructors (and needs to know how many to call) - in which case the new has to save that count. However, given it doesn't need to be saved on all occasions, Bjarne decided not to give access to it.
(In hindsight, I think this was a mistake ...)
Even with int of course, something has to know about the size of the allocated memory, but:
Many allocators round up the size to some convenient multiple (say 64 bytes) for alignment and convenience reasons. The allocator knows that a block is 64 bytes long - but it doesn't know whether that is because n was 1 ... or 16.
The C++ run-time library may not have access to the size of the allocated block. If for example, new and delete are using malloc and free under the hood, then the C++ library has no way to know the size of a block returned by malloc. (Usually of course, new and malloc are both part of the same library - but not always.)
One fundamental reason is that there is no difference between a pointer to the first element of a dynamically allocated array of T and a pointer to any other T.
Consider a fictitious function that returns the number of elements a pointer points to.
Let's call it "size".
Sounds really nice, right?
If it weren't for the fact that all pointers are created equal:
char* p = new char[10];
size_t ps = size(p+1); // What?
char a[10] = {0};
size_t as = size(a); // Hmm...
size_t bs = size(a + 1); // Wut?
char i = 0;
size_t is = size(&i); // OK?
You could argue that the first should be 9, the second 10, the third 9, and the last 1, but to accomplish this you need to add a "size tag" on every single object.
A char will require 128 bits of storage (because of alignment) on a 64-bit machine. This is sixteen times more than what is necessary.
(Above, the ten-character array a would require at least 168 bytes.)
This may be convenient, but it's also unacceptably expensive.
You could of course envision a version that is only well-defined if the argument really is a pointer to the first element of a dynamic allocation by the default operator new, but this isn't nearly as useful as one might think.
You are right that some part of the system will have to know something about the size. But getting that information is probably not covered by the API of memory management system (think malloc/free), and the exact size that you requested may not be known, because it may have been rounded up.
You will often find that memory managers will only allocate space in a certain multiple, 64 bytes for example.
So, you may ask for new int[4], i.e. 16 bytes, but the memory manager will allocate 64 bytes for your request. To free this memory it doesn't need to know how much memory you asked for, only that it has allocated you one block of 64 bytes.
The next question may be, can it not store the requested size? This is an added overhead which not everybody is prepared to pay for. An Arduino Uno for example only has 2k of RAM, and in that context 4 bytes for each allocation suddenly becomes significant.
If you need that functionality then you have std::vector (or equivalent), or you have higher-level languages. C/C++ was designed to enable you to work with as little overhead as you choose to make use of, this being one example.
There is a curious case of overloading the operator delete that I found in the form of:
void operator delete[](void *p, size_t size);
The parameter size seems to default to the size (in bytes) of the block of memory to which void *p points. If this is true, it is reasonable to at least hope that it has a value passed by the invocation of operator new and, therefore, would merely need to be divided by sizeof(type) to deliver the number of elements stored in the array.
As for the "why" part of your question, Martin's rule of "don't pay for what you don't need" seems the most logical.
There's no way to know how you are going to use that array.
The allocation size does not necessarily match the element number so you cannot just use the allocation size (even if it was available).
This is a deep flaw in other languages not in C++.
You achieve the functionality you desire with std::vector yet still retain raw access to arrays. Retaining that raw access is critical for any code that actually has to do some work.
Many times you will perform operations on subsets of the array and when you have extra book-keeping built into the language you have to reallocate the sub-arrays and copy the data out to manipulate them with an API that expects a managed array.
Just consider the trite case of sorting the data elements.
If you have managed arrays then you can't use recursion without copying data to create new sub-arrays to pass recursively.
Another example is an FFT which recursively manipulates the data starting with 2x2 "butterflies" and works its way back to the whole array.
To fix the managed array you now need "something else" to patch over this defect and that "something else" is called 'iterators'. (You now have managed arrays but almost never pass them to any functions because you need iterators +90% of the time.)
The size of an array allocated with new[] is not visibly stored anywhere, so you can't access it. And new[] operator doesn't return an array, just a pointer to the array's first element. If you want to know the size of a dynamic array, you must store it manually or use classes from libraries such as std::vector

What is the purpose of allocating a specific amount of memory for arrays in C++?

I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.

Efficient approaches for parsing objects from consecutive fixed size buffers that don't align with object size

I am trying to achieve something in C++, where I have an API that reads out objects from a byte array, while the array I pass in is constrained to a fixed size. After it parses out a complete object, the API knows the pointer location where it finishes reading (the beginning of next object to be read from but not complete in the current byte array).
Then I simply need to attach the remaining byte array with the next same fixed size array, and start reading a new object out at the pointer location as if it's the beginning of the new array.
I am new to C++ and I have the following approach working, but looks rather cumbersome and inefficient. It requires three vectors and lots of cleanup, reserve and insertion. I wonder if there is any alternative that may be more efficient, or at least as efficient but the code looks much more concise? I've been reading things like stringstream all such but they don't seem to require less memory copy (probably more as my API has to require byte array gets passed in). Thanks!
std::vector<char> checkBuffer;
std::vector<char> remainingBuffer;
std::vector<char> readBuffer(READ_BUFFER_SIZE);
//loop while I still have stuff to read from input stream
while (in.good()) {
in.read(readBuffer.data(), READ_BUFFER_SIZE);
//This is the holding buffer for the API to parse object from
checkBuffer.clear();
//concatenate what's remaining in remainingBuffer (initially empty)
//with what's newly read from input inside readBuffer
checkBuffer.reserve(remainingBuffer.size() + readBuffer.size());
checkBuffer.insert(checkBuffer.end(), remainingBuffer.begin(),
remainingBuffer.end());
checkBuffer.insert(checkBuffer.end(), readBuffer.begin(),
readBuffer.end());
//Call API here, and I will also get a pointerPosition back as to
//where I am inside the buffer when finishing reading the object
Object parsedObject = parse(checkBuffer, &pointerPosition)
//Then calculate the size of bytes not read in checkBuffer
int remainingBufSize = CheckBuffer.size() - pointerPosition;
remainingBuffer.clear();
remainingBuffer.reserve(remainingBufSize);
//Then just copy over whatever is remaining in the checkBuffer into
//remainingBuffer and make it be used in next iteration
remainingBuffer.insert(remainingBuffer.end(),
&checkBuffer[pointerPosition],&checkBuffer[checkBuffer.size()]);
}
Write append_chunk_into(in,vect). It appends one chunk of data at the end of vect. It does resizing as needed. As an aside, a char-sized does-not-zero-memory standard layout struct might be a better choice than char.
To append to end:
size_t old_size=vect.size();
vect.resize(vect.size()+new_bytes);
in.read(vect.data()+old_size, new_bytes);
or whatever the read api is.
To parse, feed it vect.data(). Get back the pointer of when it ends ptr.
Then `vect.erase(vect.begin(), vect.begin()+(ptr-vect.data())) to remove the parsed bytes. (only do this after you have parsed everything you can from the buffer, to save wasted mem moves).
One vector. It will reuse its memory, and never grow larger than read size+size of largest object-1. So you can pre-reserve it.
But really, usually most of the time spent will be io. So focus optimizarion on keepimg the data flowing smoothly.
If I were in your position I would keep only the readBuffer. I would reserve READ_BUFFER_SIZE +sizeof(LargestMessage).
After parsing you would be given back a pointer to the last thing the api was able to read in the vector. I would then convert the end iterator to a pointer &*readbuffer.end() and use it to bound the data we have to then copy to the head of the vector. once you have that data on the head of the vector you can then read the rest in using that same data call except you add in the number of bytes remaining. There does need to be some way of determining how many characters were in the remaining array but that shouldn't be insurmountable.

Dynamic memory allocation, C++

I need to write a function that can read a file, and add all of the unique words to a dynamically allocated array. I know how to create a dynamically allocated array if, for instance, you are asking for the number of entries in the array:
int value;
cin >> value;
int *number;
number = new int[value];
My problem is that I don't know ahead of time how many unique words are going to be in the file, so I can't initially just read the value or ask for it. Also, I need to make this work with arrays, and not vectors. Is there a way to do something similar to a push_back using a dynamically allocated array?
Right now, the only thing I can come up with is first to create an array that stores ALL of the words in the file (1000), then have it pass through it and find the number of unique words. Then use that value to create a dynamically allocated array which I would then pass through again to store all the unique words. Obviously, that solution sounds pretty overboard for something that should have a more effective solution.
Can someone point me in the right direction, as to whether or not there is a better way? I feel like this would be rather easy to do with vectors, so I think it's kind of silly to require it to be an array (unless there's some important thing that I need to learn about dynamically allocated arrays in this homework assignment).
EDIT: Here's another question. I know there are going to be 1000 words in the file, but I don't know how many unique words there will be. Here's an idea. I could create a 1000 element array, write all of the unique words into that array while keeping track of how many I've done. Once I've finished, I could provision a dynamically allocate a new array with that count, and then just copy the words from the initial array to the second. Not sure if that's the most efficient, but with us not being able to use vectors, I don't think efficiency is a huge concern in this assignment.
A vector really is a better fit for this than an array. Really.
But if you must use an array, you can at least make it behave like a vector :-).
Here's how: allocate the array with some capacity. Store the allocated capacity in a "capacity" variable. Each time you add to the array, increment a separate "length" variable. When you go to add something to the array and discover it's not big enough (length == capacity), allocate a second, longer array, then copy the original's contents to the new one, then finally deallocate the original.
This gives you the effect of being able to grow the array. If performance becomes a concern, grow it by more than one element at a time.
Congrats, after following these easy steps you have implemented a small subset of std::vector functionality atop an array!
As you have rightly pointed out this is trivial with a Vector.
However, given that you are limited to using an array, you will likely need to do one of the following:
Initialize the array with a suitably large size and live with poor memory utilization
Write your own code to dynamically increase the size of the array at run time (basically the internals of a Vector)
If you were permitted to do so, some sort of hash map or linked list would also be a good solution.
If I had to use an array, I'd just allocate one with some initial size, then keep doubling that size when I fill it to accommodate any new values that won't fit in an array with the previous sizes.
Since this question regards C++, memory allocation would be done with the new keyword. But what would be nice is if one could use the realloc() function, which resizes the memory and retains the values in the previously allocated memory. That way one wouldn't need to copy the new values from the old array to the new array. Although I'm not so sure realloc() would play well with memory allocated with new.
You can "resize" array like this (N is size of currentArray, T is type of its elements):
// create new array
T *newArray = new T[N * 2];
// Copy the data
for ( int i = 0; i < N; i++ )
newArray[i] = currentArray[i];
// Change the size to match
N *= 2;
// Destroy the old array
delete [] currentArray;
// set currentArray to newArray
currentArray = newArray;
Using this solution you have to copy the data. There might be a solution that does not require it.
But I think it would be more convenient for you to use std::vectors. You can just push_back into them and they will resize automatically for you.
You can cheat a bit:
use std::set to get all the unique words then copy the set into a dynamically allocated array (or preferably vector).
#include <iterator>
#include <set>
#include <iostream>
#include <string>
// Copy into a set
// this will make sure they are all unique
std::set<std::string> data;
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(data, data.end()));
// Copy the data into your array (or vector).
std::string* words = new std::string[data.size()];
std::copy(data.begin(), data.end(), &words[0]);
This could be going a bit overboard, but you could implement a linked list in C++... it would actually allow you to use a vector-like implementation without actually using vectors (which are actually the best solution).
The implementation is fairly easy: just a pointer to the next and previous nodes and storing the "head" node in a place you can easily access to. Then just looping through the list would let you check which words are already in, and which are not. You could even implement a counter, and count the number of times a word is repeated throughout the text.