I know that I can remove something in the middle of an array such as
char* cArray = (char*) malloc(sizeof(char) * sizeTracker);
using the memmove function. In this way something will be removed from the array without having to use a temp array, switching to vectors, etc. The question here is that can I add a new index in the middle of the array (is there a function for it)? Or let's say that using realloc I add a new index at the end, then how can I move the values down efficiently?
Alternative Answer
I have been thinking about this and the comments where #DietmarKühl started talking about inserting blocks like a deque does. The problem with this is that a deque is a linked list of blocks so then you can't start with an array. if you start with an array and then want to insert something in the middle you have to do something else and I think I have an idea - it isn't fleshed out very much so it may not work but I will share it anyway. Please leave comments telling me what you think of the idea.
If you had an array of items and then want to add an item into the middle all you really want to do is add a block and update the mapping. The mapping is the thing that makes it all work - but it slows down access because you need to check the mapping before every access of the array.
The mapping would be a binary tree. It would start empty but the nodes would contain a value: if the index you want is < the value you traverse the left pointer and if it is >= you traverse the right pointer.
So, an example:
Before the insert:
root -> (array[100000], offset: 0)
After the insert at 5000:
root -> {value: 5000,
left: (array[100000], offset: 0),
right: {value: 5001,
left: (newarray[10], offset: -5000),
right: (array[100000], offset: 1),
}
}
I have used blocks of 10 here - newarray is 10 in size. If you just randomly insert indexes all over the place the block size should be 1 but if you insert groups of consecutive indexes having a blovk size larger than 1 would be good. It really depends on your usage pattern...
When you check index 7000 you check the root node: 7000 is >= 5000 so you follow the right pointer: 7000 is >= 5001 so you follow the right pointer: it points to the original array with an offset of 1 so you access array[index+offset].
When you check index 700 you check the root node: 700 is < 5000 so you follow the left pointer: it points to the original array with an offset of 0 so you access array[index+offset].
When you check index 5000 you check the root node: 5000 is >= 5000 so you follow the right pointer: 5000 is < 5001 so you follow the left pointer: it points to the new array with an offset of -5000 so you access newarray[index+offset].
Of course optimizations to this would be really important to make this useful - you would have to balance the tree after each insert because otherwise the right side would be much much longer than the left side.
The downside to this is that accesses to the array are now O(log inserts) instead of O(1) so if there are lots of inserts you will want to realloc every so often to compact the data structure back to an array but you could save that for an opportune time.
Like I said it isn't very fleshed out so it may not work in practice but I hope it is worth sharing anyway.
Original Answer
If you have a C style array and want to insert an index in the middle you would need to either have an array larger than you need (plus a variable like sizeTracker to keep track of the size).
Then if there was room left you could just memmove the last half of the array out one to create a spot in the middle.
If there wasn't any room left you could malloc another whole array that includes extra space and then memmove the first half and memmove the second half separately leaving a gap.
If you want to make the malloc amortized constant time you need to double the size of the array each time you reallocate it. The memmove becomes one machine instruction on x86 but even then it will still be O(n) because of moving every value.
But performance isn't any worse then your deleting trick - if you can delete everywhere throughout the array the cost is O(n) for that as well because you memmove half the values in average when you delete.
There is no custom C function which allows to increase an array using the C memory function and inserting an object into the middle. Essentially you'd build the functionality using malloc(), free(), memmove() (when enough space is available and elements are just moved back within the memory), or memcpy() (if you need to allocate new memory and you want to avoid first copying and then moving the tail).
In C++ where object locations tend to matter you'd obviously use std::copy(), std::reverse_copy() and/or std::move() (both forms thereof) as there may be relevant structors for the respect objects. Most likely you'd also obtain memory different, e.g., using operator new() and/or an allocator if you really travel in terms of raw memory.
The fun implementation of the actual insertion (assuming there is enough space for another element) is using std::rotate() to construct the last element and then shuffle elements:
void insert(T* array, std::size_t size, T const& value) {
// precodition: array points to at least size+1 elements
new(array + size) T(value);
std::rotate(array, array + size, array + size + 1);
}
Of course, this doesn't avoid potentially unnecessarily shuffling elements when the array needs to be relocated. In that case it more effective to allocate new memory and move the initial objects to the start, add the newly inserted element, move the trailing objects to the location right past the new object.
If you are using manually allocated memory you have to reallocate and you should hope this operation does not move the memory block to a new location. Then the best is to use the rotate algorithm.
By the way, prefer stl containers such as vectors to manually allocated memory for this kind of tasks. If you are using vectors you should have reserved memory.
You have marked this post as C++.
can I add a new index in the middle of the array (is there a function
for it)
No. From cppreference.com, std::array:
std::array is a container that encapsulates fixed size arrays.
I interpret this to mean you can change the elements, but not the indexes.
(sigh) But I suspect C style arrays are still allowed.
And I notice Dietmar's answer also says no.
Related
This is in continuation of my last question. I am failed to understand the memory taken up by vector. Problem skeleton:
Consider an vector which is an collection of lists and lists is an collection of pointers. Exactly like:
std::vector<std::list<ABC*> > vec;
where ABC is my class. We work on 64bit machines, so size of pointer is 8 bytes.
At the start of my flow in the project, I resize this vector to an number so that I can store lists at respective indexes.
vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686. Right. After resizing, I am inserting the lists at corresponding indexes as:
// Some where down in the program, make these lists. Simple push for now.
std::list<ABC*> l1;
l1.push_back(<pointer_to_class_ABC>);
l1.push_back(<pointer_to_class_ABC>);
// Copy the list at location
setInfo(613284686, l1);
void setInfo(uint64_t index, std::list<ABC*> list>) {
std::copy(list.begin(), list.end(), std::back_inserter(vec.at(index));
}
Alright. So inserting is done. Notable things are:
Size of vector is : 613284686
Entries in the vector is : 3638243731 // Calculated this by going over vector indexes and add the size of std::lists at each index.
Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes) = ~30Gb.
BUT BUT When I have this data in memory, memory peaks to, 400G.
And then I clear up this vector with:
std::vector<std::list<nl_net> >& ccInfo = getVec(); // getVec defined somewhere and return me original vec.
std::vector<std::list<nl_net> >::iterator it = ccInfo.begin();
for(; it != ccInfo.end(); ++it) {
(*it).clear();
}
ccInfo.clear(); // Since it is an reference
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.
Well, after clearing up this vector, memory drops down to 100G. That is too much holding from an vector.
Would you all like to correct me what I am failing to understand here?
P.S. I can not reproduce it on smaller cases and it is coming in my project.
vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686
It would be at least 613284686. It could be more.
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.
Technically, there is no guarantee by the standard that a default constructed vector wouldn't have capacity other than 0... But in practice, this is probably true.
Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes)
But the vector doesn't contain pointers. It contains std::list<ABC*> objects. So, you should expect vec.capacity() * sizeof(std::list<ABC*>) bytes used by the buffer of the vector itself. Each list has at least a pointer to beginning and the end.
Furthermore, you should expect each element in each of the lists to use memory as well. Since the list is doubly linked, you should expect about two pointers plus the data (a third pointer) worth of memory for each element.
Also, each pointer in the lists apparently points to an ABC object, and each of those use sizeof(ABC) memory as well.
Furthermore, since each element of the linked lists are allocated separately, and each dynamic allocation requires book-keeping so that they can be individually de-allocated, and each allocation must be aligned to the maximum native alignment, and the free store may have fragmented during the execution, there will be much overhead associated with each dynamic allocation.
Well, after clearing up this vector, memory drops down to 100G.
It is quite typical for the language implementation to retain (some) memory it has allocated from the OS. If your target system documents an implementation specific function for explicitly requesting release of such memory, then you could attempt using that.
However, if the vector buffer wasn't the latest dynamic allocation, then its deallocation may have left a massive reusable area in the free store, but if there exists later allocations, then all that memory might not be releasable back to the OS.
Even if the langauge implementation has released the memory to the OS, it is quite typical for the OS to keep the memory mapped for the process until another process actually needs the memory for something else. So, depending on how you're measuring memory use, the results might not necessarily be meaningful.
General rules of thumb that may be useful:
Don't use a vector unless you use all (or most) of the indices. In case where you don't, consider a sparse array instead (there is no standard container for such data structure though).
When using vector, reserve before resize if you know the upper bound of allocation.
Don't use linked lists without a good reason.
Don't rely on getting all memory back from peak usage (back to the OS that is; The memory is still usable for further dynamic allocations).
Don't stress about virtual memory usage.
std::list is a fragmented memory container. Typically each node MUST have the data it is storing, plus the 2 prev/next pointers, and then you have to add in the space required within the OS allocation table (typically 16 or 32 bytes per allocation - depending on OS). You then have to account for the fact all allocations must be returned on a 16byte boundary (on Intel/AMD based 64bit machines anyway).
So using the example of std::list<ABC*> the size of a pointer is 8, however you will need at least 48bytes to store each element (at least).
So memory usage for ONLY the list entries is going to be around: 3638243731 * 48(bytes) = ~162Gb.
This is of course assuming that there is no memory fragmentation (where there may be a block of 62bytes free, and the OS returns the entire block of 62 rather than the 48 requested). We are also assuming here that the OS has a minimum allocation size of 48 bytes (and not say, 64bytes, which would not be overly silly, but would push the usage up far higher).
The size of the std::lists themselves within the vector comes to around 18GB. So in total we are looking at 180Gb at least to store that vector. It would not be beyond the realm of possibility that the extra allocations are additional OS book keeping info, for all of those individual memory allocations (e.g. lists of loaded memory pages, lists of swapped out memory pages, the read/write/mmap permissions, etc, etc).
As a final note, instead of using swap on a newly constructed vector, you can just use shrink to fit.
ccInfo.clear();
ccInfo.shrinkToFit();
The main vector needs some more consideration. I get the impression it will always be a fixed size. So why not use a std::array instead? A std::vector always allocates more memory than it needs to allow for growth. The bigger your vector the bigger the reservation of memory to allow for more even growth. The reasononing behind is to keep relocations in memory to a minimum. Relocations on really big vectors take up huge amounts of time so a lot of extra memory is reserved to prevent this.
No vector function that can delete elements (such as vector::clear and ::erase) also deallocates memory (e.g. lower the capacity). The size will decrease but the capacity doesn't. Again, this is meant to prevent relocations; if you delete you are also very likely to add again. ::shrink_to_fit also doesn't guarantuee you that all of the used memory is released.*
Next is the choice of a list to store elements. Is a list really applicable? Lists are strong in random access/insertion/removal operations. Are you really constantly adding and removing ABC objects to the list in random locations? Or is another container type with different properties but with contiguous memory more suitable? Another std::vector or std::array perhaps. If the answer is yes than you're pretty much stuck with a list and its scattered memory allocations. If no, than you could win back a lot of memory by using a different container type.
So, what is it you really want to do? Do you really need dynamic growth on both the main container and its elements? Do you really need random manipulation? Or can you use fixed-size arrays for both container and ABC objects and use iteration instead? When contemplating this you might want to read up on the available containers and their properties on en.cppreference.com. It will help you decide what is most appropriate.
*For the fun of it I dug around in VS2017's implementation and it creates an entirely new vector without the growth segment, copies the old elements and then reassigns the internal pointers of the old vector to the new one while deleting the old memory. So at least with that compiler you can count on memory being released.
Remove first N elements from a std::vector
This question talks about removing items from a vector and 'compacting memory'. What is 'compacting memory' and why is it important here?
Inside the implementation of the std::vector class is some code that dynamically allocates an array of data-elements. Often not all of the elements in this internal array will be in use -- the array is often allocated to be bigger than what is currently needed, in order to avoid having to reallocate a bigger array too often (array-reallocations are expensive!).
Similarly, when items are removed from the std::vector, the internal data-array is not immediately reallocated to be smaller (because doing so would be expensive); rather, the now-extra slots in the array are left "empty" in the expectation that the calling code might want to re-use them in the near future.
However, those empty slots still take up RAM, so if the calling code has just removed a lot of items from the vector, it might want to force the vector to reallocate a smaller internal array that doesn't contain so many empty slots. That is what they are referring to as compacting in that question.
The OP is talking about shrinking the memory the vector takes up. When you erase elements from a vector its size decreases but it capacity (the memory it is using) remains the same. When the OP says
(that also compacts memory)
They want the removal of the elements to also shrink the capacity of the vector so it reduces its memory consumption.
It means that the vector shouldn't use more memory than it needs to. In other words, the OP wants:
size() == capacity()
This can be achieved in C++11 and later by calling shrink_to_fit() on a vector. This is only a request though, it is not binding.
To make sure it will compact memory you should create a new vector and call reserve(oldvector.size()).
Imagine the following requirements:
measurement data should be logged and the user should be able to iterate through the data.
uint32_t timestamp;
uint16_t place;
struct SomeData someData;
have a timestamp (uint32_t), a place (uint16_t) and some data in a struct
have a constant number of datasets. If a new one arrives, the oldest is thrown away.
the number of "place" is dynamic, the user can insert new ones during runtime
it should be possible to iterate through the data to the next newer or older dataset but only if the place is the same
need to insert at the end only
memory should be allocated once at program start
insertion need not to be fast but should not block other threads for a long time which might be iterating through the container
memory requirement should be low
EDIT: - The container should all the memory which is not used otherwise, therefore it can be large.
I am not sure which container I should use. It is an embedded system and should not use boost etc.
I see the following possibilities:
std::vector - drawbacks: The insertion at the end requires that all objects are copied and during this time another thread cannot access the vector. Edit: This can be avoided by implementing it as a circular buffer - see comments below. When iterating throught the vector, I have to test the place ID. Maybe it might also be a problem to allocate much memory as one block - because the memory could be segmented?
std::deque - compared to std::vector insertion (and pop_back) is faster but memory requirement? Iterators do not become invalid if the insertion is at the end. But I still have to iterate and test the second ID ("place"). I think it does not need to allocate all the memory in one big block as it is the case with vector or array. If an element is added in front and another one is removed at the end (or removed first and added after), I guess there does no memory allocation take place?
std::queue - instead of deque, I should rather use a queue? Is it true that in many implementations a queue ist implented just as a deque?
std::map - Like deque any iterators to existing elements will not become invalid. If I make the key a combination of place and timestamp, then iteration through the map is maybe faster because it is already sorted? Memory requirements of a map?
std::multimap - as the number of places is not constant I cannot make a multimap with "place" as the index.
std::list - has no advantage over deque here?
Some suggested the use of a circular buffer. If I do not want that the memory is allocated as one big block I still have to use a container and most questions above stay valid.
Update:
I will use a ring buffer as suggested here but using a deque as the underlying container. In order to being able to scroll fast through the datasets with the preselected "place" I will eventually introduce two additional indices into the data struct which will point to the previous and the next index with the same place.
How much memory will be used? In my special case the size of the struct is 56 bytes. The gnu lib uses 512 bytes as minimum block size, the IAR compiler 16 bytes. Hence the used block size will be 512 or 56 bytes respectively. Besides two iterators (using 4 pointers each) and the size there will be a pointer stored for each block. Therefore in the implementation of the iar compiler (block size 56 bytes) there will be 7 % overhead (on a 32 bit system) compared to the use of a std::vector or array. In the gcc implementation there will fit 9 objects in the block (504 bytes) while 512 + 4 bytes are needed per block which is 2 % more.
The block size is not large but the continuous memory size needed for the pointer array is already relatively large, especially for the implementation where one block is one struct.
A std::list would need 2 pointers per struct which is 14 % overhead in my case on 32 bit systems.
std::vector
... the memory could be segmented?
No, std::vector allocates contiguous memory, as is documented in that link. Arrays are also contiguous, but you might just as well use vector for this.
std::deque is segmented, which you said you didn't want. Or do you want to avoid a single large allocated block? It's not clear.
Anyway, it has no benefit over vector if you really want a circular buffer (because you'll never be adding/removing elements from the front/back anyway), and you can't control the block size.
std::queue
... Is it true that in many implementations a queue is implented just as a deque?
Yes, that's the default in all implementations. See the linked documentation or any decent book.
It doesn't sound like you want a FIFO queue, so I don't know why you're considering this one - the interface doesn't match your stated requirement.
'std::map`
... iteration through the map is maybe faster because it is already sorted?
On most modern server/desktop architectures, map will be slower because advancing an iterator involves a pointer chase (which impairs pipelining) and a likely cache miss. Your anonymous embedded architecture may be less sensitive to these effects, so map may be faster for you.
... Memory requirements of a map?
Higher. You have the node size (at least a couple of pointers) added to each element.
Say you have an array of size 13. Imagine there is no contiguous available memory section for storing an array of size 14; copying the original array into a new one is not an option. So the only way of extending the array by one element while keeping the first 13 elements the same is to store the 14th element at the very next memory address after the 13th element. How would I go about doing this, if this memory slot was available? Dynamic memory addressing?
There is no standard C++ methods. standard library usually have defined realloc(), but on most platforms it all it does that is call of another malloc() and memcpy() to copy memory. You may want to use standard library containers which hide that mechanism - that's most common - or use memory pool object (allocate all possibly required memory, then "allocate" objects within it), that's less common, usually applied for FEMA or image-processing or in OpenGL render engines.
I would suggest take array's length and insert data in length + 1 location.
eg :
int arraylength = abc.length;
abc[arraylength + 1].value = "Your Value";
I need to write a function that can read a file, and add all of the unique words to a dynamically allocated array. I know how to create a dynamically allocated array if, for instance, you are asking for the number of entries in the array:
int value;
cin >> value;
int *number;
number = new int[value];
My problem is that I don't know ahead of time how many unique words are going to be in the file, so I can't initially just read the value or ask for it. Also, I need to make this work with arrays, and not vectors. Is there a way to do something similar to a push_back using a dynamically allocated array?
Right now, the only thing I can come up with is first to create an array that stores ALL of the words in the file (1000), then have it pass through it and find the number of unique words. Then use that value to create a dynamically allocated array which I would then pass through again to store all the unique words. Obviously, that solution sounds pretty overboard for something that should have a more effective solution.
Can someone point me in the right direction, as to whether or not there is a better way? I feel like this would be rather easy to do with vectors, so I think it's kind of silly to require it to be an array (unless there's some important thing that I need to learn about dynamically allocated arrays in this homework assignment).
EDIT: Here's another question. I know there are going to be 1000 words in the file, but I don't know how many unique words there will be. Here's an idea. I could create a 1000 element array, write all of the unique words into that array while keeping track of how many I've done. Once I've finished, I could provision a dynamically allocate a new array with that count, and then just copy the words from the initial array to the second. Not sure if that's the most efficient, but with us not being able to use vectors, I don't think efficiency is a huge concern in this assignment.
A vector really is a better fit for this than an array. Really.
But if you must use an array, you can at least make it behave like a vector :-).
Here's how: allocate the array with some capacity. Store the allocated capacity in a "capacity" variable. Each time you add to the array, increment a separate "length" variable. When you go to add something to the array and discover it's not big enough (length == capacity), allocate a second, longer array, then copy the original's contents to the new one, then finally deallocate the original.
This gives you the effect of being able to grow the array. If performance becomes a concern, grow it by more than one element at a time.
Congrats, after following these easy steps you have implemented a small subset of std::vector functionality atop an array!
As you have rightly pointed out this is trivial with a Vector.
However, given that you are limited to using an array, you will likely need to do one of the following:
Initialize the array with a suitably large size and live with poor memory utilization
Write your own code to dynamically increase the size of the array at run time (basically the internals of a Vector)
If you were permitted to do so, some sort of hash map or linked list would also be a good solution.
If I had to use an array, I'd just allocate one with some initial size, then keep doubling that size when I fill it to accommodate any new values that won't fit in an array with the previous sizes.
Since this question regards C++, memory allocation would be done with the new keyword. But what would be nice is if one could use the realloc() function, which resizes the memory and retains the values in the previously allocated memory. That way one wouldn't need to copy the new values from the old array to the new array. Although I'm not so sure realloc() would play well with memory allocated with new.
You can "resize" array like this (N is size of currentArray, T is type of its elements):
// create new array
T *newArray = new T[N * 2];
// Copy the data
for ( int i = 0; i < N; i++ )
newArray[i] = currentArray[i];
// Change the size to match
N *= 2;
// Destroy the old array
delete [] currentArray;
// set currentArray to newArray
currentArray = newArray;
Using this solution you have to copy the data. There might be a solution that does not require it.
But I think it would be more convenient for you to use std::vectors. You can just push_back into them and they will resize automatically for you.
You can cheat a bit:
use std::set to get all the unique words then copy the set into a dynamically allocated array (or preferably vector).
#include <iterator>
#include <set>
#include <iostream>
#include <string>
// Copy into a set
// this will make sure they are all unique
std::set<std::string> data;
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(data, data.end()));
// Copy the data into your array (or vector).
std::string* words = new std::string[data.size()];
std::copy(data.begin(), data.end(), &words[0]);
This could be going a bit overboard, but you could implement a linked list in C++... it would actually allow you to use a vector-like implementation without actually using vectors (which are actually the best solution).
The implementation is fairly easy: just a pointer to the next and previous nodes and storing the "head" node in a place you can easily access to. Then just looping through the list would let you check which words are already in, and which are not. You could even implement a counter, and count the number of times a word is repeated throughout the text.