If I have for example array of pointers (which is full) with 5 elements and I want to insert another element at second position, I would have to allocate another array (with size+1), copy first element from an old array, insert new element, and then copy remaining elements. This application can't waste any space.
This is the code so far:
Sometype **newArray=new Sometype*[++Count];
size_t s=sizeof(Array);
memcpy(newArray,Array,s*position);
newArray[position]=new Sometype();
memcpy(newArray+position+1,Array+position,s*(Count-position-1));
delete [] Array;
Array=newArray;
Is there any more efficient method to do this thing because this is a bottleneck for my application?I'm new to c++ so I don't know any advanced stuff.
Could vector be used for this purpose?
I think I read somewhere that it takes double of previous used space when it's resizing.
Is this true or this behavior can be modified?
If you can't waste any space and you have to stick to sequential containers, then I'm afraid this is the most efficient way. But I still don't believe that you can't waste any space. If you can anticipate in advance that you will need to later add 5 more elements, then having your array resized from the beginning will prove much more effective. In any case, you should use vector to avoid this awful C-style code and be more clear with your intents. You might want to take a look at std::vector<T>::reserve() function. Whether or not vector takes double of previous when it's resizing is unspecified and varies across implementations.
Have a look at the standard containers std::vector, std::list, std::unordered_set and std::unordered_map
Related
I might be missing something very basic here but here is what I was wondering -
We know removing an element from the beginning of an std::vector ( vector[0] ) in C++ is an O(n) operation because all the other elements have to be shifted one place backwards.
But why isn't it implemented such that the pointer to the first element is moved one position ahead so that now the vector starts from the second element and, in essence, the first element is removed? This would be an O(1) operation.
std::array and C-style arrays are fixed-length, and you can't change their length at all, so I think you're having a typo there and mean std::vector instead.
"Why was it done that way?" is a bit of a historical question. Perspectively, if your system library allowed for giving back unused memory to the operating system, your "shift backwards" trick would disallow any reuse of the former first elements' memory later on.
Also, std::vector comes from systems (like they are still basically used in every operating system) with calls like free and malloc, where you need to keep the pointer to the beginning of an allocated region around, to be able to free it later on. Hence, you'd have to lug around another pointer in the std::vector structure to be able to free the vector, and that would only be a useful thing if someone deleted from the front. And if you're deleting from the front, chances are you might be better off using a reversed vector (and delete from the end), or a vector isn't the right data structure alltogether.
It is not impossible for a vector to be implemented like that (it wouldn't be std::vector though). You need to keep the pointer to first element in addition to a pointer to the underlying array (alternatively some offset can be stored, but no matter how you put it you need to store more data in the vector).
Consider that this is useful only for one quite specific use-case: Erasing the first element. Well, once you got that you can also benefit while inserting an element in the front when there is free space left. If there is free space left then even inserting in the first half could benefit by shifting only the first half.
However, all this does not fit with the concept of capacity. With std::vector you know exactly how many elements you can add before a reallocation occurs: capcity() - size(). With your proposal this wouldn't hold any more. Erasing the first element would affect capacity in an odd way. It would complicate the interface and usages of vectors for all use cases.
Further, erasing elements anywhere else would still not be O(1). In total it would incur a cost and add complexity for any use of the vector, while bringing an advantage only in a very narrow use case.
If you do find yourself in the situation that you need to erase the front element very often, then you can either store the vector in reverse, and erasing the last element is already O(1), or use a different container.
I need to append a large number of elements to a stxxl vector. What is the most efficient way of adding elements to a stxxl vector? Right now, I'm using push_back of the stxxl vector, but it doesn't seem very efficient. It's far from saturating the disk bandwidth. Is there a better way?
Thanks,
Da
Most of the things written about "Efficient Sequential Reading and Writing to Vectors" apply in your case.
Besides vector_bufwriter, which fills a vector using an imperative loop, there is also a variant of stxxl::stream::materialize() which does it in a functional programming style.
About previously knowing the vector's size: this is not really necessary for EM, since one can allocate blocks on the fly. These will then generally not be in order, but so be it, there is no guarantee on that anyway.
I see someone (me) made vector_bufwriter automatically double the vector's size if the filling reaches the vector's end. At the moment, I don't think this is necessary, maybe one should change this behaviour.
According to the documentation:
If one needs only to sequentially write elements to the vector in n/B I/Os the currently fastest method is stxxl::generate.
Does not really answer why push_back should be I/O-inefficient, though.
One approach:
First reserve the number of elements you need. Resizing a vector with some types can be very time consuming. Appending many elements can result in several resizes as the vector grows.
Once resized, append using emplace_back (or simply push if the type is trivial, e.g. int).
Also review the member functions. An implementation which suits your needs well may already exist.
Lets say if I have a vector V, which has 10 elements.
If I erase the first element (at index 0) using v.erase(v.begin()) then how STL vector handle this?
Does it create another new vector and copy elements from the old vector to the new vector and deallocate the old one? Or Does it copy each element starting from index 1 and copy the element to index-1 ?
If I need to have a vector of size 100,000 at once and later I don't use that much space, lets say I only need a vector of size 10 then does it automatically reduce the size? ( I don't think so)
I looked online and there are only APIs and tutorials how to use STL library.
Is there any good references that I can have an idea of the implementation or complexity of STL library?
Actually, the implementation of vector is visible, since it's a template, so you can look into that for details:
iterator erase(const_iterator _Where)
{ // erase element at where
if (_Where._Mycont != this
|| _Where._Myptr < _Myfirst || _Mylast <= _Where._Myptr)
_DEBUG_ERROR("vector erase iterator outside range");
_STDEXT unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
_Destroy(_Mylast - 1, _Mylast);
_Orphan_range(_Where._Myptr, _Mylast);
--_Mylast;
return (iterator(_Where._Myptr, this));
}
Basically, the line
unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
does exactly what you thought - copies the following elements over (or moves them in C++11 as bames53 pointed out).
To answer your second question, no, the capacity cannot decrease on its own.
The complexities of the algorithms in std can be found at http://www.cplusplus.com/reference/stl/ and the implementation, as previously stated, is visible.
Does it copy each element starting from index 1 and copy the element to index-1 ?
Yes (though it actually moves them since C++11).
does it automatically reduce the size?
No, reducing the size would typically invalidate iterators to existing elements, and that's only allowed on certain function calls.
I looked online and there are only APIs and tutorials how to use STL library. Is there any good references that I can have an idea of the implementation or complexity of STL library?
You can read the C++ specification which will tell you exactly what's allowed and what isn't in terms of implementation. You can also go look at your actual implementation.
Vector will copy (move in C++11) the elements to the beginning, that's why you should use deque if you would like to insert and erase from the beginning of a collection. If you want to truly resize the vector's internal buffer you can do this:
vector<Type>(v).swap(v);
This will hopefully make a temporary vector with the correct size, then swaps it's internal buffer with the old one, then the temporary one goes out of scope and the large buffer gets deallocated with it.
As others noted, you may use vector::shrink_to_fit() in C++11.
That's one of my (many) objection to C++. Everybody says "use the standard libraries" ... but even when you have the STL source (which is freely available from many different places. Including, in this case, the header file itself!) ... it's basically an incomprehensible nightmare to dig in to and try to understand.
The (C-only) Linux kernel is a paragon of simplicity and clarity in contrast.
But we digress :)
Here's the 10,000-foot answer to your question:
http://www.cplusplus.com/reference/stl/vector/
Vector containers are implemented as dynamic arrays; Just as regular
arrays, vector containers have their elements stored in contiguous
storage locations, which means that their elements can be accessed not
only using iterators but also using offsets on regular pointers to
elements.
But unlike regular arrays, storage in vectors is handled
automatically, allowing it to be expanded and contracted as needed.
Vectors are good at:
Accessing individual elements by their position index (constant time).
Iterating over the elements in any order (linear time).
Add and remove elements from its end (constant amortized time).
Compared to arrays, they provide almost the same performance for these
tasks, plus they have the ability to be easily resized. Although, they
usually consume more memory than arrays when their capacity is handled
automatically (this is in order to accommodate extra storage space for
future growth).
Compared to the other base standard sequence containers (deques and
lists), vectors are generally the most efficient in time for accessing
elements and to add or remove elements from the end of the sequence.
For operations that involve inserting or removing elements at
positions other than the end, they perform worse than deques and
lists, and have less consistent iterators and references than lists.
...
Reallocations may be a costly operation in terms of performance, since
they generally involve the entire storage space used by the vector to
be copied to a new location. You can use member function
vector::reserve to indicate beforehand a capacity for the vector. This
can help optimize storage space and reduce the number of reallocations
when many enlargements are planned.
...
I only need a vector of size 10 then does it automatically reduce the size?
No it doesn't automatically shrink.
Traditionally you swap the vector with a new empty one: reduce the capacity of an stl vector
But C++x11 includes a std::vector::shrink_to_fit() which it does it directly
I have a class that looks like this:
typedef std::list<char*> PtrList;
class Foo
{
public:
void DoStuff();
private:
PtrList m_list;
PtrList::iterator m_it;
};
The function DoStuff() basically adds elements to m_list or erases elements from it, finds an iterator to some special element in it and stores it in m_it. It is important to note that each value of m_it is used in every following call of DoStuff().
So what's the problem?
Everything works, except that profiling shows that the operator new is invoked too much due to list::push_back() called from DoStuff().
To increase performance I want to preallocate memory for m_list in the initialization of Foo as I would do if it were an std::vector. The problem is that this would introduce new problems such as:
Less efficient insert and erase of elements.
m_it becomes invalid as soon as the vector is changed from one call to DoStuff() to the next. EDIT: Alan Stokes suggested to use an index instead of an iterator, solving this issue.
My solution: the simplest solution I could think of is to implement a pool of objects that also has a linked-list functionality. This way I get a linked list and can preallocate memory for it.
Am I missing something or is it really the simplest solution? I'd rather not "re-invent the wheel", and use a standard solution instead, if it exists.
Any thoughts, workarounds or enlightening comments would be appreciated!
I think you are using wrong the container.
If you want fast push back then don't automatically assume that you need a linked list, a linked list is a slow container, it is basically suitable for reordering.
A better container is a std::deque. A deque is basically a array of arrays. It allocates a block of memory and occupies it when you push back, when it runs out it will allocate another block. This means that it only allocates very infrequently and you do not have to know the size of the container ahead of time for efficiency like std::vector and reserver.
You can use the splice function in std::list to implement a pool. Add a new member variable PtrList m_Pool. When you want to add a new object and the pool is not empty, assign the value to the first element in the pool and then splice it into the list. To erase an element, splice it from the list to the pool.
But if you don't care about the order of the elements, then a deque can be much faster. If you want to erase an element in the middle, copy the last element onto the element you want to delete, then erase the last element.
My advice is the same as 111111's, try switching to deque before you write any significant code.
However, to directly answer your question: you could use std::list with a custom allocator. It's a bit fiddly, and I'm not going to work through all the details here, but the gist of it is that you write a class that represents the memory allocation strategy for list nodes. The nodes allocated by list will be a small implementation-defined amount larger than char*, but they will all be the same size, which means you can write an optimized allocator just for that size (a pool of memory blocks rather than a pool of objects), and you can add functions to it that let you reserve whatever space you want in the allocator, at the time you want. Then the list can allocate/free quickly. This saves you needing to re-implement any of the actual list functionality.
If you were (for some reason) going to implement a pool of objects with list functionality, then you could start with boost::intrusive. That might also be useful when writing your own allocator, for keeping track of your list of free blocks.
List and vector are completely different in the way they manage objects.
Vector constructs elements in place into a allocated buffer of a given capacity. New allocation happens when the capacity is exhausted.
List allocate elements one by one, each into an individually allocated space.
Vector elements shift when something is inserted / removed, hence, vector indexes and element addresses are not stable.
List element are re-linked when something is inserted / removed, hence, list iterators and elements addresses are stable.
A way to make a list to behave similarly to a vector, is to replace the default allocator (that allocates form the system every time is invoked) with another one the allocates objects in larger chunks, dispatching sub-chunks to the list when it invokes it.
This is not something the standard library provides by default.
Could potentially use list::get_allocator().allocate(). Afaik, default behaviour would be to acquire memory as and when due to the non-contiguous nature of lists - hence the lack of reserve() - but no major drawbacks with using the allocator method occur to me immediately. Provided you have a non-critical section in your program, at the start or whatever, you can at least choose to take the damage at that point.
Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.
The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.
Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.
I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.
You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).
You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.
You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/
Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.