Reference to a subset of a container object - c++

i have a quick question about having a reference to a subset of a collection.
Consider i have a vector of objects. Now I want to create another vector which is a subset of this vector, and I dont want to create a copy of the subset of objects.
One of the ways I was thinking about is creating a vector<auto_ptr<MyClass> >. Is this a good approach?
Please suggest if you think any other containers or idioms or patterns would help in this case.
Thank you

No ! See : Why it is wrong to use std::auto_ptr<> with STL containers ?
Now, as an alternative, you could store raw pointers or boost::shared_ptr depending on your needs.

Another, possibly more STL way would be to just have the one vector but keep track of sub ranges using pairs of iterators (note all the algorithms use iterators for exactly this reason)

You can use a vector of indices: vector<int> (or vector<size_t> if you want to be pedantic). This is better than storing pointers (pointers in a general meaning: raw C/C++ pointers, shared_ptr, iterator, etc) if the containing vector is not constant.
Consider the following scenario: the "big" vector contains an apple, an orange and a lemon, while the "small" vector contains a pointer to the apple. If you add a bunch of other fruits to the big vector, the STL is going to reallocate storage for the vector, so the pointer to the apple will be invalid (point to deallocated memory).
If the above scenario is possible, use a vector of indices. If it is not possible, use the other techniques (e.g. a vector of raw pointers, or a vector of copies of the objects).

If the subsection is contiguous you can reference the subsection using an iterator and a count indicating how many items you are referencing.
The sane way to do this would be to create some sort of templated class which you could construct with a container reference and two indices, and let the class do all the bounds and error checking, though I'm unsure how you'd be able to tell if the underlying container still existed at some later time...

Related

How valid positions in vector::insert()? [duplicate]

This question is related with item 16 of effective stl book which states that while using vector(lets assume vector<int>vec) instead of array in a legacy code we must use &vec[0] instead of vec.begin() :
void doSomething(const int* pInts, size_t numlnts);
dosomething(&vec[0],vec.size()); \\correct!!
dosomething(vec.begin(),vec.size()); \\ wrong!! why???
The book states that vec.begin() is not same as &vec[0] . Why ? What the difference between the two ?
A std::vector is sequence container that encapsulates dynamic size arrays. This lets you conveniently store a bunch of elements without needing to be as concerned with managing the underlying array that is the storage for your elements. A large part of the convenience of using these classes comes from the fact that they give you a bunch of methods that let you deal with the sequence without needing to deal with raw pointers, an iterator is an example of this.
&vec[0] is a pointer to the first element of the underlying storage that the vector is using. vec.begin() is an iterator that starts at the beginning of the vector. While both of these give you a way to access the elements in the sequence these are 2 distinct concepts. Search up iterators to get a better idea of how this works.
If your code supports iterators its often easiest to use the iterators to iterate over the data. Part of the reasons for this is that iterators are not pointers, they let you iterate over the elements of the data structure without needing to know as much about the implementation details of the datastructure you are iterating over.
However sometimes you need the raw array of items, for example in some legacy API's or calls to C code you might need to pass a pointer to the array. In this case you have no choice but to extract the raw array from the vector, you can do this using something such as &vec[0]. Note that if you have c++11 support there's an explicit way to do this with std::vector::data which will give you access to the underlying storage array. The c++11 way has the additional benefit of also more clearly stating your intent to the people reading your code.
Formally, one produces an iterator, and the other a pointer, but I think the major difference is that vec[0] will do bad stuff if the vector is empty, while vec.begin() will not.
vec.begin() has type std::vector<int>::iterator. &vec[0] has type pointer to std::vector<int>::value_type. These are not necessarily the same type.
It is possible that a given implementation uses pointers as the iterator implementation for a vector, but this is not guaranteed, and thus you should not rely on that assumption. In fact most implementations do provide a wrapping iterator type.
Regarding your question about pointers being iterators, this is partly true. Pointers do meet the criteria of a random access iterator, but not all iterators are pointers. And there are iterators that do not support the random access behavior of pointers.

C++ STL vector for existing data

Can I create an std::vector using my pre-existing data instead of it allocating new memory and copying the data?
To be clearer, if I have a memory area (either a c-array or part of another vector or whatever) and I want to provide vector-like access to it, can I create a vector and tell it to use this block of memory?
No, but you could write your own class that does this. Since this would be a fairly common need I wouldn't be surprised if someone else has done this already.
However the normal C++ way would be to write template code to operate on iterators. You can create iterators for any portion of a vector, or for any portion of a C array (and much else). So writing template code for iterators is probably what you should be doing.
Since you can use a custom allocator when creating a vector, it is technically possible.
However, I wouldn't recommend it. I'd just create a vector with a fixed size (apparently you can get a hold of that) and then use std::copy.
Algorithms which iterate over a container accept a pair of iterators which define the input range. You can use the algorithm with iterators which point to a middle of a big container.
Examples:
std::vector<int> big_vector(100000);
// initialize it
//...
std::sort(big_vector.begin()+100, big_vector.begin()+200); // sort a subrange
int big_array[100000]; //c-style array
// initialize it
//...
std::sort(std::begin(big_array)+300, std::begin(big_array)+400); // sort a subrange
From C++20 we have std::span which provides the exact same thing you are looking for. Take a look at https://en.cppreference.com/w/cpp/container/span.

Sharing an array with STL vectors

I would like to share the contents of an array of doubles a of size k with one or more STL vectors v1, v2...vn.
The effect that I want from this shared storage is that if the underlying array gets modified the change can be observed from all the vectors that share its contents with the array.
I can do that by defining the vectors v1...vn as vectors of pointers
vector<double*> v1;
and copy the pointers a to a + k into this vector. However, I do not like that solution. I want the vectors to be a vector of doubles.
Given that you can extract the underlying pointer from a vector I am assuming one could initialize a vector with an array in such a way that the contents are shared. Would appreciate help about how to do this.
Given that you can extract the underlying pointer from a vector I am assuming one could initialize a vector with an array in such a way that the contents are shared.
No, you can't do this. The Standard Library containers always manage their own memory.
Your best option is to create the std::vector<double> and then use it as an array where you need to do so (via &v[0], assuming the vector is non-empty).
If you just want to have the container interface, consider using std::array (or boost::array or std::tr1::array) or writing your own container interface to encapsulate the array.
This sounds to me like you want to alias the array with a vector. So logically you want a vector of references (which doesn't work for syntactical reasons). If you really really need this feature, you can write your own ref wrapper class, that behaves exactly like an actual C++ reference, so the users of your vn vectors wont be able to distinguish between vector<T> and vector<ref<T> > (e.g. with T = double). But internally, you could link the items in the vectors to the items in your "master" array.
But you should have darned good reasons to do this overhead circus :)
OK, Standard Library containers are both holders of information, and enumerators for those elements. That is, roughly any container can be used in almost any algorithm, and at least, you can go through them using begin() and end().
When you separate both (element holding and element enumeration), as in your case, you may consider boost.range. boost.range gives you a pair of iterators that delimit the extent to which algorithms will be applied, and you have the actual memory store in your array. This works mostly to read-access them, because normally, modifying the structure of the vector will invalidate the iterators. You can recreate them, though.
To answer your question, as far as I know std::vector can not be given an already constructed array to use. I can not even think how that could be done since there are also the size/capacity related variables. You can possibly try to hack a way to do it using a custom allocator but I feel it will be ugly, error prone and not intuitive for future maintenance.
That said, if I may rephrase your words a bit, you are asking for multiple references to the same std::vector. I would either do just that or maybe consider using a shared_ptr to a vector.

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object
std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:
You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.
The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.
Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.
You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

Should I return an iterator or a pointer to an element in a STL container?

I am developing an engine for porting existing code to a different platform. The existing code has been developed using a third party API, and my engine will redefine those third party API functions in terms of my new platform.
The following definitions come from the API:
typedef unsigned long shape_handle;
shape_handle make_new_shape( int type );
I need to redefine make_new_shape and I have the option to redefine shape_handle.
I have defined this structure ( simplified ):
struct Shape
{
int type
};
The Caller of make_new_shape doesn't care about the underlying structure of Shape, it just needs a "handle" to it so that it can call functions like:
void `set_shape_color( myshape, RED );`
where myshape is the handle to the shape.
My engine will manage the memory for the Shape objects and other requirements dictate that the engine should be storing Shape objects in a list or other iterable container.
My question is, what is the safest way to represent this handle - if the Shape itself is going to be stored in a std::list - an iterator, a pointer, an index?
Both an iterators or a pointers will do bad stuff if you try to access them after the object has been deleted so neither is intrinsically safer. The advantage of an iterator is that it can be used to access other members of your collection.
So, if you just want to access your Shape then a pointer will be simplest. If you want to iterate through your list then use an iterator.
An index is useless in a list since std::list does not overload the [] operator.
The answer depends on your representation:
for std::list, use an iterator (not a pointer), because an iterator allows you to remove the element without walking the whole list.
for std::map or boost::unordered_map, use the Key (of course)
Your design would be much strong if you used an associative container, because associative containers give you the ability to query for the presence of the object, rather than invoking Undefined Behavior.
Try benchmarking both map and unordered_map to see which one is faster in your case :)
IIF the internal representation will be a list of Shapes, then pointers and iterators are safe. Once an element is allocated, no relocation will ever occur. I wouldn't recommend an index for obvious access performance reasons. O(n) in case of lists.
If you were using a vector, then don't use iterators or pointers, because elements can be relocated when you exceed the vectors capacity, and your pointers/iterators would become invalid.
If you want a representation that is safe regardless of the internal container, then create a container (list/vector) of pointers to your shapes, and return the shape pointer to your client. Even if the container is moved around in memory, the Shape objects will stay in the same location.
Iterators aren't safer than pointers, but they have much better diagnostics than raw pointers if you're using a checked STL implementation!
For example, in a debug build, if you return a pointer to a list element, then erase that list element, you have a dangling pointer. If you access it you get a crash and all you can see is junk data. That can make it difficult to work out what went wrong.
If you use an iterator and you have a checked STL implementation, as soon as you access the iterator to an erased element, you get a message something like "iterator was invalidated". That's because you erased the element it points to. Boom, you just saved yourself potentially a whole lot of debugging effort.
So, not indices for O(n) performance. Between pointers and iterators - always iterators!