A few C++ vector questions - c++

I'm trying to learn some c++, to start off I created some methods to handle outputing to and reading from a console.
I'm having 2 major problems, marked in the code, manipulating/accessing values within a std::vector of strings passed in by reference.
The method below takes in a question (std string) to ask the user and a vector std strings that contain responses from the user deemed acceptable. I also wanted, in the interest of learning, to access a string within the vector and change its value.
std::string My_Namespace::My_Class::ask(std::string question, std::vector<std::string> *validInputs){
bool val = false;
std::string response;
while(!val){
//Ask and get a response
response = ask(question);
//Iterate through the acceptable responses looking for a match
for(unsigned int i = 0; i < validInputs->size(); i++){
if(response == validInputs->at(i)){
////1) Above condition always returns true/////
val = true;
break;
}
}
}
//////////2) does not print anything//////////
println(validInputs->at(0)); //note the println method is just cout << param << "\n" << std::endl
//Really I want to manipulate its value (not the pointer the actual value)
//So I'd want something analogous to validInputs.set(index, newVal); from java
///////////////////////////////////////////
}
A few additional questions:
3) I'm using .at(index) on the the vector to get the value but I've read that [] should be used instead, however I'm not sure what that should look like (validInputs[i] doesn't compile).
4) I assume that since a deep copy is unnecessary its good practice to pass in a pointer to the vector as above, can someone verify that?
5) I've heard that ++i is better practice than i++ in loops, is that true? why?

3) There should not be a significant difference using at and operator[] in this case. Note that you have a pointer-to-vector, not a vector (nor reference-to-vector) so you will have to use either (*validInputs)[i] or validInputs->operator[](i) to use the operator overload. Using validInputs->at(i) is fine if you don't want to use either of these other approaches. (The at method will throw an exception if the argument is out of the array bounds, while the operator[] method has undefined behavior when the argument is out of the array bounds. Since operator[] skips the bounds check, it is faster if you know for a fact that i is within the vector's bounds. If you are not sure, use at and be prepared to catch an exception.)
4) A pointer is good, but a reference would be better. And if you're not modifying the vector in the method, a reference-to-const-vector would be best (std::vector<std::string> const &). This ensures that you cannot be passed a null pointer (references cannot be null), while also ensuring that you don't accidentally modify the vector.
5) It usually is. i++ is post-increment, which means that the original value must be copied, then i is incremented and the copy of the original value is returned. ++i increments i and then returns i, so it is usually faster, especially when dealing with complex iterators. With an unsigned int the compiler should be smart enough to realize that a pre-increment will be fine, but it's good to get into the practice of using ++i if you don't need the original, unincremented value of i.

I'd use a reference-to-const, and std::find. Note that I also take the string by reference (it gets deep copied otherwise) :
std::string My_Class::
ask (const std::string& question, const std::vector<std::string>& validInputs)
{
for (;;) {
auto response = ask (question);
auto i = std::find (validInputs.begin (), validInputs.end (), response);
if (i != validInputs.end ()) {
std::cout << *i << '\n'; // Prints the value found
return *i;
}
}
}
Read about iterators if you don't understand the code. Of course, feel free to ask other questions if you need.

I'm not going to address points 1 and 2 since we don't know what you are doing and we don't even see the code for ask and println.
I'm using .at(index) on the the vector to get the value but I've read that [] should be used instead, however I'm not sure what that should look like (validInputs[i] doesn't compile).
Subscript access and at member function are different things. They give you the very same thing, a reference to the indexed element, but they behave differently if you pass an out-of bounds index: at will throw an exception while [] will invoke undefined behavior (as builtin arrays do). Using [] on a pointer is somewhat ugly, (*validInputs)[i], but you really should avoid pointers when possible.
I assume that since a deep copy is unnecessary its good practice to pass in a pointer to the vector as above, can someone verify that?
A deep copy is unnecessary, but so is a pointer. You want a reference instead, and a const one since I presume you shouldn't be modifying those:
ask(std::string const& question, std::vector<std::string> const& validInputs)
I've heard that ++i is better practice than i++ in loops, is that true? why?
Its true in the general case. The two operations are different, ++i increments i and returns the new value while i++ increments i but returns the value before the incrementation, which requires a temporary to be hold and returned. For ints this hardly matters, but for potentially fat iterators preincrement is more efficient and a better choice if you don't need or care for its return value.

To answer questions 1 and 2, we'll probably need more information, like: How did you initialize validInputs? What's the source of ask?
3) First dereference the pointer, then index the vector:
(*validInputs)[i]
4) References are considered better style. Especially instead of pointers which never are NULL.
5) For integers, it doesn't matter (unless you evaluate the result of the expression). For other objects, with overloaded ++ operators (iterators, for example) it may be better to use ++i. But in practice, for inline definitions of the ++ operator, it will probably be optimized to the same code.

Related

Pass vectors by pointer and reference in C++

A quick question about how to safely pass and use vectors in c++.
I know that when using vectors you have to be very careful with addresses to them and their elements because when you dynamically change their size they may change their address (unless you use reserve etc. but I'm imagining I will not know how much space I will need).
Now I want to pass an existing vector (created elsewhere) to a function which adapts it and changes it size etc. but I'm a little unclear as to what is safe to do because I would normally achieve all of this with pointers. On top of this there is using references to the vector and this just muddies the water for me.
For instance take the two following functions and comments in them
void function1(std::vector<int>* vec){
std::cout<<"the size of the vector is: "<<vec->size()<<std::endl; //presumably valid here
for (int i=0;i<10;i++){
(*vec).pushback(i); //Is this safe? Or will this fail?
// Or: vec->pushback(i); Any difference?
}
std::cout<<"the size of the vector is: "<<vec->size()<<std::endl; //Is this line valid here??
}
AND
void function2(std::vector<int>& vec){
std::cout<<"the size of the vector is: "<<vec.size()<<std::endl; //presumably valid here
for (int i=0;i<10;i++){
vec.pushback(i); //Is this safe? Or will this fail?
}
std::cout<<"the size of the vector is: "<<vec.size()<<std::endl; //Is this line valid here??
}
Is there any difference between the two functions, both in terms of functionality and in terms of safety?
Or in other words, if I only have a pointer/reference to a vector and need to resize it how can I be sure where the vector will actually be in memory, or what the pointer to the vector really is, after I operate on it. Thanks.
In term of functionality, in the very limited context you gave us, they are essentially the same.
In more general view, if you want to write generic code, consider that operation and operators bind directly to reference, but not to pointers
a = b + c;
To compile requires
A operator+(const B&, const C&);
But
A* operator+(const B*, const C*);
is all a different beast.
Also, an expression taking reference and taking value have the same syntax, but an expression taking pointers require pointers to be deference to provide equal semantics, but this leads to different expression syntax ( *a + *b against a+b ) thus leading to "less general code".
On the counterpart, if you are writing a class that have runtime polymorphism (and lyskov substitution in mind), you will most likely treat dynamically allocated object, and hence, manipulating them through pointers may be more natural.
There are "grey areas" where the two things mesh, but -in general- pointer taking function are more frequent in runtime based OOP frameworks, while reference taking functions are more frequent in "value based generic algorithms", where static type deduction is expected, and on-stack based allocation is most likely wanted.

Is it wise to use a pointer to access values in an std::map

Is it dangerous to returning a pointer out of a std::map::find to the data and using that as opposed to getting a copy of the data?
Currently, i get a pointer to an entry in my map and pass it to another function to display the data. I'm concerned about items moving causing the pointer to become invalid. Is this a legit concern?
Here is my sample function:
MyStruct* StructManagementClass::GetStructPtr(int structId)
{
std::map<int, MyStruct>::iterator foundStruct;
foundStruct= myStructList.find(structId);
if (foundStruct== myStructList.end())
{
MyStruct newStruct;
memset(&newStruct, 0, sizeof(MyStruct));
myStructList.structId= structId;
myStructList.insert(pair<int, MyStruct>(structId, newStruct));
foundStruct= myStructList.find(structId);
}
return (MyStruct*) &foundStruct->second;
}
It would undoubtedly be more typical to return an iterator than a pointer, though it probably makes little difference.
As far as remaining valid goes: a map iterator remains valid until/unless the item it refers to is removed/erased from the map.
When you insert or delete some other node in the map, that can result in the nodes in the map being rearranged. That's done by manipulating the pointers between the nodes though, so it changes what other nodes contain pointers to the node you care about, but does not change the address or content of that particular node, so pointers/iterators to that node remain valid.
As long as you, your code, and your development team understand the lifetime of std::map values ( valid after insert, and invalid after erase, clear, assign, or operator= ), then using an iterator, const_iterator, ::mapped_type*, or ::mapped_type const* are all valid. Also, if the return is always guaranteed to exist, then ::mapped_type&, or ::mapped_type const& are also valid.
As for wise, I'd prefer the const versions over the mutable versions, and I'd prefer references over pointers over iterators.
Returning an iterator vs. a pointer is bad:
it exposes an implementation detail.
it is awkward to use, as the caller has to know to dereference the iterator, that the result is an std::pair, and that one must then call .second to get the actual value.
.first is the key that the user may not care about.
determining if an iterator is invalid requires knowledge of ::end(), which is not obviously available to the caller.
It's not dangerous - the pointer remains valid just as long as an iterator or a reference does.
However, in your particular case, I would argue that it is not the right thing anyway. Your function unconditionally returns a result. It never returns null. So why not return a reference?
Also, some comments on your code.
std::map<int, MyStruct>::iterator foundStruct;
foundStruct = myStructList.find(structId);
Why not combine declaration and assignment into initialization? Then, if you have C++11 support, you can just write
auto foundStruct = myStructList.find(structId);
Then:
myStructList.insert(pair<int, MyStruct>(structId, newStruct));
foundStruct = myStructList.find(structId);
You can simplify the insertion using make_pair. You can also avoid the redundant lookup, because insert returns an iterator to the newly inserted element (as the first element of a pair).
foundStruct = myStructList.insert(make_pair(structId, newStruct)).first;
Finally:
return (MyStruct*) &foundStruct->second;
Don't ever use C-style casts. It might not do what you expect. Also, don't use casts at all when they're not necessary. &foundStruct->second already has type MyStruct*, so why insert a cast? The only thing it does is hide a place that you need to change if you ever, say, change the value type of your map.
Yes,
If you build a generic function without knowing the use of it, it can be dangerous to return the pointer (or the iterator) since it can become un-valid.
I would advice do one of two:
1. work with std::shared_ptr and return that. (see below)
2. return the struct by value (can be slower)
//change the difination of the list to
std::map<int, std::shared_ptr<MyStruct>>myStructList;
std::shared_ptr<MyStruct> StructManagementClass::GetStructPtr(int structId)
{
std::map<int, std::shared_ptr<MyStruct>>::iterator foundStruct;
foundStruct = myStructList.find(structId);
if (foundStruct == myStructList.end())
{
MyStruct newStruct;
memset(&newStruct, 0, sizeof(MyStruct));
myStructList.structId= structId;
myStructList.insert(pair<int, shared_ptr<MyStruct>>(structId, shared_ptr<MyStruct>(newStruct)));
foundStruct= myStructList.find(structId);
}
return foundStruct->second;

Returning an object or a pointer in C++

In C++, should my method return an object or a pointer to an object? How to decide? What if it's an operator? How can I define?
And one more thing - if the pointer turns to be a vector, how can I find out its size after returned? And if it's impossible, as I think it is, how should I proceed to correctly return an array without this limitation?
In C++, should my method return an object or a pointer to an object?
How to decide?
Since C++11 we have move semantics in C++ which means that it as easy as before and now also fast to return by value. That should be the default.
What if it's an operator? How can I define?
Many operators such as operator= normally return a reference to *this
X& X::operator=(X rhs);
You need to look that up for each operator if you would like to comply with the usual patterns (and you should). Start here: Operator overloading
As pointed out by Ed S. return value optimization also applies (even before C++11) meaning that often object you return need neither be copied or moved.
So, this is now the way to return stuff:
std::string getstring(){
std::string foo("hello");
foo+=" world";
return foo;
}
The fact that I made a foo object here is not my point, even if you did just do return "hello world"; this is the way to go.
And one more thing - if the pointer turns to be a vector, how can I
find out its size after returned? And if it's impossible, as I think
it is, how should I proceed to correctly return an array without this
limitation?
The same goes for all copyable or movable types in the standard (these are almost all types, for example vectors, sets, and what not), except a few exceptions. For example std::arrays do not gain from moving. They take time proportional to the amount of elements. There you could return it in a unique_ptr to avoid the copy.
typedef std::array<int,15> MyArray;
std::unique_ptr<MyArray> getArray(){
std::unique_ptr<MyArray> someArrayObj(new MyArray());
someArrayObj->at(3)=5;
return someArrayObj;
}
int main(){
auto x=getArray();
std::cout << x->at(3) <<std::endl; // or since we know the index is right: (*x)[3]
}
Now, to avoid ever writing new anymore (except for experts in rare cases) you should use a helper function called make_unique. That will vastly help exception safety, and is as convenient:
std::unique_ptr<MyArray> getArray(){
auto someArrayObj=make_unique<MyArray>();
someArrayObj->at(3)=5;
return someArrayObj;
}
For more motivation and the (really short) implementation of make_unique, have a look here:
make_unique and perfect forwarding
Update
Now make_unique is part of the C++14 standard. If you don't have it, you can find and use the whole implementation from the proposal by S.T.L.:
Ideone example on how to do that
In C++, should my method return an object or a pointer to an object?
You should return an object by default. Usual exceptions are functions that return a subclass of a given class, and when returning nothing is a legal option for a function1.
What if it's an operator?
Operators return references or objects; although it is technically possible to return pointers from overloaded operators, it is not usually done.
And one more thing - if the pointer turns to be a vector, how can I find out it's size after returned?
I think you meant an array rather than a vector, because std::vector has a size() member function returning the size of the vector. Finding the size of a variable-length array is indeed not possible.
And if it's impossible, as I think it is, how should I proceed to correctly return an array without this limitation?
You should use std::vector, it does not limit you on the size or the type of elements that go into it.
1 In which case you return NULL or nullptr in C++11.
Unless there is some specific reason to use plain pointers, always return something memory-safe. In an estimated 95% of all cases, simply returning objects is fine, and then return-by-value is definitely the canonical thing to do (simple, efficient, good!).
The remaining 5% are mostly when the returned object is runtime-polymorphic; such an object can't be returned by value in C++ since that would happen on the stack. In such a case, you should return a smart pointer to the new object, in C++11 the standard choice is std::unique_ptr. There is also the case when you want to optionally return something, but that's IMO a case for a specific container, not for pointers, boost::optional or something like that.

Curious behaviour of std::string::operator[] in MSVC

I've been using some semi-iterators to tokenize a std::string, and I've run into a curious problem with operator[]. When constructing a new string from a position using char*, I've used something like the following:
t.begin = i;
t.end = i + 1;
t.contents = std::string(&arg.second[t.begin], &arg.second[t.end]);
where arg.second is a std::string. But, if i is the position of the last character, then arg.second[t.end] will throw a debugging assertion- even though taking a pointer of one-past-the-end is well defined behaviour and even common for primitive arrays, and since the constructor is being called using iterators I know that the end iterator will never be de-referenced. Doesn't it seem logical that arg.second[arg.second.size()] should be a valid expression, producing the equivalent of arg.second.end() as a char*?
You're not taking a pointer to one past the end, you're ACCESSING one past the end and then getting the address of that. Entirely different and while the the former is well defined and well formed, the latter is not either. I suggest using the iterator constructor, which is basically what you ARE using but do so with iterators instead of char*. See Alexandre's comment.
operator[](size_type pos) const doesn't return one-past-the-end is pos == size(); it returns charT(), which is a temporary. In the non-const version of operator[], the behavior is undefined.
21.3.4/1
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1 Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const
version returns charT(). Otherwise, the behavior is undefined.
What is well-defined is creating an iterator one past the end. (Pointers might be iterators, too.) However, dereferencing such an iterator will yield Undefined Behavior.
Now, what you're doing is array subscription, and that is very different from forming iterators, because it returns a reference to the referred-to object (much akin to dereferencing an iterator). You are certainly not allowed to access an array one-past-the-end.
std::string is not an array. It is an object, whose interface loosely resembles an array (namely, provides operator[]). But that's when the similarity ends.
Even if we for a second assume that std::string is just a wrapper built on top of an ordinary array, then in order to obtain the one-past-the-end pointer for the stored sequence, you have to do something like &arg.second[0] + t.end, i.e. instead of going through the std::string interface first move into into the domain of ordinary pointers and use ordinary low-level pointer arithmetic.
However, even that assumption is not correct and doing something like &arg.second[0] + t.end is a recipe for disaster. std::string is not guaranteed to store its controlled sequence as an array. It is not guaranteed to be stored continuously, meaning that regardless of where your pointers point, you cannot assume that you'll be able to iterate from one to another by using pointer arithmetic.
If you want to use an std::string in some legacy pointer-based interface the only choice you have is to go through the std::string::c_str() method, which will generate a non-permanent array-based copy of the controlled sequence.
P.S. Note, BTW, that in the original C and C++ specifications it is illegal to use the &a[N] method to obtain the one-past-the-end pointer even for an ordinary built-in array. You always have to make sure that you are not using the [] operator with past-the-end index. The legal way to obtain the pointer has always been something like a + N or &a[0] + N, but not &a[N]. Recent changes legalized the &a[N] approach as well, but nevertheless originally it was not legal.
A string is not a primitive array, so I'd say the implementation is free to add some debug diagnostics if you are doing something dangerous like accessing elements outside its range. I would guess that a release build will probably work.
But...
For what you are trying to do, why not just use the basic_string( const basic_string& str, size_type index, size_type length ); constructor to create the sub strings?

Efficient passing of std::vector

When a C++ function accepts an std::vector argument, the usual pattern is to pass it by const reference, such as:
int sum2(const std::vector<int> &v)
{
int s = 0;
for(size_t i = 0; i < v.size(); i++) s += fn(v[i]);
return s;
}
I believe that this code results in double dereferencing when the vector elements are accessed, because the CPU should first dereference v to read the pointer to the first element, which pointer needs to be dereferenced again to read the first element. I would expect that it would be more efficient to pass a shallow copy of the vector object on the stack. Such shallow copy would encapsulate a pointer to the first element, and the size, with the pointer referencing the same memory area as the original vector does.
int sum2(vector_ref<int> v)
{
int s = 0;
for(size_t i = 0; i < v.size(); i++) s += fn(v[i]);
return s;
}
Similar performance, but much less convenience could be achieved by passing a random access iterator pair. My question is: what is flawed with this idea? I expect that there should be some good reason that smart people accept to pay the performace cost of vector reference, or deal with the inconvenience of iterators.
Edit: Based on the coments below, please consider the situation if I simply rename the suggested vector_ref class to slice or range. The intention is to use random-access iterator pairs with more natural syntax.
I believe that this code results in double dereferencing when the vector elements are accessed
Not necessarily. Compilers are pretty smart and should be able to eliminate common subexpressions. They can see that the operator [] doesn't change the 'pointer to the first element', so they have no need make the CPU reload it from memory for every loop iteration.
What's wrong with your idea is that you already have two perfectly good solutions:
Pass the vector as is, either by value (where the compiler will often eliminate the copy), or by (const) reference, and trust the compiler to eliminate the double indirection, or
Pass an iterator pair.
Of course you can argue that the iterator pair is "less natural syntax", but I disagree. It is perfectly natural to anyone who's used to the STL. It is efficient, and gives you exactly what you need to work with the range, using std algorithms or your own functions.
Iterator pairs are a common C++ idiom, and a C++ programmer reading your code will understand them without a problem, whereas they're going to be surprised at your home-brewed vector wrappers.
If you're really paranoid about performance, pass the pair of iterators. If the syntax really bothers you, pass the vector and trust the compiler.
What is flawed with this idea?
Simple: It's premature optimization. Alternatives: Accept a vector<int> const& and use iterators or pass iterators directly to the function.
You're right that there's an extra indirection here. It's conceivable (though it would be surprising) if the compiler (with the help of link-time code generation) optimized it away.
What you've proposed is sometimes called slicing, and it's used extensively in some situations. Though, in general, I'm not sure it's worth the dangers. You have to be very careful about invaliding your slice (or someone else's).
Note that if you used iterators for the loop instead of indexing, then you'd deref the reference only a couple times (to call begin() and end()) rather than n times (to index into the vector).
int sum(const vector<int> &v)
{
int s = 0;
for (auto it = v.begin(); it != v.end(); ++it) {
s += fn(*it);
}
return s;
}
(I'm assuming the optimizer will hoist the end() calls out of the loop. You could do it explicitly to be certain.)
Passing a pair of iterators instead of the container itself seems like the STL idiom. That would give you more generality, as the type of container can vary, but so can the number of dereferences needed.
Pass by value unless you're certain passing by reference improves performances.
When you pass by value, copy elision may occur which will result in similar if not better performances.
Dave wrote about it here:
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
There is no double dereferencing because the compiler will probably pass the real pointer to the vector as the argument and not a pointer to a pointer. You can simply try this out and check the disassembly view of your IDE for what is actually going on behind the scenes:
void Method(std::vector<int> const& vec) {
int i = vec.back();
}
void SomeOtherMethod() {
std::vector<int> vec;
vec.push_back(1);
Method(vec);
}
What happens here? The vector is allocated on the stack. The first push back is translated to:
push eax // this is the constant one that has been stored in eax
lea ecx,[ebp-24h] // ecx is the pointer to vec on the stack
call std::vector<int,std::allocator<int> >::push_back
Now we call Method(), passing the vector const&:
lea ecx,[ebp-24h]
push ecx
call Method (8274DC0h)
Unsurprisingly, the pointer to the vector is passed as references are nothing but permanently dereferenced pointers. Now inside Method(), the vector is accessed again:
mov ecx,dword ptr [ebp+8]
call std::vector<int,std::allocator<int> >::back (8276100h)
The vector pointer is taken directly from the stack and written to ecx.