C++: Why does this string input fail while the other does not

C++: Why does this string input fail while the other does not - c++

I got this problem from a friend
#include <string>
#include <vector>
#include <iostream>
void riddle(std::string input)
{
auto strings = std::vector<std::string>{};
strings.push_back(input);
auto raw = strings[0].c_str();
strings.emplace_back("dummy");
std::cout << raw << "\n";
}
int main()
{
riddle("Hello world of!"); // Why does this print garbage?
//riddle("Always look at the bright side of life!"); // And why doesn't this?
std::cin.get();
}
My first observation is that the riddle() function will not produce garbage when the number of words passed into input is more than 3 words. I am still trying to see why it fails for the first case and not for the second case. Anyways thought this was be fun to share.

This is undefined behavior (UB), meaning that anything can happen, including the code working.
It is UB because the emplace_back invalidates all pointers into the objects in the vector. This happens because the vector may be reallocated (which apparently it is).
The first case of UB "doesn't work" because of short string optimization (sso). Due to sso the raw pointer points to the memory directly allocated by the vector, which is lost after reallocation.
The second case of UB "works" because the string text is too long for SSO and resides on an independent memory block. During resize the string object is moved from, moving the ownership of the memory block of the text to the newly created string object. Since the block of memory simply changes ownership, it remains valid after emplace_back.

std::string::c_str() :
The pointer returned may be invalidated by further calls to other member functions that modify the object.
std::vector::emplace_back :
If a reallocation happens, all contained elements are modified.
Since there is no way to know whether a vector reallocation is going to happen when calling emplace_back you have to assume that subsequent use of the earlier return value from string::c_str() leads to undefined behavior.
Since undefined behavior is - undefined - anything can happen. Hence, your code may seem to work or it may seem to fail. It's in error either way.

Related

What happens to the memory in a vector when the vector is re-assigned

What would happen to the vector myv and the memory that it was using if it gets re-assigned to a different vector? More importantly, what happens when you pass a vector as a reference to a function that re-assigns the vector? Is it just a copy type operation, or does the passed in vector reference the right hand side vector? What if that right hand side vector goes out of scope?
TLDR: what does this actually accomplish?
#include <iostream>
#include <string>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
std::vector<std::vector<int> > vectors;
void SetVector(std::vector<int> &v){
v = vectors.at(0);
}
int main()
{
std::vector<int> ints;
ints.push_back(1);
ints.push_back(1);
ints.push_back(1);
vectors.push_back(ints);
std::vector<int> myv;
myv.push_back(6);
myv.push_back(6);
myv.push_back(6);
myv.push_back(6);
SetVector(myv);
for(size_t i = 0; i < myv.size(); i++){
std::cout << myv[i] << std::endl;
}
}

This will call the assignment operator of the vector you pass into SetVector. It's equivalent to doing myv = vectors.at(0). The vector implementation will handle managing the memory of myv (freeing if needed), before copying the values in vectors.at(0) to it.

What would happen to [...] the memory that [the vector] was using if it gets re-assigned to a different vector?
A vector's memory management is a black box. If you think your program's correctness hinges upon such a question, you probably are asking the wrong question. Focus on the behavior of a vector, not how that behavior is implemented.
What would happen to the vector myv [...] if it gets re-assigned to a different vector?
In this case, myv becomes a copy of that other vector. The length of myv is adjusted (if needed) to match the other vector, and each element of the other vector is copied to the corresponding element in myv.
I should note that it is possible for the assignment operator to move the contents of the other vector to myv instead of copying. However, your sample code indicates copy-assignment rather than move-assignment so I'll stick to assuming copying instead of moving. (Moving is similar to copying with the additional caveat that the object being assigned from is left in an indeterminate, but valid, state. However, that's a separate topic.)
More importantly, what happens when you pass a vector as a reference to a function that re-assigns the vector?
There is nothing special about vectors in this context. If you have a reference to an object, then that reference is equivalent to the variable naming the object (modulo lifetime considerations); assigning a value to that reference is equivalent to assigning a value to the original object. In this case, the original vector becomes a copy of the other vector.
Is it just a copy type operation, or does the passed in vector reference the right hand side vector?
I'm not sure how to parse this. Perhaps you would be re-assured being told that a reference variable cannot be bound to a different object after initialization? See also What are the differences between a pointer variable and a reference variable in C++?
What if that right hand side vector goes out of scope?
None of the information provided indicates that this has any detrimental effect.

Using a reference member out of scope

This question concerns the function stack and reference members (which I read are considered bad practice in general). My test code:
#include <iostream>
using namespace std;
struct Person
{
Person(const int& s) : score(s) {}
const int& score;
};
int main()
{
Person p(123);
cout << "P's score is: " << p.score << endl;
return 0;
}
We create an integer object in Person's constructor. A template object is created because of converting int into &int (and that's why we need const). Then we set score point to the constructor's argument. Finally, we exit the constructor and the argument is destroyed.
Output:
P's score is: 123
How come we are still getting the value 123 if the argument was destroyed? It would make sense to me if we copied the argument to the member. My logic tells me the member would point to an empty location which is obviously incorrect. Maybe the argument is not really destroyed but instead it just goes out of scope?
This question arose when I read this question: Does a const reference prolong the life of a temporary?
I find Squirrelsama's answer clear and I thought I understood it until I tried this code.
Update 2/12/2018:
More information about this:
What happens when C++ reference leaves it's scope?
Update 2/18/2018:
This question was made in not clear understanding of how references, pointers and dynamic memory work in C++. Anyone struggling with this, I recommend reading about those.

How come we are still getting the value 123 if the argument was destroyed?
Because nothing guarantees you won't. In C++, accessing an object whose lifetime has ended (and your temporary is dead when you access it) results in undefined behavior. Undefined behavior doesn't mean "crash", or "get empty result". It means the language specification doesn't prescribe an outcome. You can't reason about the results of the program from a pure C++ perspective.
Now what may happen, is that your C++ implementation reserves storage for that temporary. And even though it may reuse that location after p is initialized, it doesn't mean it has to. So you end up reading the "proper value" by sheer luck.

By storing a reference in your object, the only guarantee you have is that you keep track of the object, as long as the object is valid. When the object is not valid anymore, you have access to something not valid anymore.
In your example you allocate a temporary object (123) somewhere, and you keep track of the object, via the reference mechanism. You do not have any guarantee the object you are tracking is still valid when you use this reference.

Modifying a vector reference. What gets invalidated?

Assume I have the following code:
void appendRandomNumbers(vector<double> &result) {
for (int i = 0; i < 10000; i++) {
result.push_back(rand());
}
}
vector<double> randomlist;
appendRandomNumbers(randomlist);
for (double i : randomlist) cout << i << endl;
The repeated push_back() operations will eventually cause a reallocation and I suspect a memory corruption.
Indeed, the vector.push_back() documentation says that
If a reallocation happens, all iterators, pointers and references related to the container are invalidated.
After the reallocation happens, which of the scopes will have a correct vector? Will the reference used by appendRandomNumbers be invalid so it pushes numbers into places it shouldn't, or will the "correct" location be known by appendRandomNumbers only and the vector is deleted as soon as it gets out of scope?
Will the printing loop iterate over an actual vector or over a stale area of memory where the vector formerly resided?
Edit: Most answers right now say that the vector reference itself should be fine. I have a piece of code similar to the one above which caused memory corruption when I modified a vector received by reference and stopped having memory corruption when I changed the approach. Still, I cannot exclude that I incidentally fixed the real reason during the change. Will experiment on this.

I think you are confused on what is going on. push_back() can invalidate iterators and references that point to objects in the vector, not the vector itself. In you situation there will be no invalidation and your code is correct.

The reference vector<double> &result will be fine, the problem would be if you had something referencing the underlying memory such as
double& some_value = result[74];
result.push_back(); // assume this caused a reallocation
Now some_value is referencing bad memory, the same would occur with accessing the underlying array using data
double* values = result.data();
result.push_back(); // again assume caused reallocation
Now values is pointing at garbage.

I think you're confused about what gets invalidated. Everything in your example is perfectly behaving code. The issue is when you keep references to data that the vector itself owns. For instance:
vector<double> v;
v.push_back(x);
double& first = v[0];
v.push_back(y);
v.push_back(z);
v.push_back(w);
cout << first;
Here, first is a reference to v's internal data - which could get invalidated by one of the push_back()s and unless you specifically accounted for the additional size, you should assume that it was invalidated so the cout is undefined behavior because first is a dangling reference. That's the sort of thing you should be worried about - not situations where you pass the whole vector itself by reference.

C++: Issue with clearing a vector of objects

EDIT: It seems this is a problem with using memset on my struct, instead clearing the vector. Thanks to all that have provided advice!
I'm attempting to clear my vector of Subject objects (my own defined class) called people. The vector sits in a struct (pQA) and is defined as the following:
typedef struct _FSTRUCT_
{
const char * filePath;
std::vector<Subject> people;
long srcImageWidth;
long srcImageHeight;
STRUCT_CONFIG_PARAMS * configParam;
unsigned char * imageBuf;
int imageBufLen;
} STRUCT_FSTRUCT;
I am creating the pQA struct by:
STRUCT_FSTRUCT *pQA = NULL;
pQA = new STRUCT_FSTRUCT();
memset(pQA,0,sizeof(STRUCT_FSTRUCT));
I populate 'people' with data by using the Subject class' set methods. This is all fine. What I am wanting to do is then reset 'people', i.e. clear out all data and set the size to 0. I call the below method:
int ResetFaceCollection()
{
if (!pQA->people.empty())
{
pQA->people.clear();
}
}
The clear() line throws a debug assertion failed error message which states "Expression: vector iterators incompatible".
I'm not sure if this has anything to do with Subject's destructor:
Subject::~Subject(void)
{
}
I'm not using any pointers, so from what I've gathered, the destructor looks OK. I have, of course, defined the destructor in my .h file also ~Subject(void);.
I'm a bit lost as to why this is happening. Can anyone provide some insight?
I apologize if I'm omitted any necessary code, can update upon request!

Your std::memset call is (a) redundant, as
pQA = new STRUCT_SPID_QA(); // <---- note the parens
value-initializes the object, which initializes integers to 0 and pointers to nullptr here.
and (b) actually very wrong:
If the object is not trivially-copyable (e.g., scalar, array, or a C-compatible struct), the behavior is undefined.
Source
Your _SPID_FQA_ contains non trivially copyable object of type std::vector<Subject>, which makes _SPID_FQA_ non trivially copyable.

Note: firtly OPs didn't showed that he is using memset some where in his code that's y i gave this answer, as i thought this weired behavior is because of some problem in clear as mentioned in below links.
1) cppreference.com: says that it Invalidates any references, pointers, or iterators referring to contained elements. May invalidate any past-the-end iterators.
Leaves the capacity() of the vector unchanged.
2) cplusplus.com says that : A reallocation is not guaranteed to happen, and the vector capacity is not guaranteed to change due to calling this function. A typical alternative that forces a reallocation is to use swap :
vector<T>().swap(x); // clear x reallocating
but you can use this also:
int ResetFaceCollection()
{
if (!pQA->people.empty())
{
pQA->people.erase(pQA->people.begin(),pQA->people.end());
}
}
And check if it is giving any error?
here is the probably same environment and working fine with g++, clang, VC++ link

Is it wrong to dereference a pointer to get a reference?

I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value. And I feel dirty converting back to a reference, it just seems wrong.
Is it?
To clarify...
MyType *pObj = ...
MyType &obj = *pObj;
Isn't this 'dirty', since you can (even if only in theory since you'd check it first) dereference a NULL pointer?
EDIT: Oh, and you don't know if the objects were dynamically created or not.

Ensure that the pointer is not NULL before you try to convert the pointer to a reference, and that the object will remain in scope as long as your reference does (or remain allocated, in reference to the heap), and you'll be okay, and morally clean :)

Initialising a reference with a dereferenced pointer is absolutely fine, nothing wrong with it whatsoever. If p is a pointer, and if dereferencing it is valid (so it's not null, for instance), then *p is the object it points to. You can bind a reference to that object just like you bind a reference to any object. Obviously, you must make sure the reference doesn't outlive the object (like any reference).
So for example, suppose that I am passed a pointer to an array of objects. It could just as well be an iterator pair, or a vector of objects, or a map of objects, but I'll use an array for simplicity. Each object has a function, order, returning an integer. I am to call the bar function once on each object, in order of increasing order value:
void bar(Foo &f) {
// does something
}
bool by_order(Foo *lhs, Foo *rhs) {
return lhs->order() < rhs->order();
}
void call_bar_in_order(Foo *array, int count) {
std::vector<Foo*> vec(count); // vector of pointers
for (int i = 0; i < count; ++i) vec[i] = &(array[i]);
std::sort(vec.begin(), vec.end(), by_order);
for (int i = 0; i < count; ++i) bar(*vec[i]);
}
The reference that my example has initialized is a function parameter rather than a variable directly, but I could just have validly done:
for (int i = 0; i < count; ++i) {
Foo &f = *vec[i];
bar(f);
}
Obviously a vector<Foo> would be incorrect, since then I would be calling bar on a copy of each object in order, not on each object in order. bar takes a non-const reference, so quite aside from performance or anything else, that clearly would be wrong if bar modifies the input.
A vector of smart pointers, or a boost pointer vector, would also be wrong, since I don't own the objects in the array and certainly must not free them. Sorting the original array might also be disallowed, or for that matter impossible if it's a map rather than an array.

No. How else could you implement operator=? You have to dereference this in order to return a reference to yourself.
Note though that I'd still store the items in the STL container by value -- unless your object is huge, overhead of heap allocations is going to mean you're using more storage, and are less efficient, than you would be if you just stored the item by value.

My answer doesn't directly address your initial concern, but it appears you encounter this problem because you have an STL container that stores pointer types.
Boost provides the ptr_container library to address these types of situations. For instance, a ptr_vector internally stores pointers to types, but returns references through its interface. Note that this implies that the container owns the pointer to the instance and will manage its deletion.
Here is a quick example to demonstrate this notion.
#include <string>
#include <boost/ptr_container/ptr_vector.hpp>
void foo()
{
boost::ptr_vector<std::string> strings;
strings.push_back(new std::string("hello world!"));
strings.push_back(new std::string());
const std::string& helloWorld(strings[0]);
std::string& empty(strings[1]);
}

I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value.
Just to be clear: STL containers were designed to support certain semantics ("value semantics"), such as "items in the container can be copied around." Since references aren't rebindable, they don't support value semantics (i.e., try creating a std::vector<int&> or std::list<double&>). You are correct that you cannot put references in STL containers.
Generally, if you're using references instead of plain objects you're either using base classes and want to avoid slicing, or you're trying to avoid copying. And, yes, this means that if you want to store the items in an STL container, then you're going to need to use pointers to avoid slicing and/or copying.
And, yes, the following is legit (although in this case, not very useful):
#include <iostream>
#include <vector>
// note signature, inside this function, i is an int&
// normally I would pass a const reference, but you can't add
// a "const* int" to a "std::vector<int*>"
void add_to_vector(std::vector<int*>& v, int& i)
{
v.push_back(&i);
}
int main()
{
int x = 5;
std::vector<int*> pointers_to_ints;
// x is passed by reference
// NOTE: this line could have simply been "pointers_to_ints.push_back(&x)"
// I simply wanted to demonstrate (in the body of add_to_vector) that
// taking the address of a reference returns the address of the object the
// reference refers to.
add_to_vector(pointers_to_ints, x);
// get the pointer to x out of the container
int* pointer_to_x = pointers_to_ints[0];
// dereference the pointer and initialize a reference with it
int& ref_to_x = *pointer_to_x;
// use the reference to change the original value (in this case, to change x)
ref_to_x = 42;
// show that x changed
std::cout << x << '\n';
}
Oh, and you don't know if the objects were dynamically created or not.
That's not important. In the above sample, x is on the stack and we store a pointer to x in the pointers_to_vectors. Sure, pointers_to_vectors uses a dynamically-allocated array internally (and delete[]s that array when the vector goes out of scope), but that array holds the pointers, not the pointed-to things. When pointers_to_ints falls out of scope, the internal int*[] is delete[]-ed, but the int*s are not deleted.
This, in fact, makes using pointers with STL containers hard, because the STL containers won't manage the lifetime of the pointed-to objects. You may want to look at Boost's pointer containers library. Otherwise, you'll either (1) want to use STL containers of smart pointers (like boost:shared_ptr which is legal for STL containers) or (2) manage the lifetime of the pointed-to objects some other way. You may already be doing (2).

If you want the container to actually contain objects that are dynamically allocated, you shouldn't be using raw pointers. Use unique_ptr or whatever similar type is appropriate.

There's nothing wrong with it, but please be aware that on machine-code level a reference is usually the same as a pointer. So, usually the pointer isn't really dereferenced (no memory access) when assigned to a reference.
So in real life the reference can be 0 and the crash occurs when using the reference - what can happen much later than its assignemt.
Of course what happens exactly heavily depends on compiler version and hardware platform as well as compiler options and the exact usage of the reference.
Officially the behaviour of dereferencing a 0-Pointer is undefined and thus anything can happen. This anything includes that it may crash immediately, but also that it may crash much later or never.
So always make sure that you never assign a 0-Pointer to a reference - bugs likes this are very hard to find.
Edit: Made the "usually" italic and added paragraph about official "undefined" behaviour.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js