C++, threads, and pointers - c++

I am using std::thread to execute multiple threads. I pass a pointer to an array as an argument, something akin to:
my_type* rest[count];
//Fill rest array
std::thread(fnc, rest, count);
The issue I seem to be having, is that somewhere along the way, the pointer values in 'rest' get corrupted. I print out the pointer values before the call to std::thread, and first thing in the function that std::thread calls on my behalf, and the values do not match. It seems fairly random, sometimes they will match, and sometimes not (and a segfault results when the latter happens).
I know (from what little I could find on the topic) that std::thread copies the arguments, and I am thinking that my issue stems from this, and that there is a special function std::ref() that allows it to pass references, but none of them mention pointers specifically. I have tried various techniques to attempt to pass this array with std::ref() but I have yet to solve the issue.
Am I correct in thinking that this could be the cause of my issue, or am I barking up the wrong tree?

if gets converted in some fashion (the array pointer, not the contents) then I would have a problem.
Yes, that's exactly what happens.
It's often incorrectly said that arrays are just pointers. The truth of the matter is that whenever you declare a function that takes an array:
void foo(int x[10]);
The declaration is 'adjusted' so that the parameter is a pointer:
void foo(int *x); // C++ can't tell the difference between this and the first declaration
and when you call the function:
int x[10];
foo(x);
There's an implicit conversion equivalent to the following:
int x[10];
int *tmp = &x[0];
foo(tmp);
So what happens is that you have a block of memory containing your pointers to long lived objects:
my_type *rest[count] = {new my_type, new my_type, new my_type};
You pass a pointer to that block of memory to the thread:
thread(fnc, &rest[0], count);
Then when the function returns rest goes out of scope, and that block of memory is no longer valid.
Then the thread follows the pointer to the block of memory and reads garbage. If by some chance it does read the correct array contents then it can access the long lived objects just fine. The problem is getting the pointers to the long lived objects from the corrupt block of memory where rest used to be on the stack.
Is there a way to suppress this behavior?
In most cases the only thing that makes sense is not use raw arrays as function parameters. You can wrap a raw array in a struct and get the sensible behavior:
struct int_array {
int x[10];
};
void foo(int_array x);
int main() {
int_array x = {1,2,3,4,5,6,7,8,9,0};
foo(x); // the array is copied rather than getting strangely converted
}
This is pretty much exactly what std::array does, so you're better off using it.
In cases where you don't want a copy of the array you can take a reference to the array:
int foo(int (&x)[10]);
This gives you essentially the same behavior as the weird 'adjustments' and implicit conversions that are done behind your back with int foo(int x[10]); foo(x);. The benefit here is that it's explicit and that you get type checking on the size of the array. That is, due to the 'adjustment' the following does not result in a compiler error:
int foo(int x[10]);
int x[3];
foo(x);
Whereas this will:
int foo(int (&x)[10]);
int x[3];
foo(x); // the implicit conversion to &x[0] does not get happen when the function takes a reference to array

Just so you be aware of the risk of your code, try execute this:
#include <thread>
#include <iostream>
void f() { std::cout << "hello" << std::endl; }
int main()
{
{
auto t = std::thread(f);
std::cout << "0" << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "1" << std::endl;
}
std::cout << "2" << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(400));
std::cout << "3" << std::endl;
}
You'll see 2 and 3 never get to output, because the application is aways terminated.
In truth, it's more subtle, since at my sample, I've moved the thread to t. Resembling your original sample and not assigning the thread to any variable, there's no early termination, but "hello" never gets to output. (probably there was an optimization to eliminate the temporary, since it's never used; it just got destroyed before getting joinable; or who knows...)

Related

Using std::vector from an object referenced by shared_ptr after shared_ptr's destruction

I apologise if the title is different from what I will be describing, I don't quite know how to describe it apart from using examples.
Suppose I have a shared_ptr of an object, and within that object, is a vector. I assign that vector to a variable so I can access it later on, and the shared_ptr gets destroyed as it goes out of scope. Question, is the vector I saved "safe" to access?
In the example below, from main(), outer() is called, and within outer(), inner() is called. inner() creates a shared_ptr to an object that contains a std::vector, and assigns it to a variable passed by reference. The role of outer() is to create some form of seperation, so that we know that the shared_ptr is destroyed. In main(), this referenced variable is accessed, but is it safe to use this variable?
#include <iostream>
#include <vector>
#include <memory>
struct sample_compound_obj {
std::vector<int> vektor;
sample_compound_obj(){std::cout << "I'm alive!" << std::endl;};
~sample_compound_obj(){std::cout << "Goodbye, thank you forever!" << std::endl;};
};
bool inner(std::vector<int>& input) {
std::cout << "About to create sample_compound_obj..." << std::endl;
std::shared_ptr<sample_compound_obj> hehe(new sample_compound_obj);
hehe->vektor.push_back(1);
hehe->vektor.push_back(2);
hehe->vektor.push_back(3);
input = hehe->vektor;
std::cout << "About to return from inner()..." << std::endl;
return true;
}
bool outer(std::vector<int>& input) {
std::cout << "About to enter inner()..." << std::endl;
inner(input);
std::cout << "About to return from outer()..." << std::endl;
return true;
}
int main() {
std::cout << "About to enter outer()..." << std::endl;
std::vector<int> vector_to_populate;
outer(vector_to_populate);
for (std::vector<int>::iterator it = vector_to_populate.begin(); it != vector_to_populate.end(); it++) {
std::cout << *it <<std::endl; // <-- is it even "safe" to access this vector
}
}
https://godbolt.org/z/47EWfPGK3
To avoid XY problem, I first thought of this issue when I was writing some ROS code, where a subscriber callback passes by reference the incoming message as a const shared_ptr&, and the message contains a std::vector. In this callback, the std::vector is assigned (via =) to a global/member variable, to be used some time later, after the end of the callback, so presumably the original shared_ptr is destroyed. One big difference is that in my example, I passed the std::vector by reference between the functions, instead of a global variable, but I hope it does not alter the behavior. Question is, is the std::vector I have "saved", suitable to be used?
is it safe to use this variable?
Yes, in the below statement, you copy the whole vector using the std::vector::operator= overload doing copy assignment. The two vectors do not share anything and live their separate lives and can be used independently of each other.
input = hehe->vektor;
In this case it's safe, because you get copy of the vector in this line:
input = hehe->vektor;
One big difference is that in my example, I passed the std::vector by reference between the functions, instead of a global variable, but I hope it does not alter the behavior.
Any reference can be bound only once, and in your scenario input reference is already bound to the argument passed (to be precise std::vector<int>& input of inner function is bound to std::vector<int>& input of outer function which itself is bound to std::vector<int> vector_to_populate). After a reference is bound, it acts as is object itself, so in the assignment statement you actually end up with calling something like this:
input.operator=(hehe->vektor);
Where operator= refers to the std::vector<T>::operator=(const std::vector<T> rhs) function.
If you're familiar with the RAII concept and C++ references, you can pretty much skip the following explanations
In C++, vectors, and as a matter of fact, pretty much all structs, defined in std work very differently from what you might be used to in other higher level languages, like Java or C#. In C++, the RAII (resource acquisition is initialization) technique is used. I highly recommend that you actually read the article, but in short, it means that an object will define a constructor and a destructor, that allocate and free all memory used by the object, in your case, a std::vector, and the language is going to call the destructor when the object falls out of scope. This ensures that there are no memory leaks.
However, how would we go about passing a std::vector to a function, for example? Well, if we just made a straight up copy of the object, byte by byte, as we'd do in C, the function would run fine, until we reach the end of the function, and we call the destructor of the vector. In that case, after the function executes, when we go back to the caller, the vector is no longer valid, because its data got freed by the callee.
void callee(std::vector<int> vec) { }
void caller() {
std::vector<int> vec;
vec.push_back(10);
callee(vec);
vec.push_back(10); // This will break, with our current logic
}
Well, the keyword here is "copy". We copied the vector so that we can pass it in the callee function. We can solve this issue by creating custom copy behavior, in C++ terms that would be a copy constructor. In simple terms, a copy constructor takes an instance of the type itself as an argument, and copies it in the current instance. This allows us now to execute the code from above without any issues.
There are a lot more intricacies to it than I've written here, but other people have said it better than I have. In short, whenever you try to make an assignment, or pass an argument, you utilize the copy constructor (with some exceptions). In your case, you assign a vector variable with a vector value. This vector value gets copied, and so is valid until the function goes out of scope. There is a catch to that thou: if you try to modify the vector, you will modify only the copy, not the original. If you want to do so, you'll need to utilize references.
In C++, we have the concept of references. You can consider the reference something like a pointer, with some caveats to it. First of all, unlike pointers, you can't have a reference to a reference. The reference tells C++ that you don't work with the object itself, but an "alias" of the object. The object will exist in one place in the memory, but you will be able to access it in two places:
int a = 10;
int &ref_a = a;
std::cout << ref_a << ", " << a << "\n"; // 10, 10
ref_a = 5;
std::cout << ref_a << ", " << a << "\n"; // 5, 5
a = 8;
std::cout << ref_a << ", " << a << "\n"; // 8, 8
In the line int &ref_a = a, we don't actually copy a, but we tell ref_a that it is a reference of a. This means that any operations (including assignment) will be applied to a, not to ref_a. We can use references with variables, fields, return values and parameters. A reference is valid as long as the value it refers to is valid, so as soon as the value goes out of scope, the reference is no longer valid.
References can be used in parameters, in order to avoid copying the value. This can provide a lot of performance benefits, since we don't need to copy the value, but just pass a "pointer" to it. Of course, this means that if the function modifies the parameter, that is reflected in the caller:
void func(int &ref) {
ref = 5;
}
void func2() {
int a = 10;
func(a);
std::cout << a; // 5
}
TL; DR
In your case, you're returning a vector via an out parameter (reference parameter). Still, even if you're working with references, setting a reference will actually set the object behind the reference, and will use the copy constructor of the object. You can avoid that by making that a reference of a pointer to a vector, but working with pointers in C++ is strongly advised against. Regardless, the answer to your question is that this code is completely safe. Still, if you try to modify input in outer, you won't modify the vector in hehe, but instead you're going to modify the copy that inener has created.

Why do references passed as arguments work for modifying variables via a function?

As I got stuck between whether shortening the question or clarifying it, I do not think I can be understood with the question sentence. I tried to explain myself with examples here:
Let's see a non-working void function for decreasing a variable by 1 :
#include <iostream>
using namespace std;
void decrease (int a)
{
a--;
}
int main ()
{
int x = 17;
decrease (x);
cout << x << endl;
}
As expected it prints 17. The reason is super clear because when a variable is passed to a function, actually what it is passed is a copy of the original variable (they hold same values, but they are in different addresses) and therefore all the modifications are done to the copy of that variable, and the copy is destroyed after function terminates.
What we need here is pointers. If we modify the above code piece like this :
#include <iostream>
using namespace std;
void decrease (int* a)
{
(*a)--;
}
int main ()
{
int x = 17;
decrease (&x);
cout << x << endl;
}
It works because what we passed in the line decrease (&x) is a copy of the x's address. The decrement is done to the value of our copy, and because the value of it is equivalent to x's address the decrement effects address' value, which is the value the original x holds.
So far all is understandable. But when we solve the issue by references like below :
#include <iostream>
using namespace std;
void decrease (int&a)
{
a--;
}
int main ()
{
int x = 17;
int& y = x;
decrease (y);
cout << x << endl;
}
How is this situation explained? When we decrease the copy of the reference of x, isn't it the same with trying to decrease the copy of x?
If so, what makes references work here? Why does it work when the first code piece doesn't work?
How does it modify the actual value of x?
NOTE : I found a post of user #Angew is no longer proud of SO stating that whether a reference takes memory or not is unspecified. As he stated, if we consider that references take memory in this kind of implementation and references behave like pointers I think it gets explainable, but still confused with the usage.
EDIT : I was using C and I decided to move on with C++, that is why I can't get the logic of references.
Lets make this simple:
The meaning of "reference" is to refer to a specific variable.
You tell the compiler: "I want to work with this variable under another name, not make a copy".
How the compiler accomplishes this, is up to whoever writes the compiler (that is what the question you linked tries to explain).
But the meaning of the syntax is "use the variable it self".
As for pointers, they just hold an address.
It can be any address, and it is really just a number.
You can do math with it, you can reinterpret the data stored there in to different types, and so on.
The use of pointers you demonstrate is just one single use out of many possible ones.
Another way to think about it is when a variable is passed to the function by reference, it is as if the variable was global, and the function could just reach an touch it but using the name of the parameter (as an alias) instead of the original variable name.

Unique pointer still holds the object after moving

I'm going through some tutorials on how smart pointers work in C++, but I'm stuck on the first one I tried: the unique pointer. I'm following guidelines from wikipedia, cppreference and cplusplus. I've also looked at this answer already. A unique pointer is supposed to be the only pointer that has ownership over a certain memory cell/block if I understood this correctly. This means that only the unique pointer (should) point to that cell and no other pointer. From wikipedia they use the following code as an example:
std::unique_ptr<int> p1(new int(5));
std::unique_ptr<int> p2 = p1; //Compile error.
std::unique_ptr<int> p3 = std::move(p1); //Transfers ownership. p3 now owns the memory and p1 is rendered invalid.
p3.reset(); //Deletes the memory.
p1.reset(); //Does nothing.
Until the second line, that worked fine for me when I test it. However, after moving the first unique pointer to a second unique pointer, I find that both pointers have access to the same object. I thought the whole idea was for the first pointer to be rendered useless so to speak? I expected a null pointer or some undetermined result. The code I ran:
class Figure {
public:
Figure() {}
void three() {
cout << "three" << endl;
}
};
class SubFig : public Figure {
public:
void printA() {
cout << "printed a" << endl;
}
};
int main()
{
unique_ptr<SubFig> testing (new SubFig());
testing->three();
unique_ptr<SubFig> testing2 = move(testing);
cout << "ok" << endl;
int t;
cin >> t; // used to halt execution so I can verify everything works up til here
testing->three(); // why is this not throwing a runtime error?
}
Here, testing has been moved to testing2, so I'm surprised to find I can still call the method three() on testing.
Also, calling reset() doesn't seem to delete the memory like it said it would. When I modify the main method to become:
int main()
{
unique_ptr<SubFig> testing (new SubFig());
testing->three();
unique_ptr<SubFig> testing2 = move(testing);
cout << "ok" << endl;
int t;
cin >> t;
testing.reset(); // normally this should have no effect since the pointer should be invalid, but I added it anyway
testing2.reset();
testing2->three();
}
Here I expect three() not to work for testing2 since the example from wikipedia mentioned the memory should be deleted by resetting. I'm still printing out printed a as if everything is fine. That seems weird to me.
So can anyone explain to me why:
moving from one unique pointer to another unique pointer doesn't make the first one invalid?
resetting does not actually remove the memory? What's actually happening when reset() is called?
Essentially you invoke a member function through a null pointer:
int main()
{
SubFig* testing = nullptr;
testing->three();
}
... which is undefined behavior.
From 20.8.1 Class template unique_ptr (N4296)
4 Additionally, u can, upon request, transfer ownership to another
unique pointer u2. Upon completion of such a transfer, the following
postconditions hold:
u2.p is equal to the pre-transfer u.p,
u.p is equal to nullptr, and
if the pre-transfer u.d maintained state, such state has been transferred to u2.d.
(emphasis mine)
After the std::move() the original pointer testing is set to nullptr.
The likely reason std::unique_ptr doesn't check for null access to throw a runtime error is that it would slow down every time you used the std::unique_ptr. By not having a runtime check the compiler is able to optimize the std::unique_ptr call away entirely, making it just as efficient as using a raw pointer.
The reason you didn't get a crash when calling the nullptr is likely because the function you called doesn't access the (non-existent) object's memory. But it is undefined behavior so anything could happen.
On calling std::unique_ptr<int> p3 = std::move(p1); your original pointer p1 is in undefined state, as such using it will result in undefined behavior. Simply stated, never ever do it.

Return newly allocated pointer or update the object through parameters?

I'm actually working on pointers to user-defined objects but for simplicity, I'll demonstrate the situation with integers.
int* f(){
return new int(5);
}
void f2(int* i){
*i = 10;
}
int main(){
int* a;
int* b = new int();
a = f();
f2(b);
std::cout << *a << std::endl; // 5
std::cout << *b << std::endl; // 10
delete b;
delete a;
return 0;
}
Consider that in functions f() and f2() there are some more complex calculations that determine the value of the pointer to be returned(f()) or updated through paramteres(f2()).
Since both of them work, I wonder if there is a reason to choose one over the other?
From looking at the toy code, my first thought is just to put the f/f2 code into the actual object's constructor and do away with the free functions entirely. But assuming that isn't an option, it's mostly a matter of style/preference. Here are a few considerations:
The first is easier to read (in my opinion) because it's obvious that the pointer is an output value. When you pass a (nonconst) pointer as a parameter, it's hard to tell at a glance whether it's input, output or both.
Another reason to prefer the first is if you subscribe to the school of thought that says objects should be made immutable whenever possible in order to simplify reasoning about the code and to preclude thread safety problems. If you're attempting that, then the only real choice is for f() to create the object, configure it, and return const Foo* (again, assuming you can't just move the code to the constructor).
A reason to prefer the second is that it allows you to configure objects that were created elsewhere, and the objects can be either dynamic or automatic. Though this can actually be a point against this approach depending on context--sometimes it's best to know that objects of a certain type will always be created and initialized in one spot.
If the allocation function f() does the same thing as new, then just call new. You can do whatever initialisation in the object's construction.
As a general rule, try to avoid passing around raw pointers, if possible. However that may not be possible if the object must outlive the function that creates it.
For a simple case, like you have shown, you might do something like this.
void DoStuff(MyObj &obj)
{
// whatever
}
int Func()
{
MyObj o(someVal);
DoStuff(o);
// etc
}
f2 is better only because the ownership of the int is crystal clear, and because you can allocate it however you want. That's reason enough to pick it.

C++ Parameter Reference

void (int a[]) {
a[5] = 3; // this is wrong?
}
Can I do this so that the array that is passed in is modified?
Sorry for deleting, a bit new here...
I have another question which might answer my question:
If I have
void Test(int a) {
}
void Best(int &a) {
}
are these two statements equivalent?
Test(a);
Best(&a);
void Test(int a[])
{
a[5] = 3;
}
just alternate syntax for:
void Test(int* a)
{
*(a+5) = 3;
}
No array is passed, just a pointer. The original array is modified.
As for your second revision, given:
void Test(int a)
{
}
void Best(int &a)
{
}
then
Test(aa); // Passes aa by value. Changes to a in Test() do not effect aa
Best(aa); // Passes aa by reference; Changes to a DO effect aa
Best(&aa); // Is a syntax error: Passing a pointer instead of an int.
If you get the variable not by reference and not by pointer, it means that the function is essentially isolated, getting an ad-hoc copy of a. No matter what you do (without trying to hack the stack or things like that) you wouldn't have access to that value in the calling context.
If you know something about the calling context, you may be able to do things based on some anticipation of stack contents, but it's generally a bad idea.
If your method takes a[] which is essentially a*, then yes, you can alter the contents of the cell that a points to, but you won't be able to alter a (the pointer) itself to point at something else.
Nope.
Your options for altering a value from outside the function are call by reference f(int& a), call by pointer f(int* a), and using a global (shudder...) variable.
Read the answer given here about the difference of int[] and int* in a parameter list: Difference between char* and char[] . I've really put so much love into that answer! :)
Regarding your question about your Test and Best functions, James Curran provided an excellent answer.
Your original Function should work.
If you give it a name:
#include <iostream>
// Arrays always de-generate to pointers.
void plop(int a[]) // Make sure this function has a name.
{
a[5] = 3;
}
int main()
{
int test[] = { 1,1,1,1,1,1,1,1};
plop(test);
std::cout << test[5] << std::endl;
}
This is because arrays always de-generate into pointers when passed as an argument to a function. So this should always work as expected. Assuming you don't index beyond the end of the array. Inside plop there is no way to determine the size of the array passed.
The primary motivator for passing arrays by reference is to prevent stack overflows and needless copying of large objects. For example, imagine if I had a function like this:
void foo(int x[500000000000]);
The stack would probably overflow the first time you called the function if all arrays were passed by value (but of course this is an obvious exaggeration).
This will become useful when using object-oriented methods. Suppose instead of an array, you had this:
void foo(SomeClass x);
where SomeClass is a class with 500000000000 data members. If you called a method like this, the compiler would copy x bit by bit, which would be a very long process to say the least. The same concept as you use in arrays still applies, but you have to specify that this is to be used by reference manually:
void foo(SomeClass &x);
(and don't go trying to create a 500000000000 element array to begin with unless you have a 64 bit machine and lots of RAM)