Thread function with passed by reference vector is slow to start

Thread function with passed by reference vector is slow to start - c++

I've been looking at C++0x threads and have this code:
#include <vector>
#include <iostream>
#include <thread>
void TestFunc(const vector<int>& vVec)
{
cout << "in"<<endl;
}
int main()
{
int sizer = 400000000;
vector<int> vTest(sizer);
for(int f=0; f<sizer; f++)
vTest[f] = f;
cout << "V created." << endl;
thread one(TestFunc, vTest);
one.join();
}
As you can see it just passes a vector to a thread.
The thing I don't understand is that there is a pause after the message "V created" appears. Originally this (I assumed) was the vector being copied for use in the function.
To stop this I passed by reference instead but this made no difference.
The delay seems to be proportional to the size of the vector which indicates that it is still copying (or doing something with the array).
If I try the same experiment without threads and just call the function directly the delay is there when passing by value but not when passing by reference as I expected.
I tried the same using Boost threads instead of C++0x (even though I've read that they are much the same) and got the same result.
Is there some reason for this behaviour or have I missed something blindingly obvious?
Thanks.
Sorry, posted the wrong test code. Corrected.
Edit: Added includes as requested.
Compiled with:
g++44 -std=c++0x -lpthread tester.cpp -o test
...as I have GNU 4.4 installed along side the standard GNU compiler that comes with my Linux (CentOS) which doesn't support C++11.

I'm just speculating, since you haven't posted the version of the code that uses threads, but I would suspect your problem is that, by default, std::bind (or boost::bind) make copies of all the arguments you bind. To avoid this, you can use std::ref or std::cref.
To make this concrete, you're probably using bind like this:
std::bind(TestFunc, vTest)
Instead, you should use it like this:
std::bind(TestFunc, std::cref(vTest));

Where are the threads here? Looks like the for loop is causing the delay you are referring to. Nothing unusual here - as you are assigning a vector of size 200000000.

Related

Why thread still work with deleted variable?

I expect to crash program with this code:
void f(int& ref)
{
for (auto i{0}; i < 0xfffff; ++i)
{
std::clog << i << ":" << ref++ << std::endl;
}
}
void run(void)
{
int n = 10;
std::thread(f, std::ref(n)).detach();
}
int main(void)
{
run();
std::this_thread::sleep_for(3s);
}
I have GCC 9.3 and compile above program with default parameters. When i run the program i expect to crash which in void f(int&); function we no longer have local int n; variable decelered in void run(void); function, but it's clearly run the program and increase ref variable each time and printed until the 3 second sleep in main function get over. Where i do wrong ?

Your code has undefined behavior because you are using a reference to access an object whose lifetime already ended. There are some special rules regarding extension of lifetime when binding to a const reference, but that does not apply here.
The C++ standard never guarantees that your program will crash if you make a mistake. The C++ standard only specifies what you get when you compile and run valid code. There would be no gain from specifiying things like "If you dereference a null pointer you will get a segmentation fault". Instead the standard specifies what is valid code and mostly stays silent about what happens when you write invalid C++ code. In a nutshell, this is what undefined behavior is.
Repeating a comment of mine:
If you cross a streed when the lights are red, you will not necessarily get hit by car. It just means you should not cross the streets when there are red lights (and when you do you might be hit by a car).
As any analogy this isn't a perfect one. Even if you don't see a car coming and you know that you will not get hit, you should not "cross the street" when "the lights are red". That is because code that relies on undefined behavior like yours may appear to work today, but tomorrow with a different compiler, different version of same compiler, or even same compiler targeting a different platform, bad things may happen.

A pathological way of passing an array: by reference to the first element

It seems one can pass an array by reference to the first element:
void passMe(int& firstElement)
{
int* p=&firstElement;
cout << p[0] << p[1];
}
Main program:
int hat[]={ 5,2,8 };
passMe(hat[0]);
Usually, instead of the above function definition, I would do void passMe(int* myArray) or void passMe(int[] myArray). But would the method above cause any issues? Knowing the answer would allow me to gain a better understanding of the things at play here, if nothing else.

As far as the language-lawyers and the compiler are concerned, it's fine. The main issue will be with the human programmers (including future versions of yourself). When your standard C++ programmer sees this in a program:
void passMe(int& firstElement); // function declaration (in a .h file)
int hat[]={ 5,2,8 };
passMe(hat[0]);
He is going to expect that passMe() might read and/or modify hat[0]. He is definitely not going to expect that passMe() might read and/or modify hat[1]. And when he eventually figures out what you are doing, he's going to be very upset with you for misleading him via unnecessarily "clever" code. :)
Furthermore, if another programmer (who doesn't know about your trick) tries to call your function from his own code, after seeing only the function declaration in the header, he's likely to try to do something like this:
int cane = 5;
passMe(cane);
... which will lead directly to mysterious undefined behavior at runtime when passMe() tries to reference the second item in the array after cane, which doesn't actually exist because cane is not actually in an array.

How can I pass a C++ array of structs to a CUDA device?

I've spent 2 days trying to figure this out and getting nowhere. Say I had a struct that looks like this:
struct Thing {
bool is_solid;
double matrix[9];
}
I want to create an array of that struct called things and then process that array on the GPU. Something like:
Thing *things;
int num_of_things = 100;
cudaMallocManaged((void **)&things, num_of_things * sizeof(Thing));
// Something missing here? Malloc individual structs? Everything I try doesn't work.
things[10].is_solid = true; // Segfaults
Is it even best practice to do it this way rather than pass a single struct with arrays that are num_of_things large? It seem to me that can get pretty nasty especially when you have arrays already (like matrix, which would need to be 9 * num_of_things.
Any info would be much appreciated!

After some dialog in the comments, it seems that OP's posted code has no issues. I was able to successfully compile and run this test case built around that code, and so was OP:
$ cat t1005.cu
#include <iostream>
struct Thing {
bool is_solid;
double matrix[9];
};
int main(){
Thing *things;
int num_of_things = 100;
cudaError_t ret = cudaMallocManaged((void **)&things, num_of_things * sizeof(Thing));
if (ret != cudaSuccess) {
std::cout << cudaGetErrorString(ret) << std::endl;
return 1;}
else {
things[10].is_solid = true;
std::cout << "Success!" << std::endl;
return 0;}
}
$ nvcc -arch=sm_30 -o t1005 t1005.cu
$ ./t1005
Success!
$
Regarding this question:
Is it even best practice to do it this way rather than pass a single struct with arrays that are num_of_things large?
Yes, this is a sensible practice and is usable whether managed memory is being used or not. An array of more or less any structure that does not contain embedded pointers to dynamically allocated data elsewhere can be transferred to the GPU in a simple fashion using a single cudaMemcpy call (for example, if managed memory were not being used.)
To address the question about the 3rd (flags) parameter to cudaMallocManaged:
If it is specified, it is not correct to pass zero (although OP's posted code gives no evidence of that.) You should use one of the documented choices.
If it is not specified, this is still valid, and a default argument of cudaMemAttachGlobal is provided. This can be confirmed by reviewing the cuda_runtime.h file or else simply compiling/running the test code above. This particular point appears to be an oversight in the documentation, and I've filed an internal issue at NVIDIA to take a look at that. So it's possible the documentation may change in the future with respect to this.
Finally, proper cuda error checking is always in order any time you are having trouble with a CUDA code, and the use of such may shed some light on any errors that are made. The seg fault that the OP reported in code comments was almost certainly due to the cudaMallocManaged call failing (perhaps because a zero parameter was supplied incorrectly) and as a result the pointer in question (things) had no actual allocation. Subsequent usage of that pointer would lead to a seg fault. My test code demonstrates how to avoid that seg fault, even if the cudaMallocManaged call fails for some reason, and the key is proper error checking.

Why is return value of queue:front() valid after queue::pop()

I am new to C++ and ran into following supposedly bug, but somehow my program just works..
Here is the code
#include<iostream>
#include<queue>
#include <string>
int main()
{
string s ("cat");
queue<string> _queue;
_queue.push(s);
string & s1 = _queue.front();
_queue.pop();
// at this time s1 should become invalid as pop called destructor on s
std::cout << s1 << std::endl;
return 0;
}
It just works, even though s1 is a reference to an invalid object. Is there a way i can assert that s1 truely refers to an invalid object?

Trying to access a destroyed object the way you do it in your code results in undefined behavior. And no, there's no language-provided way to perform a run-time check for this situation. It is entirely your responsibility to make sure things like that do not happen in your code.
The fact that "it just works" in your experiment is just an accident (with certain degree of typical computer determinism, as usual). Something completely unrelated might change in your program, and this code will no longer "work".

Using "extern C" from C++ with vector for ctypes

My goal is to use ctypes to call from within Python a C++ library I'm creating separately, where I am passing numpy vectors to the C++ routines via pointers. So Python would allocate the memory and just pass the address, and then the C++ routine would access the data and perform its calculations and return the results.
Since I'm new to ctypes, right now I'm gradually building a working toy example. I'm starting off writing the C++ code and creating a C extern interface that the Python wrapper code will use. I haven't even started on the Python wrapper code yet but have some experience with that aspect already.
Here is my example which includes a constructor and a simple total function.
// foo.cpp //
#include <iostream>
#include <string>
#include <vector>
using namespace std;
class Data {
public:
Data(vector<double>*);
void PrintTotal(void);
private:
vector<double>* myContents;
};
Data::Data(vector<double>* input) {
std::cout << "Entered constructor in C++" << std::endl;
myContents = input;
}
void Data::PrintTotal(void) {
double total = 0;
for (int i=0; i<myContents->size(); i++){
total += (*myContents)[i];
}
std::cout << "Hello, my total is " << total << std::endl;
}
extern "C" {
Data* Data_new(double (**input)) {return new Data(input);}
void Print_Total(Data* pData) {pData->PrintTotal();}
}
Note in particular that I'm using the vector class from the STL, which may be part of the problem described below. The idea is that the class Data holds a pointer to the data and does not replicate the data.
When I try to compile this using the command
g++ -c -fPIC foo.cpp -o foo.o
on our Linux machine, I get the following error:
foo.cpp: In function âData* Data_new(double**)â:
foo.cpp:26: error: no matching function for call to âData::Data(double**&)â
foo.cpp:13: note: candidates are: Data::Data(std::vector<double, std::allocator<double> >*)
foo.cpp:6: note: Data::Data(const Data&)
This seems pretty clear to me: it's saying that the way I call the Data constructor in the third to last line (#26) in foo.cpp does not coincide with the constructor I wrote in the C++ part of the code. In both the C++ code and the C extern code, I'm trying to say that the input is a pointer to a vector/array of doubles. I've tried other options on line #36 but it still doesn't compile.
How do I write the extern C part while still using the vector class (which I will find useful when I code the real algorithm) in the C++ part? Am I running into trouble because vector is part of the STL, which doesn't work well with extern C?
Thanks in advance.

First, let's answer your question. You should change your constructor so that it takes the same arguments as Data_new, and converts the input into a std::vector. However, it looks like there are a couple of conceptual issues that you need to focus on first:
Arrays and vectors are not equivalent: if your input points to an
array, but you want to work with a vector, you'll need to copy
all of the elements from the array into the vector (or ask a
vector constructor to do it for you). You need to do the copy
because all STL containers hold onto copies of the elements that are
put in them.
You can construct the vector using two iterators, with:
myContents(data, data + numberOfElements);
But this brings up the second conceptual point --- there is nothing in the signature of your code that tells you how large the input array is.
In Data you're holding onto a pointer to a vector, but you
never free it. This is a memory leak. That being said, you should
probably just hold onto a vector, not a pointer to a vector.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Thread function with passed by reference vector is slow to start - c++

Where are the threads here? Looks like the for loop is causing the delay you are referring to. Nothing unusual here - as you are assigning a vector of size 200000000.

Related

Why thread still work with deleted variable?

A pathological way of passing an array: by reference to the first element

How can I pass a C++ array of structs to a CUDA device?

Why is return value of queue:front() valid after queue::pop()

Using "extern C" from C++ with vector for ctypes

Categories

Resources