Question on function references and threads - c++

I was randomly testing std::thread in my virtual linux machine (GCC 4.4.5-Debian) with this test program:
#include <algorithm>
#include <thread>
#include <iostream>
#include <vector>
#include <functional>
using namespace std;
static int i=0;
void f( vector<int> &test)
{
++i;
cout << "Push back called" << endl;
test.push_back(i);
}
int main()
{
vector<thread> t;
vector<int> test;
for( int i=0; i<1000; ++i )
{
t.push_back( thread( bind(f, test) ) );
}
for( auto it = t.begin(); it != t.end(); ++it )
{
(*it).join();
}
cout << test.size() << endl;
for( auto it = test.begin(); it != test.end(); ++it )
{
cout << *it << endl;
}
return 0;
}
Why does vector test remain empty? Am I doing something stupid with references (probably) or is it something with bind or some threading problem?
Thanks!
UPDATE: with the combined help of Kos and villintehaspan I "fixed" the "problem":
#include <algorithm>
#include <thread>
#include <iostream>
#include <vector>
#include <functional>
using namespace std;
static int i=0;
void f( vector<int> &test)
{
++i;
test.push_back(i);
}
int main()
{
vector<thread> t;
vector<int> test;
for( int i=0; i<1000; ++i )
{
t.push_back( thread(f, std::ref(test)) );
}
for( auto it = t.begin(); it != t.end(); ++it )
{
(*it).join();
}
cout << test.size() << endl;
for( auto it = test.begin(); it != test.end(); ++it )
{
cout << *it << endl;
}
return 0;
}
Which prints all values in order and seems to work OK. Now only one question remains: is this just lucky (aka undefined behavior (TM) ) or is the static variable causing a silent mutex-like step in the code?
PS: I understand the "killing multithreadedness" problem here, and that's not my point. I'm just trying to test the robustness of the basic std::thread functionality...

Looks to me like a threading problem.
While I'm not 100% sure, it should be noted that all 1000 threads:
do ++i on the same int value (it's not an atomic operation- you may encounter problems here, you can use __sync_fetch_and_add(&i,1) instead (note that it's a gcc extension not standard C++);
do push_back simultaneously on a std::vector, which is not a thread-safe container AFAIK... Same for cout I think. I believe you'd need to use a locking mechanism around that (std::mutex perhaps? I've only used pthreads so far but I believe it's what you need).
Note that this kind of kills any benefit of using threads here, but that's a consequence of the fact that you shouldn't use multiple threads at once on a non-thread-safe object.
----EDIT----
I had a google on this threading API (not present on my tdm gcc 4.5 on Windows, unfortunately).
Aparrently instead of:
thread( bind(f, test) )
you can just say
thread( f, test )
and pass an arbitrary number of arguments in this way.
Source: http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=422
This should also solve your problem with making a copy of the vector which I haven't noticed before
(+1 for #villintehaspam here).
Actually, one more thing is needed to make sure the copy isn't created here:
thread( f, std::ref(test) )
will make sure that the vector isn't copied.
Wow, I got confused too. :)

The bind will actually make a copy of the vector, so that each thread push_back's on their own copy (yes, that & won't help here). You need to provide the threads with a pointer or similar so that they use the same vector. You should also make sure to use access protection like suggested by Kos.
Edit: After your fix to use std::ref instead of making a copy of the vector, the multithreaded access problem still remains. My guess is that the only reason you don't get any problems right now is because the example is so trivial (or maybe you've only tried in debug mode) - there is no automatic guarantee that the ++ is atomic just because the int is static.

Related

Why this code does allow to push_back unique_ptr do vector?

so I thought adding unique to vector shouldn't work.
Why does it work for the below code?
Is it cause by not setting copy ctor as "deleted"??
#include <iostream>
#include <vector>
#include <memory>
class Test
{
public:
int i = 5;
};
int main()
{
std::vector<std::unique_ptr<Test>> tests;
tests.push_back(std::make_unique<Test>());
for (auto &test : tests)
{
std::cout << test->i << std::endl;
}
for (auto &test : tests)
{
std::cout << test->i << std::endl;
}
}
There is no copy here, only moves.
In this context, make_unique will produce an instance of unique pointer which is not named, and this push_back sees it as a r-value reference, which it can use as it wants.
It produce pretty much the same result than this code would:
std::vector<std::unique_ptr<Test>> tests;
auto ptr = std::make_unique<Test>();
tests.push_back(std::move(ptr));
This is called move semantics if you want to search more info on the matter. (and this only works from c++11 and beyond)
There are two overloads of std::vector::push_back according to https://en.cppreference.com/w/cpp/container/vector/push_back
In your case you will use the one with rvalue-ref so no copying required.

C++ Variable doesn't change until thread finished

I wrote a very simple code to reproduce my problem.
#include <iostream>
#include "tools.h" //contains s_sleep()
#include <thread>
using namespace std;
void change( int *i)
{
while (true)
{
*i = 4356;
}
}
int main()
{
int v=3;
cout << v <<endl;
thread t(change, &v);
s_sleep(1); //sleep one second
cout << v << endl;
t.join();
}
The output is 3, and after a second 3 again. But when I change one line to
//while ( true )
I receive 3, and a second later 4356.
How can that be?
Hope somebody can help.
Please specify what compiler you are using. I am using Microsoft Visual C++ compiler, and with my visual studio, I see for both time, the output is 3 followed by 4356.
Here is the code I ran on my computer.
#include <ctime>
#include <thread>
#include <iostream>
using namespace std;
void change(int *i) {
while (true) { // commented out this later, the result is same.
*i = 4356;
}
}
int main() {
clock_t tstart = clock();
int v = 3;
cout << v << endl;
thread t(change, &v);
while(double(clock() - tstart)/CLOCKS_PER_SEC < 3.0) { // Instead of your s_sleep
int x = 1; // Just a dummy instruction
}
cout << v << endl;
t.join();
return 0;
}
The explanation to my result is that the thread "t" does not know anything about the variable "v". It just gets a pointer of type int and it edits the value at the pointers location directly to the memory. So, when the main(first) thread
again accesses the variable "v", it simply reads the memory assigned to "v" and prints what it gets.
And also, what code is in the "tools.h"? Does it have anything to do with the variable "v".
If it doesn't, then it must be a compiler variance(Your compiler may be different than mine, maybe gcc or g++?). That means, your compiler must have cached(or something like that) the variable for faster access. And as in the current thread, the variable has not been changed, whenever it is accessed, compiler gives the old value(which the compiler sees as unchanged) of the variable. (I AM NOT SURE ABOUT THIS)
This might also be due to caching. You are first reading a variable frome one thread, then manipulating the variable from another thread and reading it again from the first thread. The compiler cannot know that it changed in the meantime.
To safely do this "v" has to be declared volatile.

Wunused-but-set-variable warning with C++11 auto

I am getting a -Wunused-but-set-variable warning with GCC v4.6 with the code below:
for ( auto i : f.vertexIndices ) {
Sy_MatrixFuzzyHashable< Vector3f > wrapper( cp );
if ( !vMapper.contains( wrapper ) ) {
mesh.vertexNormals() << cp;
i.normal = mesh.vertexNormals().size() - 1;
} else {
i.normal = vMapper.value( wrapper );
}
}
The warning specifically is:
warning: variable 'i' set but not used [-Wunused-but-set-variable]
The warning would make sense if i was a copy of an element, but since vertexIndices is a QList object (an STL-compliant Qt container class) the range-based for loop should call the begin() and end() iterator getters, which will always return a non-const iterator (as long as the container is non-const - which it is).
I can't currently test if it is working as I think it should because I'm changing my code base to take advantage of the new C++11 features, so nothing compiles yet. But I was hoping someone could tell me if this warning is is nonsense, or if I have misunderstood auto and range-based for loops...
I think the issue is that your for loop, as written like this:
for ( auto i : f.vertexIndices )
is getting back a copy of the stored vertex, not a reference to it. The compiler warning here says that you're then setting the value of i but not reading it because you're modifying the temporary copy rather than the stored vertex.
If you change it to
for ( auto& i : f.vertexIndices )
then this problem should go away, since you're actually modifying the vertices stored internally.
Hope this helps!
You misunderstood auto. This loop:
for ( auto i : f.vertexIndices )
should really be :
for ( auto & i : f.vertexIndices )
http://en.wikipedia.org/wiki/Foreach_loop#C.2B.2B
illustrates an example but yes, it needs to be a reference object for the foreach
#include <iostream>
int main()
{
int myint[] = {1,2,3,4,5};
for (int& i: myint)
{
std::cout << i << std::endl;
}
}
or
#include <QList>
#include <QDebug>
int main() {
QList<int> list;
list << 1 << 2 << 3 << 4 << 5;
foreach (int i, list) {
qDebug() << i;
}
}
Courtesy of Wikipedia

About unique_ptr performances

I often read that unique_ptr would be preferred in most situations over shared_ptr because unique_ptr is non-copyable and has move semantics; shared_ptr would add an overhead due to copy and ref-counting;
But when I test unique_ptr in some situations, it appears it's noticably slower (in access) than its counterparts
For example, under gcc 4.5 :
edit : the print method doesn't print anything actually
#include <iostream>
#include <string>
#include <memory>
#include <chrono>
#include <vector>
class Print{
public:
void print(){}
};
void test()
{
typedef vector<shared_ptr<Print>> sh_vec;
typedef vector<unique_ptr<Print>> u_vec;
sh_vec shvec;
u_vec uvec;
//can't use initializer_list with unique_ptr
for (int var = 0; var < 100; ++var) {
shared_ptr<Print> p(new Print());
shvec.push_back(p);
unique_ptr<Print> p1(new Print());
uvec.push_back(move(p1));
}
//-------------test shared_ptr-------------------------
auto time_sh_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var)
{
for(auto it = shvec.begin(), end = shvec.end(); it!= end; ++it)
{
(*it)->print();
}
}
auto time_sh_2 = std::chrono::system_clock::now();
cout <<"test shared_ptr : "<< (time_sh_2 - time_sh_1).count() << " microseconds." << endl;
//-------------test unique_ptr-------------------------
auto time_u_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var)
{
for(auto it = uvec.begin(), end = uvec.end(); it!= end; ++it)
{
(*it)->print();
}
}
auto time_u_2 = std::chrono::system_clock::now();
cout <<"test unique_ptr : "<< (time_u_2 - time_u_1).count() << " microseconds." << endl;
}
On average I get (g++ -O0) :
shared_ptr : 1480 microseconds
unique_ptr : 3350 microseconds
where does the difference come from ? is it explainable ?
UPDATED on Jan 01, 2014
I know this question is pretty old, but the results are still valid on G++ 4.7.0 and libstdc++ 4.7. So, I tried to find out the reason.
What you're benchmarking here is the dereferencing performance using -O0 and, looking at the implementation of unique_ptr and shared_ptr, your results are actually correct.
unique_ptr stores the pointer and the deleter in a ::std::tuple, while shared_ptr stores a naked pointer handle directly. So, when you dereference the pointer (using *, ->, or get) you have an extra call to ::std::get<0>() in unique_ptr. In contrast, shared_ptr directly returns the pointer. On gcc-4.7 even when optimized and inlined, ::std::get<0>() is a bit slower than the direct pointer.. When optimized and inlined, gcc-4.8.1 fully omits the overhead of ::std::get<0>(). On my machine, when compiled with -O3, the compiler generates exactly the same assembly code, which means they are literally the same.
All in all, using the current implementation, shared_ptr is slower on creation, moving, copying and reference counting, but equally as fast *on dereferencing*.
NOTE: print() is empty in the question and the compiler omits the loops when optimized. So, I slightly changed the code to correctly observe the optimization results:
#include <iostream>
#include <string>
#include <memory>
#include <chrono>
#include <vector>
using namespace std;
class Print {
public:
void print() { i++; }
int i{ 0 };
};
void test() {
typedef vector<shared_ptr<Print>> sh_vec;
typedef vector<unique_ptr<Print>> u_vec;
sh_vec shvec;
u_vec uvec;
// can't use initializer_list with unique_ptr
for (int var = 0; var < 100; ++var) {
shvec.push_back(make_shared<Print>());
uvec.emplace_back(new Print());
}
//-------------test shared_ptr-------------------------
auto time_sh_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var) {
for (auto it = shvec.begin(), end = shvec.end(); it != end; ++it) {
(*it)->print();
}
}
auto time_sh_2 = std::chrono::system_clock::now();
cout << "test shared_ptr : " << (time_sh_2 - time_sh_1).count()
<< " microseconds." << endl;
//-------------test unique_ptr-------------------------
auto time_u_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var) {
for (auto it = uvec.begin(), end = uvec.end(); it != end; ++it) {
(*it)->print();
}
}
auto time_u_2 = std::chrono::system_clock::now();
cout << "test unique_ptr : " << (time_u_2 - time_u_1).count()
<< " microseconds." << endl;
}
int main() { test(); }
NOTE: That is not a fundamental problem and can be easily fixed by discarding the use of ::std::tuple in current libstdc++ implementation.
All you did in the timed blocks is access them. That won't involve any additional overhead at all. The increased time probably comes from the console output scrolling. You can never, ever do I/O in a timed benchmark.
And if you want to test the overhead of ref counting, then actually do some ref counting. How is the increased time for construction, destruction, assignment and other mutating operations of shared_ptr going to factor in to your time at all if you never mutate shared_ptr?
Edit: If there's no I/O then where are the compiler optimizations? They should have nuked the whole thing. Even ideone junked the lot.
You're not testing anything useful here.
What you are talking about: copy
What you are testing: iteration
If you want to test copy, you actually need to perform a copy. Both smart pointers should have similar performance when it comes to reading, because good shared_ptr implementations will keep a local copy of the object pointed to.
EDIT:
Regarding the new elements:
It's not even worth talking about speed when using debug code, in general. If you care about performance, you will use release code (-O2 in general) and thus that's what should be measured, as there can be significant differences between debug and release code. Most notably, inlining of template code can seriously decrease the execution time.
Regarding the benchmark:
I would add another round of measures: naked pointers. Normally, unique_ptr and naked pointers should have the same performance, it would be worth checking it, and it need not necessarily be true in debug mode.
You might want to "interleave" the execution of the two batches or if you cannot, take the average of each among several runs. As it is, if the computer slows down during the end of the benchmark, only the unique_ptr batch will be affected which will perturbate the measure.
You might be interested in learning more from Neil: The Joy of Benchmarks, it's not a definitive guide, but it's quite interesting. Especially the part about forcing side-effects to avoid dead-code removal ;)
Also, be careful about how you measure. The resolution of your clock might be less precise than what it appears to be. If the clock is refreshed only every 15us for example, then any measure around 15us is suspicious. It might be an issue when measuring release code (you might need to add a few turns to the loop).

C++: Keep track of times function is called

Keeping track of how many times a function is called is easy when passing the counter as an argument into the function. It's also easy when returning a one from the called function. But, I do not want to go that route. The reason behind this is because it seems like bad programming (letting the function know too much information). Is there a better way to keep track of how many times this function has been called?
I'm just looking for concepts that I could study. Providing code examples is not neccessary, but might be helpful.
Edit: I'm not actually looking for profiling tools. Let me add some code to get my point across. Because scope for funcCounter ends in main, I have no way of getting back a variable from myFunction that will increment funcCounter. I could possibly return 1 from myFunction and then increment funcCounter that way, but this doesn't seem like very good programming. Is there another way to do it?
int main()
{
int funcCounter = 0;
char *mystring = "This is a silly function.";
myFunction(mystring);
cout << "Times function is called: " << funcCounter << endl;
return 0;
}
void myFunction(char *mystring)
{
cout << mystring << endl;
}
Have a static variable in your function and keep incrementing it each time the function in called.
void my_Function(void) {
static unsigned int call_count = 0;
call_count++;
}
If you want to do it for debugging reasons, then there are tools like gcov which do this for you. (I'm pretty sure Microsoft doesn't have an alternative bundled with Microsoft Visual C++)
I would do this through the use of a profiling tool like gcov (which is for linux). These programs do the work of inserting code into your program during compilation and give you a report of how many times a function is called, where its called from, and how long the program spent executing that function.
It sounds like what you are looking for is a profiler. Depending on the platform you are using there are a slew of tools available that can help you hunt down the (ab)uses of a routine.
Please revise your question with the platform for which you need profiling tools.
If the function is part of a class, you can add a static counter to the class, plus an accessor and/or reset functions:
class X
{
private:
/* diagnostics */
static int counter = 0;
int read_counter() const { return counter; }
void reset_counter() { counter = 0; }
public:
/* real code */
fcn() {
++counter;
/* ... */
}
};
The problem with adding a static counter to a standalone function is that there's no way to get at the value.
You could add a global, of course, but instead of a raw global I'd suggest an instance of a singleton containing all your diagnostic code and data.
Use a class like this one, and simply instantiate it at the top of a function (or any other block) like is done in f() below.
Note: There is some overhead for gettimeofday() so you may want to use a different timing method, but that is a completely different topic worthy of it's own question (and has been addressed before on SO).
#include <iostream>
#include <string>
#include <map>
#include <sstream>
#include <ctime>
#include <cstdlib>
#include <sys/time.h>
class PerfStats
{
private:
std::string which_;
timeval begin_;
public:
PerfStats(std::string const &file, int line)
{
std::stringstream ss;
ss << file << ':' << line;
which_ = ss.str();
gettimeofday(&begin_, NULL);
}
~PerfStats()
{
timeval end;
gettimeofday(&end, NULL);
Times[which_] = (end.tv_sec - begin_.tv_sec) + (end.tv_usec - begin_.tv_usec)/1000000.0;
++Counts[which_];
}
static std::map<std::string, double> Times;
static std::map<std::string, unsigned int> Counts;
static void Print()
{
for(std::map<std::string, double>::iterator it = Times.begin(); it != Times.end(); ++it)
std::cout << it->first << " :\t" << it->second << "s" << std::endl;
for(std::map<std::string, unsigned int>::iterator it = Counts.begin(); it != Counts.end(); ++it)
std::cout << it->first << " :\t" << it->second << " times" << std::endl;
}
};
std::map<std::string, double> PerfStats::Times;
std::map<std::string, unsigned int> PerfStats::Counts;
void f()
{
PerfStats(__FILE__, __LINE__);
usleep(1);
}
main()
{
srand(time(NULL));
for(int i = 0; i < rand(); ++i)
f();
PerfStats::Print();
}
Sample output:
test.cpp:54 : 2e-06s
test.cpp:54 : 21639 times
Bad coding style, but maybe adding global variables and if necessary mutex locks may do the trick.