Performance penalty using 'auto' keyword in Visual Studio 2010 - c++

Using the new auto keyword has degraded my code execution times. I narrowed the problem to the following simple code snippet:
#include <iostream>
#include <map>
#include <vector>
#include <deque>
#include <time.h>
using namespace std;
void func1(map<int, vector<deque<float>>>& m)
{
vector<deque<float>>& v = m[1];
}
void func2(map<int, vector<deque<float>>>& m)
{
auto v = m[1];
}
void main () {
map<int, vector<deque<float>>> m;
m[1].push_back(deque<float>(1000,1));
clock_t begin=clock();
for(int i = 0; i < 100000; ++i) func1(m);
cout << "100000 x func1: " << (((double)(clock() - begin))/CLOCKS_PER_SEC) << " sec." << endl;
begin=clock();
for(int i = 0; i < 100000; ++i) func2(m);
cout << "100000 x func2: " << (((double)(clock() - begin))/CLOCKS_PER_SEC) << " sec." << endl;
}
The output I get on my i7 / Win7 machine (Release mode; VS2010) is:
100000 x func1: 0.001 sec.
100000 x func2: 3.484 sec.
Can anyone explain why using auto results in such a different execution times?
Obviously, there is a simple workaround, i.e., stop using auto altogether, but I hope there is a better way to overcome this issue.

You are copying the vector to v.
Try this instead to create a reference
auto& v = ...

As Bo said, you have to use auto& instead of auto (Note, that there is also auto* for other cases). Here is an updated version of your code:
#include <functional>
#include <iostream>
#include <map>
#include <vector>
#include <deque>
#include <time.h>
using namespace std;
typedef map<int, vector<deque<float>>> FooType; // this should have a meaningful name
void func1(FooType& m)
{
vector<deque<float>>& v = m[1];
}
void func2(FooType& m)
{
auto v = m[1];
}
void func3(FooType& m)
{
auto& v = m[1];
}
void measure_time(std::function<void(FooType&)> func, FooType& m)
{
clock_t begin=clock();
for(int i = 0; i < 100000; ++i) func(m);
cout << "100000 x func: " << (((double)(clock() - begin))/CLOCKS_PER_SEC) << " sec." << endl;
}
void main()
{
FooType m;
m[1].push_back(deque<float>(1000,1));
measure_time(func1, m);
measure_time(func2, m);
measure_time(func3, m);
}
On my computer, it gives the following output:
100000 x func: 0 sec.
100000 x func: 3.136 sec.
100000 x func: 0 sec.

Related

How can I sort a Boost matrix by column?

I have a 2d boost matrix (boost::numeric::ublas::matrix) of shape (n,m), with the first column being the timestamp. However, the data I'm getting is out of order. How can I sort it with respect to the first column, and what would be the most efficient way to do so? Speed is critical in this particular application.
As I commented ublas::matrix might not be the most natural choice for a task like this. Trying the naive approach using matrix_row and some range magic:
Live on Coliru
#define _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS
#include <boost/numeric/ublas/io.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/matrix_proxy.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/irange.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
using namespace boost::adaptors;
using Matrix = boost::numeric::ublas::matrix<float>;
using Row = boost::numeric::ublas::matrix_row<Matrix>;
static auto by_col0 = [](Row const& a, Row const& b) { return a(0) < b(0); };
int main()
{
constexpr int nrows = 3, ncols = 4;
Matrix m(nrows, ncols);
for (unsigned i = 0; i < m.size1(); ++i)
for (unsigned j = 0; j < m.size2(); ++j)
m(i, j) = (10 - 3.f * i) + j;
std::cout << "before: " << m << "\n";
auto getrow = [&](int i) { return Row(m, i); };
sort(boost::irange(nrows) | transformed(getrow), by_col0);
std::cout << "after: " << m << "\n";
}
Does sadly confirm that the abstraction of the proxy doesn't hold:
before: [3,4]((10,11,12,13),(7,8,9,10),(4,5,6,7))
after: [3,4]((10,11,12,13),(10,11,12,13),(10,11,12,13))|
Oops.
Analysis?
I can't say I know what's wrong. std::sort is defined in terms of ValueSwappable which at first glance seems to work fine for matrix_row:
auto r0 = Row(m, 0);
auto r1 = Row(m, 1);
using std::swap;
swap(r0, r1);
Prints Live On Coliru
Maybe this starting point gives you something helpful. Since it's tricky like this, I'd highly consider using another data structure that is more conducive to your task (boost::multi_array[_ref] comes to mind).

How to get frequency of std:vectors in C++?

I have 5 vectors. I want to check how many times these vectors exist. I used the following code to compare if 2 vectors are equal, but now I have more than 2 vectors. I want to compare all these 5 vectors together and count how many times each vector exists.
How can I do it?
The output should be:
(0,0,1,2,3,0,0,0) = 2 time(s)
(0,0,1,2,3,4,0,0) = 1 time(s)
(0,0,2,4,3,0,0,0) = 1 time(s)
(0,0,6,2,3,5,6,0) = 1 time(s)
Here is my code:
#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;
void checkVec(vector<int> v){
vector<int> v0;
if(v0 == v){
cout << "Equal\n";
}
else{
cout << "Not Equal\n";
}
}
int main(){
vector<int> v1={0,0,1,2,3,0,0,0};
vector<int> v2={0,0,1,2,3,4,0,0};
vector<int> v3={0,0,2,4,3,0,0,0};
vector<int> v4={0,0,1,2,3,0,0,0};
vector<int> v5={0,0,6,2,3,5,6,0};
checkVec(v1);
return 0;
}
You can use std::map counting the number of occurences of each vector:
#include <map>
#include <vector>
#include <iostream>
using vec = std::vector<int>;
int main(){
vec v1={0,0,1,2,3,0,0,0};
vec v2={0,0,1,2,3,4,0,0};
vec v3={0,0,2,4,3,0,0,0};
vec v4={0,0,1,2,3,0,0,0};
vec v5={0,0,6,2,3,5,6,0};
std::map<vec,std::size_t> counter;
// Initializer list creates copies by default
// But you should not create vX variables anyway.
for(const auto& v: {v1,v2,v3,v4,v5}){
++counter[v];
}
std::cout<<"V1 is present " <<counter[v1]<<" times.\n";
return 0;
}
V1 is present 2 times.
Well, this is a contribution for Quimby answer, but if you know how many vectors you will get at compile time, use std::array to contain all that vector. If you know it at runtime instead, use std::vector as shown below
#include <map>
#include <vector>
#include <iostream>
#include <cstddef>
int main(){
std::vector<std::vector<int>> allVector
{
std::vector<int>{0,0,1,2,3,0,0,0},
std::vector<int>{0,0,1,2,3,4,0,0},
std::vector<int>{0,0,2,4,3,0,0,0},
std::vector<int>{0,0,1,2,3,0,0,0},
std::vector<int>{0,0,6,2,3,5,6,0},
};
std::map<std::vector<int>, std::size_t> counter;
for(const auto& v : allVector)
{
++counter[v];
}
// print out the array and it's frequency
for(const auto& pr : counter)
{
std::cout << '(';
for(std::size_t i {0}; i < pr.first.size(); ++i)
{
std::cout << pr.first[i];
if(i != pr.first.size() - 1)
std::cout << ' ';
}
std::cout << ") = " << pr.second << ", ";
}
return 0;
}

How to limit boost::combine to the minimum of two ranges

Finding the following as the source of a segfault just cost me about 4h of work:
#include <boost/range/combine.hpp>
#include <boost/foreach.hpp>
#include <iostream>
#include <vector>
#include <list>
int main(int, const char*[])
{
std::vector<int> v;
std::list<char> l;
for (int i = 0; i < 5; ++i)
{
v.push_back(i);
l.push_back(static_cast<char>(i) + 'a');
}
v.push_back(5);
int ti;
char tc;
BOOST_FOREACH(boost::tie(ti, tc), boost::combine(v, l))
{
std::cout << '(' << ti << ',' << tc << ')' << '\n';
}
return 0;
}
If you execute this example, you will note that combine happily iterates as long as the longer range has values. I do not see this in the documentation.
Is there a way to limit the iteration to the shorter of the two ranges?

g++ 1000 times slower than visual studio using lists?

Consider the following code snippet:
#include <iostream>
#include <ctime>
#include <vector>
#include <list>
using namespace std;
#define NUM_ITER 100000
int main() {
clock_t t = clock();
std::list< int > my_list;
std::vector< std::list< int >::iterator > list_ptr;
list_ptr.reserve(NUM_ITER);
for(int i = 0; i < NUM_ITER; ++i) {
my_list.push_back(0);
list_ptr.push_back(--(my_list.end()));
}
while(my_list.size() > 0) {
my_list.erase(list_ptr[list_ptr.size()-1]);
list_ptr.pop_back();
}
cout << "Done in: " << 1000*(clock()-t)/CLOCKS_PER_SEC << " msec!" << endl;
}
When I compile and run it with visual studio, all optimizations enabled, I get the output:
Done in: 8 msec!
When I compile and run it with g++, using the flags
g++ main.cpp -pedantic -O2
I get the output
Done in: 7349 msec!
Which is rougly 1000 times slower. Why is that? According to the "cppreference" calling erase on a list is supposed to use up only constant time.
The code was compiled and executed on the same machine.
It might be that the implementation shipped by GCC doesn't store the size, and the one MSVC ships does. In this case the inner loop is O(n^2) with GCC, O(n) for MSVC.
Anyway, C++11 mandates that list::size is constant time, you may want to report this as a bug.
UPDATE Workaround:
You can avoid calling size() so many times:
size_t my_list_size = my_list.size();
while(my_list_size > 0) {
accum += *list_ptr[list_ptr.size()-1];
my_list.erase(list_ptr[list_ptr.size()-1]);
--my_list_size;
list_ptr.pop_back();
}
Now it reports 10 msec.
EDIT
Their list implementation isn't as efficient. I tried by replacing with:
#include <iostream>
#include <ctime>
#include <boost/container/vector.hpp>
#include <boost/container/list.hpp>
using namespace std;
#define NUM_ITER 100000
int main() {
clock_t t = clock();
boost::container::list< int > my_list;
boost::container::vector< boost::container::list< int >::iterator > list_ptr;
list_ptr.reserve(NUM_ITER);
for(int i = 0; i < NUM_ITER; ++i) {
my_list.push_back(rand());
list_ptr.push_back(--(my_list.end()));
}
unsigned long long volatile accum = 0;
while(my_list.size() > 0) {
accum += *list_ptr[list_ptr.size()-1];
my_list.erase(list_ptr[list_ptr.size()-1]);
list_ptr.pop_back();
}
cout << "Done in: " << 1000*(clock()-t)/CLOCKS_PER_SEC << " msec!" << endl;
cout << "Accumulated: " << accum << "\n";
}
This now runs in ~0ms on my machine, vs. ~7s using std::list on the same machine.
sehe#desktop:/tmp$ ./test
Done in: 0 msec!
Accumulated: 107345864261546

Why is std::vector erase very slow in release only when I step over with the debugger?

Ok let's start over
I'm trying to erase an element from a std::vector and for some reason it is very slow in release mode only when I step over.
Here is the complete source:
#include <iostream>
#include <vector>
#include <windows.h>
class data
{
public:
int i;
};
void Test(int n)
{
std::vector<data> v;
data d;
for (int i=0; i<n; ++i)
{
v.push_back(d);
}
ULONGLONG nTick = GetTickCount64();
v.erase(v.begin()+1);
std::cout << n << " " << GetTickCount64() - nTick << std::endl;
}
int main()
{
Test(10000);
Test(100000);
Test(1000000);
return 0;
}
When I step over the line
v.erase(v.begin()+1);
it take respectively in release
10000 -> 2 seconds
100000 -> 18 seconds
1000000 -> 182 seconds
but is pretty much instant in debug for all of them?