Why is vector<vector<int>> slower than vector<int> []?

Why is vector<vector<int>> slower than vector<int> []? - c++

I was trying to solve leetcode323. My code and my way to solve the problem was basically identical to the official answer. The only difference was that I was using vector<vector> while the official answer used vector [] to keep the neighbors of each node. When I used vector [], the system accepted my answer. Is there any advantages of using vector [] over using vector<vector>? I put my code and the official solution code below. Thank you so much in advance.
My code:
class Solution {
public :
void explore(vector<bool> & visited,vector<int> nei[],int cur){
visited[cur]=true;
for(int i=0;i<nei[cur].size();i++){
if(!visited[nei[cur][i]]){
explore(visited,nei,nei[cur][i]);
}
}
}
public:
int countComponents(int n, vector<vector<int>>& edges) {
vector<bool> visited(n);
vector<vector<int>> neighbors(n);
int count=0;
for(int i=0;i<edges.size();i++){
neighbors[edges[i][0]].push_back(edges[i][1]);
neighbors[edges[i][1]].push_back(edges[i][0]);
}
for(int j=0;j<n;j++){
if(!visited[j]){
count++;
explore(visited,neighbors,j);
}
}
return count;
}
};
Official solution
class Solution {
public: void dfs(vector<int> adjList[], vector<int> &visited, int src) {
visited[src] = 1;
for (int i = 0; i < adjList[src].size(); i++) {
if (visited[adjList[src][i]] == 0) {
dfs(adjList, visited, adjList[src][i]);
}
}
}
int countComponents(int n, vector<vector<int>>& edges) {
if (n == 0) return 0;
int components = 0;
vector<int> visited(n, 0);
vector<int> adjList[n];
for (int i = 0; i < edges.size(); i++) {
adjList[edges[i][0]].push_back(edges[i][1]);
adjList[edges[i][1]].push_back(edges[i][0]);
}
for (int i = 0; i < n; i++) {
if (visited[i] == 0) {
components++;
dfs(adjList, visited, i);
}
}
return components;
}
};

I'm not sure, but I think the main problem of your solution is std::vector<bool> which is special case of std::vector.
In the '90s memory size was a problem. So to save memory std::vector<bool> is a specialization of std::vector template and single bits are used to store bool value.
This compacts memory, but comes with performance penalty. Now this has to remain forever to be compatible with already existing code.
I would recommend you to replace std::vector<bool> with std::vector<char> and do not change anything else. Let implicit conversion between bool and char do the magic.
Second candidate is missing reserve for adjList[i] as mentioned in other answer, but "official" solution doesn't do that either.
Here I refactor your code.

The only difference was that I was using vector<vector<int>>
There are several differences:
Official uses (non standard C++) VLA, whereas you use compliant vector<vector<int>>.
I would say that VLA "allocation" (similar to alloca) is faster than real allocation from std::vector (new[]).
From your test, assuming timing is done correctly, VLA seems to have a real impact.
Official use vector<int> whereas you use std::vector<bool>
With specialization, vector<bool> is more compact than std::vector<int /*or char*/> but would require a little more work to set/retrieve individual value.
You have some different names.
Naming difference should not impact runtime.
In some circonstance, very long difference and template usage might impact compilation time. But it should not be the case here.
Order of parameters of dfs/explore are different.
It might probably allow micro optimization in some cases, but swapping the 2 vectors doesn't seem to be relevant here.
Is there any advantages of using vector [] over using vector<vector>?
VLA is non standard C++, that is a big disadvantage.
Stack is generally more limited than heap, so size of "array" is more limited.
Its advantage seems to be a faster allocation.
The usage speed should be similar though.

Related

Performance penalty of using boost::irange over raw loop

A few answers and discussions and even the source code of boost::irange mention that there should be a performance penalty to using these ranges over raw for loops.
However, for example for the following code
#include <boost/range/irange.hpp>
int sum(int* v, int n) {
int result{};
for (auto i : boost::irange(0, n)) {
result += v[i];
}
return result;
}
int sum2(int* v, int n) {
int result{};
for (int i = 0; i < n; ++i) {
result += v[i];
}
return result;
}
I see no differences in the generated (-O3 optimized) code (Compiler Explorer). Does anyone see an example where using such an integer range could lead to worse code generation in modern compilers?
EDIT: Clearly, debug performance might be impacted, but that's not my aim here.
Concerning the strided (step size > 1) example, I think it might be possible to modify the irange code to more closely match the code of a strided raw for-loop.

Does anyone see an example where using such an integer range could lead to worse code generation in modern compilers?
Yes. It is not stated that your particular case is affected. But changing the step to anything else than 1:
#include <boost/range/irange.hpp>
int sum(int* v, int n) {
int result{};
for (auto i : boost::irange(0, n, 8)) {
result += v[i]; //^^^ different steps
}
return result;
}
int sum2(int* v, int n) {
int result{};
for (int i = 0; i < n; i+=8) {
result += v[i]; //^^^ different steps
}
return result;
}
Live.
While sum now looks worse (the loop did not get unrolled) sum2 still benefits from loop unrolling and SIMD optimization.
Edit:
To comment on your edit, it's true that it might be possible to modify the irange code to more closely. But:
To fit how range-based for loops are expanded, boost::irange(0, n, 8) must create some sort of temporary, implementing begin/end iterators and a prefix operator++ (which is crearly not as trivial as an int += operation). Compilers are using pattern matching for optimization, which is trimmed to work with standard C++ and standard libraries. Thus, any result from irange; if it is slightly different than a pattern the compiler knows to optimize, optimization won't kick in. And I think, these are the reason why the author of the library mentions performance penalties.

Advantage of function taking a pointer to a collection, to avoid copying on return?

Suppose I have the following C++ function:
// Returns a set containing {1!, 2!, ..., n!}.
set<int> GetFactorials(int n) {
set<int> ret;
int curr = 1;
for (int i = 1; i < n; i++) {
curr *= i;
ret.insert(curr);
}
return ret;
}
set<int> fs = GetFactorials(5);
(This is just a dummy example. The key is that the function creates the set itself and returns it.)
One of my friends tells me that instead of writing the function the way I did, I should write it so that the function takes in a pointer to a set, in order to avoid copying the set on return. I'm guessing he meant something like:
void GetFactorials2(int n, set<int>* fs) {
int curr = 1;
for (int i = 1; i < n; i++) {
curr *= i;
fs->insert(curr);
}
}
set<int> fs;
GetFactorials2(5, &fs);
My question: is this second way really a big advantage? It seems pretty weird to me. I'm new to C++, and don't know much about compilers, but I would assume that through some compiler magic, my original function wouldn't be that much more expensive. (And I'd get to avoid having to initialize the set myself.) Am I wrong? What should I know about pointers and copying-on-return to understand this?

No, it is generally not advantageous at all. Just about any reasonable compiler these days will utilize named return value optimization (see here). This effectively removes any performance penalty from the former example.
If you really want to get into the nitty gritty, read this article by Dave Abrahams (one of the big contributors to boost). Long story short, however, just return the value. It's probably faster.

Yes it can be expensive. Especially when the set gets bigger. There is no reason not to use pointers or reference here. It will save you a lot and you don't sacrifice much regarding readability.
And why rely on compiler optimizations when you can optimize it yourself. The compiler knows your code but not always understands your algorithm.
I would do this
void GetFactorials2(int n, set<int>& fs) {
// ^^
int curr = 1;
for (int i = 1; i < n; i++) {
curr *= i;
fs->insert(curr);
}
}
and the call will stay normal.
set<int> fs;
GetFactorials2(5, fs);
^^

Dynamic bool array in C++

// All right? This is really good working code?
//Need init array with value "false"
bool **Madj;
int NodeCount=4;
bool **Madj = new bool*[NodeCount];
for (int i=0; i<NodeCount; i++){
Madj[i] = new bool [NodeCount];
for (int j=0; j<NodeCount; j++){
Madj[i][j] = false;
}
}

You could consider using Boost's builtin multi-dimensional array as a less brittle alternative. As noted the code you supplied will work, but has issues.

What about:
std::vector<std::vector<bool> > Madj(4,std:vector<bool>(4, false));
Unfortunately std::vector<bool> is specialized to optimize for size (not speed).
So it can be inefficient (especially if used a lot). So you could use an int array (if you find the bool version is slowing you down).
std::vector<std::vector<int> > Madj(4,std:vector<int>(4, 0));
Note: int can be used in a boolean context and auto converted (0 => false, any other number is true (though best to use 1).

At least IMO, if you insist on doing this at all, you should normally do it rather differently, something like:
class bool_array {
bool *data_;
size_t width_;
// no assignment or copying
bool_array &operator=();
bool_array(bool_array const &);
public:
bool_array(size_t x, size_t y) width_(x) {
data_ = new bool[x*y];
std::fill_n(data_, x*y, false);
}
bool &operator()(size_t x, size_t y) {
return data_[y+width_+x];
}
~bool_array() { delete [] data_; }
};
This can be embellished (e.g., using a proxy to enforce constness), but the general idea remains: 1) allocate your bools in a single block, and 2) put them into a class, and 3) overload an operator to support reasonably clean indexing into the data.
You should also consider using std::vector<bool>. Unlike other instantiations of std::vector, it's not a container (as the standard defines that term), which can be confusing -- but what you're creating isn't a container either, so that apparently doesn't matter to you.

bool **Madj = new bool*[NodeCount];
for (int i=0; i<NodeCount; i++){
Madj[i] = new bool [NodeCount];
for (int j=0; j<NodeCount; j++){
Madj[i][j] = false;
}
}
If the first call to new succeeds but any of the ones in the loop fails, you have a memory leak since Madj and the subarrays up to the current i are not deleted. Use a vector<vector<bool> >, or a vector<bool> of size NodeCount * NodeCount. With the latter option, you can get to element (i,j) with [i*NodeCount+j].

I think this looks fine!
Depending on the use, you could use std::vector instead of a raw array.
But its true that the first Madj declaration should be "extern" to avoid linking or shadowing errors.

If you have only bools, consider using bitsets. You can combine that with other containers for multidimensional arrays, e.g vector<bitset>.

C++ metaprogramming with templates versus inlining

Is it worth to write code like the following to copy array elements:
#include <iostream>
using namespace std;
template<int START, int N>
struct Repeat {
static void copy (int * x, int * y) {
x[START+N-1] = y[START+N-1];
Repeat<START, N-1>::copy(x,y);
}
};
template<int START>
struct Repeat<START, 0> {
static void copy (int * x, int * y) {
x[START] = y[START];
}
};
int main () {
int a[10];
int b[10];
// initialize
for (int i=0; i<=9; i++) {
b[i] = 113 + i;
a[i] = 0;
}
// do the copy (starting at 2, 4 elements)
Repeat<2,4>::copy(a,b);
// show
for (int i=0; i<=9; i++) {
cout << a[i] << endl;
}
} // ()
or is it better to use a inlined function?
A first drawback is that you can't use variables in the template.

That's not better. First of all, it's not really compile time, since you make function calls here. If you are lucky, the compiler will inline these and end up with a loop you could have written yourself with much less amount of code (or just by using std::copy).

General rule: Use templates for things known at compile time, use inlining for things known at run time. If you don't know the size of your array at compile time, then don't be using templates for it.

You shouldn't do this. Templates were invented for different purpose, not for calculations, although you can do it. First you can't use variables, second templates will produce vast of unused structures at compilation, and third is: use for (int i = start; i <= end; i++) b[i] = a[i];

That's better because you control and enforce the loop unrolling by yourself.
A loop can be unrolled by the compiler depending on optimizing options...
The fact that copying with copy is almost the best is not a good general answer because the loop unrolling can be done whatever is the computation done inside...

A proper way to create a matrix in c++

I want to create an adjacency matrix for a graph. Since I read it is not safe to use arrays of the form matrix[x][y] because they don't check for range, I decided to use the vector template class of the stl. All I need to store in the matrix are boolean values. So my question is, if using std::vector<std::vector<bool>* >* produces too much overhead or if there is a more simple way for a matrix and how I can properly initialize it.
EDIT: Thanks a lot for the quick answers. I just realized, that of course I don't need any pointers. The size of the matrix will be initialized right in the beginning and won't change until the end of the program. It is for a school project, so it would be good if I write "nice" code, although technically performance isn't too important. Using the STL is fine. Using something like boost, is probably not appreciated.

Note that also you can use boost.ublas for matrix creation and manipulation and also boost.graph to represent and manipulate graphs in a number of ways, as well as using algorithms on them, etc.
Edit: Anyway, doing a range-check version of a vector for your purposes is not a hard thing:
template <typename T>
class BoundsMatrix
{
std::vector<T> inner_;
unsigned int dimx_, dimy_;
public:
BoundsMatrix (unsigned int dimx, unsigned int dimy)
: dimx_ (dimx), dimy_ (dimy)
{
inner_.resize (dimx_*dimy_);
}
T& operator()(unsigned int x, unsigned int y)
{
if (x >= dimx_ || y>= dimy_)
throw std::out_of_range("matrix indices out of range"); // ouch
return inner_[dimx_*y + x];
}
};
Note that you would also need to add the const version of the operators, and/or iterators, and the strange use of exceptions, but you get the idea.

Best way:
Make your own matrix class, that way you control every last aspect of it, including range checking.
eg. If you like the "[x][y]" notation, do this:
class my_matrix {
std::vector<std::vector<bool> >m;
public:
my_matrix(unsigned int x, unsigned int y) {
m.resize(x, std::vector<bool>(y,false));
}
class matrix_row {
std::vector<bool>& row;
public:
matrix_row(std::vector<bool>& r) : row(r) {
}
bool& operator[](unsigned int y) {
return row.at(y);
}
};
matrix_row& operator[](unsigned int x) {
return matrix_row(m.at(x));
}
};
// Example usage
my_matrix mm(100,100);
mm[10][10] = true;
nb. If you program like this then C++ is just as safe as all those other "safe" languages.

The standard vector does NOT do range checking by default.
i.e. The operator[] does not do a range check.
The method at() is similar to [] but does do a range check.
It will throw an exception on out of range.
std::vector::at()
std::vector::operator[]()
Other notes:
Why a vector<Pointers> ?
You can quite easily have a vector<Object>. Now there is no need to worry about memory management (i.e. leaks).
std::vector<std::vector<bool> > m;
Note: vector<bool> is overloaded and not very efficient (i.e. this structure was optimized for size not speed) (It is something that is now recognized as probably a mistake by the standards committee).
If you know the size of the matrix at compile time you could use std::bitset?
std::vector<std::bitset<5> > m;
or if it is runtime defined use boost::dynamic_bitset
std::vector<boost::dynamic_bitset> m;
All of the above will allow you to do:
m[6][3] = true;

If you want 'C' array performance, but with added safety and STL-like semantics (iterators, begin() & end() etc), use boost::array.
Basically it's a templated wrapper for 'C'-arrays with some NDEBUG-disable-able range checking asserts (and also some std::range_error exception-throwing accessors).
I use stuff like
boost::array<boost::array<float,4>,4> m;
instead of
float m[4][4];
all the time and it works great (with appropriate typedefs to keep the verbosity down, anyway).
UPDATE: Following some discussion in the comments here of the relative performance of boost::array vs boost::multi_array, I'd point out that this code, compiled with g++ -O3 -DNDEBUG on Debian/Lenny amd64 on a Q9450 with 1333MHz DDR3 RAM takes 3.3s for boost::multi_array vs 0.6s for boost::array.
#include <iostream>
#include <time.h>
#include "boost/array.hpp"
#include "boost/multi_array.hpp"
using namespace boost;
enum {N=1024};
typedef multi_array<char,3> M;
typedef array<array<array<char,N>,N>,N> C;
// Forward declare to avoid being optimised away
static void clear(M& m);
static void clear(C& c);
int main(int,char**)
{
const clock_t t0=clock();
{
M m(extents[N][N][N]);
clear(m);
}
const clock_t t1=clock();
{
std::auto_ptr<C> c(new C);
clear(*c);
}
const clock_t t2=clock();
std::cout
<< "multi_array: " << (t1-t0)/static_cast<float>(CLOCKS_PER_SEC) << "s\n"
<< "array : " << (t2-t1)/static_cast<float>(CLOCKS_PER_SEC) << "s\n";
return 0;
}
void clear(M& m)
{
for (M::index i=0;i<N;i++)
for (M::index j=0;j<N;j++)
for (M::index k=0;k<N;k++)
m[i][j][k]=1;
}
void clear(C& c)
{
for (int i=0;i<N;i++)
for (int j=0;j<N;j++)
for (int k=0;k<N;k++)
c[i][j][k]=1;
}

What I would do is create my own class for dealing with matrices (probably as an array[x*y] because I'm more used to C (and I'd have my own bounds checking), but you could use vectors or any other sub-structure in that class).
Get your stuff functional first then worry about how fast it runs. If you design the class properly, you can pull out your array[x*y] implementation and replace it with vectors or bitmasks or whatever you want without changing the rest of the code.
I'm not totally sure, but I thing that's what classes were meant for, the ability to abstract the implementation well out of sight and provide only the interface :-)

In addition to all the answers that have been posted so far, you might do well to check out the C++ FAQ Lite. Questions 13.10 - 13.12 and 16.16 - 16.19 cover several topics related to rolling your own matrix class. You'll see a couple of different ways to store the data and suggestions on how to best write the subscript operators.
Also, if your graph is sufficiently sparse, you may not need a matrix at all. You could use std::multimap to map each vertex to those it connects.

my favourite way to store a graph is vector<set<int>>; n elements in vector (nodes 0..n-1), >=0 elements in each set (edges). Just do not forget adding a reverse copy of every bi-directional edge.

Consider also how big is your graph/matrix, does performance matter a lot? Is the graph static, or can it grow over time, e.g. by adding new edges?

Probably, not relevant as this is an old question, but you can use the Armadillo library, which provides many linear algebra oriented data types and functions.
Below is an example for your specific problem:
// In C++11
Mat<bool> matrix = {
{ true, true},
{ false, false},
};
// In C++98
Mat<bool> matrix;
matrix << true << true << endr
<< false << false << endr;

Mind you std::vector doesn't do range checking either.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why is vector<vector<int>> slower than vector<int> []? - c++

Related

Performance penalty of using boost::irange over raw loop

Advantage of function taking a pointer to a collection, to avoid copying on return?

Dynamic bool array in C++

C++ metaprogramming with templates versus inlining

A proper way to create a matrix in c++

Categories

Resources