[ performance]--- string::operator+= vs. vector<char> push_back

[ performance]--- string::operator+= vs. vector<char> push_back - c++

I was testing the performance between these two operations, and on G++ 4.7.3, the string::operator+= version is about 2 times faster. My simple test:
What can be the cause of such a big difference?
g++ -O2 --std=c++11
#include <iostream>
#include <ctime>
#include <string>
#include <vector>
using namespace std;
class Timer {
public:
Timer(const std::string &label)
:label_(label)
{
begin_clock_ = clock();
cout <<label<<"- Timer starts!"<<endl;
}
~Timer() {
clock_t clock_used = clock() - begin_clock_;
cout<<label_<<"- Clock used:"<<clock_used
<<" Time:"<<clock_used/CLOCKS_PER_SEC<<endl;
}
private:
clock_t begin_clock_;
string label_;
};
int str(int loop)
{
Timer t("str");
string s;
for(int i=0;i<loop;++i)
s+=(i%2);
return s.length();
}
int vec(int loop)
{
Timer t("vec");
vector<bool> v;
for(int i=0;i<loop;++i)
v.push_back(i%2);
return v.size();
}
int main()
{
int loop = 1000000000;
int s1=str(loop);
int s2=vec(loop);
cout <<"s1="<<s1<<endl;
cout <<"s2="<<s2<<endl;
}

Strings and vectors both store their content contiguously. If there's not enough room for adding a new element, the capacity must be increased (memory allocation) and the existing content must be moved to the new location.
Hence, the performance should depend significantly on the allocation strategy of your implementation. If one container reserves bigger chunks when the current capacity is exhausted, it will be more efficient (less allocation, less moving).
Of course, the results are implementation dependent. In my tests, for example, the vector implementation was one third faster than the string variant.
Here how to see the effect:
int str(int loop)
{
Timer t("str");
string s;
size_t capa = 0, ncapa, alloc = 0; // coutners for monitoring allocations
long long mw = 0; //
for(int i = 0; i<loop; ++i){
if((ncapa = s.capacity()) != capa) // check if capacity increased
{ //
capa = ncapa; alloc++; mw += s.size(); //
} //
s += (i % 2);
}
cout << "allocations: " << alloc << " and elements moved: " << mw << endl;
return s.length();
}
On my compiler for example, for strings I got a capacity of 2, 4, 8, ... when for vectors it started immediately at 32,64, ...
Now, this doesn't explain all. If you want to see what part of the performance comes from allocation policy and what from other factors, you can siimply pre-allocate your string (s.reserve(loop);) and your vector (v.reserve(loop);) before starting to add any elements.

Related

C++ pointers not accessing correct memory address, instead accessing random memory locations

I'm learning C++ and I'm wondering if anyone can explain some strange behaviour I'm seeing.
I'm currently learning memory management and have been playing around with the following code:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
// pass back by pointer (old C++)
const int array_size = 1e6; // determines size of the random number array
vector<int> *RandomNumbers1()
{
vector<int> *random_numbers = new vector<int>[array_size]; // allocate memory on the heap...
for (int i = 0; i < array_size; i++)
{
int b = rand();
(*random_numbers).push_back(b); // ...and fill it with random numbers
}
return random_numbers; // return pointer to heap memory
}
int main (){
vector<int> *random_numbers = RandomNumbers1();
for (int i = 0; i < (*random_numbers).size(); i++){
cout << (*random_numbers)[i] + "\n";
}
delete random_numbers;
}
What I'm trying to do is get a pointer to a vector containing random integers by calling the RandomNumbers1() function, and then print each random number on a new line.
However, when I run the above code, instead of printing out a random number, I get all sorts of random information. It seems as though the code is accessing random places in memory and printing out the contents.
Now I know that I'm doing something stupid here - I have an int and I am adding the string "\n" to it. If I change the code in main() to the following, it works fine:
int main (){
vector<int> *random_numbers = RandomNumbers1();
for (int i = 0; i < (*random_numbers).size(); i++){
cout << to_string((*random_numbers)[i]) + "\n";
}
}
However I just can't understand the behaviour I'm getting with the "wrong" code - i.e. how adding the string "\n" to (*random_numbers)[i]
causes the program to access random areas of memory, instead of where my pointer is pointing to. Surely I have de-referenced the pointer and accessed the element at position i before "adding" "\n" to it? So how is the program instead accessing a totally different memory address?

"\n" is a string literal. It is an array and it is converted to a pointer pointing at its first element in your expression.
(*random_numbers)[i] is an integer.
Adding a pointer to an integer means that advance the pointer by the integer.
This will drive the pointer to out-of-range because "\n" has only 2 elements ('\n' and '\0') but the numbers returnd from the rand() function are likely to be larger than 2.

There are several issues with your code.
you are using delete instead of delete[] to free the array allocated with new[].
you are creating an array of 1000000 vectors, but populating only the 1st vector with 1000000 integers. You probably meant to create just 1 vector instead.
you can and should use the -> operator when accessing an object's members via a pointer. Using the * and . operators will also work, but is more verbose and harder to read/code for.
you are trying to print a "\n" after each number, but you are using the + operator when you should be using the << operator instead. You can't append a string literal to an integer (well, you can, but it will invoke pointer arithmetic and thus the result will not be what you want, as you have seen).
With that said, try something more like this:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
const int array_size = 1e6; // determines size of the random number array
vector<int>* RandomNumbers1()
{
vector<int> *random_numbers = new vector<int>;
random_numbers->reserve(array_size);
for (int i = 0; i < array_size; ++i)
{
int b = rand();
random_numbers->push_back(b);
}
return random_numbers;
}
int main (){
vector<int> *random_numbers = RandomNumbers1();
for (size_t i = 0; i < random_numbers->size(); ++i){
cout << (*random_numbers)[i] << "\n";
}
/* alternatively:
for (int number : *random_numbers){
cout << number << "\n";
}
*/
delete[] random_numbers;
}
However, if you are going to return a pointer to dynamic memory, you really should wrap it inside a smart pointer like std::unique_ptr or std::shared_ptr, and let it deal with the delete for you:
#include <iostream>
#include <vector>
#include <cmath>
#include <memory>
using namespace std;
const int array_size = 1e6; // determines size of the random number array
unique_ptr<vector<int>> RandomNumbers1()
{
auto random_numbers = make_unique<vector<int>>();
// or: unique_ptr<vector<int>> random_numbers(new vector<int>);
random_numbers->reserve(array_size);
for (int i = 0; i < array_size; ++i)
{
int b = rand();
random_numbers->push_back(b);
}
return random_numbers;
}
int main (){
auto random_numbers = RandomNumbers1();
for (size_t i = 0; i < random_numbers->size(); ++i){
cout << (*random_numbers)[i] << "\n";
}
/* alternatively:
for (int number : *random_numbers){
cout << number << "\n";
}
*/
}
Though, in this case, there is really no good reason to create the vector dynamically at all. 99% of the time, it is unnecessary and unwanted to use standard containers like that. Since the vector manages dynamic memory internally, there is no reason for the vector itself to also be created in dynamic memory. Return the vector by value instead, and let the compiler optimize the return for you.
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
const int array_size = 1e6; // determines size of the random number array
vector<int> RandomNumbers1()
{
vector<int> random_numbers;
random_numbers.reserve(array_size);
for (int i = 0; i < array_size; ++i)
{
int b = rand();
random_numbers.push_back(b);
}
return random_numbers;
}
int main (){
vector<int> random_numbers = RandomNumbers1();
for (size_t i = 0; i < random_numbers.size(); ++i){
cout << random_numbers[i] << "\n";
}
/* alternatively:
for (int number : random_numbers){
cout << number << "\n";
}
*/
}

Is it possible to dynamically allocate a vector?

I was told on here to use vectors instead of arrays for this particular solution ( My original solution using to dynamically allocate arrays) my problem is I don't really understand vectors, my book only covers a small section. So, maybe I am doing something wrong? I'm just trying to learn a different way in solving it with vectors. So, when I ask is it possible to dynamically allocate a vector. What I mean is it legal to do this std::vector<int*> vect= nullptr;, if not why? Also it would be really helpful to see an example how to modify my original solution with vectors. I'm a beginner just trying to learn from my mistakes.
#include <iostream>
#include <vector>
void sortAscendingOrder(int*, int );
void LowestTestScore(int*, int);
void calculatesAverage(int*,int);
int main()
{
std::vector<int*> vect= nullptr;
int input;
std::cout << "Enter the number of testscores you want to enter." <<std::endl;
std::cin >> input;
vect = new int[input];
for(int count =0; count < input; count++)
{
std::cout << "Enter the test score" << (count +1) <<":" <<std::endl;
std::cin >> vect[count] ;
while( vect[count] < 0)
{
std::cout <<"You enter a negative number. Please enter a postive number." <<std::endl;
std::cin >> vect[count];
}
}
sortAscendingOrder(vect,input);
for(int count =0; count < input;count++)
{
std::cout << "\n" <<vect[count];
std::cout << std::endl;
}
LowestTestScore(vect,input);
calculatesAverage(vect,input);
return 0;
}
void sortAscendingOrder(int* input,int size)
{
int startScan,minIndex,minValue;
for(startScan =0; startScan < (size-1);startScan++)
{
minIndex = startScan;
minValue = input[startScan];
for(int index = startScan+1;index<size;index++)
{
if(input[index] < minValue)
{
minValue = input[index];
minIndex = index;
}
}
input[minIndex]=input[startScan];
input[startScan]=minValue;
}
}
void LowestTestScore(int* input, int size)
{
int count =0;
int* lowest = nullptr;
lowest = input;
for(count =1; count <size;count++)
{
if(input[count] < lowest[0])
{
lowest[0] = input[count];
}
}
std::cout << "Lowest score" << *lowest;
}
void calculatesAverage(int* input, int size)
{
int total = 0;
int average =0;
for(int count = 0; count < size; count++)
{
total += input[count];
}
average =(total/size-1);
std::cout << "Your average is" << average;
}

Dynamic allocation of an array is required when you want to increase the size of an array at run-time. Just like arrays, vectors used contiguous storage locations for their elements. But, unlike arrays, their size can change dynamically, with their storage being handled automatically by the container.
Vectors do not reallocate each time an element is added to the container. It pre-allocates some extra storage to accommodate future growth. Libraries can implement different strategies for growth to balance between memory usage and reallocation, but in any case, reallocation should only happen at logarithmically growing intervals of size so that the insertion of individual elements at the end of the vector can be provided with amortized constant time complexity (see push_back)
To answer your question:
Yes, it is possible to dynamically allocate a vector and no, it requires no extra effort as std:vector already does that. To dynamically allocate vectors is therefore, a redundant effort.
You can write:
#include<vector>
typedef std::vector< std::vector<double> >vec_array;
vec_array name(size_x, std::vector<double>(size_y));
This means: size instances of a std::vector<double>, each containing size_y doubles (initialized to 0.0).
Or, you can simply use vector to increase the container size (knowing it handles storage requirements by itself) like:
#include <vector>
vector<int> vec_array;
int i;
for(i=0;i<10;i++)
vec_array.push_back(i);

Performance issue with vectors

I am using a couple of structs in my code. The first struct is constructed after reading data from a line of text file and the second struct contains a vector of structs of the first kind.
struct A{
long a, b, c, d, e, f;
string x, y, z;
}
struct B{
string id1;
vector<A> aVec;
}
Now I read my file and initialize a vector of struct B's. Then based on what the incoming new line's id1 and id2 are, I create a new A and push it into the correct B.
vector<B> bVec;
vector<A> atmpVec;
B btmp;
//Initializing bVec
for(int i = 0; i < 7; i++)
{
btmp.id1 = "c"+to_string(i);
btmp.aVec = atmpVec;
//tried using reserve too.
//btmp.aVec.reserve(50000);
bVec.push_back(btmp);
}
//readerCode
while(getline(file, line))
{
A = readA(line); //readA reads and sets the fields of struct A.
int idx = getBIdx(bVec, A); //getBIdx returns which B struct should i be using.
bVec[idx].aVec.push_back(A);
}
Now the last line has become a bottleneck. If I simply declare a vector of A and keep on pushing back to it, the time taken to process a million records is ~10 seconds.
On the other hand, with this approach, it takes 60 seconds to just process 50k records.
Is there a way I can keep the above general structure without losing the performance.
Any ways to efficiently implement this?

Isn't the time spent in the getBIdx method ? Pushing to a single vector or one between N should be almost the same.
Trying with a simple getBIdx:
#include <cstdlib>
#include <iostream>
#include <string>
#include <vector>
#include <time.h>
using namespace std;
const int NUMBER_OF_A = 3E7;
const int NUMBER_OF_B = 7;
struct A {
long a, b, c, d, e, f;
string x, y, z;
};
struct B {
string id1;
vector<A> aVec;
};
struct A readA() {
A a;
a.a = 1;
a.b = 2;
return a;
}
int getBIdx(const A& a) {
return rand() % NUMBER_OF_B;
}
void Test1() {
vector<B> bVec;
for(int i = 0; i < NUMBER_OF_B; i++) {
B btmp;
bVec.push_back(btmp);
}
for(int i = 0; i < NUMBER_OF_A; ++i) {
A a = readA();
int idx = getBIdx(a);
bVec[idx].aVec.push_back(a);
}
}
void Test2() {
vector<A> vector;
for(int i = 0; i < NUMBER_OF_A; ++i) {
A a = readA();
int idx = getBIdx(a);
vector.push_back(a);
}
}
int main() {
time_t start = time(0);
Test1();
time_t end_of_test1 = time(0);
Test2();
time_t end_of_test2 = time(0);
cout << "Elapsed test 1:" << end_of_test1 - start << " s" << endl;
cout << "Elapsed test 2:" << end_of_test2 - end_of_test1 << " s" << endl;
return 0;
}
Result: (old Pentium 4 single core machine)
Elapsed test 1:17 s
Elapsed test 2:13 s
So it's slower but not that slower.
With -O3 the difference is even smaller:
Elapsed test 1:9 s
Elapsed test 2:7 s

I　would try to optimize this code in two ways
define vector<A*> aVec instead of vector<A> aVec, to avoid copy constructor call ,since you are using C++0x
estimate the size of size of aVec in B, using resize() to resever some space might save some time

You should probably using the sizing constructor on bVec as it's size is known.
Then for the main culprit of filling it with vector of A, you probably
want to use vector::reserve on each avec inside bVec with an arbitrary size depending on your number of data to be fed.
Also, are you sure you compile with -O3 ?

Is using a vector of boolean values slower than a dynamic bitset?

Is using a vector of boolean values slower than a dynamic bitset?
I just heard about boost's dynamic bitset, and I was wondering is it worth
the trouble. Can I just use vector of boolean values instead?

A great deal here depends on how many Boolean values you're working with.
Both bitset and vector<bool> normally use a packed representation where a Boolean is stored as only a single bit.
On one hand, that imposes some overhead in the form of bit manipulation to access a single value.
On the other hand, that also means many more of your Booleans will fit in your cache.
If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.
Most of the arguments against std::vector<bool> come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool> isn't a container.
If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool> or vector<char> can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool> should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.
Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>" without a lot of explanation about the tradeoffs involved should not be trusted.
Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:
#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>
unsigned long primes = 0;
template <class bool_t>
unsigned long sieve(unsigned max) {
std::vector<bool_t> sieve(max, false);
sieve[0] = sieve[1] = true;
for (int i = 2; i < max; i++) {
if (!sieve[i]) {
++primes;
for (int temp = 2 * i; temp < max; temp += i)
sieve[temp] = true;
}
}
return primes;
}
// Warning: auto return type will fail with older compilers
// Fine with g++ 5.1 and VC++ 2015 though.
//
template <class F>
auto timer(F f, int max) {
auto start = std::chrono::high_resolution_clock::now();
primes += f(max);
auto stop = std::chrono::high_resolution_clock::now();
return stop - start;
}
int main() {
using namespace std::chrono;
unsigned number = 100000000;
auto using_bool = timer(sieve<bool>, number);
auto using_char = timer(sieve<char>, number);
std::cout << "ignore: " << primes << "\n";
std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "\n";
std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "\n";
}
We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char> vs. vector<bool>. Here are some results. First with VC++ 2015:
ignore: 34568730
Time using bool: 2623
Time using char: 3108
...then the time using g++ 5.1:
ignore: 34568730
Time using bool: 2359
Time using char: 3116
Obviously, the vector<bool> wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char> in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:
ignore: 3987474
Time using bool: 72
Time using char: 249
Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool> is saving enough space that the array fits entirely in the cache, while the vector<char> is large enough to overflow the cache, and involve a great deal of main memory access.

You should usually avoid std::vector<bool> because it is not a standard container. It's a packed version, so it breaks some valuable guarantees usually given by a vector. A valid alternative would be to use std::vector<char> which is what Herb Sutter recommends.
You can read more about it in his GotW on the subject.
Update:
As has been pointed out, vector<bool> can be used to good effect, as a packed representation improves locality on large data sets. It may very well be the fastest alternative depending on circumstances. However, I would still not recommend it by default since it breaks many of the promises established by std::vector and the packing is a speed/memory tradeoff which may be beneficial in both speed and memory.
If you choose to use it, I would do so after measuring it against vector<char> for your application. Even then, I'd recommend using a typedef to refer to it via a name which does not seem to make the guarantees which it does not hold.

#include "boost/dynamic_bitset.hpp"
#include <chrono>
#include <iostream>
#include <random>
#include <vector>
int main(int, char*[])
{
auto gen = std::bind(std::uniform_int_distribution<>(0, 1), std::default_random_engine());
std::vector<char> randomValues(1000000);
for (char & randomValue : randomValues)
{
randomValue = static_cast<char>(gen());
}
// many accesses, few initializations
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 500; ++i)
{
std::vector<bool> test(1000000, false);
for (int j = 0; j < test.size(); ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Time taken1: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " milliseconds" << std::endl;
auto start2 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 500; ++i)
{
boost::dynamic_bitset<> test(1000000, false);
for (int j = 0; j < test.size(); ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end2 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken2: " << std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2).count()
<< " milliseconds" << std::endl;
auto start3 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 500; ++i)
{
std::vector<char> test(1000000, false);
for (int j = 0; j < test.size(); ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end3 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken3: " << std::chrono::duration_cast<std::chrono::milliseconds>(end3 - start3).count()
<< " milliseconds" << std::endl;
// few accesses, many initializations
auto start4 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; ++i)
{
std::vector<bool> test(1000000, false);
for (int j = 0; j < 500; ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end4 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken4: " << std::chrono::duration_cast<std::chrono::milliseconds>(end4 - start4).count()
<< " milliseconds" << std::endl;
auto start5 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; ++i)
{
boost::dynamic_bitset<> test(1000000, false);
for (int j = 0; j < 500; ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end5 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken5: " << std::chrono::duration_cast<std::chrono::milliseconds>(end5 - start5).count()
<< " milliseconds" << std::endl;
auto start6 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; ++i)
{
std::vector<char> test(1000000, false);
for (int j = 0; j < 500; ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end6 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken6: " << std::chrono::duration_cast<std::chrono::milliseconds>(end6 - start6).count()
<< " milliseconds" << std::endl;
return EXIT_SUCCESS;
}
Time taken1: 1821 milliseconds
Time taken2: 1722 milliseconds
Time taken3: 25 milliseconds
Time taken4: 1987 milliseconds
Time taken5: 1993 milliseconds
Time taken6: 10970 milliseconds
dynamic_bitset = std::vector<bool>
if you allocate many times but you only access the array that you created few times, go for std::vector<bool> because it has lower allocation/initialization time.
if you allocate once and access many times, go for std::vector<char>, because of faster access
Also keep in mind that std::vector<bool> is NOT safe to be used is in multithreading because you might write to different bits but it might be the same byte.

It appears that the size of a dynamic bitset cannot be changed:
"The dynamic_bitset class is nearly identical to the std::bitset class. The difference is that the size of the dynamic_bitset (the number of bits) is specified at run-time during the construction of a dynamic_bitset object, whereas the size of a std::bitset is specified at compile-time through an integer template parameter." (from http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html)
As such, it should be slightly faster since it will have slightly less overhead than a vector, but you lose the ability to insert elements.

UPDATE: I just realize that OP was asking about vector<bool> vs bitset, and my answer does not answer the question, but I think I should leave it, if you search for c++ vector bool slow, you end up here.
vector<bool> is terribly slow. At least on my Arch Linux system (you can probably get a better implementation or something... but I was really surprised). If anybody has any suggestions why this is so slow, I'm all ears! (Sorry for the blunt beginning, here's the more professional part.)
I've written two implementations of the SOE, and the 'close to metal' C implementation is 10 times faster. sievec.c is the C implementation, and sievestl.cpp is the C++ implementation. I just compiled with make (implicit rules only, no makefile): and the results were 1.4 sec for the C version, and 12 sec for the C++/STL version:
sievecmp % make -B sievec && time ./sievec 27
cc sievec.c -o sievec
aa 1056282
./sievec 27 1.44s user 0.01s system 100% cpu 1.455 total
and
sievecmp % make -B sievestl && time ./sievestl 27
g++ sievestl.cpp -o sievestl
1056282./sievestl 27 12.12s user 0.01s system 100% cpu 12.114 total
sievec.c is as follows:
#include <stdio.h>
#include <stdlib.h>
typedef unsigned long prime_t;
typedef unsigned long word_t;
#define LOG_WORD_SIZE 6
#define INDEX(i) ((i)>>(LOG_WORD_SIZE))
#define MASK(i) ((word_t)(1) << ((i)&(((word_t)(1)<<LOG_WORD_SIZE)-1)))
#define GET(p,i) (p[INDEX(i)]&MASK(i))
#define SET(p,i) (p[INDEX(i)]|=MASK(i))
#define RESET(p,i) (p[INDEX(i)]&=~MASK(i))
#define p2i(p) ((p)>>1) // (((p-2)>>1))
#define i2p(i) (((i)<<1)+1) // ((i)*2+3)
unsigned long find_next_zero(unsigned long from,
unsigned long *v,
size_t N){
size_t i;
for (i = from+1; i < N; i++) {
if(GET(v,i)==0) return i;
}
return -1;
}
int main(int argc, char *argv[])
{
size_t N = atoi(argv[1]);
N = 1lu<<N;
// printf("%u\n",N);
unsigned long *v = malloc(N/8);
for(size_t i = 0; i < N/64; i++) v[i]=0;
unsigned long p = 3;
unsigned long pp = p2i(p * p);
while( pp <= N){
for(unsigned long q = pp; q < N; q += p ){
SET(v,q);
}
p = p2i(p);
p = find_next_zero(p,v,N);
p = i2p(p);
pp = p2i(p * p);
}
unsigned long sum = 0;
for(unsigned long i = 0; i+2 < N; i++)
if(GET(v,i)==0 && GET(v,i+1)==0) {
unsigned long p = i2p(i);
// cout << p << ", " << p+2 << endl;
sum++;
}
printf("aa %lu\n",sum);
// free(v);
return 0;
}
sievestl.cpp is as follows:
#include <iostream>
#include <vector>
#include <sstream>
using namespace std;
inline unsigned long i2p(unsigned long i){return (i<<1)+1; }
inline unsigned long p2i(unsigned long p){return (p>>1); }
inline unsigned long find_next_zero(unsigned long from, vector<bool> v){
size_t N = v.size();
for (size_t i = from+1; i < N; i++) {
if(v[i]==0) return i;
}
return -1;
}
int main(int argc, char *argv[])
{
stringstream ss;
ss << argv[1];
size_t N;
ss >> N;
N = 1lu<<N;
// cout << N << endl;
vector<bool> v(N);
unsigned long p = 3;
unsigned long pp = p2i(p * p);
while( pp <= N){
for(unsigned long q = pp; q < N; q += p ){
v[q] = 1;
}
p = p2i(p);
p = find_next_zero(p,v);
p = i2p(p);
pp = p2i(p * p);
}
unsigned sum = 0;
for(unsigned long i = 0; i+2 < N; i++)
if(v[i]==0 and v[i+1]==0) {
unsigned long p = i2p(i);
// cout << p << ", " << p+2 << endl;
sum++;
}
cout << sum;
return 0;
}

Comprehensive vector vs linked list benchmark for randomized insertions/deletions

So I am aware of this question, and others on SO that deal with issue, but most of those deal with the complexities of the data structures (just to copy here, linked this theoretically has O(
I understand the complexities would seem to indicate that a list would be better, but I am more concerned with the real world performance.
Note: This question was inspired by slides 45 and 46 of Bjarne Stroustrup's presentation at Going Native 2012 where he talks about how processor caching and locality of reference really help with vectors, but not at all (or enough) with lists.
Question: Is there a good way to test this using CPU time as opposed to wall time, and getting a decent way of "randomly" inserting and deleting elements that can be done beforehand so it does not influence the timings?
As a bonus, it would be nice to be able to apply this to two arbitrary data structures (say vector and hash maps or something like that) to find the "real world performance" on some hardware.

I guess if I were going to test something like this, I'd probably start with code something on this order:
#include <list>
#include <vector>
#include <algorithm>
#include <deque>
#include <time.h>
#include <iostream>
#include <iterator>
static const int size = 30000;
template <class T>
double insert(T &container) {
srand(1234);
clock_t start = clock();
for (int i=0; i<size; ++i) {
int value = rand();
T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
container.insert(pos, value);
}
// uncomment the following to verify correct insertion (in a small container).
// std::copy(container.begin(), container.end(), std::ostream_iterator<int>(std::cout, "\t"));
return double(clock()-start)/CLOCKS_PER_SEC;
}
template <class T>
double del(T &container) {
srand(1234);
clock_t start = clock();
for (int i=0; i<size/2; ++i) {
int value = rand();
T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
container.erase(pos);
}
return double(clock()-start)/CLOCKS_PER_SEC;
}
int main() {
std::list<int> l;
std::vector<int> v;
std::deque<int> d;
std::cout << "Insertion time for list: " << insert(l) << "\n";
std::cout << "Insertion time for vector: " << insert(v) << "\n";
std::cout << "Insertion time for deque: " << insert(d) << "\n\n";
std::cout << "Deletion time for list: " << del(l) << '\n';
std::cout << "Deletion time for vector: " << del(v) << '\n';
std::cout << "Deletion time for deque: " << del(d) << '\n';
return 0;
}
Since it uses clock, this should give processor time not wall time (though some compilers such as MS VC++ get that wrong). It doesn't try to measure the time for insertion exclusive of time to find the insertion point, since 1) that would take a bit more work and 2) I still can't figure out what it would accomplish. It's certainly not 100% rigorous, but given the disparity I see from it, I'd be a bit surprised to see a significant difference from more careful testing. For example, with MS VC++, I get:
Insertion time for list: 6.598
Insertion time for vector: 1.377
Insertion time for deque: 1.484
Deletion time for list: 6.348
Deletion time for vector: 0.114
Deletion time for deque: 0.82
With gcc I get:
Insertion time for list: 5.272
Insertion time for vector: 0.125
Insertion time for deque: 0.125
Deletion time for list: 4.259
Deletion time for vector: 0.109
Deletion time for deque: 0.109
Factoring out the search time would be somewhat non-trivial because you'd have to time each iteration separately. You'd need something more precise than clock (usually is) to produce meaningful results from that (more on the order or reading a clock cycle register). Feel free to modify for that if you see fit -- as I mentioned above, I lack motivation because I can't see how it's a sensible thing to do.

This is the program I wrote after watching that talk. I tried running each timing test in a separate process to make sure the allocators weren't doing anything sneaky to alter performance. I have amended the test allow timing of the random number generation. If you are concerned it is affecting the results significantly, you can time it and subtract out the time spent there from the rest of the timings. But I get zero time spent there for anything but very large N. I used getrusage() which I am pretty sure isn't portable to Windows but it would be easy to substitute in something using clock() or whatever you like.
#include <assert.h>
#include <algorithm>
#include <iostream>
#include <list>
#include <vector>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/resource.h>
void f(size_t const N)
{
std::vector<int> c;
//c.reserve(N);
for (size_t i = 0; i < N; ++i) {
int r = rand();
auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
c.insert(p, r);
}
}
void g(size_t const N)
{
std::list<int> c;
for (size_t i = 0; i < N; ++i) {
int r = rand();
auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
c.insert(p, r);
}
}
int h(size_t const N)
{
int r;
for (size_t i = 0; i < N; ++i) {
r = rand();
}
return r;
}
double usage()
{
struct rusage u;
if (getrusage(RUSAGE_SELF, &u) == -1) std::abort();
return
double(u.ru_utime.tv_sec) + (u.ru_utime.tv_usec / 1e6) +
double(u.ru_stime.tv_sec) + (u.ru_stime.tv_usec / 1e6);
}
int
main(int argc, char* argv[])
{
assert(argc >= 3);
std::string const sel = argv[1];
size_t const N = atoi(argv[2]);
double t0, t1;
srand(127);
if (sel == "vector") {
t0 = usage();
f(N);
t1 = usage();
} else if (sel == "list") {
t0 = usage();
g(N);
t1 = usage();
} else if (sel == "rand") {
t0 = usage();
h(N);
t1 = usage();
} else {
std::abort();
}
std::cout
<< (t1 - t0)
<< std::endl;
return 0;
}
To get a set of results I used the following shell script.
seq=`perl -e 'for ($i = 10; $i < 100000; $i *= 1.1) { print int($i), " "; }'`
for i in $seq; do
vt=`./a.out vector $i`
lt=`./a.out list $i`
echo $i $vt $lt
done

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

[ performance]--- string::operator+= vs. vector<char> push_back - c++

Related

C++ pointers not accessing correct memory address, instead accessing random memory locations

Is it possible to dynamically allocate a vector?

Performance issue with vectors

Is using a vector of boolean values slower than a dynamic bitset?

Comprehensive vector vs linked list benchmark for randomized insertions/deletions

Categories

Resources