I am currently studying different search algorithms, and I have made a little program to see the difference in the efficiency. Binary search should be faster than linear search, but the time mesures show otherwise. Did I made some mistake in the code or is this some special case?
#include <chrono>
#include <unistd.h>
using namespace std;
const int n=1001;
int a[n];
void assign() {
for (int i=0; i<n; i++) {
a[i]=i;
}
}
void print() {
for (int i=0; i<n; i++) {
cout << a[i] << endl;
}
}
bool find1 (int x) {
for (int i=0; i<n; i++) {
if (x==a[i]){
return true;
}
} return false;
}
bool binsearch(int x) {
int l=0,m;
int r=n-1;
while (l<=r) {
m = ((l+r)/2);
if (a[m]==x) return true;
if (a[m]<x) l=m+1;
if (a[m]>x) r=m-1;
}
return false;
}
int main() {
assign();
//print();
auto start1 = chrono::steady_clock::now();
cout << binsearch(500) << endl;
auto end1 = chrono::steady_clock::now();
auto start2 = chrono::steady_clock::now();
cout << find1(500) << endl;
auto end2 = chrono::steady_clock::now();
cout << "binsearch: " << chrono::duration_cast<chrono::nanoseconds>(end1 - start1).count()
<< " ns " << endl;
cout << "linsearch: " << chrono::duration_cast<chrono::nanoseconds>(end2 - start2).count()
<< " ns " << endl;
return 0;
}
Your test dataset is too small (1001 integers). It will fit entirely in the fastest (L1) cache when you fill it; consequently, you're bound by branch complexity, not memory.
The binary search version exhibits more branch mispredictions, resulting in more pipeline stalls than a simple linear pass.
I increased n to 1000001 and also increased the number of test passes:
auto start1 = chrono::steady_clock::now();
for (int i = 0; i < n; i += n/13) {
if (!binsearch(i%n)) {
std::cerr << i << std::endl;
}
}
auto end1 = chrono::steady_clock::now();
auto start2 = chrono::steady_clock::now();
for (int i = 0; i < n; i += n / 13) {
if (!find1(i%n)) {
std::cerr << i << std::endl;
}
}
auto end2 = chrono::steady_clock::now();
and I'm getting different results:
binsearch: 10300 ns
linsearch: 3129600 ns
Note also that you should not call cout in a timed loop, but you do need to use the result of the find in order for it to not get optimized away.
To my mind N=1001 is enough to notice that binary search has a better performance. Specific realizations of linear search could be faster only for small N (approximately < 100). However, in your case the reason of such strange results is incorrect profiling measurements. All your data has been successfully cached during calculations of the first algorithm (binary search), which dramatically improved performance of the second (linear search).
If you just swap their calls, you will get an opposite result:
binsearch: 6019 ns
linsearch: 77587 ns
For precise measurements you should use special frameworks (google benchmark, for example), which ensures the 'fair conditions' for both algorithms.
Other online benchmarking tool (it runs the testing code on a pool of many AWS machines whose load is unknown and returns average result) gives these charts for your code without changes (with the same n=1001 as well):
Get the best of both!
Do a binary search down to some level, then switch to linear. Think of it this way, a binary search has a bunch of bookkeeping; a linear search is faster because it is 'simpler'.
When I first experimented with this (back in the 1970's) in assembly language, I deduced that doing binary searches down to about 4 items, then doing linear, was about optimal. However YMMV; It depends on the hardware, the complexity of comparing two items (float / int / string / whatever), etc.
Tip: Count the number of operations in your code. I see about twice as many operations are needed for each step in your binsearch() routine versus the linear scan.
Related
I've been using std::vector mostly and was wondering if I should use std::map for a key lookup to improve performance.
And here's my full test code.
#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <ctime>
#include <chrono>
using namespace std;
vector<string> myStrings = {"aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr", "sss", "ttt", "uuu", "vvv", "www", "xxx", "yyy", "zzz"};
struct MyData {
string key;
int value;
};
int findStringPosFromVec(const vector<MyData> &myVec, const string &str) {
auto it = std::find_if(begin(myVec), end(myVec),
[&str](const MyData& data){return data.key == str;});
if (it == end(myVec))
return -1;
return static_cast<int>(it - begin(myVec));
}
int main(int argc, const char * argv[]) {
const int testInstance = 10000; //HOW MANY TIMES TO PERFORM THE TEST
//----------------------------std::map-------------------------------
clock_t map_cputime = std::clock(); //START MEASURING THE CPU TIME
for (int i=0; i<testInstance; ++i) {
map<string, int> myMap;
//insert unique keys
for (int i=0; i<myStrings.size(); ++i) {
myMap[myStrings[i]] = i;
}
//iterate again, if key exists, replace value;
for (int i=0; i<myStrings.size(); ++i) {
if (myMap.find(myStrings[i]) != myMap.end())
myMap[myStrings[i]] = i * 100;
}
}
//FINISH MEASURING THE CPU TIME
double map_cpu = (std::clock() - map_cputime) / (double)CLOCKS_PER_SEC;
cout << "Map Finished in " << map_cpu << " seconds [CPU Clock] " << endl;
//----------------------------std::vector-------------------------------
clock_t vec_cputime = std::clock(); //START MEASURING THE CPU TIME
for (int i=0; i<testInstance; ++i) {
vector<MyData> myVec;
//insert unique keys
for (int i=0; i<myStrings.size(); ++i) {
const int pos = findStringPosFromVec(myVec, myStrings[i]);
if (pos == -1)
myVec.push_back({myStrings[i], i});
}
//iterate again, if key exists, replace value;
for (int i=0; i<myStrings.size(); ++i) {
const int pos = findStringPosFromVec(myVec, myStrings[i]);
if (pos != -1)
myVec[pos].value = i * 100;
}
}
//FINISH MEASURING THE CPU TIME
double vec_cpu = (std::clock() - vec_cputime) / (double)CLOCKS_PER_SEC;
cout << "Vector Finished in " << vec_cpu << " seconds [CPU Clock] " << endl;
return 0;
}
And this is the result I got.
Map Finished in 0.38121 seconds [CPU Clock]
Vector Finished in 0.346863 seconds [CPU Clock]
Program ended with exit code: 0
I mostly store less than 30 elements in a container.
Does this mean it is better to use std::vector instead of std::map in my case?
EDIT: when I move map<string, int> myMap; before the loop, std::map was faster than std::vector.
Map Finished in 0.278136 seconds [CPU Clock]
Vector Finished in 0.328548 seconds [CPU Clock]
Program ended with exit code: 0
So If this is the proper test, I guess std::map is faster.
But, If I reduce the amount of elements to 10, std::vector was faster so I guess it really depends on the number of elements.
I would say that in general, it's possible that a vector performs better than a map for lookups, but for a tiny amount of data only, e.g. you've mentioned less than 30 elements.
The reason is that linear search through continuous memory chunk is the cheapest way to access memory. A map keeps data at random memory locations, so it's a little bit more expensive to access them. In case of a tiny number of elements, this might play a role. In real life with hundreds and thousands of elements, algorithmic complexity of a lookup operation will dominate this performance gain.
BUT! You are benchmarking completely different things:
You are populating a map. In case of a vector, you don't do this
Your code could perform TWO map lookups: first, find to check existence, second [] operator to find an element to modify. These are relatively heavy operations. You can modify an element just with single find (figure this out yourself, check references!)
Within each test iteration, you are performing additional heavy operations, like memory allocation for each map/vector. It means that your tests are measuring not only lookup performance but something else.
Benchmarking is a difficult problem, don't do this yourself. For example, there are side effects like cache heating and you have to deal with them. Use something like Celero, hayai or google benchmark
Your vector has constant content, so the compiler optimizes most of your code away anyway.
There is little use in measuring for such small counts, and no use measuring for hard coded values.
I am making a test program to measure time for storage of each container. The following is my code for the test.
#include <list>
#include <vector>
#include <iostream>
#include <iomanip>
#include <string>
#include <ctime>
#include <cstdlib>
using namespace std;
void insert(list<short>& l, const short& value);
void insert(vector<short>& v, const short& value);
void insert(short arr[], int& logicalSize, const int& physicalSize, const short& value);
int main() {
clock_t start, end;
srand(time(nullptr));
const int SIZE = 50000;
const short RANGE = 10000;
list<short> l;
vector<short> v;
short* arr = new short[SIZE];
int logicalSize = 0;
// array
start = clock();
cout << "Array storage time test...";
for (int i = 0; i < SIZE; i++) {
try {
insert(arr, logicalSize, SIZE, (short)(rand() % (2 * RANGE + 1) - RANGE));
} catch (string s) {
cout << s << endl;
system("pause");
exit(-1);
}
}
end = clock();
cout << "Time: " << difftime(end, start) << endl << endl;
// list
cout << "List storage time test...";
start = clock();
for (int i = 0; i < SIZE; i++) {
insert(l, (short)(rand() % (2 * RANGE + 1) - RANGE));
}
end = clock();
cout << "Time: " << difftime(end, start) << endl << endl;
// vector
cout << "Vector storage time test...";
start = clock();
for (int i = 0; i < SIZE; i++) {
insert(v, (short)(rand() % (2 * RANGE + 1) - RANGE));
}
end = clock();
cout << "Time: " << difftime(end, start) << endl << endl;
delete[] arr;
system("pause");
return 0;
}
void insert(list<short>& l, const short& value) {
for (auto it = l.begin(); it != l.end(); it++) {
if (value < *it) {
l.insert(it, value);
return;
}
}
l.push_back(value);
}
void insert(vector<short>& v, const short& value) {
for (auto it = v.begin(); it != v.end(); it++) {
if (value < *it) {
v.insert(it, value);
return;
}
}
v.push_back(value);
}
void insert(short arr[], int& logicalSize, const int& physicalSize, const short& value) {
if (logicalSize == physicalSize) throw string("No spaces in array.");
for (int i = 0; i < logicalSize; i++) {
if (value < arr[i]) {
for (int j = logicalSize - 1; j >= i; j--) {
arr[j + 1] = arr[j];
}
arr[i] = value;
logicalSize++;
return;
}
}
arr[logicalSize] = value;
logicalSize++;
}
However, when I execute the code, the result seems a little different from the theory. The list should be fastest, but the result said that insertion in the list is slowest. Can you tell me why?
Inserting into a vector or array requires moving everything after it; so if at a random spot, requires an average of 1.5 accesses to each element. 0.5 to find the spot, and 0.5*2 (read and write) to do the insert.
Inserting into a list requires 0.5 accesses per element (to find the spot).
This means the vector is only 3 times more element accesses.
Lists nodes are 5 to 9 times larger than vector "nodes" (which are just elements). Forward iteration requires reading 3 to 5 times as much memory (element 16 bits and pointer 32 to 64 bits).
So the list solution reads/writes more memory! Worse, it is sparser (with the back pointer), and it may not be arranged in a cache-friendly way in memory (vectors are contiguous; list nodes may be a mess in linear space) thus messing with cpu memory cache predictions and loads and etc.
List is very rarely faster than vector; you have to be inserting/deleting many times more often than you iterate over the list.
Finally vector uses exponential allocation with reserved unused space. List allocates each time. Calling new is slow, and often not much slower when you ask for bigger chunks than when you ask for smaller ones. Growing a vector by 1 at a time 1000 times results in about 15 allocations (give or take); for list, 1000 allocations.
Insertion in a list is blisteringly fast, but first you have to find there you want to insert. This is where list comes out a loser.
It might be helpful to stop and read Why is it faster to process a sorted array than an unsorted array? sometime around now because it covers similar material and covers it really well.
With a vector or array each element comes one after the next. Prediction is dead easy, so the CPU can be loading the cache with values you won't need for a while at the same time as it is processing the current value.
With a list predictability is shot, you have to get the next node before you can load the node after that, and that pretty much nullifies the cache. Without the cache you can see an order of magnitude degradation in performance as the CPU sits around waiting for data to be retrieved from RAM.
Bjarne Stroustrup has a number of longer pieces on this topic. The keynote video is definitely worth watching.
One important take-away is take Big-O notation with a grain of salt because it is measuring a the efficiency of the algorithm, not how well the algorithm takes advantage of the hardware.
I wanted to learn to use C++ 11 std::threads with VS2012 and I wrote a very simple C++ console program with two threads which just increment a counter. I also want to test the performance difference when two threads are used. Test program is given below:
#include <iostream>
#include <thread>
#include <conio.h>
#include <atomic>
std::atomic<long long> sum(0);
//long long sum;
using namespace std;
const int RANGE = 100000000;
void test_without_threds()
{
sum = 0;
for(unsigned int j = 0; j < 2; j++)
for(unsigned int k = 0; k < RANGE; k++)
sum ++ ;
}
void call_from_thread(int tid)
{
for(unsigned int k = 0; k < RANGE; k++)
sum ++ ;
}
void test_with_2_threds()
{
std::thread t[2];
sum = 0;
//Launch a group of threads
for (int i = 0; i < 2; ++i) {
t[i] = std::thread(call_from_thread, i);
}
//Join the threads with the main thread
for (int i = 0; i < 2; ++i) {
t[i].join();
}
}
int _tmain(int argc, _TCHAR* argv[])
{
chrono::time_point<chrono::system_clock> start, end;
cout << "-----------------------------------------\n";
cout << "test without threds()\n";
start = chrono::system_clock::now();
test_without_threds();
end = chrono::system_clock::now();
chrono::duration<double> elapsed_seconds = end-start;
cout << "finished calculation for "
<< chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< "ms.\n";
cout << "sum:\t" << sum << "\n";\
cout << "-----------------------------------------\n";
cout << "test with 2_threds\n";
start = chrono::system_clock::now();
test_with_2_threds();
end = chrono::system_clock::now();
cout << "finished calculation for "
<< chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< "ms.\n";
cout << "sum:\t" << sum << "\n";\
_getch();
return 0;
}
Now, when I use for the counter just the long long variable (which is commented) I get value which is different from the correct - 100000000 instead of 200000000. I am not sure why is that and I suppose that the two threads are changing the counter at the same time, but I am not sure how it happens really because ++ is just a very simple instruction. It seems that the threads are caching the sum variable at beginning. Performance is 110 ms with two threads vs 200 ms for one thread.
So the correct way according to documentation is to use std::atomic. However now the performance is much worse for both cases as about 3300 ms without threads and 15820 ms with threads. What is the correct way to use std::atomic in this case?
I am not sure why is that and I suppose that the two threads are changing the counter at the same time, but I am not sure how it happens really because ++ is just a very simple instruction.
Each thread is pulling the value of sum into a register, incrementing the register, and finally writing it back to memory at the end of the loop.
So the correct way according to documentation is to use std::atomic. However now the performance is much worse for both cases as about 3300 ms without threads and 15820 ms with threads. What is the correct way to use std::atomic in this case?
You're paying for the synchronization std::atomic provides. It won't be nearly as fast as using an un-synchronized integer, though you can get a small improvement to performance by refining the memory order of the add:
sum.fetch_add(1, std::memory_order_relaxed);
In this particular case, you're compiling for x86 and operating on a 64-bit integer. This means that the compiler has to generate code to update the value in two 32-bit operations; if you change the target platform to x64, the compiler will generate code to do the increment in a single 64-bit operation.
As a general rule, the solution to problems like this is to reduce the number of writes to shared data.
Your code has a couple of problems. First of all, all the "inputs" involved are compile-time constants, so a good compiler can pre-compute the value for the single-threaded code, so (regardless of the value you give for range) it shows as running in 0 ms.
Second, you're sharing a single variable (sum) between all the threads, forcing all of their accesses to be synchronized at that point. Without synchronization, that gives undefined behavior. As you've already found, synchronizing the access to that variable is quite expensive, so you usually want to avoid it if at all reasonable.
One way to do that is to use a separate subtotal for each thread, so they can all do their additions in parallel, without synchronizing, the adding together the individual results at the end.
Another point is to ensure against false sharing. False sharing arises when two (or more) threads are writing to data that really is separate, but has been allocated in the same cache line. In this case, access to the memory can be serialized even though (as already noted) you don't have any data actually shared between the threads.
Based on those factors, I've rewritten your code slightly to create a separate sum variable for each thread. Those variables are of a class type that gives (fairly) direct access to the data, but does stop the optimizer from seeing that it can do the whole computation at compile-time, so we end up comparing one thread to 4 (which reminds me: I did increase the number of threads from 2 to 4, since I'm using a quad-core machine). I moved that number into a const variable though, so it should be easy to test with different numbers of threads.
#include <iostream>
#include <thread>
#include <conio.h>
#include <atomic>
#include <numeric>
const int num_threads = 4;
struct val {
long long sum;
int pad[2];
val &operator=(long long i) { sum = i; return *this; }
operator long long &() { return sum; }
operator long long() const { return sum; }
};
val sum[num_threads];
using namespace std;
const int RANGE = 100000000;
void test_without_threds()
{
sum[0] = 0LL;
for(unsigned int j = 0; j < num_threads; j++)
for(unsigned int k = 0; k < RANGE; k++)
sum[0] ++ ;
}
void call_from_thread(int tid)
{
for(unsigned int k = 0; k < RANGE; k++)
sum[tid] ++ ;
}
void test_with_threads()
{
std::thread t[num_threads];
std::fill_n(sum, num_threads, 0);
//Launch a group of threads
for (int i = 0; i < num_threads; ++i) {
t[i] = std::thread(call_from_thread, i);
}
//Join the threads with the main thread
for (int i = 0; i < num_threads; ++i) {
t[i].join();
}
long long total = std::accumulate(std::begin(sum), std::end(sum), 0LL);
}
int main()
{
chrono::time_point<chrono::system_clock> start, end;
cout << "-----------------------------------------\n";
cout << "test without threds()\n";
start = chrono::system_clock::now();
test_without_threds();
end = chrono::system_clock::now();
chrono::duration<double> elapsed_seconds = end-start;
cout << "finished calculation for "
<< chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< "ms.\n";
cout << "sum:\t" << sum << "\n";\
cout << "-----------------------------------------\n";
cout << "test with threads\n";
start = chrono::system_clock::now();
test_with_threads();
end = chrono::system_clock::now();
cout << "finished calculation for "
<< chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< "ms.\n";
cout << "sum:\t" << sum << "\n";\
_getch();
return 0;
}
When I run this, my results are closer to what I'd guess you hoped for:
-----------------------------------------
test without threds()
finished calculation for 78ms.
sum: 000000013FCBC370
-----------------------------------------
test with threads
finished calculation for 15ms.
sum: 000000013FCBC370
... the sums are identical, but N threads increases speed by a factor of approximately N (up to the number of cores available).
Try to use prefix increment, which will give performance improvement.
Test on my machine, std::memory_order_relaxed does not give any advantage.
I`m new in C++ programming and try to write some sparse matrix and vector stuff I as a practice.
The sparse matrix is build of a vector of maps, where the vector accesses the rows and the map is used for the sparse entries in the columns.
What I was trying to do is to fill a diagonal dominant sparse matrix with an equation system for a Poisson equation.
Now when filling the matrix in test cases I was able to provoke the following very weird problem, which I broke down to the essential operations.
#include <vector>
#include <iterator>
#include <iostream>
#include <map>
#include <ctime>
int main()
{
unsigned int nDim = 100000;
double clock1;
// alternative std::map<unsigned int, std::map<unsigned int, double> > mat;
std::vector<std::map<unsigned int, double> > mat;
mat.resize(nDim);
// if clause and number set
clock1 = double(clock())/CLOCKS_PER_SEC;
for(unsigned int rowIter = 0; rowIter < nDim; rowIter++)
{
for(unsigned int colIter = 0; colIter < nDim; colIter++)
{
if(rowIter == colIter)
{
mat[rowIter][colIter] = 1.;
}
}
}
std::cout << "time for diagonal fill: " << 1e3 * (double(clock())/CLOCKS_PER_SEC - clock1) << " ms" << std::endl;
// if clause and number insert
clock1 = double(clock())/CLOCKS_PER_SEC;
for(unsigned int rowIter = 0; rowIter < nDim; rowIter++)
{
for(unsigned int colIter = 0; colIter < nDim; colIter++)
{
if(rowIter == colIter)
{
mat[rowIter].insert(std::pair<unsigned int, double>(colIter,1.));
}
}
}
std::cout << "time for insert diagonal fill: " << 1e3 * (double(clock())/CLOCKS_PER_SEC - clock1) << " ms" << std::endl;
// only number set
clock1 = double(clock())/CLOCKS_PER_SEC;
for(unsigned int rowIter = 0; rowIter < nDim; rowIter++)
{
mat[rowIter][rowIter] += 1.;
}
std::cout << "time for easy diagonal fill: " << 1e3 * (double(clock())/CLOCKS_PER_SEC - clock1) << " ms" << std::endl;
// only if clause
clock1 = double(clock())/CLOCKS_PER_SEC;
for(unsigned int rowIter = 0; rowIter < nDim; rowIter++)
{
for(unsigned int colIter = 0; colIter < nDim; colIter++)
{
if(rowIter == colIter)
{
}
}
}
std::cout << "time for if clause: " << 1e3 * (double(clock())/CLOCKS_PER_SEC - clock1) << " ms" << std::endl;
return 0;
}
Running this in gcc (newest version, 4.8.1 I think) the following times appear:
time for diagonal fill: 26317ms
time for insert diagonal: 8783ms
time for easy diagonal fill: 10ms !!!!!!!
time for if clause: 0ms
I only used the loop for the if clause to be sure the it is not responsible for the speed lack.
Optimization level is O3, but the problem also appears on other levels.
So I thought let's try the Visual Studio (2012 Express).
It is a little bit faster, but still as slow as ketchup:
time for diagonal fill: 9408ms
time for insert diagonal: 8860ms
time for easy diagonal fill: 11ms !!!!!!!
time for if clause: 0ms
So MSVSC++ fails, too.
It will probably not even be necessary to used this combination of if-clause and matrix fill, but if... I'm screwed.
Does anybody know where this huge performance gap is coming from and how I could deal with it?
Is it some optimization problem caused by the fact, that the if-clause is inside the loop? Do I maybe just need another compiler flag?
I would also be interested, if it occurs with other systems/compilers, too. I might run it on the Xeon E5 machine at work and see what this baby makes with this devil piece of code :).
EDIT:
I ran it on the Xeon machine: Much faster, still slow.
Times with gcc:
2778ms
2684ms
1ms
0ms
The most obvious performance issue would be allocations within your map. Each time you assign/insert a new item in a map, it's got to allocate space for it and sort the tree appropriately. Doing that thousands of times is bound to be slow.
It's also very significant that you're not clearing the maps after your first loop. That means your subsequent loops don't have to do as much work, so your performance comparisons are not equivalent.
Finally, the nested loops are obviously going to be doing an order of magnitude more iterations than your single loop. From a strict algorithm analysis standpoint, it may be doing the same amount of actual work on the data. However, the program still has to run through all those extra iterations because that's what you've told it to do. The compiler can only optimise it out if there is literally nothing being processed/modified in the loop body.
In the first loop, the runtime system is doing loads of memory allocation, so it takes a lot of time on memory management.
The other loops don't have that overhead; you didn't release the allocation done by the first loop, so they don't have to repeat the memory allocation and it doesn't take anywhere near as long.
The last loop is optimized out by the compiler; it has no side effects, so it doesn't get included in the program.
Morals:
memory allocation has a cost.
benchmarking is hard.
EDIT:
I've fixed the insertion. As Blastfurnace kindly mentioned the insertion invalidated the iterators. The loop is needed I believe to compare performance (see my comment on Blastfurnance's answer). My code is updated. I have completely similar code for the list just with vector replaced by list. However, with the code I find that the list performs better than the vector both for small and large datatypes and even for linear search (if I remove the insertion). According to http://java.dzone.com/articles/c-benchmark-%E2%80%93-stdvector-vs and other sites that should not be the case. Any clues to how that can be?
I am taking a course on programming of mathematical software (exam on monday) and for that I would like to present a graph that compares performance between random insertion of elements into a vector and a list. However, when I'm testing the code I get random slowdowns. For instance I might have 2 iterations where inserting 10 elements at random into a vector of size 500 takes 0.01 seconds and then 3 similar iterations that each take roughly 12 seconds. This is my code:
void AddRandomPlaceVector(vector<FillSize> &myContainer, int place) {
int i = 0;
vector<FillSize>::iterator iter = myContainer.begin();
while (iter != myContainer.end())
{
if (i == place)
{
FillSize myFill;
iter = myContainer.insert(iter, myFill);
}
else
++iter;
++i;
}
//cout << i << endl;
}
double testVector(int containerSize, int iterRand)
{
cout << endl;
cout << "Size: " << containerSize << endl << "Random inserts: " << iterRand << endl;
vector<FillSize> myContainer(containerSize);
boost::timer::auto_cpu_timer tid;
for (int i = 0; i != iterRand; i++)
{
double randNumber = (int)(myContainer.size()*((double)rand()/RAND_MAX));
AddRandomPlaceVector(myContainer, randNumber);
}
double wallTime = tid.elapsed().wall/1e9;
cout << "New size: " << myContainer.size();
return wallTime;
}
int main()
{
int testSize = 500;
int measurementIters = 20;
int numRand = 1000;
int repetionIters = 100;
ofstream tidOutput1_sum("VectorTid_8bit_sum.txt");
ofstream tidOutput2_sum("ListTid_8bit_sum.txt");
for (int i = 0; i != measurementIters; i++)
{
double time = 0;
for (int j = 0; j != repetionIters; j++) {
time += testVector((i+1)*testSize, numRand);
}
std::ostringstream strs;
strs << double(time/repetionIters);
tidOutput1_sum << ((i+1)*testSize) << "," << strs.str() << endl;
}
for (int i = 0; i != measurementIters; i++)
{
double time = 0;
for (int j = 0; j != repetionIters; j++) {
time += testList((i+1)*testSize, numRand);
}
std::ostringstream strs;
strs << double(time/repetionIters);
tidOutput2_sum << ((i+1)*testSize) << "," << strs.str() << endl;
}
return 0;
}
struct FillSize
{
double fill1;
};
The struct is just for me to easily add more values so I can test for elements with different size. I know that this code is probably not perfect concerning performance-testing, but they would rather have me make a simple example than simply reference to something I found.
I've tested this code on two computers now, both having the same issues. How can that be? And can you help me with a fix so I can graph it and present it Monday? Perhaps adding some seconds of wait time between each iteration will help?
Kind regards,
Bjarke
Your AddRandomPlaceVector function has a serious flaw. Using insert() will invalidate iterators so the for loop is invalid code.
If you know the desired insertion point there's no reason to iterate over the vector at all.
void AddRandomPlaceVector(vector<FillSize> &myContainer, int place)
{
FillSize myFill;
myContainer.insert(myContainer.begin() + place, myFill);
}