I'm trying to make run time measurements of simple algorithms like linear sort. The problem is that no matter what I do, the time measurement won't work as intended. I get the same search time no matter what problem size I use. Both me and other people who've tried to help me are equally confused.
I have a linear sort function that looks like this:
// Search the N first elements of 'data'.
int linearSearch(vector<int> &data, int number, const int N) {
if (N < 1 || N > data.size()) return 0;
for (int i=0;i<N;i++) {
if (data[i] == number) return 1;
}
return 0;
}
I've tried to take time measurement with both time_t and chrono from C++11 without any luck, except more decimals. This is how it looks like right now when i'm searching.
vector<int> listOfNumbers = large list of numbers;
for (int i = 15000; i <= 5000000; i += 50000) {
const clock_t start = clock();
for (int a=0; a<NUMBERS_TO_SEARCH; a++) {
int randNum = rand() % INT_MAX;
linearSearch(listOfNumbers, randNum, i);
}
cout << float(clock() - start) / CLOCKS_PER_SEC << endl;
}
The result?
0.126, 0.125, 0.125, 0.124, 0.124, ... (same values?)
I have tried the code with both VC++, g++ and on different computers.
First I thought it was my implementation of the search algorithms that was at fault. But a linear sort like the one above can't become any simpler, it's clearly O(N). How can the time be the same even when the problem size is increased by so much? I'm at loss what to do.
Edit 1:
Someone else might have an explanation why this is the case. But it actually worked in release mode after changing:
if (data[i] == number)
To:
if (data.at(i) == number)
I have no idea why this is the case, but linear search could be time measured correctly after that change.
The reason for the about-constant execution times is that the compiler is able to optimize away parts of the code.
Specifically looking at this part of the code:
for (int a=0; a<NUMBERS_TO_SEARCH; a++) {
int randNum = rand() % INT_MAX;
linearSearch(listOfNumbers, randNum, i);
}
When compiling with g++5.2 and optimization level -O3, the compiler can optimize away the call to linearSearch() completely. This is because the result of the code is the same with or without that function being called.
The return value of linearSearch is not used anywhere, and the function does not seem to have side-effects. So the compiler can remove it.
You can cross-check and modify the inner loop as follows. The execution times shouldn't change:
for (int a=0; a<NUMBERS_TO_SEARCH; a++) {
int randNum = rand() % INT_MAX;
// linearSearch(listOfNumbers, randNum, i);
}
What remains in the loop is the call to rand(), and this is what you seem to be measuring. When changing the data[i] == number to data.at(i) == number, the call to linearSearch is not side-effects-free as at(i) may throw an out-of-range exception. So the compiler does not completely optimize the linearSearch code away. However, with g++5.2, it will still inline it and not make a function call.
clock() is measuring CPU time, maybe you want time(NULL)? check this issue
The start should be before the for loop. In your case the start is different for each iteration, it is constant between the { ... }.
const clock_t start = clock();
for (int i = 15000; i <= 5000000; i += 50000){
...
}
Related
I am trying to create something that generates a random array with no duplicate values. I've already looked at other answers but none seem to help me understand. I cannot think of a way to actually generate random numbers that contain no duplicates. Here is what I have tried so far:
srand(time(NULL));
int numbers [4];
for (int x=0; x!=4;x++)
{
numbers[x] = 1 + (rand() % 4) ;
printf("%d ", numbers[x]);
}
You start off filling a container with consecutive elements beginning at 0
std::iota(begin(vec), end(vec), 0);
then you get yourself a decent random number generator and seed it properly
std::mt19937 rng(std::random_device{}());
finally you shuffle the elements using the rng
std::shuffle(begin(vec), end(vec), rng);
live on coliru
On some implementations random_device doesn’t work properly (most notably gcc on windows) and you have to use an alternative seed, i.e. the current time → chrono.
First of all rand() is generatig random numbers but not wihout duplicates.
If you want to generate a random array without duplicates the rand() method is not working at all.
Let say you want to generate an array of 1000 numbers. In the best case let say you generated the first 999 numbers without duplicates and last think to do is generating the last number. The probability of getting that number is 1/1000 so this is almost going to take forever to get generated. In practice only 10 numbers makes a big trouble.
The best method is to generate all your numbers by incrementation (or strictly monotonic sequence) is shuffle them. In this case there will be no duplicates
Here is an exemple on how to do it with 10 numbers. Even with 1000 numbers it's working.
Note: Suffle function from Jhon Leehey's answer.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void shuffle(int *arr, size_t n)
{
if (n > 1)
{
size_t i;
srand(time(NULL));
for (i = 0; i < n - 1; i++)
{
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = arr[j];
arr[j] = arr[i];
arr[i] = t;
}
}
}
int main()
{
int i;
int arr[10];
for (i=0; i<10; i++){
arr[i] = i;
}
shuffle(arr, 10);
for (i=0; i<10; i++){
printf("%d ", arr[i]);
}
}
There are 2 solutions to choose from:
Generate random numbers using something like rand() and check for duplicates.
Find a mathematical sequence that is strictly monotonic (preferably strictly increasing) and get its terms as members of your array. Then, you can shuffle your array. The result will not be truly random, but neither using rand() won't. rand() uses a simillar tehnique, and that is why we need to set the seed with something changeing, like time. You can use time for example to generate the first element of the sequence, and with a good sequence your results will be at least decent. Note that the sequence MUST be strictly monotonic, to avoid generation of duplicates. The sequence need not be too complex. For example, if you get unix time modulo 10000 as the first term and then you generate other terms using a reccurence like x[i] = x[i-1] + 3*x[i-2] should be fine. Of course, you may use more sophisticated sequences too, but be careful at overflow (as you can't apply modulo operator to the result, because it would not be increasing anymore) and the number of digits you would like to have.
srand(time(NULL));
const int N = 4;
int numbers [N];
bool isAlreadyAdded(int value, int index)
{
for( int i = 0; i < index; i ++)
if( numbers[i] == value)
return true;
return false;
}
for (int x=0; x!=N;x++)
{
int tmp = 1 + (rand() % N) ;
while( x !=0 && isAlreadyAdded(tmp, x))
tmp = 1 + (rand() % N) ;
numbers[x] = tmp;
printf("%d ", numbers[x]);
}
It's just a way. it should work, of course there are better ways
How about this:
#define NUMS (10)
int randomSequence[NUMS] = {0}, i = 0, randomNum;
bool numExists[NUMS] = {false};
while(i != NUMS)
{
randomNum = rand() % NUMS;
if(numExists[randomNum] == false)
{
randomSequence[i++] = randomNum;
numExists[randomNum] = true;
}
}
Of course, the bigger NUMS is, the longer it will take to execute the while loop.
In c++, all you need is:
std::random_shuffle()
http://www.cplusplus.com/reference/algorithm/random_shuffle/
int numbers [4];
for (int x=0; x!=4;x++)
{
numbers[x] = x;
}
std::random_shuffle(numbers, numbers +4);
Update: OK, I had been thinking that a suitable map function could go from each index to a random number, but thinking again I realize that may be hard. The following should work:
int size = 10;
int range = 100;
std::set<int> sample;
while(sample.size() != size)
sample.insert(rand() % range); // Or whatever random source.
std::vector<int> result(sample.begin(), sample.end());
std::random_shuffle ( result.begin(), result.end() );
You can use your own random number generator which has the sequence greater or equal to length of the array. Refer to http://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length for instructions.
So you need LCG with expression Xn+1 = (aXn + c) mod m. Value m must be at least as large as length of the array. Check "if and only if" conditions for maximum sequence length and make sure that your numbers satisfy them.
As a result, you will be able to generate random numbers with satisfactory randomness for most uses, which is guaranteed to not repeat any number in the first m calls.
After you generate each random number, loop through the previous values and compare. If there's a match, re-generate a new value and try again.
If you want to pseudo-randomly traverse a large space without maintaining visited indices, you should look at this project I contributed to years ago for the basic technique. http://packetfactory.openwall.net/projects/ipspace/index.html
You should be able to adapt it to your purposes, source is at the bottom of the page.
I am trying to create something that generates a random array with no duplicate values. I've already looked at other answers but none seem to help me understand. I cannot think of a way to actually generate random numbers that contain no duplicates. Here is what I have tried so far:
srand(time(NULL));
int numbers [4];
for (int x=0; x!=4;x++)
{
numbers[x] = 1 + (rand() % 4) ;
printf("%d ", numbers[x]);
}
You start off filling a container with consecutive elements beginning at 0
std::iota(begin(vec), end(vec), 0);
then you get yourself a decent random number generator and seed it properly
std::mt19937 rng(std::random_device{}());
finally you shuffle the elements using the rng
std::shuffle(begin(vec), end(vec), rng);
live on coliru
On some implementations random_device doesn’t work properly (most notably gcc on windows) and you have to use an alternative seed, i.e. the current time → chrono.
First of all rand() is generatig random numbers but not wihout duplicates.
If you want to generate a random array without duplicates the rand() method is not working at all.
Let say you want to generate an array of 1000 numbers. In the best case let say you generated the first 999 numbers without duplicates and last think to do is generating the last number. The probability of getting that number is 1/1000 so this is almost going to take forever to get generated. In practice only 10 numbers makes a big trouble.
The best method is to generate all your numbers by incrementation (or strictly monotonic sequence) is shuffle them. In this case there will be no duplicates
Here is an exemple on how to do it with 10 numbers. Even with 1000 numbers it's working.
Note: Suffle function from Jhon Leehey's answer.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void shuffle(int *arr, size_t n)
{
if (n > 1)
{
size_t i;
srand(time(NULL));
for (i = 0; i < n - 1; i++)
{
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = arr[j];
arr[j] = arr[i];
arr[i] = t;
}
}
}
int main()
{
int i;
int arr[10];
for (i=0; i<10; i++){
arr[i] = i;
}
shuffle(arr, 10);
for (i=0; i<10; i++){
printf("%d ", arr[i]);
}
}
There are 2 solutions to choose from:
Generate random numbers using something like rand() and check for duplicates.
Find a mathematical sequence that is strictly monotonic (preferably strictly increasing) and get its terms as members of your array. Then, you can shuffle your array. The result will not be truly random, but neither using rand() won't. rand() uses a simillar tehnique, and that is why we need to set the seed with something changeing, like time. You can use time for example to generate the first element of the sequence, and with a good sequence your results will be at least decent. Note that the sequence MUST be strictly monotonic, to avoid generation of duplicates. The sequence need not be too complex. For example, if you get unix time modulo 10000 as the first term and then you generate other terms using a reccurence like x[i] = x[i-1] + 3*x[i-2] should be fine. Of course, you may use more sophisticated sequences too, but be careful at overflow (as you can't apply modulo operator to the result, because it would not be increasing anymore) and the number of digits you would like to have.
srand(time(NULL));
const int N = 4;
int numbers [N];
bool isAlreadyAdded(int value, int index)
{
for( int i = 0; i < index; i ++)
if( numbers[i] == value)
return true;
return false;
}
for (int x=0; x!=N;x++)
{
int tmp = 1 + (rand() % N) ;
while( x !=0 && isAlreadyAdded(tmp, x))
tmp = 1 + (rand() % N) ;
numbers[x] = tmp;
printf("%d ", numbers[x]);
}
It's just a way. it should work, of course there are better ways
How about this:
#define NUMS (10)
int randomSequence[NUMS] = {0}, i = 0, randomNum;
bool numExists[NUMS] = {false};
while(i != NUMS)
{
randomNum = rand() % NUMS;
if(numExists[randomNum] == false)
{
randomSequence[i++] = randomNum;
numExists[randomNum] = true;
}
}
Of course, the bigger NUMS is, the longer it will take to execute the while loop.
In c++, all you need is:
std::random_shuffle()
http://www.cplusplus.com/reference/algorithm/random_shuffle/
int numbers [4];
for (int x=0; x!=4;x++)
{
numbers[x] = x;
}
std::random_shuffle(numbers, numbers +4);
Update: OK, I had been thinking that a suitable map function could go from each index to a random number, but thinking again I realize that may be hard. The following should work:
int size = 10;
int range = 100;
std::set<int> sample;
while(sample.size() != size)
sample.insert(rand() % range); // Or whatever random source.
std::vector<int> result(sample.begin(), sample.end());
std::random_shuffle ( result.begin(), result.end() );
You can use your own random number generator which has the sequence greater or equal to length of the array. Refer to http://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length for instructions.
So you need LCG with expression Xn+1 = (aXn + c) mod m. Value m must be at least as large as length of the array. Check "if and only if" conditions for maximum sequence length and make sure that your numbers satisfy them.
As a result, you will be able to generate random numbers with satisfactory randomness for most uses, which is guaranteed to not repeat any number in the first m calls.
After you generate each random number, loop through the previous values and compare. If there's a match, re-generate a new value and try again.
If you want to pseudo-randomly traverse a large space without maintaining visited indices, you should look at this project I contributed to years ago for the basic technique. http://packetfactory.openwall.net/projects/ipspace/index.html
You should be able to adapt it to your purposes, source is at the bottom of the page.
I'm trying to get a good understanding of branch prediction by measuring the time to run loops with predictable branches vs. loops with random branches.
So I wrote a program that takes large arrays of 0's and 1's arranged in different orders (i.e. all 0's, repeating 0-1, all rand), and iterates through the array branching based on if the current index is 0 or 1, doing time-wasting work.
I expected that harder-to-guess arrays would take longer to run on, since the branch predictor would guess wrong more often, and that the time-delta between runs on two sets of arrays would remain the same regardless of the amount of time-wasting work.
However, as amount of time-wasting work increased, the difference in time-to-run between arrays increased, A LOT.
(X-axis is amount of time-wasting work, Y-axis is time-to-run)
Does anyone understand this behavior? You can see the code I'm running at the following code:
#include <stdlib.h>
#include <time.h>
#include <chrono>
#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;
static const int s_iArrayLen = 999999;
static const int s_iMaxPipelineLen = 60;
static const int s_iNumTrials = 10;
int doWorkAndReturnMicrosecondsElapsed(int* vals, int pipelineLen){
int* zeroNums = new int[pipelineLen];
int* oneNums = new int[pipelineLen];
for(int i = 0; i < pipelineLen; ++i)
zeroNums[i] = oneNums[i] = 0;
chrono::time_point<chrono::system_clock> start, end;
start = chrono::system_clock::now();
for(int i = 0; i < s_iArrayLen; ++i){
if(vals[i] == 0){
for(int i = 0; i < pipelineLen; ++i)
++zeroNums[i];
}
else{
for(int i = 0; i < pipelineLen; ++i)
++oneNums[i];
}
}
end = chrono::system_clock::now();
int elapsedMicroseconds = (int)chrono::duration_cast<chrono::microseconds>(end-start).count();
//This should never fire, it just exists to guarantee the compiler doesn't compile out our zeroNums/oneNums
for(int i = 0; i < pipelineLen - 1; ++i)
if(zeroNums[i] != zeroNums[i+1] || oneNums[i] != oneNums[i+1])
return -1;
delete[] zeroNums;
delete[] oneNums;
return elapsedMicroseconds;
}
struct TestMethod{
string name;
void (*func)(int, int&);
int* results;
TestMethod(string _name, void (*_func)(int, int&)) { name = _name; func = _func; results = new int[s_iMaxPipelineLen]; }
};
int main(){
srand( (unsigned int)time(nullptr) );
vector<TestMethod> testMethods;
testMethods.push_back(TestMethod("all-zero", [](int index, int& out) { out = 0; } ));
testMethods.push_back(TestMethod("repeat-0-1", [](int index, int& out) { out = index % 2; } ));
testMethods.push_back(TestMethod("repeat-0-0-0-1", [](int index, int& out) { out = (index % 4 == 0) ? 0 : 1; } ));
testMethods.push_back(TestMethod("rand", [](int index, int& out) { out = rand() % 2; } ));
int* vals = new int[s_iArrayLen];
for(int currentPipelineLen = 0; currentPipelineLen < s_iMaxPipelineLen; ++currentPipelineLen){
for(int currentMethod = 0; currentMethod < (int)testMethods.size(); ++currentMethod){
int resultsSum = 0;
for(int trialNum = 0; trialNum < s_iNumTrials; ++trialNum){
//Generate a new array...
for(int i = 0; i < s_iArrayLen; ++i)
testMethods[currentMethod].func(i, vals[i]);
//And record how long it takes
resultsSum += doWorkAndReturnMicrosecondsElapsed(vals, currentPipelineLen);
}
testMethods[currentMethod].results[currentPipelineLen] = (resultsSum / s_iNumTrials);
}
}
cout << "\t";
for(int i = 0; i < s_iMaxPipelineLen; ++i){
cout << i << "\t";
}
cout << "\n";
for (int i = 0; i < (int)testMethods.size(); ++i){
cout << testMethods[i].name.c_str() << "\t";
for(int j = 0; j < s_iMaxPipelineLen; ++j){
cout << testMethods[i].results[j] << "\t";
}
cout << "\n";
}
int end;
cin >> end;
delete[] vals;
}
Pastebin link: http://pastebin.com/F0JAu3uw
I think you may be measuring the cache/memory performance, more than the branch prediction. Your inner 'work' loop is accessing an ever increasing chunk of memory. Which may explain the linear growth, the periodic behaviour, etc.
I could be wrong, as I've not tried replicating your results, but if I were you I'd factor out memory accesses before timing other things. Perhaps sum one volatile variable into another, rather than working in an array.
Note also that, depending on the CPU, the branch prediction can be a lot smarter than just recording the last time a branch was taken - repeating patterns, for example, aren't as bad as random data.
Ok, a quick and dirty test I knocked up on my tea break which tried to mirror your own test method, but without thrashing the cache, looks like this:
Is that more what you expected?
If I can spare any time later there's something else I want to try, as I've not really looked at what the compiler is doing...
Edit:
And, here's my final test - I recoded it in assembler to remove the loop branching, ensure an exact number of instructions in each path, etc.
I also added an extra case, of a 5-bit repeating pattern. It seems pretty hard to upset the branch predictor on my ageing Xeon.
In addition to what JasonD pointed out, I would also like to note that there are conditions inside for loop, which may affect branch predictioning:
if(vals[i] == 0)
{
for(int i = 0; i < pipelineLen; ++i)
++zeroNums[i];
}
i < pipelineLen; is a condition like your ifs. Of course compiler may unroll this loop, however pipelineLen is argument passed to a function so probably it does not.
I'm not sure if this can explain wavy pattern of your results, but:
Since the BTB is only 16 entries long in the Pentium 4 processor, the prediction will eventually fail for loops that are longer than 16 iterations. This limitation can be avoided by unrolling a loop until it is only 16 iterations long. When this is done, a loop conditional will always fit into the BTB, and a branch misprediction will not occur on loop exit. The following is an exam ple of loop unrolling:
Read full article: http://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts
So your loops are not only measuring memory throughput but they are also affecting BTB.
If you have passed 0-1 pattern in your list but then executed a for loop with pipelineLen = 2 your BTB will be filled with something like 0-1-1-0 - 1-1-1-0 - 0-1-1-0 - 1-1-1-0 and then it will start to overlap, so this can indeed explain wavy pattern of your results (some overlaps will be more harmful than others).
Take this as an example of what may happen rather than literal explanation. Your CPU may have much more sophisticated branch prediction architecture.
This question already has answers here:
Why are elementwise additions much faster in separate loops than in a combined loop?
(10 answers)
Performance of breaking apart one loop into two loops
(6 answers)
Closed 9 years ago.
What is the overhead in splitting a for-loop like this,
int i;
for (i = 0; i < exchanges; i++)
{
// some code
// some more code
// even more code
}
into multiple for-loops like this?
int i;
for (i = 0; i < exchanges; i++)
{
// some code
}
for (i = 0; i < exchanges; i++)
{
// some more code
}
for (i = 0; i < exchanges; i++)
{
// even more code
}
The code is performance-sensitive, but doing the latter would improve readability significantly. (In case it matters, there are no other loops, variable declarations, or function calls, save for a few accessors, within each loop.)
I'm not exactly a low-level programming guru, so it'd be even better if someone could measure up the performance hit in comparison to basic operations, e.g. "Each additional for-loop would cost the equivalent of two int allocations." But, I understand (and wouldn't be surprised) if it's not that simple.
Many thanks, in advance.
There are often way too many factors at play... And it's easy to demonstrate both ways:
For example, splitting the following loop results in almost a 2x slow-down (full test code at the bottom):
for (int c = 0; c < size; c++){
data[c] *= 10;
data[c] += 7;
data[c] &= 15;
}
And this is almost stating the obvious since you need to loop through 3 times instead of once and you make 3 passes over the entire array instead of 1.
On the other hand, if you take a look at this question: Why are elementwise additions much faster in separate loops than in a combined loop?
for(int j=0;j<n;j++){
a1[j] += b1[j];
c1[j] += d1[j];
}
The opposite is sometimes true due to memory alignment.
What to take from this?
Pretty much anything can happen. Neither way is always faster and it depends heavily on what's inside the loops.
And as such, determining whether such an optimization will increase performance is usually trial-and-error. With enough experience you can make fairly confident (educated) guesses. But in general, expect anything.
"Each additional for-loop would cost the equivalent of two int allocations."
You are correct that it's not that simple. In fact it's so complicated that the numbers don't mean much. A loop iteration may take X cycles in one context, but Y cycles in another due to a multitude of factors such as Out-of-order Execution and data dependencies.
Not only is the performance context-dependent, but it also vary with different processors.
Here's the test code:
#include <time.h>
#include <iostream>
using namespace std;
int main(){
int size = 10000;
int *data = new int[size];
clock_t start = clock();
for (int i = 0; i < 1000000; i++){
#ifdef TOGETHER
for (int c = 0; c < size; c++){
data[c] *= 10;
data[c] += 7;
data[c] &= 15;
}
#else
for (int c = 0; c < size; c++){
data[c] *= 10;
}
for (int c = 0; c < size; c++){
data[c] += 7;
}
for (int c = 0; c < size; c++){
data[c] &= 15;
}
#endif
}
clock_t end = clock();
cout << (double)(end - start) / CLOCKS_PER_SEC << endl;
system("pause");
}
Output (one loop): 4.08 seconds
Output (3 loops): 7.17 seconds
Processors prefer to have a higher ratio of data instructions to jump instructions.
Branch instructions may force your processor to clear the instruction pipeline and reload.
Based on the reloading of the instruction pipeline, the first method would be faster, but not significantly. You would add at least 2 new branch instructions by splitting.
A faster optimization is to unroll the loop. Unrolling the loop tries to improve the ratio of data instructions to branch instructions by performing more instructions inside the loop before branching to the top of the loop.
Another significant performance optimization is to organize the data so it fits into one of the processor's cache lines. So for example, you could split have inner loops that process a single cache of data and the outer loop would load new items into the cache.
This optimizations should only be applied after the program runs correctly and robustly and the environment demands more performance. The environment defined as observers (animation / movies), users (waiting for a response) or hardware (performing operations before a critical time event). Any other purpose is a waste of your time, as the OS (running concurrent programs) and storage access will contribute more to your program's performance issues.
This will give you a good indication of whether or not one version is faster than another.
#include <array>
#include <chrono>
#include <iostream>
#include <numeric>
#include <string>
const int iterations = 100;
namespace
{
const int exchanges = 200;
template<typename TTest>
void Test(const std::string &name, TTest &&test)
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<float, std::milli> ms;
std::array<float, iterations> timings;
for (auto i = 0; i != iterations; ++i)
{
auto t0 = Clock::now();
test();
timings[i] = ms(Clock::now() - t0).count();
}
auto avg = std::accumulate(timings.begin(), timings.end(), 0) / iterations;
std::cout << "Average time, " << name << ": " << avg << std::endl;
}
}
int main()
{
Test("single loop",
[]()
{
for (auto i = 0; i < exchanges; ++i)
{
// some code
// some more code
// even more code
}
});
Test("separated loops",
[]()
{
for (auto i = 0; i < exchanges; ++i)
{
// some code
}
for (auto i = 0; i < exchanges; ++i)
{
// some more code
}
for (auto i = 0; i < exchanges; ++i)
{
// even more code
}
});
}
The thing is quite simple. The first code is like taking a single lap on a race track and the other code is like taking a full 3-lap race. So, more time required to take three laps rather than one lap. However, if the loops are doing something that needs to be done in sequence and they depend on each other then second code will do the stuff. for example if first loop is doing some calculations and second loop is doing some work with those calculations then both loops need to be done in sequence otherwise not...
I made a program that returns the sum of all primes under 2 million. I really have no idea what's going on with this one, I get 142891895587 as my answer when the correct answer is 142913828922. It seems like its missing a few primes in there. I'm pretty sure the getPrime function works as it is supposed to. I used it a couple times before and worked correctly than. The code is as follows:
vector<int> getPrimes(int number);
int main()
{
unsigned long int sum = 0;
vector<int> primes = getPrimes(2000000);
for(int i = 0; i < primes.size(); i++)
{
sum += primes[i];
}
cout << sum;
return 0;
}
vector<int> getPrimes(int number)
{
vector<bool> sieve(number+1,false);
vector<int> primes;
sieve[0] = true;
sieve[1] = true;
for(int i = 2; i <= number; i++)
{
if(sieve[i]==false)
{
primes.push_back(i);
unsigned long int temp = i*i;
while(temp <= number)
{
sieve[temp] = true;
temp = temp + i;
}
}
}
return primes;
}
The expression i*i overflows because i is an int. It is truncated before being assigned to temp. To avoid the overflow, cast it: static_cast<unsigned long>( i ) * i.
Even better, terminate iteration before that condition occurs: for(int i = 2; i*i <= number; i++).
Tested fixed.
Incidentally, you're kinda (un)lucky that this doesn't produce extra primes as well as missing some: the int value is signed, and could be negative upon overflow, and by my reading of §4.7/2, that would cause the inner loop to skip.
You may be running into datatype limits: http://en.wikipedia.org/wiki/Long_integer.
This line is the problem:
unsigned long int temp = i*i;
I'll give you a hint. Take a closer look at the initial value you give to temp. What's the first value you exclude from sieve? Are there any other smaller multiples of i that should also be excluded? What different initial value could you use to make sure all the right numbers get skipped?
There are some techniques you can use to help figure this out yourself. One is to try to get your program working using a lower limit. Instead of 2 million, try, say, 30. It's small enough that you can calculate the correct answer quickly by hand, and even walk through your program on paper one line at a time. That will let you discover where things start to go wrong.
Another option is to use a debugger to walk through your program line-by-line. Debuggers are powerful tools, although they're not always easy to learn.
Instead of using a debugger to trace your program, you could print out messages as your program progressed. Say, have it print out each number in the result of getPrimes instead of just printing the sum. (That's another reason you'd want to try a lower limit first — to avoid being overwhelmed by the volume of output.)
Your platform must have 64-bit longs. This line:
unsigned long int temp = i * i;
does not compute correctly because i is declared int and the multiplication result is also int (32-bit). Force the multiplication to be long:
unsigned long int temp = (unsigned long int) i * i;
On my system, long is 32-bit, so I had to change both temp and sum to be unsigned long long.