This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 11 years ago.
I am using Dev-C++. It doesn't show any code error, but fails to work.
It works when I try small numbers like 10 or 20
I am working on this problem :
Each new term in the Fibonacci sequence is generated by adding the
previous two terms. By starting with 1 and 2, the first 10 terms will
be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not
exceed four million, find the sum of the even-valued terms.
#include<stdio.h>
#include<stdlib.h>
int main()
{
const int N=100;
int a=1,b=2,i,t[N],S=0,c,j;
t[0]=1;
t[1]=2;
for(i=2;i<N;i++){
t[i]=t[i-2]+t[i-1];
if(t[i]>4000000)
{
for(j=1;j<=i-1;j++){
c=t[j]%2;
if(c==0){
S=S+t[j];
}
else {
continue;
}}
break;
}
}
printf("%d\n",S);
system("pause");
}
You cannot define a variable size array (T[N]). If you make N const, problem should be solved.
const int N = 3999998;
int T(N);
Also, main should have a return type. Change to "int main()"?
You don't need an array to store all those numbers, you can get away with storing the last two terms in the sequence, since that's all that's need to calculate the next term.
Trying to allocate that much space on the stack is asking for trouble since the stack is a relatively limited resource.
In fact, that exact code entered into gcc on a Linux box gives me a segmentation violation when I try to run it, for precisely that reason.
On top of that, your code is not getting the even valued terms, it's getting every term, and you're getting the first four million values, rather than the values below four million which was specified.
The sort of code you're after would look like this:
#include <stdio.h>
int main (void) {
// Accumulator and terms (acc is zero because first two terms are odd).
int acc = 0, t1 = 1, t2 = 1, t3;
// Continue until next term is 4mil or more.
while ((t3 = t1 + t2) < 4000000) {
// printf ("DEBUG: %d %d %d %s\n", t1, t2, t3,
// ((t3 % 2) == 0) ? "<<" : "");
// Accumulate only even terms.
if ((t3 % 2) == 0) acc += t3;
// Cycle through terms.
t1 = t2; t2 = t3;
}
// Print the accumulated value.
printf ("%d\n", acc);
return 0;
}
And the output:
4613732
If you test that program by un-commenting the debug statement, you see:
DEBUG: 1 1 2 <<
DEBUG: 1 2 3
DEBUG: 2 3 5
DEBUG: 3 5 8 <<
DEBUG: 5 8 13
DEBUG: 8 13 21
DEBUG: 13 21 34 <<
DEBUG: 21 34 55
DEBUG: 34 55 89
DEBUG: 55 89 144 <<
DEBUG: 89 144 233
DEBUG: 144 233 377
DEBUG: 233 377 610 <<
DEBUG: 377 610 987
DEBUG: 610 987 1597
DEBUG: 987 1597 2584 <<
DEBUG: 1597 2584 4181
DEBUG: 2584 4181 6765
DEBUG: 4181 6765 10946 <<
DEBUG: 6765 10946 17711
DEBUG: 10946 17711 28657
DEBUG: 17711 28657 46368 <<
DEBUG: 28657 46368 75025
DEBUG: 46368 75025 121393
DEBUG: 75025 121393 196418 <<
DEBUG: 121393 196418 317811
DEBUG: 196418 317811 514229
DEBUG: 317811 514229 832040 <<
DEBUG: 514229 832040 1346269
DEBUG: 832040 1346269 2178309
DEBUG: 1346269 2178309 3524578 <<
4613732
and, if you add up all the even numbers at the end of those DEBUG lines, you do indeed get the given value.
Two things I notice are that main doesn't have a return type (try int main()) and N is used as an array size but isn't constant.
It's a very common programming error called "Stack overflow". In fact, it's so common that it has named a very popular question and answer site, "Stack Overflow", maybe you have heard about it?
(I've been waiting for being able to give this answer ever since I joined "Stack Overflow"!!!)
Related
I am currently developing a chess engine in C++, and I am in the process of debugging my move generator. For this purpose, I wrote a simple perft() function:
int32_t Engine::perft(GameState game_state, int32_t depth)
{
int32_t last_move_nodes = 0;
int32_t all_nodes = 0;
Timer timer;
timer.start();
int32_t output_depth = depth;
if (depth == 0)
{
return 1;
}
std::vector<Move> legal_moves = generator.generate_legal_moves(game_state);
for (Move move : legal_moves)
{
game_state.make_move(move);
last_move_nodes = perft_no_print(game_state, depth - 1);
all_nodes += last_move_nodes;
std::cout << index_to_square_name(move.get_from_index()) << index_to_square_name(move.get_to_index()) << ": " << last_move_nodes << "\n";
game_state.unmake_move(move);
}
std::cout << "\nDepth: " << output_depth << "\nTotal nodes: " << all_nodes << "\nTotal time: " << timer.get_milliseconds() << "ms/" << timer.get_milliseconds()/1000.0f << "s\n\n";
return all_nodes;
}
int32_t Engine::perft_no_print(GameState game_state, int32_t depth)
{
int32_t nodes = 0;
if (depth == 0)
{
return 1;
}
std::vector<Move> legal_moves = generator.generate_legal_moves(game_state);
for (Move move : legal_moves)
{
game_state.make_move(move);
nodes += perft_no_print(game_state, depth - 1);
game_state.unmake_move(move);
}
return nodes;
}
It's results for the initial chess position (FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1) for depths 1 and 2 match the results of stockfish's perft command, so I assume they are correct:
h2h3: 1
h2h4: 1
g2g3: 1
g2g4: 1
f2f3: 1
f2f4: 1
e2e3: 1
e2e4: 1
d2d3: 1
d2d4: 1
c2c3: 1
c2c4: 1
b2b3: 1
b2b4: 1
a2a3: 1
a2a4: 1
g1h3: 1
g1f3: 1
b1c3: 1
b1a3: 1
Depth: 1
Total nodes: 20
Total time: 1ms/0.001s
h2h3: 20
h2h4: 20
g2g3: 20
g2g4: 20
f2f3: 20
f2f4: 20
e2e3: 20
e2e4: 20
d2d3: 20
d2d4: 20
c2c3: 20
c2c4: 20
b2b3: 20
b2b4: 20
a2a3: 20
a2a4: 20
g1h3: 20
g1f3: 20
b1c3: 20
b1a3: 20
Depth: 2
Total nodes: 400
Total time: 1ms/0.001s
The results stop matching at depth 3, though:
Stockfish:
go perft 3
a2a3: 380
b2b3: 420
c2c3: 420
d2d3: 539
e2e3: 599
f2f3: 380
g2g3: 420
h2h3: 380
a2a4: 420
b2b4: 421
c2c4: 441
d2d4: 560
e2e4: 600
f2f4: 401
g2g4: 421
h2h4: 420
b1a3: 400
b1c3: 440
g1f3: 440
g1h3: 400
Nodes searched: 8902
My engine:
h2h3: 361
h2h4: 380
g2g3: 340
g2g4: 397
f2f3: 360
f2f4: 436
e2e3: 380
e2e4: 437
d2d3: 380
d2d4: 437
c2c3: 399
c2c4: 326
b2b3: 300
b2b4: 320
a2a3: 280
a2a4: 299
g1h3: 281
g1f3: 280
b1c3: 357
b1a3: 320
Depth: 3
Total nodes: 7070
Total time: 10ms/0.01s
I figured that my move generator was just buggy, and tried to track down the bugs by making a move the engine gives incorrect values for on the board and then calling perft() with depth = 2 on it to find out which moves are missing. But for all moves I tried this with, the engine suddenly starts to output the correct results I expected to get earlier!
Here is an example for the move a2a3:
When calling perft() on the initial position in stockfish, it calculates 380 subnodes for a2a3 at depth 3.
When calling perft() on the initial position in my engine, it calculates 280 subnodes for a2a3 at depth 3.
When calling perft() on the position you get after making the move a2a3 in the initial position in my engine, it calculates the correct number of total nodes at depth 2, 380:
h7h5: 19
h7h6: 19
g7g5: 19
g7g6: 19
f7f5: 19
f7f6: 19
e7e5: 19
e7e6: 19
d7d5: 19
d7d6: 19
c7c5: 19
c7c6: 19
b7b5: 19
b7b6: 19
a7a5: 19
a7a6: 19
g8h6: 19
g8f6: 19
b8c6: 19
b8a6: 19
Depth: 2
Total nodes: 380
Total time: 1ms/0.001s
If you have any idea what the problem could be here, please help me out. Thank you!
EDIT:
I discovered some interesting new facts that might help to solve the problem, but I don't know what to do with them:
For some reason, using std::sort() like this in perft():
std::sort(legal_moves.begin(), legal_moves.end(), [](auto first, auto second){ return first.get_from_index() % 8 > second.get_from_index() % 8; });
to sort the vector of legal moves causes the found number of total nodes for the initial position (for depth 3) to change from the wrong 7070 to the (also wrong) 7331.
When printing the game state after calling game_state.make_move() in perft(), it seems to have had no effect on the position bitboards (the other properties change like they are supposed to). This is very strange, because isolated, the make_move() method works just fine.
I'm unsure if you were able to pin down the issue but from the limited information available in the question, the best I can assume (and something I faced myself earlier) is that there is a problem in your unmake_move() function when it comes to captures since
Your perft fails only at level 3 - this is when the first legal capture is possible, move 1 and 2 can have no legal captures.
Your perft works fine when it's at depth 1 in the position after a2a3 rather than when it's searching at depth 3 from the start
This probably means that your unmake_move() fails at a depth greater than 1 where you need to restore some of the board's state that cannot be derived from just the move parameter you are passing in (e.g. enpassant, castling rights etc. before you made the move).
This is how you would like to debug your move generator using perft.
Given startpos as p1, generate perft(3) for your engine and sf. (you did that)
Now check any move that have different nodes, you pick a2a3. (you did that)
Given startpos + a2a3 as p2, generate perft(2) for your engine and sf. (you partially did this)
Now check any move that have different nodes in step 3. Let's say move x.
Given startpos + a2a3 + x as p3, generate perft(1) for your engine and sf.
Since that is only perft(1) by this time you will be able to figure out the wrong move or the missing move from your generator. Setup that last position or p3 on the board and see the wrong/missing moves from your engine compared to sf perft(1) result.
Consider the following code snippet of this class template...
template<class T>
class FileTemplate {
private:
std::vector<T> vals_;
std::string filenameAndPath_;
public:
inline FileTemplate( const std::string& filenameAndPath, const T& multiplier ) :
filenameAndPath_( filenameAndPath ) {
std::fstream file;
if ( !filenameAndPath_.empty() ) {
file.open( filenameAndPath_ );
T val = 0;
while ( file >> val ) {
vals_.push_back( val );
}
file.close();
for ( unsigned i = 0; i < vals_.size(); i++ ) {
vals_[i] *= multiplier;
}
file.open( filenameAndPath_ );
for ( unsigned i = 0; i < vals_.size(); i++ ) {
file << vals_[i] << " ";
}
file.close();
}
}
inline std::vector<T> getValues() const {
return vals_;
}
};
When used in main as such with the lower section commented out with the following pre-populated text file:
values.txt
1 2 3 4 5 6 7 8 9
int main() {
std::string filenameAndPath( "_build/values.txt" );
std::fstream file;
FileTemplate<unsigned> ft( filenameAndPath, 5 );
std::vector<unsigned> results = ft.getValues();
for ( auto r : results ) {
std::cout << r << " ";
}
std::cout << std::endl;
/*
FileTemplate<float> ft2( filenameAndPath, 2.5f );
std::vector<float> results2 = ft2.getValues();
for ( auto r : results2 ) {
std::cout << r << " ";
}
std::cout << std::endl;
*/
std::cout << "\nPress any key and enter to quit." << std::endl;
char q;
std::cin >> q;
return 0;
}
and I run this code through the debugger sure enough both the output to the screen and file are changed to
values.txt - overwritten are -
5 10 15 20 25 30 35 40 45
then lets say I don't change any code just stop the debugging or running of the application, and let's say I run this again 2 more times, the outputs respectively are:
values.txt - iterations 2 & 3
25 50 75 100 125 150 175 200 225 250
125 250 375 500 625 750 875 1000 1125 1250
Okay good so far; now lets reset our values in the text file back to default and lets uncomment the 2nd instantiation of this class template for the float with a multiplier value of 2.5f and then run this 3 times.
values.txt - reset to default
1 2 3 4 5 6 7 8 9
-iterations 1,2 & 3 with both unsigned & float the multipliers are <5,2.5> respectively. 5 for the unsigned and 2.5 for the float
- Iteration 1
cout:
5 10 15 20 25 30 35 40 45
12.5 25 37.5 50 62.5 75 87.5 100 112.5
values.txt:
12.5 25 37.5 50 62.5 75 87.5 100 112.5
- Iteration 2
cout:
60
150 12.5 62.5 93.75 125 156.25 187.5 218.75 250 281.25
values.txt:
150 12.5 62.5 93.75 125 156.25 187.5 218.75 250 281.25
- Iteration 3
cout:
750 60
1875 150 12.5 156.25 234.375 312.5 390.625 468.75 546.875 625 703.125
values.txt:
1875 150 12.5 156.25 234.375 312.5 390.625 468.75 546.875 625 703.125
A couple of questions come to mind: it is two fold regarding the same behavior of this program.
The first and primary question is: Are the file read and write calls being done at compile time considering this is a class template and the constructor is inline?
After running the debugger a couple of times; why is the output incrementing the number of values in the file? I started off with 9, but after an iteration or so there are 10, then 11.
This part just for fun if you want to answer:
The third and final question yes is opinion based but merely for educational purposes for I would like to see what the community thinks about this: What are the pros & cons to this type of programming? What are the potentials and the limits? Are their any practical real world applications & production benefits to this?
In terms of the other issues. The main issue is that you are not truncating the file when you do the second file.open statement, you need :
file.open( filenameAndPath_, std::fstream::trunc|std::fstream::out );
What is happening, is that, when you are reading unsigned int from a file containing floating points, it is only reading the first number (e.g. 12.5) up to the decimal place and then stopping (e.g. reading only 12)
, because there is no other text on the line that looks like an unsigned int. This means it only reads the number 12 and then multiplies it by 5 to get the 60, and writes it to the file.
Unfortunately because you don't truncate the file when writing the 60, it leaves the original text at the end which is interpreted as additional numbers in the next read loop. Hence, 12.5 appears in the file as 60 5
stream buffers
Extracts as many characters as possible from the stream and inserts them into the output sequence controlled by the stream buffer object pointed by sb (if any), until either the input sequence is exhausted or the function fails to insert into the object pointed by sb.
(http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/)
I'm using C++. Using sort from STL is allowed.
I have an array of int, like this :
1 4 1 5 145 345 14 4
The numbers are stored in a char* (i read them from a binary file, 4 bytes per numbers)
I want to do two things with this array :
swap each number with the one after that
4 1 5 1 345 145 4 14
sort it by group of 2
4 1 4 14 5 1 345 145
I could code it step by step, but it wouldn't be efficient. What I'm looking for is speed. O(n log n) would be great.
Also, this array can be bigger than 500MB, so memory usage is an issue.
My first idea was to sort the array starting from the end (to swap the numbers 2 by 2) and treating it as a long* (to force the sorting to take 2 int each time). But I couldn't manage to code it, and I'm not even sure it would work.
I hope I was clear enough, thanks for your help : )
This is the most memory efficient layout I could come up with. Obviously the vector I'm using would be replaced by the data blob you're using, assuming endian-ness is all handled well enough. The premise of the code below is simple.
Generate 1024 random values in pairs, each pair consisting of the first number between 1 and 500, the second number between 1 and 50.
Iterate the entire list, flipping all even-index values with their following odd-index brethren.
Send the entire thing to std::qsort with an item width of two (2) int32_t values and a count of half the original vector.
The comparator function simply sorts on the immediate value first, and on the second value if the first is equal.
The sample below does this for 1024 items. I've tested it without output for 134217728 items (exactly 536870912 bytes) and the results were pretty impressive for a measly macbook air laptop, about 15 seconds, only about 10 of that on the actual sort. What is ideally most important is no additional memory allocation is required beyond the data vector. Yes, to the purists, I do use call-stack space, but only because q-sort does.
I hope you get something out of it.
Note: I only show the first part of the output, but I hope it shows what you're looking for.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <cstdint>
// a most-wacked-out random generator. every other call will
// pull from a rand modulo either the first, or second template
// parameter, in alternation.
template<int N,int M>
struct randN
{
int i = 0;
int32_t operator ()()
{
i = (i+1)%2;
return (i ? rand() % N : rand() % M) + 1;
}
};
// compare to integer values by address.
int pair_cmp(const void* arg1, const void* arg2)
{
const int32_t *left = (const int32_t*)arg1;
const int32_t *right = (const int32_t *)arg2;
return (left[0] == right[0]) ? left[1] - right[1] : left[0] - right[0];
}
int main(int argc, char *argv[])
{
// a crapload of int values
static const size_t N = 1024;
// seed rand()
srand((unsigned)time(0));
// get a huge array of random crap from 1..50
vector<int32_t> data;
data.reserve(N);
std::generate_n(back_inserter(data), N, randN<500,50>());
// flip all the values
for (size_t i=0;i<data.size();i+=2)
{
int32_t tmp = data[i];
data[i] = data[i+1];
data[i+1] = tmp;
}
// now sort in pairs. using qsort only because it lends itself
// *very* nicely to performing block-based sorting.
std::qsort(&data[0], data.size()/2, sizeof(data[0])*2, pair_cmp);
cout << "After sorting..." << endl;
std::copy(data.begin(), data.end(), ostream_iterator<int32_t>(cout,"\n"));
cout << endl << endl;
return EXIT_SUCCESS;
}
Output
After sorting...
1
69
1
83
1
198
1
343
1
367
2
12
2
30
2
135
2
169
2
185
2
284
2
323
2
325
2
347
2
367
2
373
2
382
2
422
2
492
3
286
3
321
3
364
3
377
3
400
3
418
3
441
4
24
4
97
4
153
4
210
4
224
4
250
4
354
4
356
4
386
4
430
5
14
5
26
5
95
5
145
5
302
5
379
5
435
5
436
5
499
6
67
6
104
6
135
6
164
6
179
6
310
6
321
6
399
6
409
6
425
6
467
6
496
7
18
7
65
7
71
7
84
7
116
7
201
7
242
7
251
7
256
7
324
7
325
7
485
8
52
8
93
8
156
8
193
8
285
8
307
8
410
8
456
8
471
9
27
9
116
9
137
9
143
9
190
9
190
9
293
9
419
9
453
With some additional constraints on both your input and your platform, you can probably use an approach like the one you are thinking of. These constraints would include
Your input contains only positive numbers (i.e. can be treated as unsigned)
Your platform provides uint8_t and uint64_t in <cstdint>
You address a single platform with known endianness.
In that case you can divide your input into groups of 8 bytes, do some byte shuffling to arrange each groups as one uint64_t with the "first" number from the input in the lower-valued half and run std::sort on the resulting array. Depending on endianness you may need to do more byte shuffling to rearrange each sorted 8-byte group as a pair of uint32_t in the expected order.
If you can't code this on your own, I'd strongly advise you not to take this approach.
A better and more portable approach (you have some inherent non-portability by starting from a not clearly specified binary file format), would be:
std::vector<int> swap_and_sort_int_pairs(const unsigned char buffer[], size_t buflen) {
const size_t intsz = sizeof(int);
// We have to assume that the binary format in buffer is compatible with our int representation
// we also require an even number of integers
assert(buflen % (2*intsz) == 0);
// load pairwise
std::vector< std::pair<int,int> > pairs;
pairs.reserve(buflen/(2*intsz));
for (const unsigned char* bufp=buffer; bufp<buffer+buflen; bufp+= 2*intsz) {
// It would be better to have a more portable binary -> int conversion
int first_value = *reinterpret_cast<int*>(bufp);
int second_value = *reinterpret_cast<int*>(bufp + intsz);
// swap each pair here
pairs.emplace_back( second_value, firstvalue );
}
// less<pair<..>> does lexicographical ordering, which is what you are looking ofr
std::sort(pairs.begin(), pairs.end());
// convert back to linear vector
std::vector<int> result;
result.reserve(2*pairs.size());
for (auto& entry : pairs) {
result.push_back(entry.first);
result.push_back(entry.second);
}
return result;
}
Both the inital parse/swap pass (which you need anyway) and the final conversion are O(N), so the total complexity is still (O(N log(N)).
If you can continue to work with pairs, you can save the final conversion. The other way to save that conversion would be to use a hand-coded sort with two-int strides and two-int swap: much more work - and possibly still hard to get as efficient as a well-tuned library sort.
Do one thing at a time. First, give your data some *struct*ure. It seems that each 8 byte form a unit of the
form
struct unit {
int key;
int value;
}
If the endianness is right, you can do this in O(1) with a reinterpret_cast. If it isn't, you'll have to live with a O(n) conversion effort. Both vanish compared to the O(n log n) search effort.
When you have an array of these units, you can use std::sort like:
bool compare_units(const unit& a, const unit& b) {
return a.key < b.key;
}
std::sort(array, length, compare_units);
The key to this solution is that you do the "swapping" and byte-interpretation first and then do the sorting.
You get 10 numbers that you have to split into two lists where the sum of numbers in the lists have the smallest difference possible.
so let's say you get:
10 29 59 39 20 17 29 48 33 45
how would you sort this into two lists where the difference in the sum of the lists is as small as possible
so in this case, the answer (i think) would be:
59 48 29 17 10 = 163
45 39 33 29 20 = 166
I'm using mIRC script as the language but perl or C++ is just as good for me.
edit: actually there can be multiple answers such as in this scenario, it could also be:
59 48 29 20 10 = 166
45 39 33 29 17 = 163
to me, it doesn't matter so long as the end result is that the difference of the sum of the lists is as small as possible
edit 2: each list must contain 5 numbers.
What you have listed is exactly the partition problem (for more details look at http://en.wikipedia.org/wiki/Partition_problem).
The point is that this is a NP-complete problem, therefore it does not exist a program able to solve any instance of this problem (i.e. with a bigger amount of numbers).
But if your problem is always with only ten numbers to divide into two lists of exactly five items each, then it becomes feasible, also to try naively all possible solutions, since they are only p^N, where p=2 is the number of partitions, and N=10 is the number of integers, thus only 2^10=1024 combinations, and each takes only O(N) to be verified (i.e. compute the difference).
Otherwise you can implement the greedy algorithm described in the Wikipedia page, it is simple to implement but there is no guarantee of optimality, in fact you can see this implementation in Java:
static void partition() {
int[] set = {10, 29, 59, 39, 20, 17, 29, 48, 33, 45}; // array of data
Arrays.sort(set); // sort data in descending order
ArrayList<Integer> A = new ArrayList<Integer>(5); //first list
ArrayList<Integer> B = new ArrayList<Integer>(5); //second list
String stringA=new String(); //only to print result
String stringB=new String(); //only to print result
int sumA = 0; //sum of items in A
int sumB = 0; //sum of items in B
for (int i : set) {
if (sumA <= sumB) {
A.add(i); //add item to first list
sumA+=i; //update sum of first list
stringA+=" "+i;
} else {
B.add(i); //add item to second list
sumB+=i; //update sum of second list
stringB+=" "+i;
}
}
System.out.println("First list:" + stringA + " = " + sumA);
System.out.println("Second list:"+ stringB+ " = " + sumB);
System.out.println("Difference (first-second):" + (sumA-sumB));
}
It does not return a good result:
First list: 10 20 29 39 48 = 146
Second list: 17 29 33 45 59 = 183
Difference (first-second):-37
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I was having a discussion about the relative cost of fork() Vs thread() for parallelization of a task.
We understand the basic differences between processes Vs Thread
Thread:
Easy to communicate between threads
Fast context switching.
Processes:
Fault tolerance.
Communicating with parent not a real problem (open a pipe)
Communication with other child processes hard
But we disagreed on the start-up cost of processes Vs threads.
So to test the theories I wrote the following code. My question: Is this a valid test of measuring the start-up cost or I am missing something. Also I would be interested in how each test performs on different platforms.
fork.cpp
#include <boost/lexical_cast.hpp>
#include <vector>
#include <unistd.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
extern "C" int threadStart(void* threadData)
{
return 0;
}
int main(int argc,char* argv[])
{
int threadCount = boost::lexical_cast<int>(argv[1]);
std::vector<pid_t> data(threadCount);
clock_t start = clock();
for(int loop=0;loop < threadCount;++loop)
{
data[loop] = fork();
if (data[looo] == -1)
{
std::cout << "Abort\n";
exit(1);
}
if (data[loop] == 0)
{
exit(threadStart(NULL));
}
}
clock_t middle = clock();
for(int loop=0;loop < threadCount;++loop)
{
int result;
waitpid(data[loop], &result, 0);
}
clock_t end = clock();
std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n";
}
Thread.cpp
#include <boost/lexical_cast.hpp>
#include <vector>
#include <iostream>
#include <pthread.h>
#include <time.h>
extern "C" void* threadStart(void* threadData)
{
return NULL;
}
int main(int argc,char* argv[])
{
int threadCount = boost::lexical_cast<int>(argv[1]);
std::vector<pthread_t> data(threadCount);
clock_t start = clock();
for(int loop=0;loop < threadCount;++loop)
{
if (pthread_create(&data[loop], NULL, threadStart, NULL) != 0)
{
std::cout << "Abort\n";
exit(1);
}
}
clock_t middle = clock();
for(int loop=0;loop < threadCount;++loop)
{
void* result;
pthread_join(data[loop], &result);
}
clock_t end = clock();
std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n";
}
I expect Windows to do worse in processes creation.
But I would expect modern Unix like systems to have a fairly light fork cost and be at least comparable to thread. On older Unix style systems (before fork() was implemented as using copy on write pages) that it would be worse.
Anyway My timing results are:
> uname -a
Darwin Alpha.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
> gcc --version | grep GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)
> g++ thread.cpp -o thread -I~/include
> g++ fork.cpp -o fork -I~/include
> foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 )
foreach? ./thread ${a} >> A
foreach? end
> foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 )
foreach? ./fork ${a} >> A
foreach? end
vi A
Thread: Fork:
C Start Wait Total C Start Wait Total
==============================================================
1 26 145 171 1 160 37 197
2 44 198 242 2 290 37 327
3 62 234 296 3 413 41 454
4 77 275 352 4 499 59 558
5 91 107 10808 5 599 57 656
6 99 332 431 6 665 52 717
7 130 388 518 7 741 69 810
8 204 468 672 8 833 56 889
9 164 469 633 9 1067 76 1143
10 165 450 615 10 1147 64 1211
12 343 585 928 12 1213 71 1284
15 232 647 879 15 1360 203 1563
20 319 921 1240 20 2161 96 2257
30 461 1243 1704 30 3005 129 3134
40 559 1487 2046 40 4466 166 4632
50 686 1912 2598 50 4591 292 4883
60 827 2208 3035 60 5234 317 5551
70 973 2885 3858 70 7003 416 7419
80 3545 2738 6283 80 7735 293 8028
90 1392 3497 4889 90 7869 463 8332
100 3917 4180 8097 100 8974 436 9410
Edit:
Doing a 1000 children caused the fork version to fail.
So I have reduced the children count. But doing a single test also seems unfair so here is a range of values.
mumble ... I do not like your solution for many reasons:
You are not taking in account the execution time of child processes/thread.
You should compare cpu-usage not the bare elapsed time. This way your statistics will not depend from, e.g., disk access congestion.
Let your child process do something. Remember that "modern" fork uses copy-on-write mechanisms to avoid to allocate memory to the child process until needed. It is too easy to exit immediately. This way you avoid quite all the disadvantages of fork.
CPU time is not the only cost you have to account. Memory consumption and slowness of IPC are both disadvantages of fork solution.
You could use "rusage" instead of "clock" to measure real resource usage.
P.S. I do not think you can really measure the process/thread overhead writing a simple test program. There are too many factors and, usually, the choice between threads and processes is driven by other reasons than mere cpu-usage.
Under Linux fork is a special call to sys_clone, either within the library or within the kernel. Clone has lots of switches to flip on and off, and each of them effects how expensive it is to start.
The actual library function clone is probably more expensive than fork though because it does more, though most of that is on the child side (stack swapping and calling a function by pointer).
What that micro-benchmark shows is that thread creation and joining (there are no fork results when I'm writing this) takes tens or hundreds of microseconds (assuming your system has CLOCKS_PER_SEC=1000000, which it probably has, since it's an XSI requirement).
Since you said that fork() takes 3 times the cost of threads, we are still talking tenths of a millisecond at worst. If that is noticeable on an application, you could use pools of processes/threads, like Apache 1.3 did. In any case, I'd say that startup time is a moot point.
The important difference of threads vs processes (on Linux and most Unix-likes) is that on processes you choose explicitly what to share, using IPC, shared memory (SYSV or mmap-style), pipes, sockets (you can send file descriptors over AF_UNIX sockets, meaning you get to choose which fd's to share), ... While on threads almost everything is shared by default, whether there's a need to share it or not. In fact, that is the reason Plan 9 had rfork() and Linux has clone() (and recently unshare()), so you can choose what to share.