OpenCV - Stitching Images from a grid of images - c++

I have found some basic working examples on stitching via OpenCV for panoramic images. I have also found some useful documentation in the API docs, but I can't find out how to speed up the processing by providing additional information.
In my case, I generate a set of images in a 20x20 grid of individual frames, for a total of 400 images to be stitched into a single large one. This takes an enormous amount of time on a modern PC, so it would likely take hours on a developer board.
Is there any way to tell the OpenCV instance information about the images, such as me knowing in advance the relative positioning of all the images as they would appear on a grid? The only API calls I see so far is to just add all the images indiscriminately to a queue via vImg.push_back().
References
Stitching. Image Stitching - OpenCV API Documentation, Accessed 2014-02-26, <http://docs.opencv.org/modules/stitching/doc/stitching.html>
OpenCV Stitching example (Stitcher class, Panorama), Accessed 2014-02-26, <http://feelmare.blogspot.ca/2013/11/opencv-stitching-example-stitcher-class.html>
Panorama – Image Stitching in OpenCV, Accessed 2014-02-26, <http://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/>

I did some work with the stitching pipeline and though I do not consider myself an expert on the field, I did get better performance (and better results as well) adjusting each step of the pipeline separately. As you can see in the picture, the Stitching class is nothing but a wrapper of this pipeline:
Some interesting parts you can adjust are the resizing steps (there comes a point were more resolution just means more computation time and more inaccurate features), the matching process and (though this is just a guess) giving a good camera parameters instead of performing an estimation. This involves getting the camera parameters before doing the stitching, but it is not really hard. Here you have some reference: OpenCV Camera Calibration and 3D Reconstruction.
Again: I am not an expert, this is just based on my experience as an intern doing some experiments with the library!

So far as I know, there is no means to provide additional data to the OpenCV engine beyond just giving it a list of images. It does a pretty good job on its own though. I would check out some of the example code, and test how long each stitching operation takes. From my experiments using 4x6, 4x8, ..., 4x20 panoramic reconstructions, the CPU time required seems to increase with the number of overlapping images. I would imagine your case would require at least a minute to compute on a modern machine.
Source:
https://code.ros.org/trac/opencv/browser/trunk/opencv/samples/cpp/stitching.cpp?rev=6682
1 /*M///////////////////////////////////////////////////////////////////////////////////////
2 //
3 // IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
4 //
5 // By downloading, copying, installing or using the software you agree to this license.
6 // If you do not agree to this license, do not download, install,
7 // copy or use the software.
8 //
9 //
10 // License Agreement
11 // For Open Source Computer Vision Library
12 //
13 // Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
14 // Copyright (C) 2009, Willow Garage Inc., all rights reserved.
15 // Third party copyrights are property of their respective owners.
16 //
17 // Redistribution and use in source and binary forms, with or without modification,
18 // are permitted provided that the following conditions are met:
19 //
20 // * Redistribution's of source code must retain the above copyright notice,
21 // this list of conditions and the following disclaimer.
22 //
23 // * Redistribution's in binary form must reproduce the above copyright notice,
24 // this list of conditions and the following disclaimer in the documentation
25 // and/or other materials provided with the distribution.
26 //
27 // * The name of the copyright holders may not be used to endorse or promote products
28 // derived from this software without specific prior written permission.
29 //
30 // This software is provided by the copyright holders and contributors "as is" and
31 // any express or implied warranties, including, but not limited to, the implied
32 // warranties of merchantability and fitness for a particular purpose are disclaimed.
33 // In no event shall the Intel Corporation or contributors be liable for any direct,
34 // indirect, incidental, special, exemplary, or consequential damages
35 // (including, but not limited to, procurement of substitute goods or services;
36 // loss of use, data, or profits; or business interruption) however caused
37 // and on any theory of liability, whether in contract, strict liability,
38 // or tort (including negligence or otherwise) arising in any way out of
39 // the use of this software, even if advised of the possibility of such damage.
40 //
41 //M*/
42
43 // We follow to these papers:
44 // 1) Construction of panoramic mosaics with global and local alignment.
45 // Heung-Yeung Shum and Richard Szeliski. 2000.
46 // 2) Eliminating Ghosting and Exposure Artifacts in Image Mosaics.
47 // Matthew Uyttendaele, Ashley Eden and Richard Szeliski. 2001.
48 // 3) Automatic Panoramic Image Stitching using Invariant Features.
49 // Matthew Brown and David G. Lowe. 2007.
50
51 #include <iostream>
52 #include <fstream>
53 #include "opencv2/highgui/highgui.hpp"
54 #include "opencv2/stitching/stitcher.hpp"
55
56 using namespace std;
57 using namespace cv;
58
59 void printUsage()
60 {
61 cout <<
62 "Rotation model images stitcher.\n\n"
63 "stitching img1 img2 [...imgN]\n\n"
64 "Flags:\n"
65 " --try_use_gpu (yes|no)\n"
66 " Try to use GPU. The default value is 'no'. All default values\n"
67 " are for CPU mode.\n"
68 " --output <result_img>\n"
69 " The default is 'result.jpg'.\n";
70 }
71
72 bool try_use_gpu = false;
73 vector<Mat> imgs;
74 string result_name = "result.jpg";
75
76 int parseCmdArgs(int argc, char** argv)
77 {
78 if (argc == 1)
79 {
80 printUsage();
81 return -1;
82 }
83 for (int i = 1; i < argc; ++i)
84 {
85 if (string(argv[i]) == "--help" || string(argv[i]) == "/?")
86 {
87 printUsage();
88 return -1;
89 }
90 else if (string(argv[i]) == "--try_gpu")
91 {
92 if (string(argv[i + 1]) == "no")
93 try_use_gpu = false;
94 else if (string(argv[i + 1]) == "yes")
95 try_use_gpu = true;
96 else
97 {
98 cout << "Bad --try_use_gpu flag value\n";
99 return -1;
100 }
101 i++;
102 }
103 else if (string(argv[i]) == "--output")
104 {
105 result_name = argv[i + 1];
106 i++;
107 }
108 else
109 {
110 Mat img = imread(argv[i]);
111 if (img.empty())
112 {
113 cout << "Can't read image '" << argv[i] << "'\n";
114 return -1;
115 }
116 imgs.push_back(img);
117 }
118 }
119 return 0;
120 }
121
122
123 int main(int argc, char* argv[])
124 {
125 int retval = parseCmdArgs(argc, argv);
126 if (retval) return -1;
127
128 Mat pano;
129 Stitcher stitcher = Stitcher::createDefault(try_use_gpu);
130 Stitcher::Status status = stitcher.stitch(imgs, pano);
131
132 if (status != Stitcher::OK)
133 {
134 cout << "Can't stitch images, error code = " << status << endl;
135 return -1;
136 }
137
138 imwrite(result_name, pano);
139 return 0;
140 }
141
142

Maybe this could help?
https://software.intel.com/en-us/articles/fast-panorama-stitching
Specifically the part about pairwise matching
Ronen

Consider enabling the use of GPU in the Opencv Stitcher:
bool try_use_gpu = true;
Stitcher myStitcher = Stitcher::createDefault(try_use_gpu);
Stitcher::Status status = myStitcher.stitch(Imgs, pano);

If you know the relative positions of the images, it seems that you could break down the problem into sub-problems and possibly reduce the computational load by approaching it with knowledge of the substructure of the problem. Basically break the set of images into groups of 4 adjacent images, process the frames, then proceed to process the resulting images using the same idea until you have arrived at your panorama. That being said, I've only recently began toying with this toolset of opencv. I know it's a pretty simple idea, but it might be useful to someone.

Related

How to print table of 12 using recursion, I want it upto 10 but on running it turns out to be upto 12

I was trying to create a program to print table of 12 using recursion as I wrote a simple program for this I did get table, but table was instead upto 144 (12times12=144) instead of 120(12times10=120) I am sharing my code and output with you guys I was writing code in C++
//we will print table of 12 using concept of recursion
//a table of 12 is like this
//12 24 36 48 60 72 84 96 108 120
#include<iostream>
using namespace std;
void table(int n)
{
if(n==1)
{
cout<<12<<"\n";
return;
}
table(n-1);
cout<<n*12<<"\n";
}
int main(void)
{
table(12);
}
and now here is out put of this program
12
24
36
48
60
72
84
96
108
120
132
144
please help me what I'm missing here I am positive that adding some condition will help I tried one adding if(n==12) { return;} but it prevents does nothing as in the end it is return n*12

Fast read of large text file to 1D structure in C++

I need to read a batch of text files of up to 20mb in size, fast.
The text file comes in the format. The numbers need to be in double format as some other file may have 3 decimal place precision:
0 0 29 175 175 175 175 174
0 1 29 175 175 175 175 174
0 2 29 28 175 175 175 174
0 3 29 28 175 175 175 174
0 4 29 29 175 175 175 174
.
.
.
I would like to store the last six numbers of each line into a single 1D structure like this such that it skips the first two columns. It basically transposes each column and horizontally concatenates each transposed column:
29 29 29 29 29 175 175 28 28 29 175 175 175 175 175...
Here is my class attempting this that is too slow for my purposes.
void MyClass::GetFromFile(std::string filename, int headerLinestoSkip, int ColumnstoSkip, int numberOfColumnsIneed)
{
std::ifstream file(filename);
std::string file_line;
double temp;
std::vector<std::vector<double>> temp_vector(numberOfColumnsIneed);
if(file.is_open())
{
SkipLines(file, headerLinestoSkip);
while(getline(file, file_line, '\n'))
{
std::istringstream ss(file_line);
for(int i=0; i<ColumnstoSkip; i++)
{
ss >> temp;
}
for(int i=0; i<numberOfColumnsIneed; i++)
{
ss >> temp;
temp_vector[i].push_back(temp);
}
}
for(int i=0; i<numberOfColumnsIneed; i++)
{
this->ClassMemberVector.insert(this->ClassMemberVector.end(), temp_vector[i].begin(), temp_vector[i].end());
}
}
I have read that memory mapping the file may be helpful but my attempts to getting it into the 1D structure I need has not been successful. An example from someone would be very much appreciated!
With 20mb and short lines as you show, that's approx 500 000 lines. Knowing this, there are several factors that could slow down your code:
I/O : at the current hardware and OS performance, I can't imagine that this plays a role here;
parsing/conversion. You read each line, build a string stream out of it, to then extract the numbers. This could be an overhead, especially on some C++ implementations where stream extraction is slower than the old sscanf(). I may be wrong but again I'm not sure that this overhead would be so huge.
the memory allocation for your vectors. This is definitely the first place to look for. A vector has a size and a capacity. Each time you add an item above capacity, the vector needs to be reallocated, which could require to move and move again all its content.
I'd strongly advise you to execute your code with a profiler to identify the bottleneck. Manual timing will be difficult here because your loop contains all potential problems, but each iteration is certainly to quick for std::chrono to measure the different loop parts with sufficient accuracy.
If you can't use a profiler, I'd suggest to compute a rough estimation of the number of lines using the file size, and take half of it. Pre-reserve then the corresponding capacity in each temp_vector[i]. If you observe a good progress you'll be the right track and could then fine tune this approach. If not, edit your answer with your new findings and post a comment to this answer.

Understanding some C++ coding practice [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I am currently trying to understand how the following code (http://pastebin.com/zTHUrmyx) works, my approach is currently compiling the software in debug and using gdb to step through the code.
However, I'm running into the problem that 'step' does not always tell me what is going on. Particularly unclear to me is the EXECUTE {...} which I cannot step into.
How do I go about learning what the code is doing?
1 /*
2 Copyright 2008 Brain Research Institute, Melbourne, Australia
3
4 Written by J-Donald Tournier, 27/06/08.
5
6 This file is part of MRtrix.
7
8 MRtrix is free software: you can redistribute it and/or modify
9 it under the terms of the GNU General Public License as published by
10 the Free Software Foundation, either version 3 of the License, or
11 (at your option) any later version.
12
13 MRtrix is distributed in the hope that it will be useful,
14 but WITHOUT ANY WARRANTY; without even the implied warranty of
15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 GNU General Public License for more details.
17
18 You should have received a copy of the GNU General Public License
19 along with MRtrix. If not, see <http://www.gnu.org/licenses/>.
20
21
22 15-10-2008 J-Donald Tournier <d.tournier#brain.org.au>
23 * fix -prs option handling
24 * remove MR::DICOM_DW_gradients_PRS flag
25
26 15-10-2008 J-Donald Tournier <d.tournier#brain.org.au>
27 * add -layout option to manipulate data ordering within the image file
28
29 14-02-2010 J-Donald Tournier <d.tournier#brain.org.au>
30 * fix -coord option so that the "end" keyword can be used
31
32
33 */
34
35 #include "app.h"
36 #include "image/position.h"
37 #include "image/axis.h"
38 #include "math/linalg.h"
39
40 using namespace std;
41 using namespace MR;
42
43 SET_VERSION_DEFAULT;
44
45 DESCRIPTION = {
46 "perform conversion between different file types and optionally extract a subset of the input image.",
47 "If used correctly, this program can be a very useful workhorse. In addition to converting images between different formats, it can be used to extract specific studies from a data set, extract a specific region of interest, flip the images, or to scale the intensity of the images.",
48 NULL
49 };
50
51 ARGUMENTS = {
52 Argument ("input", "input image", "the input image.").type_image_in (),
53 Argument ("ouput", "output image", "the output image.").type_image_out (),
54 Argument::End
55 };
56
57
58 const gchar* type_choices[] = { "REAL", "IMAG", "MAG", "PHASE", "COMPLEX", NULL };
59 const gchar* data_type_choices[] = { "FLOAT32", "FLOAT32LE", "FLOAT32BE", "FLOAT64", "FLOAT64LE", "FLOAT64BE",
60 "INT32", "UINT32", "INT32LE", "UINT32LE", "INT32BE", "UINT32BE",
61 "INT16", "UINT16", "INT16LE", "UINT16LE", "INT16BE", "UINT16BE",
62 "CFLOAT32", "CFLOAT32LE", "CFLOAT32BE", "CFLOAT64", "CFLOAT64LE", "CFLOAT64BE",
63 "INT8", "UINT8", "BIT", NULL };
64
65 OPTIONS = {
66 Option ("coord", "select coordinates", "extract data only at the coordinates specified.", false, true)
67 .append (Argument ("axis", "axis", "the axis of interest").type_integer (0, INT_MAX, 0))
68 .append (Argument ("coord", "coordinates", "the coordinates of interest").type_sequence_int()),
69
70 Option ("vox", "voxel size", "change the voxel dimensions.")
71 .append (Argument ("sizes", "new dimensions", "A comma-separated list of values. Only those values specified will be changed. For example: 1,,3.5 will change the voxel size along the x & z axes, and leave the y-axis voxel size unchanged.")
72 .type_sequence_float ()),
73
74 Option ("datatype", "data type", "specify output image data type.")
75 .append (Argument ("spec", "specifier", "the data type specifier.").type_choice (data_type_choices)),
76
77 Option ("scale", "scaling factor", "apply scaling to the intensity values.")
78 .append (Argument ("factor", "factor", "the factor by which to multiply the intensities.").type_float (NAN, NAN, 1.0)),
79
80 Option ("offset", "offset", "apply offset to the intensity values.")
81 .append (Argument ("bias", "bias", "the value of the offset.").type_float (NAN, NAN, 0.0)),
82
83 Option ("zero", "replace NaN by zero", "replace all NaN values with zero."),
84
85 Option ("output", "output type", "specify type of output")
86 .append (Argument ("type", "type", "type of output.")
87 .type_choice (type_choices)),
88
89 Option ("layout", "data layout", "specify the layout of the data in memory. The actual layout produced will depend on whether the output image format can support it.")
90 .append (Argument ("spec", "specifier", "the data layout specifier.").type_string ()),
91
92 Option ("prs", "DW gradient specified as PRS", "assume that the DW gradients are specified in the PRS frame (Siemens DICOM only)."),
93
94 Option::End
95 };
96
97
98
99 inline bool next (Image::Position& ref, Image::Position& other, const std::vector<int>* pos)
100 {
101 int axis = 0;
102 do {
103 ref.inc (axis);
104 if (ref[axis] < ref.dim(axis)) {
105 other.set (axis, pos[axis][ref[axis]]);
106 return (true);
107 }
108 ref.set (axis, 0);
109 other.set (axis, pos[axis][0]);
110 axis++;
111 } while (axis < ref.ndim());
112 return (false);
113 }
114
115
116
117
118
119 EXECUTE {
120 std::vector<OptBase> opt = get_options (1); // vox
121 std::vector<float> vox;
122 if (opt.size())
123 vox = parse_floats (opt[0][0].get_string());
124
125
126 opt = get_options (3); // scale
127 float scale = 1.0;
128 if (opt.size()) scale = opt[0][0].get_float();
129
130 opt = get_options (4); // offset
131 float offset = 0.0;
132 if (opt.size()) offset = opt[0][0].get_float();
133
134 opt = get_options (5); // zero
135 bool replace_NaN = opt.size();
136
137 opt = get_options (6); // output
138 Image::OutputType output_type = Image::Default;
139 if (opt.size()) {
140 switch (opt[0][0].get_int()) {
141 case 0: output_type = Image::Real; break;
142 case 1: output_type = Image::Imaginary; break;
143 case 2: output_type = Image::Magnitude; break;
144 case 3: output_type = Image::Phase; break;
145 case 4: output_type = Image::RealImag; break;
146 }
147 }
148
149
150
151
152 Image::Object &in_obj (*argument[0].get_image());
153
154 Image::Header header (in_obj);
155
156 if (output_type == 0) {
157 if (in_obj.is_complex()) output_type = Image::RealImag;
158 else output_type = Image::Default;
159 }
160
161 if (output_type == Image::RealImag) header.data_type = DataType::CFloat32;
162 else if (output_type == Image::Phase) header.data_type = DataType::Float32;
163 else header.data_type.unset_flag (DataType::ComplexNumber);
164
165
166 opt = get_options (2); // datatype
167 if (opt.size()) header.data_type.parse (data_type_choices[opt[0][0].get_int()]);
168
169 for (guint n = 0; n < vox.size(); n++)
170 if (isfinite (vox[n])) header.axes.vox[n] = vox[n];
171
172 opt = get_options (7); // layout
173 if (opt.size()) {
174 std::vector<Image::Axis> ax = parse_axes_specifier (header.axes, opt[0][0].get_string());
175 if (ax.size() != (guint) header.axes.ndim())
176 throw Exception (String("specified layout \"") + opt[0][0].get_string() + "\" does not match image dimensions");
177
178 for (guint i = 0; i < ax.size(); i++) {
179 header.axes.axis[i] = ax[i].axis;
180 header.axes.forward[i] = ax[i].forward;
181 }
182 }
183
184
185 opt = get_options (8); // prs
186 if (opt.size() && header.DW_scheme.rows() && header.DW_scheme.columns()) {
187 for (guint row = 0; row < header.DW_scheme.rows(); row++) {
188 double tmp = header.DW_scheme(row, 0);
189 header.DW_scheme(row, 0) = header.DW_scheme(row, 1);
190 header.DW_scheme(row, 1) = tmp;
191 header.DW_scheme(row, 2) = -header.DW_scheme(row, 2);
192 }
193 }
194
195 std::vector<int> pos[in_obj.ndim()];
196
197 opt = get_options (0); // coord
198 for (guint n = 0; n < opt.size(); n++) {
199 int axis = opt[n][0].get_int();
200 if (pos[axis].size()) throw Exception ("\"coord\" option specified twice for axis " + str (axis));
201 pos[axis] = parse_ints (opt[n][1].get_string(), header.dim(axis)-1);
202 header.axes.dim[axis] = pos[axis].size();
203 }
204
205 for (int n = 0; n < in_obj.ndim(); n++) {
206 if (pos[n].empty()) {
207 pos[n].resize (in_obj.dim(n));
208 for (guint i = 0; i < pos[n].size(); i++) pos[n][i] = i;
209 }
210 }
211
212
213 in_obj.apply_scaling (scale, offset);
214
215
216
217
218
219
220 Image::Position in (in_obj);
221 Image::Position out (*argument[1].get_image (header));
222
223 for (int n = 0; n < in.ndim(); n++) in.set (n, pos[n][0]);
224
225 ProgressBar::init (out.voxel_count(), "copying data...");
226
227 do {
228
229 float re, im = 0.0;
230 in.get (output_type, re, im);
231 if (replace_NaN) if (gsl_isnan (re)) re = 0.0;
232 out.re (re);
233
234 if (output_type == Image::RealImag) {
235 if (replace_NaN) if (gsl_isnan (im)) im = 0.0;
236 out.im (im);
237 }
238
239 ProgressBar::inc();
240 } while (next (out, in, pos));
241
242 ProgressBar::done();
243 }
As was noted in the comments, EXECUTE seems to be a macro, apparent from the context a function header (and maybe a bit more, e.g. some global variables and functions), so the part in curly braces is the function body.
To get to the definition of EXECUTE, you will have to examine the headers.
However, if you can reach some part of the code during debugging, you could insert a string or char[] at that point, giving it the stringified version of EXECUTE, so you get whatever the preprocessor will emit for EXECUTE at that position in the code.
#define STR(x) #x
#define STRINGIFY(x) STR(x)
char c[] = STRINGIFY(EXECUTE);
the two macros are a known little macro trick to get the content of any macro as a string literal. Try it out and inspect the char array in your debugger to get the content of execute.
My wild guess here: EXECUTE is the main function or a replacement for it, the OPTIONS and ARGUMENTS describe what arguments the program expects and what command line options you can pass to it. Those macros and some of the used functions and variables (get_options, argument) are part of a little framework that should facilitate the usage, evaluation and user information about command line options.

Direct-inclusion sorting

What is the other name for direct-inclusion sorting and what is the algorithm for the same sort?
I have been trying to search on the Internet, but I'm not getting a straight answer, but I can not find any. I found this algorithm for straight insertion sort and in some books it's saying they are the same with direct direct-inclusion sorting, but I'm doubting it because the book is in Russian, so I want to confirm (that is, if it's true or might I have a translation error?)
Code in C++:
int main(int argc, char* argv[])
{
int arr[8] = {27, 412, 71, 81, 59, 14, 273, 87},i,j;
for (j=1; j<8; j++){
if (arr[j] < arr[j-1]) {
//Что бы значение j мы не меняли а работали с i
i = j;
//Меняем местами пока не найдем нужное место
do{
swap(arr[i],arr[i-1]);
i--;
//защита от выхода за пределы массива
if (i == 0)
break;
}
while (arr[i] < arr[i-1]) ;
}
for (i=0;i<8;i++)
cout << arr[i]<< ' ';
cout << '\n';
}
getch();
return 0;
}
Result
27 412 71 81 59 14 273 87
27 71 412 81 59 14 273 87
27 71 81 412 59 14 273 87
27 59 71 81 412 14 273 87
14 27 59 71 81 412 273 87
14 27 59 71 81 273 412 87
14 27 59 71 81 87 273 412
The posted code is Insertion sort.
Most implementations will copy an out-of-order element to a temporary variable and then work backwards, moving elements up until the correct open spot is found to "insert" the current element. That's what the pseudocode in the Wikipedia article shows.
Some implementations just bubble the out-of-order element backwards while it's less than the element to its left. That's what the inner do...while loop in the posted code shows.
Both methods are valid ways to implement Insertion sort.
The code you posted looks not like an algorithm for insertion sort, since you are doing a repeated swap of two neighboring elements.
Your code looks much more like some kind of bubble-sort.
Here a list of common sorting algorithms:
https://en.wikipedia.org/wiki/Sorting_algorithm
"straight insertion" and "direct inclusion" sounds like pretty much the same .. so I quess they probably are different names for the same algorithm.
Edit:
Possibly the "straight" prefix should indicate that only one container is used .. however, if two neighboring elements are swaped, I would not call it insertion-sort, since no "insert" is done at all.
Given the fact that the term "direct inclusion sort" yields no google hits at all, and "direct insertion sorting" only 27 hits, the first three of which are this post here and two identically phrased blog posts, I doubt that this term has any widely accepted meaning. So the part of your question about
some book its saying they are the same with direct direct-inclusion sorting
is hard to answer, unless we find a clear definition of what direct-inclusion sorting actually is.

Fair comparison of fork() Vs Thread [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I was having a discussion about the relative cost of fork() Vs thread() for parallelization of a task.
We understand the basic differences between processes Vs Thread
Thread:
Easy to communicate between threads
Fast context switching.
Processes:
Fault tolerance.
Communicating with parent not a real problem (open a pipe)
Communication with other child processes hard
But we disagreed on the start-up cost of processes Vs threads.
So to test the theories I wrote the following code. My question: Is this a valid test of measuring the start-up cost or I am missing something. Also I would be interested in how each test performs on different platforms.
fork.cpp
#include <boost/lexical_cast.hpp>
#include <vector>
#include <unistd.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
extern "C" int threadStart(void* threadData)
{
return 0;
}
int main(int argc,char* argv[])
{
int threadCount = boost::lexical_cast<int>(argv[1]);
std::vector<pid_t> data(threadCount);
clock_t start = clock();
for(int loop=0;loop < threadCount;++loop)
{
data[loop] = fork();
if (data[looo] == -1)
{
std::cout << "Abort\n";
exit(1);
}
if (data[loop] == 0)
{
exit(threadStart(NULL));
}
}
clock_t middle = clock();
for(int loop=0;loop < threadCount;++loop)
{
int result;
waitpid(data[loop], &result, 0);
}
clock_t end = clock();
std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n";
}
Thread.cpp
#include <boost/lexical_cast.hpp>
#include <vector>
#include <iostream>
#include <pthread.h>
#include <time.h>
extern "C" void* threadStart(void* threadData)
{
return NULL;
}
int main(int argc,char* argv[])
{
int threadCount = boost::lexical_cast<int>(argv[1]);
std::vector<pthread_t> data(threadCount);
clock_t start = clock();
for(int loop=0;loop < threadCount;++loop)
{
if (pthread_create(&data[loop], NULL, threadStart, NULL) != 0)
{
std::cout << "Abort\n";
exit(1);
}
}
clock_t middle = clock();
for(int loop=0;loop < threadCount;++loop)
{
void* result;
pthread_join(data[loop], &result);
}
clock_t end = clock();
std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n";
}
I expect Windows to do worse in processes creation.
But I would expect modern Unix like systems to have a fairly light fork cost and be at least comparable to thread. On older Unix style systems (before fork() was implemented as using copy on write pages) that it would be worse.
Anyway My timing results are:
> uname -a
Darwin Alpha.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
> gcc --version | grep GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)
> g++ thread.cpp -o thread -I~/include
> g++ fork.cpp -o fork -I~/include
> foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 )
foreach? ./thread ${a} >> A
foreach? end
> foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 )
foreach? ./fork ${a} >> A
foreach? end
vi A
Thread: Fork:
C Start Wait Total C Start Wait Total
==============================================================
1 26 145 171 1 160 37 197
2 44 198 242 2 290 37 327
3 62 234 296 3 413 41 454
4 77 275 352 4 499 59 558
5 91 107 10808 5 599 57 656
6 99 332 431 6 665 52 717
7 130 388 518 7 741 69 810
8 204 468 672 8 833 56 889
9 164 469 633 9 1067 76 1143
10 165 450 615 10 1147 64 1211
12 343 585 928 12 1213 71 1284
15 232 647 879 15 1360 203 1563
20 319 921 1240 20 2161 96 2257
30 461 1243 1704 30 3005 129 3134
40 559 1487 2046 40 4466 166 4632
50 686 1912 2598 50 4591 292 4883
60 827 2208 3035 60 5234 317 5551
70 973 2885 3858 70 7003 416 7419
80 3545 2738 6283 80 7735 293 8028
90 1392 3497 4889 90 7869 463 8332
100 3917 4180 8097 100 8974 436 9410
Edit:
Doing a 1000 children caused the fork version to fail.
So I have reduced the children count. But doing a single test also seems unfair so here is a range of values.
mumble ... I do not like your solution for many reasons:
You are not taking in account the execution time of child processes/thread.
You should compare cpu-usage not the bare elapsed time. This way your statistics will not depend from, e.g., disk access congestion.
Let your child process do something. Remember that "modern" fork uses copy-on-write mechanisms to avoid to allocate memory to the child process until needed. It is too easy to exit immediately. This way you avoid quite all the disadvantages of fork.
CPU time is not the only cost you have to account. Memory consumption and slowness of IPC are both disadvantages of fork solution.
You could use "rusage" instead of "clock" to measure real resource usage.
P.S. I do not think you can really measure the process/thread overhead writing a simple test program. There are too many factors and, usually, the choice between threads and processes is driven by other reasons than mere cpu-usage.
Under Linux fork is a special call to sys_clone, either within the library or within the kernel. Clone has lots of switches to flip on and off, and each of them effects how expensive it is to start.
The actual library function clone is probably more expensive than fork though because it does more, though most of that is on the child side (stack swapping and calling a function by pointer).
What that micro-benchmark shows is that thread creation and joining (there are no fork results when I'm writing this) takes tens or hundreds of microseconds (assuming your system has CLOCKS_PER_SEC=1000000, which it probably has, since it's an XSI requirement).
Since you said that fork() takes 3 times the cost of threads, we are still talking tenths of a millisecond at worst. If that is noticeable on an application, you could use pools of processes/threads, like Apache 1.3 did. In any case, I'd say that startup time is a moot point.
The important difference of threads vs processes (on Linux and most Unix-likes) is that on processes you choose explicitly what to share, using IPC, shared memory (SYSV or mmap-style), pipes, sockets (you can send file descriptors over AF_UNIX sockets, meaning you get to choose which fd's to share), ... While on threads almost everything is shared by default, whether there's a need to share it or not. In fact, that is the reason Plan 9 had rfork() and Linux has clone() (and recently unshare()), so you can choose what to share.