I'm trying to use the OpenCV stitcher class to stitch multiple frames from a stereo setup, in which neither camera moves. I'm getting poor stitching results when running across multiple frames. I've tried a few different ways, which I'll try to explain here.
Using stitcher.stitch( )
Given a stereo pair of views, I ran the following code for some frames (VideoFile is a custom wrapper for the OpenCV VideoCapture object):
VideoFile f1( ... );
VideoFile f2( ... );
cv::Mat output_frame;
cv::Stitcher stitcher = cv::Stitcher::createDefault(true);
for( int i = 0; i < num_frames; i++ ) {
currentFrames.push_back(f1.frame( ));
currentFrames.push_back(f2.frame( ));
stitcher.stitch( currentFrames, output_mat );
// Write output_mat, put it in a named window, etc...
f1.next_frame();
f2.next_frame();
currentFrames.clear();
}
This gave really quite good results on each frame, but since the parameters are estimated each frame put in a video you could see the small differences in stitching where the parameters were slightly different.
Using estimateTransform( ) & composePanorama( )
To get past the problem of the above method, I decided to try estimating the parameters only on the first frame, then use composePanorama( ) to stitch all subsequent frames.
for( int i = 0; i < num_frames; i++ ) {
currentFrames.push_back(f1.frame( ));
currentFrames.push_back(f2.frame( ));
if( ! have_transform ) {
status = stitcher.estimateTransform( currentFrames );
}
status = stitcher.composePanorama(currentFrames, output_frame );
// ... as above
}
Sadly there appears to be a bug (documented here) causing the two views to move apart in a very odd way, as in the images below:
Frame 1:
Frame 2:
...
Frame 8:
Clearly this is useless, but I thought it may be just because of the bug, which basically keeps multiplying the intrinsic parameter matrix by a constant each time composePanorama() is called. So I did a minor patch on the bug, stopping this from happening, but then the stitching results were poor. Patch below (modules/stitching/src/stitcher.cpp), results afterwards:
243 for (size_t i = 0; i < imgs_.size(); ++i)
244 {
245 // Update intrinsics
246 // change following to *=1 to prevent scaling error, but messes up stitching.
247 cameras_[i].focal *= compose_work_aspect;
248 cameras_[i].ppx *= compose_work_aspect;
249 cameras_[i].ppy *= compose_work_aspect;
Results:
Does anyone have a clue how I can fix this problem? Basically I need to work out the transformation once, then use it on the remaining frames (we're talking 30mins of video).
I'm ideally looking for some advice on patching the stitcher class, but I would be willing to try handcoding a different solution. An earlier attempt which involved finding SURF points, correlating them and finding the homography gave fairly poor results compared to the stitcher class, so I'd rather use it if possible.
So in the end, I hacked about with the stitcher.cpp code and got something close to a solution (but not perfect as the stitching seam still moves about a lot so your mileage may vary).
Changes to stitcher.hpp
Added a new function setCameras() at line 136:
void setCameras( std::vector<detail::CameraParams> c ) {
this->cameras_ = c;
}`
Added a new private member variable to keep track of whether this is our first estimation:
bool _not_first;
Changes to stitcher.cpp
In estimateTransform() (line ~100):
this->not_first = 0;
images.getMatVector(imgs_);
// ...
In composePanorama() (line ~227):
// ...
compose_work_aspect = compose_scale / work_scale_;
// Update warped image scale
if( !this->not_first ) {
warped_image_scale_ *= static_cast<float>(compose_work_aspect);
this->not_first = 1;
}
w = warper_->create((float)warped_image_scale_);
// ...
Code calling stitcher object:
So basically, we create a stitcher object, then get the transform on the first frame (storing the camera matrices outside the stitcher class). The stitcher will then break the Intrinsic Matrix somewhere along the line causing the next frame to mess up. So before we process it, we just reset the cameras using the ones we extracted from the class.
Be warned, I had to have some error checking in case the stitcher couldn't produce an estimation with the default settings - you may need to iteratively decrease the confidence threshold using setPanoConfidenceThresh(...) before you get a result.
cv::Stitcher stitcher = cv::Stitcher::createDefault(true);
std::vector<cv::detail::CameraParams> cams;
bool have_transform = false;
for( int i = 0; i < num_frames; i++ ) {
currentFrames.push_back(f1.frame( ));
currentFrames.push_back(f2.frame( ));
if( ! have_transform ) {
status = stitcher.estimateTransform( currentFrames );
have_transform = true;
cams = stitcher.cameras();
// some code to check the status of the stitch and handle errors...
}
stitcher.setCameras( cams );
status = stitcher.composePanorama(currentFrames, output_frame );
// ... Doing stuff with the panorama
}
Please be aware that this is very much a hack of the OpenCV code, which is going to make updating to a newer version a pain. Unfortunately I was short of time so a nasty hack was all I could get round to!
Related
I am building an Android app to create panoramas. The user captures a set of images and those images
are sent to my native stitch function that was based on https://github.com/opencv/opencv/blob/master/samples/cpp/stitching_detailed.cpp.
Since the images are in order, I would like to match each image only to the next image in the vector.
I found an Intel article that was doing just that with following code:
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_gpu, match_conf);
Mat matchMask(features.size(),features.size(),CV_8U,Scalar(0));
for (int i = 0; i < num_images -1; ++i)
{
matchMask.at<char>(i,i+1) =1;
}
matcher(features, pairwise_matches,matchMask);
matcher.collectGarbage();
Problem is, this wont compile. Im guessing its because im using OpenCV 3.1.
Then I found somewhere that this code would do the same:
int range_width = 2;
BestOf2NearestRangeMatcher matcher(range_width, try_cuda, match_conf);
matcher(features, pairwise_matches);
matcher.collectGarbage();
And for most of my samples this works fine. However sometimes, especially when im stitching
a large set of images (around 15), some objects appear on top of eachother and in places they shouldnt.
I've also noticed that the "beginning" (left side) of the end result is not the first image in the vector either
which is strange.
I am using "orb" as features_type and "ray" as ba_cost_func. Seems like I cant use SURF on OpenCV 3.1.
The rest of my initial parameters look like this:
bool try_cuda = false;
double compose_megapix = -1; //keeps resolution for final panorama
float match_conf = 0.3f; //0.3 default for orb
string ba_refine_mask = "xxxxx";
bool do_wave_correct = true;
WaveCorrectKind wave_correct = detail::WAVE_CORRECT_HORIZ;
int blend_type = Blender::MULTI_BAND;
float blend_strength = 5;
double work_megapix = 0.6;
double seam_megapix = 0.08;
float conf_thresh = 0.5f;
int expos_comp_type = ExposureCompensator::GAIN_BLOCKS;
string seam_find_type = "dp_colorgrad";
string warp_type = "spherical";
So could anyone enlighten me as to why this is not working and how I should match my features? Any help or direction would be much appreciated!
TL;DR : I want to stitch images in the order they were taken, but above codes are not working for me, how can I do that?
So I found out that the issue here is not with the order the images are stitched but rather the rotation that is estimated for the camera parameters in the Homography Based Estimator and the Bundle Ray Adjuster.
Those rotation angles are estimated considering a self rotating camera and my use case envolves an user rotating the camera (which means that will be some translation too.
Because of that (i guess) horizontal angles (around Y axis) are highly overestimated which means that the algorithm considers the set of images cover >= 360 degrees which results in some overlapped areas that shouldnt be overlapped.
Still havent found a solution for that problem though.
matcher() takes UMat as mask instead of Mat object, so try the following code:
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_gpu, match_conf);
Mat matchMask(features.size(),features.size(),CV_8U,Scalar(0));
for (int i = 0; i < num_images -1; ++i)
{
matchMask.at<char>(i,i+1) =1;
}
UMat umask = matchMask.getUMat(ACCESS_READ);
matcher(features, pairwise_matches, umask);
matcher.collectGarbage();
I'm trying to teach myself opencv and c++ and this example program for face and eye detection includes the line:
for(size_t i = 0; i < faces.size(); i++)
I don't understand what faces.size() means, and following from that at what point i can be greater than faces.size().
How does it acquire a numerical value?
I see plenty of instances of faces throughout the rest of the program, but the only time I see size is as a parameter for face_cascade.detectMultiScale. It is capitalized though, which makes me think that it has nothing to do with faces.size().
faces.size()
Returns the size of 'faces', i.e. how many faces there are in 'faces'.
In general a basic for loop is structured like so:
for ( init; condition; increment )
{
//your code...
}
It will run as long as the condition is true, i.e. as long as 'i' is less than faces.size() (which might be '10' or some other integer value).
'i' will get bigger as for each loop iteration 1 is added to it. This is managed by the i++ instruction.
I'd suggest if you're struggling with loop syntax that openCV might not be the best place to start learning C++ as a lot of the examples expect a level of competence higher than 'beginner' (intentionally and unintentionally via simple bad coding/lack of commenting etc.)
faces is being populated here :
//-- Detect faces
face_cascade.detectMultiScale( frame_gray, faces, 1.1, 2, 0|CASCADE_SCALE_IMAGE, Size(30, 30) );
According to OpenCV documentation :
void cv::CascadeClassifier::detectMultiScale ( InputArray image,
std::vector< Rect > & objects,
double scaleFactor = 1.1,
int minNeighbors = 3,
int flags = 0,
Size minSize = Size(),
Size maxSize = Size()
)
where std::vector< Rect > & objects (faces in your case) is a
Vector of rectangles where each rectangle contains the detected
object, the rectangles may be partially outside the original image.
As you can see, objects is passed by reference to allow its modification inside the function.
Also std::vector<Type>::size() will give you the size of your vector, so, i<faces.size() is necessary to get the index i inside the bounds of the vector.
I'm trying to implement video stabilization using OpenCV videostab module. I need to do it in stream, so I'm trying to get motion between two frames. After learning documentation, I decide to do it this way:
estimator = new cv::videostab::MotionEstimatorRansacL2(cv::videostab::MM_TRANSLATION);
keypointEstimator = new cv::videostab::KeypointBasedMotionEstimator(estimator);
bool res;
auto motion = keypointEstimator->estimate(this->firstFrame, thisFrame, &res);
std::vector<float> matrix(motion.data, motion.data + (motion.rows*motion.cols));
Where firstFrame and thisFrame are fully initialized frames. The problem is, that estimate method always return the matrix like that:
In this matrix only last value(matrix[8]) is changing from frame to frame. Am I correctly use videostab objects and how can I apply this matrix on frame to get result?
I am new to OpenCV but here is how I have solved this issue.
The problem lies in the line:
std::vector<float> matrix(motion.data, motion.data + (motion.rows*motion.cols));
For me, the motion matrix is of type 64-bit double (check yours from here) and copying it into std::vector<float> matrix of type 32-bit float messes-up the values.
To solve this issue, try replacing above line with:
std::vector<float> matrix;
for (auto row = 0; row < motion.rows; row++) {
for (auto col = 0; col < motion.cols; col++) {
matrix.push_back(motion.at<float>(row, col));
}
}
I have tested it with running the estimator on duplicate set of points and it gives expected results with most entries close to 0.0 and matrix[0], matrix[4] and matrix[8] being 1.0 (using author's code with this setting was giving the same erroneous values as author's picture displays).
I want to count the vehicles from a video. After frame differencing I got a gray scale image or kind of binary image. I have defined a Region of Interest to work on a specific area of the frames, the values of the pixels of the vehicles passing through the Region of Interest are higher than 0 or even higher than 40 or 50 because they are white.
My idea is that when a certain number of pixels in a specific interval of time (say 1-2 seconds) are white then there must be a vehicle passing so I will increment the counter.
What I want is, to check whether there are still white pixels coming or not after a 1-2 seconds. If there are no white pixels coming it means that the vehicle has passed and the next vehicle is gonna come, in this way the counter must be incremented.
One method that came to my mind is to count the frames of the video and store it in a variable called No_of_frames. Then using that variable I think I can estimate the time passed. If the value of the variable No_of_frames is greater then lets say 20, it means that nearly 1 second has passed, if my videos frame rate is 25-30 fps.
I am using Qt Creator with windows 7 and OpenCV 2.3.1
My code is something like:
for(int i=0; i<matFrame.rows; i++)
{
for(int j=0;j<matFrame.cols;j++)
if (matFrame.at<uchar>(i,j)>100)//values of pixels greater than 100
//will be considered as white.
{
whitePixels++;
}
if ()// here I want to use time. The 'if' statement must be like:
//if (total_no._of_whitepixels>100 && no_white_pixel_came_after 2secs)
//which means that a vehicle has just passed so increment the counter.
{
counter++;
}
}
Any other idea for counting the vehicles, better than mine, will be most welcomed. Thanks in advance.
For background segmentation I am using the following algorithm but it is very slow, I don't know why. The whole code is as follows:
// opencv2/video/background_segm.hpp OPENCV header file must be included.
IplImage* tmp_frame = NULL;
CvCapture* cap = NULL;
bool update_bg_model = true;
Mat element = getStructuringElement( 0, Size( 2,2 ),Point() );
Mat eroded_frame;
Mat before_erode;
if( argc > 2 )
cap = cvCaptureFromCAM(0);
else
// cap = cvCreateFileCapture( "C:\\4.avi" );
cap = cvCreateFileCapture( "C:\\traffic2.mp4" );
if( !cap )
{
printf("can not open camera or video file\n");
return -1;
}
tmp_frame = cvQueryFrame(cap);
if(!tmp_frame)
{
printf("can not read data from the video source\n");
return -1;
}
cvNamedWindow("BackGround", 1);
cvNamedWindow("ForeGround", 1);
CvBGStatModel* bg_model = 0;
for( int fr = 1;tmp_frame; tmp_frame = cvQueryFrame(cap), fr++ )
{
if(!bg_model)
{
//create BG model
bg_model = cvCreateGaussianBGModel( tmp_frame );
// bg_model = cvCreateFGDStatModel( temp );
continue;
}
double t = (double)cvGetTickCount();
cvUpdateBGStatModel( tmp_frame, bg_model, update_bg_model ? -1 : 0 );
t = (double)cvGetTickCount() - t;
printf( "%d. %.1f\n", fr, t/(cvGetTickFrequency()*1000.) );
before_erode= bg_model->foreground;
cv::erode((Mat)bg_model->background, (Mat)bg_model->foreground, element );
//eroded_frame=bg_model->foreground;
//frame=(IplImage *)erode_frame.data;
cvShowImage("BackGround", bg_model->background);
cvShowImage("ForeGround", bg_model->foreground);
// cvShowImage("ForeGround", bg_model->foreground);
char k = cvWaitKey(5);
if( k == 27 ) break;
if( k == ' ' )
{
update_bg_model = !update_bg_model;
if(update_bg_model)
printf("Background update is on\n");
else
printf("Background update is off\n");
}
}
cvReleaseBGStatModel( &bg_model );
cvReleaseCapture(&cap);
return 0;
A great deal of research has been done on vehicle tracking and counting. The approach you describe appears to be quite fragile, and is unlikely to be robust or accurate. The main issue is using a count of pixels above a certain threshold, without regard for their spatial connectivity or temporal relation.
Frame differencing can be useful for separating a moving object from its background, provided the object of interest is the only (or largest) moving object.
What you really need is to first identify the object of interest, segment it from the background, and track it over time using an adaptive filter (such as a Kalman filter). Have a look at the OpenCV video reference. OpenCV provides background subtraction and object segmentation to do all the required steps.
I suggest you read up on OpenCV - Learning OpenCV is a great read. And also on more general computer vision algorithms and theory - http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm has a good list.
Normally they just put a small pneumatic pipe across the road (soft pipe semi filled with air). It is attached to a simple counter. Each vehicle passing over the pipe generates two pulses (first front, then rear wheels). The counter records the number of pulses in specified time intervals and divides by 2 to get the approx vehicle count.
Maybe not really big, but a hundred frames or something. Is the only way to load it in by making an array and loading each image individually?
load_image() is a function I made which loads the images and converts their BPP.
expl[0] = load_image( "explode1.gif" );
expl[1] = load_image( "explode2.gif" );
expl[2] = load_image( "explode3.gif" );
expl[3] = load_image( "explode4.gif" );
...
expl[99] = load_image( "explode100.gif" );
Seems like their should be a better way.. at least I hope.
A common technique is spritesheets, in which a single, large image is divided into a grid of cells, with each cell containing one frame of an animation. Often, all animation frames for any game entity are placed on a single, sometimes huge, sprite sheet.
maybe simplify your loading with a utility function that builds a filename for each iteration of a loop:
LoadAnimation(char* isFileBase, int numFrames)
{
char szFileName[255];
for(int i = 0; i < numFrames; i++)
{
// append the frame number and .gif to the file base to get the filename
sprintf(szFileName, "%s%d.gif", isFileBase, i);
expl[i] = load_image(szFileName);
}
}
Instead of loading as a grid, stack all the frames in one vertical strip (same image). Then you only need to know how many rows per frame and you can set a pointer to the frame row offset. You end up still having contiguous scan lines that can be displayed directly or trivially chewed off into separate images.