opencv video stabilization algorithm - c++

I am writing video stabilizer using opencv. The algorithm is as follows:
while there are more frames in the video:
take new frame from the video
detect keypoints in the new frame
compute descriptor for new keypoints
match descriptors of the new and the previous frame
filter matches to get good matches
find homography between previous and new frame
apply homography (warpPerspective) to the new frame and thus create "adjusted new frame"
set previous frame to be equal to "adjusted new frame" (descriptors, keypoints)
I have a few questions. Am I on the right track? How to do the actual stabilization (using Gaussian filter or something else)?

Here is possible sequence of steps:
Step 1. Read Frames from a Movie File
Step 2. Collect Salient Points from Each Frame
Step 3. Select Correspondences Between Points
Step 4. Estimating Transform from Noisy Correspondences
Step 5. Transform Approximation and Smoothing
Step 6. Run on the Full Video
More details on each step you can find here:
I think you can follow the same steps in OpenCV.

If you're using python code then you can use my powerful & threaded VidGear Video Processing python library that now provides real-time Video Stabilization with minimalistic latency and at the expense of little to no additional computational power requirement with Stabilizer Class. Here's a basic usage example for your convenience:
# import libraries
from vidgear.gears import VideoGear
from vidgear.gears import WriteGear
import cv2
stream = VideoGear(source=0, stabilize = True).start() # To open any valid video stream(for e.g device at 0 index)
# infinite loop
while True:
frame =
# read stabilized frames
# check if frame is None
if frame is None:
#if True break the infinite loop
# do something with stabilized frame here
cv2.imshow("Stabilized Frame", frame)
# Show output window
key = cv2.waitKey(1) & 0xFF
# check for 'q' key-press
if key == ord("q"):
#if 'q' key-pressed break out
# close output window
# safely close video stream
More advanced usage can be found here:


How can I merge 2 videos together in OpenCV? Similar to that of a SimulCam

I am looking for a way to combine/blend 2 videos irrespective of alignment in OpenCV.
I have 2 videos of the same scene, one is a ball rolling fast, the other slow from a standardised start point.
I have managed to work out how to use AddWeighted() previously to blend two images together, but have little knowledge of performing something similar for Videos.
I understand that it involves reading frames of the respective sources and processing them.. but that is all.
Any help or direction would be greatly appreciated.
perform operation on every frame of video
import cv2
import numpy as np
video1 = cv2.VideoCapture('output.avi')
video2 = cv2.VideoCapture('output1.avi')
while True:
ret1, frame1 =
ret2, frame2 =
if ret1==False or ret2==False:
frame1=cv2.resize(frame1, (240,320))
frame2=cv2.resize(frame2, (240,320))
dst = cv2.addWeighted(frame1,0.3,frame2,0.7,0)
key = cv2.waitKey(1)
if key==ord('q'):

How to make image comparison in openCV more coarse

I am writing a code on raspberry pi in python to compare two images using mean squared error. The project is an personal home security thing.
My main goal is to detect a change between the images that I capture from pi camera(if something is added to the current image or something removed from the image) but right now my code is too sensitive. It is affected by change in background lighting, which I do not want.
I have two options in front of me, to either scrape my current logic and start a new one or improve my current logic to account for these noise(if I can call them that). I am searching for ways to improve my logic but I wanted some guidance on how to go about it.
My biggest fear being, am I wasting time kicking a dead horse or should I just look for some other algorithm to detect a change in image or should I use edge detection
import numpy as np
import cv2
import os
from threading import Thread
######Function Definition########################################
def mse(imageA, imageB):
# the 'Mean Squared Error' between the two images is the
# sum of the squared difference between the two images;
# NOTE: the two images must have the same dimension
err = np.sum((imageA.astype("int") - imageB.astype("int")) ** 2)
err /= int(imageA.shape[0] * imageA.shape[1])
# return the MSE, the lower the error, the more "similar"
# the two images are
return err
def compare_images(imageA, imageB):
# compute the mean squared error
m = mse(imageA, imageB)
def capture_image():
##shell command to click photos
##original image Path variable
original_image_path= "/home/pi/Downloads/python-compare-two-images/originalimage.png"
##original_image_args is a shell command to click photos
original_image_args="raspistill -o "+original_image_path+" -w 320 -h 240 -q 50 -t 500"
##read the greyscale of the image in to the variable original_image
original_image=cv2.imread(original_image_path, 0)
##Three images
image_args="raspistill -o /home/pi/Downloads/python-compare-two-images/Test_Images/image.png -w 320 -h 240 -q 50 --nopreview -t 10 --exposure sports"
#created a new thread to take pictures
#Thread started
flag = 0
image1 = cv2.imread((image_path+image1_name), 0)
compare_images(original_image, image1)
A first improvement is to adjust a gain to compensate for the global variation of the light. Like taking the average intensity of the two images and correcting one with the ratio of the intensities.
This can fail in case of an change of the foreground, which will influence the global average. If that change in the foreground doesn't have a too large area, you can get an estimate by robust fitting of a linear model y = a.x.
A worse, but unfortunately common, scenario, is when the background illumination changes in a non-uniform way. A partial solution is to try and fit a non-uniform gain model such as one obtained by bilinear interpolation between gains estimated at the corners, or a finer subdivision of the image.
The topic of change detection is a very studied field. One of the basic options is to model each one of the pixels as a Gaussian distribution by sampling a lot of images for each pixel and calculate the mean and variance of each pixel.
For the pixels that tend to change when there is change in lighting the variance of the pixels will be bigger than the ones that don't change as much.
In order to detect movement for a certain pixel you just need to choose what is the probability you consider as an unordarinry change in the pixel value and use the Gaussain distribution you calculated to find what is the corresponding value that is considered unordarinry.
To make this solution efficient for your raspberry pi you will need to first do an "offline" calculation of the values for each pixel that will be the threshold values for which the change in the pixel value is considered movement and store them in a file and than in the "online" sage you will just compare each pixel to the calculated value.
For the "offline" stage i recommend using images that were recorder during the entire day in order to get all the variation you need per pixel. This stage of curse can be done on your computer and only the output file will be uploaded to the raspberry pi

selecting multiple ROI in an image

hey guys i am using opencv 2.4 with python 2.7 on ubuntu14.04
I want to select multiple Region of Interest in an image is it possible to do so.
I want to do motion detection in only the area i have selected to do so any of the following theory can solve my problem but don't know how to implement any of them : -
Mask the area in image which is not ROI
After creating multiple ROI image how to add them such that all those ROI can be on the original location and remaining area be masked
Yes it is possible to do so. Main Idea behind the solution would be creating a mask and setting it to 0 wherever you do not want the motion tracker to track.
If you are using numpythen you can create the mask and set the regions you do not want the detector to use, to zero. (Similar to cv::Rect(start.col, start.row, numberof.cols, numberof.rows) = 0 in c++)
In python using numpy you can create a mask, somewhat like this:
import numpy as np
ret, frame =
if frame.ndim == 3
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
elif frame.ndim == 4
gray = cv2.cvtColor(frame, cv2.COLOR_BGRA2GRAY)
gray = frame
# create mask
mask = np.ones_like(gray)
mask[start_row:end_row, start_col:end_col] = 0
mask[another_starting_row:another_ending_row, another_start_col:another_end_col] = 0
# and so on you can create your own mask
# use for loops to create specific masks
It is a bit crude solution but will do the job. check numpy documentation (PDF) for more info.

Tuning background subtraction with OpenCV

My question is the final paragraph.
I am trying to use one of OpenCV's background subtractors as a means of detecting human hands. The code that tries to do this is as follows:
cv::Ptr<cv::BackgroundSubtractor> pMOG2 = cv::createBackgroundSubtractorMOG2();
cv::Mat fgMaskMOG2;
pMOG2->apply(input, fgMaskMOG2, -1);
cv::namedWindow("FG Mask MOG 2");
cv::imshow("FG Mask MOG 2", fgMaskMOG2);
When I initially ran the program on my own test video I was greeted with this (ignore the name of the right most window):
As you can see a mask is not detected for my moving hand at all, given that the background in my video is completely stationary (there were maybe one or two white pixels at a time showing up in the mask). So I tried using a different video, one that many examples seemed to use which was moving traffic.
You can see it picked up on a moving car -very- slightly. I have tried (for both these videos) setting the "learning threshold" for the apply method to many values between 0 and 1 and there was not much variation at all from the results you can see above.
Have I missed anything with regards to setting up the background subtraction or are the videos particularly hard examples to deal with? Where can I adjust the settings of the background subtraction to favour my setup (if anywhere)? I will repeat the fact that in both videos the camera is stationary.
My answer is in python but convert and try it. Approve if it works.
if (cap.isOpened() == False):
print("Error opening video stream or file")
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
fgbg = cv2.createBackgroundSubtractorMOG2()
connectivity = 4
# Read until video is completed
while (cap.isOpened()):
# Capture frame-by-frame
ret, frame =
if ret == True:
print("Frame detected")
frame1 = frame.copy()
fgmask = fgbg.apply(frame1)
fgmask = cv2.morphologyEx(fgmask, cv2.MORPH_OPEN, kernel)
output = cv2.connectedComponentsWithStats(
fgmask, connectivity, cv2.CV_32S)
for i in range(output[0]):
if output[2][i][4] >= min_thresh and output[2][i][4] <= max_thresh:
cv2.rectangle(frame, (output[2][i][0], output[2][i][1]), (
output[2][i][0] + output[2][i][2], output[2][i][1] + output[2][i][3]), (0, 255, 0), 2)
cv2.imshow('detection', frame)
cv2.imshow('detection', fgmask)
Update cv2.createBackgroundSubtractorMOG2 by changing history, varThreshold, and detectShadows=True. You can also change kernel sizel, remove noise etc.
Try using MOG subtractor instead of MOG2 background subtractor.. It might help you.
Because most times MOG subtractor would be handy. But the worst thing is MOG subtractor has been moved to bgsegm package. It's a contrib package. It is available in OpenCv git hub page itself.

Retrieving the current frame number in OpenCV

How can I retrieve the current frame number of a video using OpenCV? Does OpenCV have any built-in function for getting the current frame or I have to do it manually?
You can use the "get" method of your capture object like below :
capture.get(CV_CAP_PROP_POS_FRAMES); // retrieves the current frame number
and also :
capture.get(CV_CAP_PROP_FRAME_COUNT); // returns the number of total frames
Btw, these methods return a double value.
You can also use cvGetCaptureProperty method (if you use old C interface).
cvGetCaptureProperty(CvCapture* capture,int property_id);
property_id options are below with definitions:
POS_MSEC is the current position in a video file, measured in
POS_FRAME is the position of current frame in video (like 55th frame of video).
POS_AVI_RATIO is the current position given as a number between 0 and 1
(this is actually quite useful when you want to position a trackbar
to allow folks to navigate around your video).
FRAME_WIDTH and FRAME_HEIGHT are the dimensions of the individual
frames of the video to be read (or to be captured at the camera’s
current settings).
FPS is specific to video files and indicates the number of frames
per second at which the video was captured. You will need to know
this if you want to play back your video and have it come out at the
right speed.
FOURCC is the four-character code for the compression codec to be
used for the video you are currently reading.
FRAME_COUNT should be the total number of frames in video, but
this figure is not entirely reliable.
(from Learning OpenCV book )
In openCV version 3.4, the correct flag is:
The way of doing it in OpenCV python is like this:
import cv2
cam = cv2.VideoCapture(<filename>);
print cam.get(