Dense optical flow for real time on hardware for 4K resolution #30fps - computer-vision

I am working on a hardware-based solution ( without GPU) for dense optical flow to get real-time performance # 30fps with decent accuracy. Something comparable to or better than NVIDIA’s optical flow SDK. Can someone please suggest good algorithms other than Pyramidal Lukas Kanade and horn Schunck. I found SGM as a good starting point but it’s difficult to implement on FPGA or DSP core. The target is to measure large displacements with occlusion as well as similar to real-world videos.
It would be great if someone could tell what exactly algorithm NVIDIA has used.

For dense optical flow estimation in real-time setup, FlowNet is a good option. It can achieve optical flow estimation at a higher FPS. You can take their trained model to perform inference. Since you want to run the estimation in a non-GPU environment, you can try converting the model to ONNX format. A good implementation of FlowNet is available in NVIDIA's Github repo. I am not sure exactly which algorithm NVIDIA is using in its SDK for optical flow.
The FlowNet2 is built upon previous work of FlowNet to compute large displacement. However, if you are concerned about occlusion then you may check out their follow up work on FlowNet3. Another alternative to FlowNet is PwC-Net.

Related

Nvidia VPI vs Nvidia VisionWorks performance difference for optical flow

I had an computer vision application that works with Nvidia VisionWorks. Now that Nvidia replaced VisionWorks with VPI(Vision Programming Interface), I am trying to rewrite my application using VPI. At some point, I realized that Pyramid LK Optical Flow performance is not even close to VisionWorks. I checked the logs and figured out that VPI Optical flow is not as performant but I couldn't find anyone having the same problem. Is my judgement skewed or is there a performance problem on this new product of Nvidia?

How to run Flow training on 2 GPU(s)?

Repost from Wouter J.
How can I run the flow training on GPU(s)?
I'm currently trying to get the ppo_runner benchmark to work with my 2 GPU's. However, I can't find any way to make use of these resources.
What would I need to change in the code to make use of the GPU's?
Repost from the Flow team
Perhaps a good place to start is to learn how to use multiple GPUs through Ray:
https://ray.readthedocs.io/en/latest/using-ray-with-gpus.html
Learning how to use multiple GPUs with ray will enable you to train with multiple GPUs too because we use Ray RLlib for training.
Although a fair warning: unless your problem is well suited for GPUs (large image input, large neural nets), often than not, training on a multi-thread CPU (e.g. 64-thread Xeon) is much faster than training on multiple GPUs.

Improve performance of dense optical flow analysis (easily)?

I wrote a program that uses OpenCV's cvCalcOpticalFlowLK. It performs fine on a low-resolution webcam input, but I need to run it on a full HD stream with significant other computation following the optical flow analysis for each frame. Processing a 5 minute video scaled down to 1440x810 took 4 hours :( Most of the time is being spent in cvCalcOpticalFlowLK.
I've researched improving the speed by adding more raw CPU, but even if I get an 8-core beast, and the speedup is the theoretical ideal (say 8x, since I'm basically only using one of my 2.9GHz cores), I'd only be getting 4FPS. I'd like to reach 30FPS.
More research seems to point to implementing it on the GPU with CUDA, OpenCL, or GLSL(?). I've found some proof-of-concept implementations (eg. http://nghiaho.com/?page_id=189), and many papers saying basically "it's a great application for the GPU, we did it, it was awesome, and no we won't share our code". Needless to say, I haven't gotten any of them to run.
Does anyone know of a GPU-based implementation that would run on Mac with an NVIDIA card? Are there resources that might help me approach writing my own? Are there other dense OF algorithms that might perform better?
Thanks!
What about OpenVidia Bayesian Optical Flow? Also the paper Real-Time Dense and Accurate Parallel Optical Flow using CUDA says that their work is freely available in the CUDA zone. I couldn't find it there immediately, but maybe you will, or can write the authors?

Fastest deskew algorithm?

I am a little overwhelmed by my task at hand. We have a toolkit which we use for TWAIN scanning. Some of our customers are complaining about slower scan speeds when the deskew option is set. This is because if their scanner does not support a hardware deskew, it is done in post-processing on the CPU. I was wondering if anyone knows of a good (i.e. fast) algorithm to achieve this. It is hard for me to say what algorithm we are using now. What algorithms are out there for this, and how do they rank as far as speed/accuracy? If I knew the names of the algorithms, it could be easier for me to do a google search on them.
Thank You.
-Tom
Are you scanning in Color or B/W ?
Deskew is processor intensive. A Group4 tiff or JPEG must be decompressed, skew angle determined, deskewed and then compressed.
There are many image processing algorithms out there with deskew and I have evaluated many over the years. There are some huge differences in processing speed between the different libraries and a lot of it comes down to how well it is coded rather than the algorithm used. There is a huge difference in commercial libraries just reading and writing images.
The fastest commerical deskew I have used by far comes from Unisoft Imaging (www.unisoftimaging.com). I assume much of it is written in assembler. Unisoft has been around for many years and is very fast and efficient. It supports different many different deskew options including black border removal, color and B/W deskew. The Group4 routines are very solid and very fast. The library comes with many other image processing options as well as TWAIN and native SCSI scanner support. It also supports Unix.
If you want a free deskew then you might want to have a look at Leptonica. It does not come with too much documentation but is very stable and well written. http://www.leptonica.com/
Developing code from scratch could be quite time consuming and may be quite buggy and prone to errors.
The other option is to process the document in a separate process so that scanning can run at the speed of the scanner. At the moment you are probably processing everything in a parallel fashion, one task after another, hence the slowdown.
Consider doing it as post-processing, because deskew cannot be done at real-time (unless it's hardware accelerated).
Deskew consists of two steps: skew detection and rotation. Detecting the skew angle can usually be done on a B&W (1-bit) image faster. Rotation speed depends on the quality of the interpolation. A good quality deskew will take a lot of time to run, much more than scanning pages.
A good high speed scanner can do 120 double-sided pages per minute, if it has hardware JPEG or TIFF Group 4 compression, and your TWAIN library takes advantage of it (hint: do not use native mode). You barely have enough time to save the file to the hard drive at that speed, let alone decompress, skew detect, rotate, re-compress. Quality deskew takes several seconds per page, unless you can use the video card's hardware accelerator to rotate and compress.
Do I correctly understand you already have such algorithm implemented? If so, are you sure there is no space for optimization? I'd start with profiling existing solution.
Anyway, I guess you should look for fast digital Radon transform algorithm.
Take a look at http://pagetools.sourceforge.net. They have deskew algorithm implementation.

High Quality Image Magnification on GPU

I'm looking for interesting algorithms for image magnification that can be implemented on a gpu for real-time scaling of video. Linear and bicubic interpolations algorithms are not good enough.
Suggestions?
Here are some papers I've found, unsure about their suitability for gpu implementation.
Adaptive Interpolation
Level Set
I've seen some demos on the cell processor used in TVs for scaling which had some impressive results, no link unfortunately.
lanczos3 is a very nice interpolation algorithm (you can test it in the GIMP or virtualDub). It generally performs better than cubic interpolation and can be parallelized.
A GPU based version is implemented in Chromium:
http://code.google.com/p/chromium/issues/detail?id=47447
Check out chromium source code.
It may be still too slow for realtime video processing but maybe worth trying if you don't use too high resolution.
You may also want to try out CUVI Lib which offers a good set of GPU acceleration Image Processing algorithms. Find about it on: http://www.cuvilib.com
Disclosure: I am part of the team that developed CUVI.
Still slightly 'work in progress' but gpuCV is a drop in replacement for the openCV image processing functions implemented in openCL on a GPU
Prefiltered cubic b-spline interpolation delivers good results (you can have a look here for some theoretical background).
CUDA source code can be downloaded here.
WebGL examples can be found here.
edit: The cubic interpolation code is now available on github: CUDA version and WebGL version.
You may want to have a look at Super Resolution Algorithms. Starting Point on CiteseerX