Nvidia VPI vs Nvidia VisionWorks performance difference for optical flow - computer-vision

I had an computer vision application that works with Nvidia VisionWorks. Now that Nvidia replaced VisionWorks with VPI(Vision Programming Interface), I am trying to rewrite my application using VPI. At some point, I realized that Pyramid LK Optical Flow performance is not even close to VisionWorks. I checked the logs and figured out that VPI Optical flow is not as performant but I couldn't find anyone having the same problem. Is my judgement skewed or is there a performance problem on this new product of Nvidia?

Related

Dense optical flow for real time on hardware for 4K resolution #30fps

I am working on a hardware-based solution ( without GPU) for dense optical flow to get real-time performance # 30fps with decent accuracy. Something comparable to or better than NVIDIA’s optical flow SDK. Can someone please suggest good algorithms other than Pyramidal Lukas Kanade and horn Schunck. I found SGM as a good starting point but it’s difficult to implement on FPGA or DSP core. The target is to measure large displacements with occlusion as well as similar to real-world videos.
It would be great if someone could tell what exactly algorithm NVIDIA has used.
For dense optical flow estimation in real-time setup, FlowNet is a good option. It can achieve optical flow estimation at a higher FPS. You can take their trained model to perform inference. Since you want to run the estimation in a non-GPU environment, you can try converting the model to ONNX format. A good implementation of FlowNet is available in NVIDIA's Github repo. I am not sure exactly which algorithm NVIDIA is using in its SDK for optical flow.
The FlowNet2 is built upon previous work of FlowNet to compute large displacement. However, if you are concerned about occlusion then you may check out their follow up work on FlowNet3. Another alternative to FlowNet is PwC-Net.

Does OpenGL display image faster than OpenCV?

I am using OpenCV to show image on the projector. But it seems the cv::imshow is not fast enough or maybe the data transfer is slow from my CPU to GPU then to projector, so I wonder if there is a faster way to display than OpenCV?
I considered OpenGL, since OpenGL directly uses GPU, the command may be faster than from CPU which is used by OpenCV. Correct me if I am wrong.
OpenCV already supports OpenGL for image output by itself. No need to write this yourself!
See the documentation:
http://docs.opencv.org/modules/highgui/doc/user_interface.html#imshow
http://docs.opencv.org/modules/highgui/doc/user_interface.html#namedwindow
Create the window first with namedWindow, where you can pass the WINDOW_OPENGL flag.
Then you can even use OpenGL buffers or GPU matrices as input to imshow (the data never leaves the GPU). But it will also use OpenGL to show regular matrix data.
Please note:
To enable OpenGL support, configure OpenCV using CMake with
WITH_OPENGL=ON . Currently OpenGL is supported only with WIN32, GTK
and Qt backends on Windows and Linux (MacOS and Android are not
supported). For GTK backend gtkglext-1.0 library is required.
Note that this is OpenCV 2.4.8 and this functionality has changed quite recently. I know there was OpenGL support in earlier versions in conjunction with the Qt backend, but I don't remember when it was introduced.
About the performance: It is a quite popular optimization in the CV community to output images using OpenGL, especially when outputting video sequences.
OpenGL is optimised for rendering images, so it's likely faster. It really depends if the OpenCV implementation uses any GPU acceleration AND if the bottleneck is on rendering side of things.
Have you tried GPU accelerated OpenCV? - http://opencv.org/platforms/cuda.html
How big is the image you are displaying? How long does it take to display the image using cv::imshow now?
I know it's an old question, but I happened to have exactly the same problem. And from my observations I've concluded that the root of the problem is the projector's own latency, especially if one is using an older model.
How have I concluded it?
I displayed the same video sequence with cv::imshow() on the laptop monitor and on the projector. Then I waved my hand. It was obvious, that projector introduces significant latency.
To double-check, I've opended a webcam video, waved my hand in front of it and observed the difference on the monitor and on the projector. Webcam does no processing, no opencv operations, so in my understanding the only thing that would explain the latency would be the projector itself.

Hardware to use for 3D Depth Perception

I am planning on giving a prebuilt robot 3D vision by integrating a 3D depth sensor such as a Kinect or Asus Xtion Pro. These are the only two that I have been able to find yet I would imagine that a lot more are being built or already exist.
Does anyone have any recommendations for hardware that I can use or which of these two is better for an open source project with integration to ROS (robot operating system).
I would vote for the Kinect for Windows over the Asus Xtion Pro based on the hardware (Kinect has better range), but depending on your project there's a chance neither will work out well for you. I'm not familiar with a Robot Operating System, but the Kinect will only run on Windows 7, kind of Windows 8, and supposedly Windows Server 2008. The Asus Xtion Pro seems to have SDKs available for Linux distros, so if your robot is running something similar it might work.
Depending on exactly what you need to be able to do, you might want to go with a much simpler depth sensor. For example, buy a handful of these and you'll still spend a lot less than you would for a Kinect. They might also be easier to integrate with your robot; hook them up to a microcontroller, plug the microcontroller into your robot through USB, and life should be easy. Or just plug them straight into your robot. I have no idea how such things work.
edit: I've spent too much time working with the Kinect SDK, I forgot there are third party SDKs available that might be able to run on whatever operating system you've got going. Still, it really depends. The Kinect has a better minimum depth, which seems important to me, but a worse FOV than the Xtion. If you just need the basics (is there a wall in front of me?) definitely go with the mini IR sensors which are available all over the web and probably at stores near you.
Kinect + Linux + ROS + PCL (http://pointclouds.org/) is a very powerful (and relatively cheap) combination. I am not sure what you are trying to do with the system, but there are enough libraries available with this combination to do lots. Your hardware will be constrained by what you can put linux on and what will be fast enough to run some point cloud processing. although there are ports of linux and ROS for embedded devices like gumstix, i would go for something closer to a standard PC like a mini-ITX. You will have fewer headaches in the long run.

Improve performance of dense optical flow analysis (easily)?

I wrote a program that uses OpenCV's cvCalcOpticalFlowLK. It performs fine on a low-resolution webcam input, but I need to run it on a full HD stream with significant other computation following the optical flow analysis for each frame. Processing a 5 minute video scaled down to 1440x810 took 4 hours :( Most of the time is being spent in cvCalcOpticalFlowLK.
I've researched improving the speed by adding more raw CPU, but even if I get an 8-core beast, and the speedup is the theoretical ideal (say 8x, since I'm basically only using one of my 2.9GHz cores), I'd only be getting 4FPS. I'd like to reach 30FPS.
More research seems to point to implementing it on the GPU with CUDA, OpenCL, or GLSL(?). I've found some proof-of-concept implementations (eg. http://nghiaho.com/?page_id=189), and many papers saying basically "it's a great application for the GPU, we did it, it was awesome, and no we won't share our code". Needless to say, I haven't gotten any of them to run.
Does anyone know of a GPU-based implementation that would run on Mac with an NVIDIA card? Are there resources that might help me approach writing my own? Are there other dense OF algorithms that might perform better?
Thanks!
What about OpenVidia Bayesian Optical Flow? Also the paper Real-Time Dense and Accurate Parallel Optical Flow using CUDA says that their work is freely available in the CUDA zone. I couldn't find it there immediately, but maybe you will, or can write the authors?

Graphics Profiling

Ive got an application which drops to around 10fps. I profiled it with xperf which showed my app was using just 20% of the CPU, with none of my methods using a larger than expected amount of that 20%.
This seems to indicate that the vast drop in fps is because the graphics card isnt able to keep up with rendering the frame, resulting in my program stopping while it catches up...
Is there some way to profile what the graphics card is up to and work out what my program is telling it to do thats slowing it down, so that I can try to improve the frame rate?
For debugging / profiling graphics, try Nvidia PerfHUD
NVIDIA PerfHUD is a powerful real-time performance analysis tool for Direct3D applications.
There is also an ATI solution, called 'GPU PerfStudio'
GPU PerfStudio is a real-time performance analysis tool which has been designed to help tune the graphics performance of your DirectX 9, DirectX 10, and OpenGL applications. GPU PerfStudio displays real-time API, driver and hardware data which can be visualized using extremely flexible plotting and bar chart mechanisms. The application being profiled maybe executed locally or remotely over the network. GPU PerfStudio allows the developer to override key rendering states in real-time for rapid bottleneck detection. An auto-analysis window can be used for identifying performance issues at various stages of the graphics pipeline. No special drivers or code modifications are needed to use GPU PerfStudio.
You can find more information and download links here:
http://developer.nvidia.com/object/nvperfhud_home.html
http://developer.amd.com/tools-and-sdks/graphics-development/gpu-perfstudio/
Also, check out this article on FPS:
FPS vs Frame Time
Basically it talks about the fact that a drop from 200fps to 190fps is negligible, whereas a drop from 30fps to 20fps is a MUCH bigger deal. For better performance measuring, you should be calculating frame time rather than FPS.
You never told us what your fps is or what the program is doing at all, so your "vast drop" might not be a big deal at all.
For DirectX, there is PIX for profiling the CPU and GPU operations. It can give very detailed info, and might be worth looking into.
Hope that helps!
You can try using dxprof (search in google). It's lightweight app that draws real-time bars, each bar corresponding to one DirectX event (such as draw-call or resource copy). You can freeze the bars and check calls stack to find out where the draw-call originates from.
Are you developing for Windows? If so avoid using Video for Windows as this will limit you in the manner that you describe. Use DirectX instead.
No need to guess. Just pause it a few times under the IDE, and it will show you exactly what it's waiting for.