FFMpeg vs. OpenCV for format conversion/simple transformation - c++

I had to create a system that can process images in realtime. I have implemented in C++ a pixel format conversion system that can also do some simple transformation (currently: rotation & mirroring).
Input/output format of the system are frame in a the following formats:
RGB (24, 32)
YUYV420, YUYV 422
JPEG
Raw greyscale
For instance, one operation can be:
YUYV422 -> rotation 90 -> flip Horiz -> RGB24
Greyscale -> rotation 270 -> flip Vert -> YUYV420
The goal of the system is to offer best performance for rotation/mirroring and pixel format conversion. My current implementation rely on OpenCV, but I suffer from performance issues when processing data above 2k resolutions.
The current implementation uses cv::Mat and cv::transpose/cv::flip/cv::cvtColor, and I optimized the system to remove transitionnal buffers and copy as much as possible.
Not very happy to reinvent the wheel, I know that using swscale and some filters from FFMpeg, it is possible to achieve the same result. My question are:
The FFMpeg system is rather generic, do you think I might suffer from footprint/performance caveat with this solution?
Format conversion seems somewhat ooptimized in OpenCV, but I have no idea about FFMpeg implementation... (note: I'm on x86_64 intel platform with SSE)
Do you know any library than can handle this kind of simple transformation for real time?
Thank you

OpenCV implementation is optimised for your configuration. Don't expect improvements from ffmpeg. Recently, OpenCV switched to libjpeg-turbo with SSE optimizations, this may improve JPEG conversions.

Related

Decoding to specific pixel format in ffmpeg with C++

I need to decode video but my video player only supports RGB8 pixel format. So I'm looking into how to do pixel format conversion in the GPU, preferably in the decoding process, but if not possible, after it.
I've found How to set decode pixel format in libavcodec? which explains how to decode video on ffmpeg to an specific pixel format, as long as it's suported by the codec.
Basically, get_format() is a function which chooses, from a list of supported pixel formats from the codec, a pixel format for the decoded video. My questions are:
Is this list of supported codec output formats the same for all computers? For example, if my codec is for H264, then it will always give me the same list on all computers? (assuming same ffmpeg version of all computers)
If I choose any of these supported pixel formats, will the pixel format conversion always happen in the GPU?
If some of the pixel format conversions won't happen in the GPU, then my question is: does sws_scale() function converts in the GPU or CPU?
It depends. First, H264 is just a Codec standard. While libx264 or openh264 are implementing this standard you can guess that each implementation supports different formats. But let's assume (as you did in your question) you are using the same implementation on different machines then yes there might be still cases where different machines support different formats. Take H264_AMF for example. You will need an AMD graphics card to use the codec and the supported formats will depend on your graphics card as well.
Decoding will generally happen on your CPU unless you explicitly specify a hardware decoder. See this example for Hardware decoding: https://github.com/FFmpeg/FFmpeg/blob/release/4.1/doc/examples/hw_decode.c
When using Hardware decoding you are heavily relying on your machine. And each Hardware encoder will output their own (proprietary) frame format e.g. NV12 for a Nvida Graphics Card. Now comes the tricky part. The encoded frames will remain on your GPU memory which means you might be able to reuse the avframe buffer to do the pixel conversion using OpenCL/GL. But achieving GPU zero-copy when working with different frameworks is not that easy and I don't have enough knowledge to help you there. So what I would do is to download the decoded frame from the GPU via av_hwframe_transfer_data like in the example.
From this point on it doesn't make much of a difference if you used hardware or software decoding.
To my knowledge sws_scale isn't using hardware acceleration. Since it's not accepting "hwframes". If you want to do color conversion on Hardware Level you might wanna take a look at OpenCV you can use GPUMat there then upload your frame, call cvtColor and download it again.
Some general remarks:
Almost any image operation scaling etc. is faster on your GPU, but uploading and downloading the data can take ages. For single operations, it's often not worth using your GPU.
In your case, I would try to work with CPU decoding and CPU color conversion first. Just make sure to use well threaded and vectorized algorithms like OpenCV or Intel IPP. If you still lack performance then you can think about Hardware Acceleration.

Using OpenGL to perform video compositing with YUV color format - performance

I have written a C/C++ implementation of what I term a "compositor" (I come from a video background) to composite/overlay video/graphics on the top of a video source. My current compositor implementation is rather naive and there is room for CPU optimization improvements (ex: SIMD, threading, etc).
I've created a high-level diagram of what I am currently doing:
The diagram is self explanatory. Nonetheless, I'll elaborate on some of the constraints:
The main video always comes served in an 8-bit YUV 4:2:2 packed format
The secondary video (optional) will come served in either an 8-bit YUV 4:2:2 or YUVA 4:2:2:4 packed format.
The output from the overlay must come out in an 8-bit YUV 4:2:2 packed format
Some other bits of information:
The number of graphics inputs will vary; it may (or may not) be a constant value.
The colour format of the Graphics can be pinned to either ARGB or YUVA format (ie. I can provide it as you see fit). At the moment, I pin it to YUVA to keep a consistent colour format.
The potential of using OpenGL and accompanying shaders is rather appealing:
No need to reinvent the wheel (in terms of actually performing the composition)
The possibility of using GPU where available.
My concern with using OpenGL is performance. Looking around on the web, it is my understanding that a YUV surface would be converted to RGB internally; I would like to minimize the number of colour format conversions and ensure optimal performance. Without prior OpenGL experience, I hope someone can shed some light and suggest if I'm about to venture down the wrong path.
Perhaps my concern relating to performance is less of an issue when using a dedicated GPU? Do I need to consider separate code paths:
Hardware with GPU(s)
Hardware with only CPU(s)?
Additionally, am I going to struggle when I need to process 10-bit YUV?
You should be able to treat YUV as independent channels throughout. OpenGL shaders will be calling them r, g, and b, but it's just data that can be treated as whatever you want.
Most GPUs will support 10 bits per channel (+ 2 alpha bits). Various will support 16 bits per channel for all 4 channels but I'm a little rusty here so I have no idea how common support is for this. Not sure about the 4:2:2 data, but you can always treat it as 3 separate surfaces.
The number of graphics inputs will vary; it may (or may not) be a constant value.
This is something I'm a little less sure about. Shaders like this to be predictable. If your implementation allows you to add each input iteratively then you should be fine.
As an alternative suggestion, have you looked into OpenCL?

OpenCV -- how to optimize color tracking program?

I want to optimize my program, in which I am using color object tracking algorithm described here. The only difference is that I am using cvBlob library, instead of cv::moments (cvBlob was faster and more accurate). Using profiler (valgrind + kcachegrind) I have found that ~29% of time is taken by colorspace conversion method (cv::cvtColor; I am tracking objects in three colors). I am converting from BGR to HSV.
I've read in some papers that using YCbCr colorspace is even better in color tracking. Is it good idea to convert from BGR to YCbCR? It should be slightly faster, as it requires less multiplications (I am not sure about that -- I do not know how OpenCv does it internally). Does this algorithm need some changes, or can I just convert lower and upper boundaries for tracked color from HSV to YCbCr, and then use inRangeS method, as I did with HSV?
Is there any way to get the frame from driver in YcbCr (or YUV)? I am not asking about HSV, because this is not supported by v4l2, AFAIR.
Do you have any other ideas? I don't want to use IPP or GPU.
Check out the OpenCV documentation for cvtColor. It talks about conversion between BGR2YCbCr using cvtColor.
(Please try that and also comment here about result, ie how much percentage of total time it takes in YCbCr mode. Because it will help lots of people in future.)

Low memory image resizing

I am looking for some advice on how to construct a very low memory image resizing program that will be run as a child process of my nodejs application in linux.
The solution I am looking for is a linux executable that will take a base64 string image (uploaded from a client) using stdin, resizing the photo to a specified size and then pumping the resulting image data back through stdout.
I've looked into image magick and it might be what I end up using, but I figured I would ask and see if anyone had a suggestion.
Suggestions of libraries or examples of pre compiled executables in C/C++ would be greatly appreciated. Also a helpful answer would include general strategies for low memory image resizing.
Thank you
Depending on the image formats you want to support, it's almost surely possible to perform incremental decoding and scaling by decoding only a few lines at a time and discarding the data once you write the output. However it may require writing your own code or adapting an existing decoder library to support this kind of operation.
It's also worth noting that downsizing giant jpegs can be performed efficiently by simply skipping the high-frequency coefficients and using a smaller IDCT. For example, to decode at half width and half height, discard all but the upper-left quadrant of the coefficients (horizontal and vertical frequency < 4) and use a 4x4 IDCT on them instead of the usual 8x8. Both the libjpeg decoder and the libavcodec decoder support this operation for power-of-2 scalings (1/2, 1/4, or 1/8). This type of approach might make incremental decoding/scaling unnecessary.
You can try it out with djpeg -scale 1/4 < src.jpg | cjpeg > dest.jpg. If you want a fixed output size, you'll probably first scale by whichever of 1/2, 1/4, or 1/8 puts you closest to the desired size without going to low, then performing interpolation to go the final step, e.g. djpeg -scale 1/4 < src.jpg | convert pnm:- -scale 640x480 dest.jpg.
When working on very large images, such as 0.25 GPix and larger, ImageMagick uses ~2 GB ram, even when using djpeg to decode the JPEG image first.
This command chain will resize JPEG images of about any size using only ~3 MB ram:
djpeg my-large.jpg | pnmscale -xysize 16000 16000 | cjpeg > scaled-large.jpg
GraphicsMagick is generally a better version of ImageMagick, I'd take a look at that. If you really need something fast, you probably want to drop to something like libjpeg - while you say you want something that's non-blocking IO, the operation you want to do is relatively CPU-bound (i.e decoding the image, then trying to resize it).
if anything this is just a sample following what he described:
import sys
from PIL import Image
import binascii
import cStringIO
x,y = sys.stdin.readline().strip().split(' ')
x,y = int(x), int(y)
img = Image.open(cStringIO.StringIO(binascii.b2a_base64(sys.stdin.read())).resize(x,y)
img.save(sys.stdout, format="png")
as that has to read the input, decode it, resize, and encode, and write it out there is no way to reduce the size of the memory used to less then the size of the input image
Nothing can beat Intel Integrated Performance Primitives in terms of performance. If you can afford it I strongly recommend to use it.
Otherwise just implement your own resizing routine. Lanczos gives quite good results albeit it won't be tremendously fast.
Edit: I strongly suggest you NOT to use Image Magick or Graphics Magick. They are both great libraries, but designed for completely different purpose - handling many file formats, depths, pixel formats, etc. They sacrifice performance and memory effectiveness for the things I've mentioned.
You might need this: https://github.com/zhangyuanwei/node-images
Cross-platform image decoder(png/jpeg/gif) and encoder(png/jpeg) for Nodejs

What is the difference between ImageMagick and GraphicsMagick?

I've found myself evaluating both of these libs. Apart from what the GraphicsMagick comparison says, I see that ImageMagick still got updates and it seems that the two are almost identical.
I'm just looking to do basic image manipulation in C++ (i.e. image load, filters, display); are there any differences I should be aware of when choosing between these libraries?
As with many things in life, different people have different ideas about what is best. If you ask a landscape photographer who wanders around in the rain in Scotland's mountains which is the best camera in the world, he's going to tell you a light-weight, weather-sealed camera. Ask a studio photographer, and he'll tell you the highest resolution one with the best flash sync speed. And if you ask a sports photographer he'll tell you the one with the fastest autofocus and highest frame rate. So it is with ImageMagick and GraphicsMagick.
Having answered around 2,000 StackOverflow questions on ImageMagick over the last 5+ years, I make the following observations...
In terms of popularity...
ImageMagick questions on SO outnumber GraphicsMagick questions by a factor of 12:1 (7,375 questions vs 611 at May 2019), and
ImageMagick followers on SO outnumber GraphicsMagick followers by 15:1 ((387 followers versus 25 at May 2019)
In terms of performance...
I am happy to concede that GraphicsMagick may be faster for some, but not all problems. However, if speed is your most important consideration, I think you should probably be using either libvips, or parallel code on today's multi-core CPUs or heavily SIMD-optimised (or GPU-optimised) libraries like OpenCV.
In terms of features and flexibility...
There is one very clear winner here - ImageMagick. My experience is that there are many features missing from GraphicsMagick which are present in ImageMagick and I list some of these below, in no particular order.
I freely admit I am not as familiar with GraphicsMagick as I am with ImageMagick, but I made my very best effort to find any mention of the features in the most recent GraphicsMagick source code. So, for Canny Edge Detector, I ran the following command on the GM source code:
find . -type f -exec grep -i Canny {} \;
and found nothing.
Canny Edge detector
This appears to be completely missing in GM. See -canny radiusxsigma{+lower-percent}{+upper-percent} in IM.
See example here and sample of edge-detection on Lena image:
Parenthesised processing, sophisticated re-sequencing
This is a killer feature of ImageMagick that I frequently sorely miss when having to use GM. IM can load, or create, or clone a whole series of images and apply different processing selectively to specific images and re-sequence, duplicate and re-order them very simply and conveniently. It is hard to convey the incredible flexibility this affords you in a short answer.
Imagine you want to do something fairly simple like load image A and blur it, load image B and make it greyscale and then place the images side-by-side with Image B on the left. That looks like this with ImageMagick:
magick imageA.png -blur x3 \( imageB.png -colorspace gray \) +swap +append result.png
You can't even get started with GM, it will complain about the parentheses. If you remove them, it will complain about swapping the image order. If you remove that it will apply the greyscale conversion to both images because it doesn't understand parentheses and place imageA on the left.
See the following sequencing commands in IM:
-swap
-clone
-duplicate
-delete
-insert
-reverse
fx DIY Image Processing Operator
IM has the -fx operator which allows you to create and experiment with incredibly sophisticated image processing. You can have function evaluated for every single pixel in an image. The function can be as complicated as you like (save it in a file if you want to) and use all mathematical operations, ternary-style if statements, references to pixels even in other images and their brightness or saturation and so on.
Here are a couple of examples:
magick rose: -channel G -fx 'sin(pi*i/w)' -separate fx_sine_gradient.gif
magick -size 80x80 xc: -channel G -fx 'sin((i-w/2)*(j-h/2)/w)/2+.5' -separate fx_2d_gradient.gif
A StackOverflow answer that uses this feature to great effect in processing green-screen (chroma-keyed) images is here.
Fourier (frequency domain) Analysis
There appears to be no mention of forward or reverse Fourier Analysis in GM, nor the High Dynamic Range support (see later) that is typically required to support it. See -fft in IM.
Connected Component Analysis / Labelling/ Blob Analysis
There appears to be no "Connected Component Analysis" in GM - also known as "labelling" and "Blob Analysis". See -connected-components connectivity for 4- and 8-connected blob analysis.
This feature alone has provided 60+ answers - see here.
Hough Line Detection
There appears to be no Hough Line Detection in GM. See -hough-lines widthxheight{+threshold} in IM.
See description of the feature here and following example of detected lines:
Moments and Perceptual Hash (pHash)
There appears to be no support for image moments calculation (centroids and higher orders), nor Perceptual Hashing in GM. See -moments in IM.
Morphology
There appears to be no support for Morphological processing in GM. In IM there is sophisticated support for:
dilation
erosion
morphological opening and closing
skeletonisation
distance morphology
top hat and bottom hat morphology
Hit and Miss morphology - line ends, line junctions, peaks, ridges, Convex Hulls etc
See all the sophisticated processing you can do with this great tutorial.
Contrast Limited Adaptive Histogram Equalisation - CLAHE
There appears to be no support for Contrast Limited Adaptive Histogram Equalisation in GM. See -clahe widthxheight{%}{+}number-bins{+}clip-limit{!} in IM.
HDRI - High Dynamic Range Imaging
There appears to be no support for High Dynamic Range Imaging in GM - just 8, 16, and 32-bit integer types.
Convolution
ImageMagick supports many types of convolution:
Difference of Gaussians DoG
Laplacian
Sobel
Compass
Prewitt
Roberts
Frei-Chen
None of these are mentioned in the GM source code.
Magick Persistent Register (MPR)
This is an invaluable feature present in ImageMagick that allows you to write intermediate processing results to named chunks of memory during processing without the overhead of writing to disk. For example, you can prepare a texture or pattern and then tile it over an image, or prepare a mask and then alter it and apply it later in the same processing without going to disk.
Here's an example:
magick tree.gif -flip -write mpr:tree +delete -size 64x64 tile:mpr:tree mpr_tile.gif
Broader Colourspace Support
IM supports the following colourspaces not found in GM:
CIELab
HCL
HSI
LMS
others.
Pango Support
IM supports Pango Text Markup Language which is similar to HTML and allows you to annotate images with text that changes:
font, colour, size, weight, italics
subscript, superscript, strike-through
justification
mid-sentence and much, much more. There is a great example here.
Shrink-on-load with JPEG
This invaluable feature allows the library to shrink JPEG images as they are read from disk, so that only the necessary coefficients are read, so the I/O is lessened, and the memory consumption is minimised. It can massively improve performance when down-scaling images.
See example here.
Defined maximum JPEG size when writing
IM supports the much-requested option to specify a maximum filesize when writing JPEG files, -define jpeg:extent=400KB for example.
Polar coordinate transforms
IM supports conversion between cartesian and polar coordinates, see -distort polar and -distort depolar.
Statistics and operations on customisable areas
With its -statistic MxN operator, ImageMagick can generate many useful kinds of statistics and effects. For example, you can set each pixel in an image to the gradient (difference between brightest and darkest) of its 5x3 neighbourhood:
magick image.png -statistic gradient 5x3 result.png
Or you can set each pixel to the median of its 1x200 neighbourhood:
magick image.png -statistic median 1x200 result.png
See example of application here.
Sequences of images
ImageMagick supports sequences of images, so if you have a set of very noisy images shot at high ISO, you can load up the entire sequence of images and, for example, take the median or average of all images to reduce noise. See the -evaluate-sequence operator. I do not mean the median in a surrounding neighbourhood in a single image, I mean by finding the median of all images at each pixel position.
The above is not an exhaustive list by any means, they are just the first few things that came to mind when I thought about the differences. I didn't even mention support for HEIC (Apple's format for iPhone images), increasingly common High Dynamic Range formats such as EXR, or any others. In fact, if you compare the file formats supported by the two products (gm convert -list format and magick identify -list format) you will find that IM supports 261 formats and GM supports 192.
As I said, different people have different opinions. Choose the one that you like and enjoy using it.
As always, I am indebted to Anthony Thyssen for his excellent insights and discourse on ImageMagick at https://www.imagemagick.org/Usage/ Thanks also to Fred Weinhaus for his examples.
From what I have read GraphicsMagick is more stable and is faster.
I did a couple of unscientific tests and found gm to be twice as fast as im (doing a resize).
I found ImageMagick to be incredibly slow for processing TIFF group-4 images (B&W document images), mainly due to the fact that it converts from 1-bit-per-pixel to 8 and back again to do any image manipulation. The GraphicsMagick group overhauled the TIFF format support with their version 1.2, and it is much faster at processing these types of images than the original ImageMagick was. The current GraphicsMagick stable release is at 1.3.5.
I use ImageMagick when speed isn't a factor. However on the server side, where tens of thousands of images are being processed daily, GraphicsMagick is quite noticeably faster - in some cases up to 50% faster in benchmarks!
History
graphicsmagick was forked from imagemagick back in 2002 due to disputes between founding developers. thus they share the same codebase.
Ref : https://en.wikipedia.org/wiki/GraphicsMagick
Goal
graphicsmagick
focuses on simple, stable, and clearer codebase / architecture
imagemagick
focuses on rolling out new features, extend a wider toolbase
Other than speed, imagemagick adds a number of cli tools to terminal shell whereas graphicsmagick is a single tool which you can call.
CLI interface design
graphicsmagick
gm <command> <options> <file>
imagemagick
convert <options> <file>
compare <options> <file>
imho, i prefer (in fact, only use) graphicsmagick(gm) over imagemagick as the latter has higher chance of tool name clash, which causes lots of issues in finding out why certain tools are not running, especially during server side automation tasks. in summary graphicsmagick has much clearer design.
imagine a binary called convert in a project and is it imagemagick's convert or your own rolled tool in project that will be called?
list of imagemagick tools (including convert, compare, display) : https://imagemagick.org/script/command-line-tools.php
list of graphicsmagick commands :
http://www.graphicsmagick.org/utilities.html
note : as of v7 as mentioned by Mark S, imagemagick is now distributed as single binary, and also supporting older v6 commands.
Performance
a simple memory consumption test can be found here :
https://coderwall.com/p/1l7h-a/imagemagick-bloat-graphicsmagick
Dependancies
GraphicsMagick depends on 36 libraries whereas ImageMagick requires 64. Ref : http://www.graphicsmagick.org/1.3/FAQ.html
Note that GraphicsMagick provides API and ABI stability, which isn't part of the guarantee for ImageMagick. This would be important in the long run unless you are vendoring all your dependencies.
GraphicsMagick was an early fork from Imagemagick. You can read about Imagemagick's history and the fork to GraphicsMagick at https://imagemagick.org/script/history.php. It seems that Imagemagick has continued to be developed rather extensively, while GraphicsMagick has remained more or less stagnant since the fork.