Decoding a png image of size 1080x1920 takes over 30ms and I'm looking to do it faster.
In Android BitmapFactory has a method where you can pass the sample size to be returned. This causes the returned decoded image to be smaller then the actual source. This in turn makes the decoding process a lot faster with the outcome of a lower quality image.
I want to do something similar in c++ using some png decoding library such as libpng but for some reason I can't find any details about decoding at a lower quality.
Any pointers or ideas to improve decoding time would be appreciated!
To ask for a lower resolution image in the decoding would have zero influence in CPU work: a PNG stream is basically a compressed ZLIB stream which must be fully decompressed, and inside that there is a PNG-specific unfiltering to be done, which, again, requires all the neighbouring pixels. Of course, subsampling could lead to less memory usage, (which in itself can result in less decoding time), for this you'd need to decode the PNG progressively (so that the subsampling is done line by line); you can do that with (my) Java library PNGJ; it's optimized for that usage pattern, and some people have used succesfully in Android.
If you want to do it in C, with libpng, the idea would be the same. Decode the image progresively, line by line, and do the subsampling yourself.
Bear in mind that this usage pattern would break with interlaced PNG (in that case, you'd want to decode one of the subimages), but, anyway, to store a 1080x1920 image as interlaced PNG would be a bad idea.
Android is open source; you could look at the source: the Java interface and the C++ backend - from there, it calls to the SKIA library.
This class appears to be where the sampling is done; it is called from here.
Related
In order to accomplish some specific editing on some .avi files, I'd like to create an application (in C++) that is able to load, edit, and save those .avi files. But, what is the most efficient way? When first thinking about it, a simple 3D-Array containing a 2D-array of pixels for every frame seems the simplest solution; But then its size would be ENORMOUS. I mean, let's assume that a pixel only needs a color. One color would mean 3bytes (1char r, 1char b, 1char g). If I now have a 1920x1080 video format, this would mean 2MEGABYTES for only one frame! This data may or may not be smaller if using pointers for the colors, so that alreay used colors wont take more size - I don't really know, since I'm pretty new to C++ and the whole low-level stuff. (As a comparison: One of my AVI files recorded with Xvid codec is 40seconds long, 30fps, and only has 2MB.)
So how would you actually store the video data (Not even the audio, just the video) efficiently (while still being easily able to perform per-frame-changes on it)?
As you have realised, uncompressed video is enormous and it is not practical to store an entire video in this way.
Video compression is an extremely complex topic, but more-or-less, it works as follows: certain "key-frames" are compressed using fairly standard compression techniques similar or identical to still-photo compression such as JPEG. Frames following key-frames are compressed by comparing the frame with the previous one and looking for changes (such as moving blocks). Every now and again, a new key-frame is used.
You don't really have to worry much about that as you are not going to write your own video coder/decoder (codec). There are standard ones.
What will happen is that your program will decode the compressed video frame-by-frame and keep a certain number of frames in memory while you are working on them and then re-encode them when it is finished. In the uncompressed form, you will have access to the individual pixels and can work on them how you want.
You are probably not going to do that either by yourself - it is very hard. You probably need to use a framework, such as OpenCV. There are a huge number of standard filters and tools built in to these frameworks, and it may be that what you want to do is already implemented somewhere.
The OpenCV framework can return individual frames in a Mat object and you can then access the pixels. See this post Get Pixels from Mat
OpenCV
Tutorial page: Open CV Tutorial
Given that FFmpeg is the leading multimedia framework and most of the video/audio players uses it, I'm wondering somethings about audio/video players using FFmpeg as intermediate.
I'm studying and I want to know how audio/video players works and I have some questions.
I was reading the ffplay source code and I saw that ffplay handles the subtitle stream. I tried to use a mkv file with a subtitle on it and doesn't work. I tried using arguments such as -sst but nothing happened. - I was reading about subtitles and how video files uses it (or may I say containers?). I saw that there's two ways putting a subtitle: hardsubs and softsubs - roughly speaking hardsubs mode is burned and becomes part of the video, and softsubs turns a stream of subtitles (I might be wrong - please, correct me).
The question is: How does they handle this? I mean, when the subtitle is part of the video there's nothing to do, the video stream itself shows the subtitle, but what about the softsubs? how are they handled? (I heard something about text subs as well). - How does the subtitle appears on the screen and can be configured changing fonts, size, colors, without encoding everything again?
I was studying some video players source codes and some or most of them uses OpenGL as renderer of the frame and others uses (such as Qt's QWidget) (kind of or for sure) canvas. - What is the most used and which one is fastest and better? OpenGL with shaders and stuffs? Handling YUV or RGB and so on? How does that work?
It might be a dump question but what is the format that AVFrame returns? For example, when we want to save frames as images first we need the frame and then we convert, from which format we are converting from? Does it change according with the video codec or it's always the same?
Most of the videos I've been trying to handle is using YUV720P, I tried to save the frames as png and I need to convert to RGB first. I did a test with the players and I put at the same frame and I took also screenshots and compared. The video players shows the frames more colorful. I tried the same with ffplay that uses SDL (OpenGL) and the colors (quality) of the frames seems to be really low. What might be? What they do? Is it shaders (or a kind of magic? haha).
Well, I think that is it for now. I hope you help me with that.
If this isn't the correct place, please let me know where. I haven't found another place in Stack Exchange communities.
There are a lot of question in one post:
How are 'soft subtitles' handled
The same way as any other stream :
read packets from a stream to the container
Give the packet to a decoder
Use the decoded frame as you wish. Here with most containers supporting subtitles the presentation time will be present. All you need at this time is get the text and burn it onto the image at the same presentation time. There are a lot of ways to print the text on the video, with ffmpeg or another library
What is the most used renderer and which one is fastest and better?
most used depend on the underlying system. For instance Qt only wrap native renderers, and even has a openGL version
You can only be as fast as the underlying system allows. Does it support ouble-buffering? Can it render in your decoded pixel format or do you have to perform color conversion before? This topic is too broad
Better only depend on the use case. this is too broad
what is the format that AVFrame returns?
It is a raw format (enum AVPixelFormat), and depends on the codec. There is a list of YUV and RGB FOURCCs which cover most formats in ffmpeg. Programmatically you can access the table AVCodec::pix_fmts to obtain the pixel format a specific codec support.
I am making a game with a large number of sprite sheets in cocos2d-x. There are too many characters and effects, and each of them use a sequence of frames. The apk file is larger than 400mb. So I have to compress those images.
In fact, each frame in a sequence only has a little difference compares with others. So I wonder if there is a tool to compress a sequence of frames instead of just putting them into a sprite sheet? (Armature animation can help but the effects cannot be regarded as an armature.)
For example, there is an effect including 10 png files and the size of each file is 1mb. If I use TexturePacker to make them into a sprite sheet, I will have a big png file of 8mb and a plist file of 100kb. The total size is 8.1mb. But if I can compress them using the differences between frames, maybe I will get a png file of 1mb and 9 files of 100kb for reproducing the other 9 png files during loading. This method only requires 1.9mb size in disk. And if I can convert them to pvrtc format, the memory required in runtime can also be reduced.
By the way, I am now trying to convert .bmp to .pvr during game loading. Is there any lib for converting to pvr?
Thanks! :)
If you have lots of textures to convert to pvr, i suggest you get PowerVR tools from www.imgtec.com. It comes with GUI and CLI variants. PVRTexToolCLI did the job for me , i scripted a massive conversion job. Free to download, free to use, you must register on their site.
I just tested it, it converts many formats to pvr (bmp and png included).
Before you go there (the massive batch job), i suggest you experiment with some variants. PVR is (generally) fat on disk, fast to load, and equivalent to other formats in RAM ... RAM requirements is essentially dictated by the number of pixels, and the amount of bits you encode for each pixel. You can get some interesting disk size with pvr, depending on the output format and number of bits you use ... but it may be lossy, and you could get artefacts that are visible. So experiment with limited sample before deciding to go full bore.
The first place I would look at, even before any conversion, is your animations. Since you are using TP, it can detect duplicate frames and alias N frames to a single frame on the texture. For example, my design team provide me all 'walk/stance' animations with 5 pictures, but 8 frames! The plist contains frame aliases for the missing textures. In all my stances, frame 8 is the same as frame 2, so the texture only contains frame 2, but the plist artificially produces a frame8 that crops the image of frame 2.
The other place i would look at is to use 16 bits. This will favour bundle size, memory requirement at runtime, and load speed. Use RGBA565 for textures with no transparency, or RGBA5551 for animations , for examples. Once again, try a few to make certain you get acceptable rendering.
have fun :)
We're currently developing some functionality for our program that needs OpenCV. One of the ideas being tossed at the table is the use of a "buffer" which saves a minute of video data to the memory and then we need to extract like a 13-second video file from that buffer for every event trigger.
Currently we don't have enough experience with OpenCV so we don't know if it is possible or not. Looking at the documentation the only allowable function to write in memory are imencode and imdecode, but those are images. If we can find a way to write sequences of images to a video file that would be neat, but for now our idea is to use a video buffer.
We're also using OpenCV version 2 specifications.
TL;DR We want to know if it is possible to write a portion of a video to memory.
In OpenCV, every video is treated as a collection of frames(images). Depending on your cameras' FPS you can capture frames periodically and fill the buffer with them. Meanwhile you can destroy the oldest frame(taken 1 min before). So a FIFO data structure can be implemented to achieve your goal. Getting a 13 second sample is easy, just jump to a random frame and write 13*FPS frames sequentially to a video file.
But there will be some sync and timing problems AFAIK and as far as I've used OpenCV.
Here is the link of OpenCV documentation about video i/o. Especially the last chunk of code is what you will use for writing.
TL;DR : There is no video, there are sequential images with little differences. So you need to treat them as such.
There is a task, to write a programm that will be crope a JPEG files. But the problem is that some jpeg files has large sizes - hundreds of MegaBytes. So the question: Is it possible to crop a jpeg file, but without loading all file to the RAM, using something like fseek(), and decoding only the parts that needed.
Is that possible? If yes, maybe there is some libraries do the same.
Upd. All this will be used for the deep zoom technology. So when deep zoom will asking for a file, this program will give it, but this should be in real time
There are two ways to accomplish this.
The first is lossless cropping, where you don't decode the file all the way but work with the 8x8 DCT blocks. You'll need to use a library that has this capability, and it places some restrictions on the cropping ability. You can't crop to a boundary that isn't on the DCT square, which limits you to multiples of 8 or 16 depending on the subsampling in the file.
The second way is to use a library that allows you to read and write one line at a time. I know that the IJG library can do this, and probably others as well. This is the easy way, but the downside is that the image goes through a decompression/recompression pass and will lose quality and/or be larger.