HD Video Calling in Unity3D - c++

I am an amateur in video/image processing but I am trying to create an app for HD video calling. I hope someone would see where I may be doing wrong and guide me on the right path. Here is what I am doing and what I think I understand, please correct me if you know better.
I am using OpenCV currently to grab an image from my webcam in a DLL. (I will be using this image for other things later)
Currently, the image that opencv gives me is a Opencv::Mat. I resized this and converted to a byte array size of a 720p image, which is about 3 Megapixels.
I pass this ptr back to my C# code then I can now render this onto a texture.
Now I created a TCP socket and connect the server and client and start to transmit previously gotten image byte array. I am able to transmit the byte array over to the client then I use the GPU to render it to a texture.
Currently, there is a big delay of about 4-500ms delay. This is after I tried compressing the buffer with gzipstream for unity. It was able to compress the byte array from about 3 million bytes to 1.5 million. I am trying to get this to smallest as possible and also fastest as possible but this is where I am completely lost. I saw that Skype requires only 1.2Mbps connection for a 720p video calling at 22 fps. I have no idea how they can achieve such a small frame, but of course I don't need it to be that small. I need to be at least decent.
Please give me a lecture on how this can be done! And let me know if you need anything else from me.

I found a link that may be very useful to anyone working on something similar. https://www.cs.utexas.edu/~teammco/misc/udp_video/
https://github.com/chenxiaoqino/udp-image-streaming/

Related

Video players questions

Given that FFmpeg is the leading multimedia framework and most of the video/audio players uses it, I'm wondering somethings about audio/video players using FFmpeg as intermediate.
I'm studying and I want to know how audio/video players works and I have some questions.
I was reading the ffplay source code and I saw that ffplay handles the subtitle stream. I tried to use a mkv file with a subtitle on it and doesn't work. I tried using arguments such as -sst but nothing happened. - I was reading about subtitles and how video files uses it (or may I say containers?). I saw that there's two ways putting a subtitle: hardsubs and softsubs - roughly speaking hardsubs mode is burned and becomes part of the video, and softsubs turns a stream of subtitles (I might be wrong - please, correct me).
The question is: How does they handle this? I mean, when the subtitle is part of the video there's nothing to do, the video stream itself shows the subtitle, but what about the softsubs? how are they handled? (I heard something about text subs as well). - How does the subtitle appears on the screen and can be configured changing fonts, size, colors, without encoding everything again?
I was studying some video players source codes and some or most of them uses OpenGL as renderer of the frame and others uses (such as Qt's QWidget) (kind of or for sure) canvas. - What is the most used and which one is fastest and better? OpenGL with shaders and stuffs? Handling YUV or RGB and so on? How does that work?
It might be a dump question but what is the format that AVFrame returns? For example, when we want to save frames as images first we need the frame and then we convert, from which format we are converting from? Does it change according with the video codec or it's always the same?
Most of the videos I've been trying to handle is using YUV720P, I tried to save the frames as png and I need to convert to RGB first. I did a test with the players and I put at the same frame and I took also screenshots and compared. The video players shows the frames more colorful. I tried the same with ffplay that uses SDL (OpenGL) and the colors (quality) of the frames seems to be really low. What might be? What they do? Is it shaders (or a kind of magic? haha).
Well, I think that is it for now. I hope you help me with that.
If this isn't the correct place, please let me know where. I haven't found another place in Stack Exchange communities.
There are a lot of question in one post:
How are 'soft subtitles' handled
The same way as any other stream :
read packets from a stream to the container
Give the packet to a decoder
Use the decoded frame as you wish. Here with most containers supporting subtitles the presentation time will be present. All you need at this time is get the text and burn it onto the image at the same presentation time. There are a lot of ways to print the text on the video, with ffmpeg or another library
What is the most used renderer and which one is fastest and better?
most used depend on the underlying system. For instance Qt only wrap native renderers, and even has a openGL version
You can only be as fast as the underlying system allows. Does it support ouble-buffering? Can it render in your decoded pixel format or do you have to perform color conversion before? This topic is too broad
Better only depend on the use case. this is too broad
what is the format that AVFrame returns?
It is a raw format (enum AVPixelFormat), and depends on the codec. There is a list of YUV and RGB FOURCCs which cover most formats in ffmpeg. Programmatically you can access the table AVCodec::pix_fmts to obtain the pixel format a specific codec support.

Writing an OpenMAX IL component, where to start

I am about to grab the video output of my raspberry pi to pass it to kinda adalight ambient lightning system.
The XBMC's player for PI, omxplayer, users OpenMAX API for decoding and other functions.
Looking into the code gives the following:
m_omx_tunnel_sched.Initialize(&m_omx_sched, m_omx_sched.GetOutputPort(), &m_omx_render, m_omx_render.GetInputPort());
as far as I understand, this sets a pipeline between the video scheduler and the renderer [S]-->[R].
Now my idea is to write a grabber component and plug-in it hardly into the pipeline [S]-->[G]->[R]. The grabber will extract the pixels from the framebuffer and pass it to a deamon which will drive the leds.
Now I am about to dig into OpenMAX API which seems to be pretty weird. Where should I start? Is it a feasible approach?
Best Regards
If you want the decoded data then just do not send to the renderer. Instead of rendering, take the data and do whatever you want to do. The decoded data should be taken from the output port of the video_decode OpenMAX IL component. I suppose you'll also need to set the correct output pixel format, so set the component output port to the correct format you need, so the conversion is done by the GPU (YUV or RGB565 are available).
At first i think you should attach a buffer to the output of camera component, do everything you want with that frame in the CPU, and send a frame through a buffer attached to the input port of the render, its not going to be a trivial task, since there is little documentation about OpenMax on the raspberry.
Best place to start:
https://jan.newmarch.name/RPi/
Best place to have on hands:
http://home.nouwen.name/RaspberryPi/documentation/ilcomponents/index.html
Next best place: source codes distributed across the internet.
Good luck.

Recording and Saving the Screen using C++ on Windows

I'm trying to write an application that records and saves the screen in C++ on the windows platform. I'm not sure where to start with this. I assume I need some sort of API, (FFMPEG, maybe OpenGL?). Could someone point me in the right direction?
You could start by looking at Windows remote desktop protocol, maybe some programming libraries are provided for that.
I know of a product that intercepts calls into the Windows GDI dll and uses that to store the screen drawing activities.
A far more simpler approach would be to do screenshots as often as possible and somehow minimize redundant data (parts of the screen that didn't change between frames).
If the desired output of your app is a video file (like mpeg) you are probably better off just grabbing frames and feeding them into a video encoder. I don't know how fast the encoders are these days. Ffmpeg would be a good place to start.
If the encoder turns out not fast enough, you can try storing the frames and encoding the video file afterwards. Consecutive frames should have many matching pixels, so you could use that to reduce the amount of data stored.

strange behavior with showing images using OpenCV and Qt

I'm capturing images from a Cam using OpenCV C API and send them using TCP sockets.
The server is running C++ (QT) and receive the frame.
The process is working fine and I can see the images on the server.
The weird problem is when I close both programs and rerun the client and the server, I can see the previous frame again that I saw in the previous test.
If, I close both programs again and rerun them, I can see a new frame not the second one, and the process continues.
To make it more clear:
capture1, close, cap1, close, cap3, close, cap3, close, cap5 ......etc
I didn't see something like this before!
I had the same problem before.
the frame size is pretty much and you read from the buffer in a random way (just guessing) , you have to make a timer or an acknowledge between the camera and OpenCV.
Just try to control the way the camera is capturing frames.
I dont know about TCP/IP programming or client/server much...but all I can suggest is initialize the images, generally in the constructors of the camera/client/server class ,
Mat Frame = Mat::zeros(rows,cols,CV_8UC3);
so that every time the client/server is initialized or before you are ready to exchange images...the start up image is a blank image...
you must be initializing using cvCreatImage()..so you can do the following...
IplImage *m = cvCreateImage(cvSize(200,200),8,3);// say its 200 x 200
cvZero(m);
cvShowImage("BLANK",m);
cvvWaitKey();
this shows a black image with each pixel as zero...
Of course this issue comes from the camera. It seems that camera has to receive any acknowledgment once a frame is grabbed. One thing you can try is go to the line of code that sends the image and save the image in the disk in order to check if cap1 is sent twice.

Writing video on memory OpenCV 2

We're currently developing some functionality for our program that needs OpenCV. One of the ideas being tossed at the table is the use of a "buffer" which saves a minute of video data to the memory and then we need to extract like a 13-second video file from that buffer for every event trigger.
Currently we don't have enough experience with OpenCV so we don't know if it is possible or not. Looking at the documentation the only allowable function to write in memory are imencode and imdecode, but those are images. If we can find a way to write sequences of images to a video file that would be neat, but for now our idea is to use a video buffer.
We're also using OpenCV version 2 specifications.
TL;DR We want to know if it is possible to write a portion of a video to memory.
In OpenCV, every video is treated as a collection of frames(images). Depending on your cameras' FPS you can capture frames periodically and fill the buffer with them. Meanwhile you can destroy the oldest frame(taken 1 min before). So a FIFO data structure can be implemented to achieve your goal. Getting a 13 second sample is easy, just jump to a random frame and write 13*FPS frames sequentially to a video file.
But there will be some sync and timing problems AFAIK and as far as I've used OpenCV.
Here is the link of OpenCV documentation about video i/o. Especially the last chunk of code is what you will use for writing.
TL;DR : There is no video, there are sequential images with little differences. So you need to treat them as such.