embed video stream with custom meta data - c++

I have an optical system that provides a UDP video stream.
From device specification FAQ:
Both single metadata (KLV) stream and compressed video (H.264) with metadata (KLV) are available on Ethernet link. Compressed video and metadata are coupled in the same stream compliant with STANAG 4609 standard. Each encoded video stream is encapsulated with the associated metadata within an MPEG-TS single program stream over Ethernet UDP/IP/ The video and metadata are synchronized through the use of timestamps.
Also there are other devices that provide data about the state of an aircraft (velocity, coords, etc). This data should be displayed on a client GUI display alongside with video. Of course it has to be synchronized with the current video frame.
One of the approaches I though of is to embed this data into the video stream. But I am not sure if it is possible or should I use another (than UDP) protocol for this purpose.
Is it possible/reasonable to use such approach? Is ffmpeg library suitable in this case?
If not, what are the other ways to synchronize data with a video frame.
Latency is crucial. Although bandwidth is limited to 2-5 Mbps.
It seems to be possible using ffmpeg: AVPacket can be provided with additional data using function av_packet_add_side_data which takes a preallocated buffer, size and a type AVPacketSideDataType.
However, I am not sure for now, which enum value of AVPacketSideDataType can be used for custom user-provided binary data.
Something similar that might be used for my needs:
How do I encode KLV packets to an H.264 video using libav*

The quote sounds like you have a transport stream containing two elementary streams (the H.264 video in one, and the KLV data in another). The transport stream is sent over UDP (or TCP, or is just a file, whatever you want - its mostly independent of the transport).
There is a discussion of implementing this kind of thing in the Motion Imagery handbook (which you can download from the MISB part of the NSG Registry at https://nsgreg.nga.mil/misb.jsp - its towards the bottom of the Non-cited Standards Documents table) and in detail in ST 1402 (which you can find in the same table). I'm avoiding providing direct links because the versions change - just look for whatever is current.
The short version is that you can embed the timestamp in the video (see ST 0603 and ST 0604), and then correlate that to the metadata timestamp (Precision Time Stamp, see ST 0601). You don't want to do that at the AVPacket level though. Instead, you need to put side data into AVFrame, with the AV_FRAME_DATA_SEI_UNREGISTERED key (https://ffmpeg.org/doxygen/trunk/group__lavu__frame.html#ggae01fa7e427274293aacdf2adc17076bca4f2dcaee18e5ffed8ff4ab1cc3b326aa). You will need a fairly recent FFmpeg version.
Note: if all you want to do is see the UDP data stream - video on one side, and decoded KLV on the other, then you might like to check out the jMISB Viewer application: https://github.com/WestRidgeSystems/jmisb
It also provides an example of encoding (generator example). Disclaimer: I contribute to the project.

Related

Routing Audio from microphone to network using QT 6.4.x

With QT 6.4.x (Windows), how can I capture microphone audio and repackage it and forward the repackaged audio to a QUdpSocket.
The repackaging will involve changing the captured audio format from its typical 16 bit little endian format and converting to 24 bit big endian format where each packet will have a constant size potentially different size payload to that from the microphone. I am not sure but somehow I think I need to replace the QAudioSink with a QAudioDecoder as the description indicates:
The QAudioDecoder class is a high level class for decoding audio media files. It is similar to the QMediaPlayer class except that audio is provided back through this API rather than routed directly to audio hardware.
I have a partially working example that contains a mixture of sending synthesized audio directly to the speaker. This functionality is based off the 'Audio Output Example' that ships with Qt 6 (my modified example sends a sine wave generated tone to the speakers).
Also in this RtpWorker thread, using the 'Audio Source Example' for inspiration, I was also able to capture and intercept audio packets from the microphone, but I do not know how to send these packets (repackaged per the above) to a UDP socket in a fixed size datagrams, instead I just log the captured packets. I think I need an intermediate circular buffer (the write part of which fills it with captured microphone audio while the read part gets called by a QAudioSink or QAudioDecoder in pull mode).
Per my comment above I think I might need to send them to a QAudioDevice so I can handle the packaging and sending over the network myself.
My code is contained in 2 attachment in the following QTBUG-108383.
It would be great if someone could point to some useful examples that try to do something similar.
try to run Mac OS or Linux its seems Windows bug

Saving video from an RTP stream to a file

I'm trying to get and store the data from an IP camera and I'd appreciate some high level advice as to the best way to do this.
So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.
I'm very happy to do the work, I'd just appreciate some pointers / a high level overview of the steps so I can deconstruct the project!
There is no direct answer to the OPs question here for his question is a bit broad, and without further information that pertains to what the OP intends to do with that information it is difficult to give a precise answer. What I can do here is to suggest to the OP steps that maybe taken and what problems to consider.
OP had stated:
So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.
Now that you have an established communication with the camera and are able to receive data packets via video stream it is now a matter of being able to understand what the RTP payloads are, or how to interpret that data. So at this point you will have to do your research on the RTP protocol which appears to me to be a type of a Network Protocol. Once you have written your structure and functions to work successfully with this protocol it is a matter of breaking down the UPD packets into useful bytes of information. Normally in many cases when it comes to processing either graphic, video or audio data either from a file directly or a stream object, they are usually accompanied with some type of header information. Next, it is a matter of understanding this Header information which is normal in a form of a structure that gives information about the type of content this file or stream holds, so that you know how many bytes of information to extract from it.
I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?
The steps involved may vary depending on your needs and what you intend to do with the information: Are you trying to write the properties or the general information about the video content to a file such as: its compression type, its audio - video codec type, its resolution and frame rate information, its byte rate etc.? Or are you trying to write the actual video content itself to a file that your application will use either for play back or for editing purposes? This all depends on your intentions.
Is the data compressed, in which case I have to decompress it?
At this point once you have successfully been able to interpret the RTP Protocol and parsed the data packets by understanding their header information and saving it to a proper structure, it is then a matter of using that header information to determine what is actually within that stream object. For example and according to the PDF about the properties of the Video Camera that you have supplied the Video Compression can be saved in 2 types, H.264 or MJPEG, this you will have to determine by the information that was provided in the header, from here you would have to branch your code and be able to read and parse each type of compression or, accept the one that you are willing to work with and disregard the other. Next is the Audio Compression if you are concerned about the audio and the types available are AAC(Encoding only), G.711 A-Law, & G.711 U-Law and the same mechanisms apply here. The once are you able to get past the audio and video compression you will then need vital information about the video information itself such as what Resolution and Frame Rate (buffer sizes) were stored from the header information so you know how many bytes to read from the stream and how far to move your pointer through the stream. If you notice the Resolution And Frame Rate there are different acceptable formats available from each type of Compression that is being used:
H.26
1920 x 180 (2.1MP) # 30 fps (1080p)
1280 x 720 # 60 fps (720p)*
720 x 480/576 # 30/25 fps (D1)
704 x 480/576 # 30/20 fps (4CIF)
352 x 240/288 # 30/25 fps (CIF)
MJPEG
720 x 480/576 # 30/25 fps (D1)
740 x 480/578 # 30/25 fps (4CIF)
352 x 240/288 # 30/25 fps (CIF)
Now this is for the resolution & frame rate but the next thing to consider is
you are working with video stream so the above may not apply here in your case and according to the properties about Video-Stream capabilities from the Video Camera These are the types available that you will have to take into account for:
Single-stream H.264 up to 1080p (1920 x 1080) # 30 fps
Dual-stream H.264 and MJPEG
H.264: Primary stream programmable up to 1280 x 720 # 25/20 fps
MJPEG: Secondary stream programmable up to 720 x 576 # 25/20 fps
With these different types available for your Video Camera to use you have to take all these into consideration. Now this also depends on your intentions of your application and what you intend to do with the information. You can write your program to accept all of these types or you can program it to accept only one type with a specific format of that type. This depends on you.
Do I have to do any other modifications?
I don't think you would have any modifications to do unless if your intentions within your actual application is to modify the video - audio information itself. If your intentions within your application are to just read the file for simple playback then the answer would be no as long as all the appropriate information was saved properly and your file parser for reading your custom file structure is able to read in your file's contents and is able to parse the data appropriately for general playback.
Where can I learn about what I'll need to do specific to this camera?
I don't think you need to much more information about the camera itself, for the PDF that you provided in the link within your question has already given you enough information to go on with. What you would need from here is information and documentation about the specific Protocols, Packet Types, Compression & Stream types which a general search of these should suffice.
UDP
Do a Google search for c++ programming UDP Sockets for either Linux or Winsock.
RTP
Do a Google search for c++ programming RTP Packets
Video Compression
Do a Goggle search for both H.26 & MJPEG compression and structure information on stream objects.
Audio Compression
Do a Google search for each of AAC(encoding only), G.711 A-Law, G.711 U-Law if you are interested in the audio as well.
From there once you have the valid specifications for these data structures as a stream object and have required the appropriate header information to determine which type and format this video content is saved as then you should be able to easily parse the Data Packets appropriately. Now as to how you save them or write them to a file all depends on your intentions.
I have provided this as a guideline to follow in order to help lead you in the right direction in a similar manner that a chemist, physicist, scientist, or engineer would approach any typical problem.
The general steps are by following a scientific approach about the current problem. These typically are:
Assessing the Situation
Create either a Hypothesis or a Thesis about the Situation.
Gather the Known Facts
Determine the Unknowns
Draft a Model that Shows a Relationship Between the Known and Unknowns.
Perform both Research and Experimentation
Record or Log Events and Data
Analyze the Data
Draw a Conclusion
Now in the case of writing software application the concept is similar but the approaches may be different or vary as not all of the steps above may be needed and or some additional steps may be required. One such step in the Application Development Cycle that is not found in the Scientific approach would be the process of Debugging an Application. But the general guideline still applies. If you can keep to this type of strategy I am sure that you will be able to have the confidence in gathering what you will need and how to use it from a step by step process to achieve your goals.
I'm trying to get and store the data from a Cisco IPC camera, and I'd appreciate some high level advice as to the best way to do this.
You can probably use openRTSP to do this, which can also output to file. For this approach you would have to write NO code. Implementing RTP, RTSP and RTCP correctly is complex and a lot of work. Should you have requirements that openRTSP doesn't meet, you can use the live555 libraries to for RTSP/RTP/RTCP and write some minimal code to do something with the received video. The mailing list is very responsive provided that you ask "good" questions, and make sure you read the FAQ first.
I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?
You don't need to know this if you use openRTSP. If you use the live555 libraries directly, you'll be passed entire video frames that you would then either have to decode and write to file yourself depending on what you want to achieve. If you DO need/want to know about RTP and RTP payload formats, read the corresponding RFCs, e.g. RFC2326, RFC3550, RFC6184.
Is the data compressed, in which case I have to decompress it?
Generally you want to store compressed media in a file, and use media player software to decode it on playback (Otherwise you end up with huge files).
Where can I learn about what I'll need to do specific to this camera?
If you just want to save the video you ideally don't need to know anything about the camera, other than what standards it implements (which you already do).

Process audio packets decoded with ffmpeg

Following my other post, I am wondernig if it is possible to do some process like MFCC extraction on the decoded audio packets. The code I use decode audio and video from mpeg-2 file using ffmpeg. Process on video is done using opencv as this library permits to grab frames on by one. I need to process the corresponding audio samples in the same time.
Thanks.
I've created a C++ audio engine named "Crosstalk".
Although it's referred to as an "audio engine", It's really just a real-time C++ data (floating point) processing engine. Crosstalk allows you to create and route systems in design-time and real-time. Basically, the engine takes care of all the data routing and gives you a simple platform for creating components through which the data gets processed (E.g. your "Audio Feed" component connected in parallel with the "Video Feed" component). As long as your branches are of equal total buffer length, they will be perfectly synchronized.
It's very easy to use. Here's an example of how to configure a system to play an mp3 file (The components used here are provided with the engine):
XtSystem system;
XtMp3Decoder mp3Decoder;
XtAudioDevice audioDevice;
long md = system.addComponent(&mp3Decoder);
long ad = system.addComponent(&audioDevice);
system.connOutToIn(md,0,ad,0);
system.connOutToIn(md,1,ad,1);
mp3Decoder.loadFile("../05 Tchaikovski-Swan Lake-Scene.mp3");
mp3Decoder.play();
You can check out the API documentation and licensing details here: http://www.adaptaudio.com/Crosstalk
EDIT (01-12-2012):
Crosstalk has been replaced by an open-source project called "DSPatch". DSPatch is essentially an upgraded version of the routing engine behind Crosstalk that is no longer limited to only audio processing. DSPatch allows you to create and route almost any type of process chain imaginable, and free for personal AND proprietary use :)
I downloaded your library and I'm playing with it.
Did you perform some kind of performance comparison with other IPC techniques like socket/localhost, message queues, circular buffers for audio streams?
I'm developing a software application that receives a multichannel UDP stream (128 channels), performs FFT on a subset of it, plays one selected channel, visualizes the spectrum of 2 channels and spectrogram of one channel.
Do you think that DSPatch is fast enough to use it?downloaded your library and I'm playing with it.
Did you perform some kind of performance comparison with other IPC techniques like socket/localhost, message queues, circular buffers for audio streams?
I'm developing a software application that receives a multichannel UDP stream (128 channels), performs FFT on a subset of it, plays one selected channel, visualizes the spectrum of 2 channels and spectrogram of one channel.
Do you think that DSPatch is fast enough to use it?

How to create a video streaming httpserver?

I'm using c++ and poco libraries. I'm trying to implement a video streaming httpserver.
Initially i used Poco::StreamCopier.
But client failed to stream.
Instead client is downloading the video.
How can i make the server to send a streamresponse so that client can stream the video in browser instead of downloading?
While not within POCO, you could use ffmpeg. It has streaming servers for a number of video protocols and is written in C (which you could write POCO-like adapters for).
http://ffmpeg.org/ffmpeg.html#rtp
http://ffmpeg.org/ffmpeg.html#toc-Protocols
http://git.videolan.org/?p=ffmpeg.git;a=tree
And it has a pretty liberal license:
http://ffmpeg.org/legal.html
You need to research which video encoding and container that is right for streaming -- not all video files can stream
Without using something to decode the video at the other end but simply over HTTP, you can use The mime encoding "content-type:multipart/x-mixed-replace; boundary=..." and send a series of jpeg images.
This is actually called M-JPEG over HTTP. See: http://en.wikipedia.org/wiki/Motion_JPEG
The browser will replace each image as it receives it which makes it look like it's video. It's probably the easiest way to stream video to a browser and many IP webcameras support this natively.
However, it's not bandwidth friendly by any means since it has to send a whole jpeg file for each frame. So if you're going to be using this over the internet it'll work but will use more bandwidth than other method.
However, It is naively supported in most browsers now and it sounds like that is what you're after.

streaming video to and from multiple sources

I wanted to get some ideas one how some of you would approach this problem.
I've got a robot, that is running linux and uses a webcam (with a v4l2 driver) as one of its sensors. I've written a control panel with gtkmm. Both the server and client are written in C++. The server is the robot, client is the "control panel". The image analysis is happening on the robot, and I'd like to stream back the video from the camera to the control panel for two reasons:
A) for fun
B) to overlay image analysis results
So my question is, what are some good ways to stream video from the webcam to the control panel as well as giving priority to the robot code to process it? I'm not interested it writing my own video compression scheme and putting it through the existing networking port, a new network port (dedicated to video data) would be best I think. The second part of the problem is how do I display video in gtkmm? The video data arrives asynchronously and I don't have control over main() in gtkmm so I think that would be tricky.
I'm open to using things like vlc, gstreamer or any other general compression libraries I don't know about.
thanks!
EDIT:
The robot has a 1GHz processor, running a desktop like version of linux, but no X11.
Gstreamer solves nearly all of this for you, with very little effort, and also integrates nicely with the Glib event system. GStreamer includes V4L source plugins, gtk+ output widgets, various filters to resize / encode / decode the video, and best of all, network sink and sources to move the data between machines.
For prototype, you can use the 'gst-launch' tool to assemble video pipelines and test them, then it's fairly simply to create pipelines programatically in your code. Search for 'GStreamer network streaming' to see examples of people doing this with webcams and the like.
I'm not sure about the actual technologies used, but this can end up being a huge synchronization ***** if you want to avoid dropped frames. I was streaming a video to a file and network at the same time. What I eventually ended up doing was using a big circular buffer with three pointers: one write and two read. There were three control threads (and some additional encoding threads): one writing to the buffer which would pause if it reached a point in the buffer not read by both of the others, and two reader threads that would read from the buffer and write to the file/network (and pause if they got ahead of the producer). Since everything was written and read as frames, sync overhead could be kept to a minimum.
My producer was a transcoder (from another file source), but in your case, you may want the camera to produce whole frames in whatever format it normally does and only do the transcoding (with something like ffmpeg) for the server, while the robot processes the image.
Your problem is a bit more complex, though, since the robot needs real-time feedback so can't pause and wait for the streaming server to catch up. So you might want to get frames to the control system as fast as possible and buffer some up in a circular buffer separately for streaming to the "control panel". Certain codecs handle dropped frames better than others, so if the network gets behind you can start overwriting frames at the end of the buffer (taking care they're not being read).
When you say 'a new video port' and then start talking about vlc/gstreaming i'm finding it hard to work out what you want. Obviously these software packages will assist in streaming and compressing via a number of protocols but clearly you'll need a 'network port' not a 'video port' to send the stream.
If what you really mean is sending display output via wireless video/tv feed that's another matter, however you'll need advice from hardware experts rather than software experts on that.
Moving on. I've done plenty of streaming over MMS/UDP protocols and vlc handles it very well (as server and client). However it's designed for desktops and may not be as lightweight as you want. Something like gstreamer, mencoder or ffmpeg on the over hand is going to be better I think. What kind of CPU does the robot have? You'll need a bit of grunt if you're planning real-time compression.
On the client side I think you'll find a number of widgets to handle video in GTK. I would look into that before worrying about interface details.