how to add text information to avi by using directshow - c++

thanks to click my question.
To make mux stream avi video, I had using directshow avimux filter.
But, directshow avimux filter is only use to media/img files.
How can I add text information to avi file and meet the stream mux(audio+video+text) condition?

AVI Mux Filter is built on top of AVI API and is limited to support video, audio and DV interleaved streams. Hence, no text option.
Input Pin Media Types Any major type that corresponds to an old-style
FOURCC, or MEDIATYPE_AUXLine21Data. (For more information, see
FOURCCMap Class.)
If the major type is MEDIATYPE_Audio, the format must be FORMAT_WaveFormatEx.
If the major type is MEDIATYPE_Video, the format must be FORMAT_VideoInfo or FORMAT_DvInfo.
If the major type is MEDIATYPE_Interleaved, the format must be FORMAT_DvInfo.
To embed text as an additional stream, you need to write a custom filter (on top of either Windows AVI API, or FFmpeg or other) or find an appropriate third party replacement for stock AVI multiplexer.

Related

Speech to Text audio formats

Can we use MP3 audio file in speech to text Watson API ?
What are the popular unsupported formats for speech to text Watson API ?
I suggest you use WAV format, in the case: popular format. Depends the case use.
If you really need to use MP3, you can simple to convert MP3 to WAV.
But, the formats Speech to Text support is:
audio/flac: Free Lossless Audio Codec (FLAC), a lossless compressed audio coding format. For more information, see en.wikipedia.org/wiki/FLAC.
audio/l16: Linear 16-bit Pulse-Code Modulation (PCM), an uncompressed audio data format. Use this media type to pass a raw PCM file. Note that linear PCM audio can also reside inside a container Waveform Audio File Format (WAV) file. For more information, see the Internet Engineering Task Force (IETF) Request for Comment (RFC) 2586 and en.wikipedia.org/wiki/Pulse-code_modulation.
audio/wav: Waveform Audio File Format (WAV), a standard created by Microsoft® and IBM. A WAV file is a container that is often used for uncompressed audio bitstreams but can contain compressed audio, as well. For more information, see en.wikipedia.org/wiki/WAV.
The service supports WAV files that use any encoding. It accepts audio with a maximum of nine channels (due to an FFmpeg limitation).
audio/ogg/ audio/ogg;codecs=opus / audio/ogg; codecs=vorbis: Ogg is a free, open container format maintained by the Xiph.org Foundation; for more information, see www.xiph.org/ogg/.
Both codecs are free, open, lossy audio-compression formats. Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio.
audio/webm/ audio/webm;codecs=opus/ audio/webm;codecs=vorbis: Web Media (WebM) is an open media-file format; for more information, see webmproject.org. WebM supports audio streams compressed with the Opus and Vorbis audio codecs; Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio. For JavaScript code that shows how to capture audio from a microphone in a Chrome browser and encode it into a WebM data stream.
But, all formats with more details you can see in the Speech to Text Official Documentation.
I suggest you to edit with more details and read the documentation, commonly, the documentation from IBM is very objective and complete.
No MP3 support:
Watson Speech to Text audio formats
Don't struggle with choosing particular audio format for speech to text conversion, most of the manual speech to text or transcription services accepts all available formats. When we go for automatic speech to text service, i always prefer wav over mp3, since it contains high bit audio data without losing the quality of the audio and accepting by most speech engines. And here are the list of formats supported by any Transcription Company: https://www.transcriptionwave.com/format.html

Get encoding of audio track

Suppose I have .3g2 file. I noticed, they can contain audio track of different encoding (AAC, AMR).
Or, for example, an .m4a file can contain (AAC or ALAC) encoded audio track.
MediaInfo detects it pretty well, but I want to be able to do that using C++.
My question is, how can I detect the type of the audio track in a media file?
Thanks.
MediaInfo is also available with a C++ interface, just download MediaInfo library package, and here is a C++ example.
For getting the first audio track format: MediaInfo::Get(Stream_Audio, 0, "Format")

Gstreamer subtitles while recording

I am new to GStreamer and am trying to encode a video stream (for now v4l2src) with a subtitle stream and muxed into an MPEG ts container. I am able to use 'textoverlay' to set the data but I don't want to burn the data into the image. However I am wanting to use the subtitle stream to encode 'metadata' that is generated while the video is being recorded.
Is there a way that I can add subtitles into the MPEG ts as time passes? The content of the subtitle text is not known before hand, for example the gps coords of a moving camera.
There is the 'subtitleoverlay' plugin but I do not fully understand this one. Does it burn the text into the image like the 'textoverlay' or does it add a separate stream?
I think that subtitleoverlay renders and burn text into video frames.. check the example pipeline there is no magic - after subtitle overlay there is videoconvert which works with video frames..
I guess you can just attach subtitle stream into mpegtsmux element. I hope this is possible now - there is this bug/feature request which I hope makes this possible..
I checked the capabilities of mpegtsmux and it supports:
subpicture/x-dvb
 application/x-teletext
If you can somehow manage to input subtitles in form of subpicture/x-dvb then later on receiver you can use dvbsuboverlay element to display them..
I didnt find a way how you can actualy create such stream from text file(I found this question but no answer, maybe ask on IRC)..
I have a feeling that teletext was capable of showing subtitles.. but this may not be what you want (I dont know)..
In both cases I think that if you had rendered stream with rendered subtitles (only subtitles) in form of subtitles.mpg you could use that.. I guess there are some tools out there in wild that you can use for that..
Hope you can use that somehow

MJPEG Video from IP Camera too fast

I'm just trying to read a video Stream out of an IP Camera (Basler BIP-1280c).
The stream I want to have is saved in a buffer on the camera, has a length of 40 seconds and is decoded in MJPEG.
Now if I access the stream via my webbrowser it shows me the 40 seconds without any problems.
But actually I need an application which is capable of downloading and saving the stream by itself.
The camera is accessed via http, so I am using libcurl to access it. This works fine and I also can download the stream without any troubles. I have chosen to save the stream data into an *.avi file (hope that's correct…?).
But now to the problem: I can open the video (tried with Totem Video Player and VLC) and also view all that has been recorded — BUT it's way too fast. The whole video lasts like 5 seconds (instead of 40). Is there in MJPEG anything in a header where to put information like the total video length or the fps? I mean there must be some information missing for the video players, so that they play it way to fast?
Update:
As suggested in the answers, I opened the file with a hexeditor and what I found was this:
--myboundary..Content-Type: image/jpeg..Content-Length: 39050.........*Exif..II*...............V...........................2...................0210................FrameNr=000398732
6.AOI=(0800x0720)#(0240,0060)/(1280x0720).Motion=00000 (no)
[00000 | 00000 | 00000 | 00000 | 00000].Alarm=0000 (no) .IO
=000.RtTrigger=0...Basler..BIP2-1280c..1970:01:05 23:08:10.8
98286......JFIF.................................. ....&"((
This header reoccurs in the file all over ( followed by a a lot of Bytes of binary Data ). This is actually okay, since I read in the camera manual that all MJPEG Pictures get this Header.
More interesting ins the JFIFin the last line. As in the answers suggested this is maybe the indicator of the file format. But afaik JFIF is a single picture format just like jpg. So does this maybe even mean that the whole video file is just some "brainless" chained pictures? And my Player just assumes that he should show this pictures one after another, without any knowledge about the framerate?
There is not a single format to use with MJPEG. From Wikipedia:
[...] there is no document that defines a single exact format that is
universally recognized as a complete specification of “Motion JPEG”
for use in all contexts.
The formats differ by vendor. My advice would be to closely inspect the file you download. Check if it is really an AVI container. (Some cameras can send the frames wrapped in a MIME container).
After the container format is clear, you can check out the documentation of that container and look for a file which has that format and the desired fps. Then you can start adjusting your downloaded file to have the desired effect.
You might also find this project useful: http://mjpeg.sourceforge.net/
Edit:
According to your sample data your camera sends the frames packed into a MIME container. (The first line is the boundary, then the headers until you encounter an empty line, then the file data itseld, followe by the boundary and so on).
These are JPEG files as the header suggests: image/jpeg. JFIF is the standard file format to store JPEG data.
I recommend you to:
Extract the contents of the file into multiple jpeg files (with munpack for instance), then
use ffmpeg or mplayer to create a movie file out of the series of jpegs.
This way you can specify the desired frame rate too.
It can make things more complicated if the camera dynamically canges AOI (area of interest), meaning it can send only a smaller part of the image where change occured. But you should check first if the simple approach works.
on un*x systems (linux, osx,...), you can use the file cmdline tool to make a (usually good) guess about the file format.
--myboundary is an indication that the stream is regular M-JPEG streamed as multipart content over HTTP. There is no well known file format which can hold this stream "as is" and be playable (that is if you rename this to AVI it is not supposed to play back).
The format itself is a sequence of (boundary, subheader, JPEG image), (boundary, subheader, JPEG image), ... etc. The stream does not have time stamps, so playback speed completely depends on the player.

c++ audio conversion ( mp3 -> ogg ) question

I was wondering if anyone knew how to convert an mp3 audio file to an ogg audio file. I know there are programs you can buy online, but I would rather just have my own little app that allowed me to convert as many files I wanted.
It's realtive simple. I wouldn't use the Windows Media Format SDK. Simply because of the fact that it's overkill for the job.
You need a MP3 decoder and a OGG encoder and a little bit of glue code around that (opening files, setting up the codecs, piping raw audio data around ect.)
For the MP3 decoder I suggest that you take a look at the liblame library or use this decoding lib http://www.codeproject.com/KB/audio-video/madlldlib.aspx as a starting point.
For OGG there aren't many choices. You need libogg and libvorbis. Easy as that. The example codes that come with the libs show you how to do the encoding.
Good luck.
It's a bad idea. To quote from the Vorbis FAQ
You can convert any audio format to
Ogg Vorbis. However, converting from
one lossy format, like MP3, to another
lossy format, like Vorbis, is
generally a bad idea. Both MP3 and
Vorbis encoders achieve high
compression ratios by throwing away
parts of the audio waveform that you
probably won't hear. However, the MP3
and Vorbis codecs are very different,
so they each will throw away different
parts of the audio, although there
certainly is some overlap. Converting
a MP3 to Vorbis involves decoding the
MP3 file back to an uncompressed
format, like WAV, and recompressing it
using the Ogg Vorbis encoder. The
decoded MP3 will be missing the parts
of the original audio that the MP3
encoder chose to discard. The Ogg
Vorbis encoder will then discard other
audio components when it compresses
the data. At best, the result will be
an Ogg file that sounds the same as
your original MP3, but it is most
likely that the resulting file will
sound worse than your original MP3. In
no case will you get a file that
sounds better than the original MP3.
Since many music players can play both
MP3 and Ogg files, there is no reason
that you should have to switch all of
your files to one format or the other.
If you like Ogg Vorbis, then we would
encourage you to use it when you
encode from original, lossless audio
sources (like CDs). When encoding from
originals, you will find that you can
make Ogg files that are smaller or of
better quality (or both) than your
MP3s.
(If you must absolutely must convert
from MP3 to Ogg, there are several
conversion scripts available on
Freshmeat.)
http://www.vorbis.com/faq/#transcode
And, for the sake of accuracy, from the same FAQ:
Ogg Ogg is the name of Xiph.org's
container format for audio, video, and
metadata.
Vorbis Vorbis is the name of
a specific audio compression scheme
that's designed to be contained in
Ogg. Note that other formats are
capable of being embedded in Ogg such
as FLAC and Speex.
I imagine it's theoretically possible to embed MP3 in Ogg, though I'm not sure why anyone would want to. FLAC is a lossless audio codec. Speex is a very lossy audio codec optimised for encoding speech. Vorbis is a general-use lossy audio codec. "Ogg audio" is, therefore, a bit of a misnomer. Ogg Vorbis is the proper term for what I imagine you mean.
All that said, if you still want to convert from MP3 to Ogg Vorbis, you could (a) try the Freshmeat link above, (b) look at the other answers, or (c) look at FFmpeg. FFmpeg is a general-purpose library for converting lots of video and audio codecs and formats. It can do a lot of clever stuff. I have heard that its default Vorbis encoder is poor quality, but it can be configured to use libvorbis instead of its inbuilt Vorbis encoder. (That last sentence may be out of date now. I don't know.)
Note that FFmpeg will be using LAME and libvorbis, just as you already are. It won't do anything new for you that way. It just gives you the option to do all sorts of other conversions too.
Foobar2000 (http://www.foobar2000.org/) is free and makes it quite easy to convert between file formats. It would take only a few clicks to convert from MP3 to OGG.
Keep in mind that moving from a lossy format to a lossy format will reduce the quality of the audio more than moving from a lossless format (FLAC, CD Audio, Apple Lossless Codec) to a lossy format (MP3, OGG, M4A). If you have access to the lossless source audio use that to convert it instead.
You will need to decode mp3 then encode into ogg.
One possibility is to use liblame for mp3 decoding and libogg/libvorbis for encoding into ogg. Or just use the command line versions of those.
But I wouldn't say converting from one lossy format to another is a great idea.
You can certainly do this in C++ with the Windows Media Format SDK.
I have only used WMFSDK9 myself. It contains a sample called UncompAVIToWMV, which may get you started. From the Readme:
It shows how to merge samples for
audio and video streams from several
AVI files and either merge these into
similar streams or create a new stream
based on the source stream profile.
It also shows how to create an
arbitrary stream, do multipass
encoding and add SMPTE time codes.