c++ audio conversion ( mp3 -> ogg ) question - c++

I was wondering if anyone knew how to convert an mp3 audio file to an ogg audio file. I know there are programs you can buy online, but I would rather just have my own little app that allowed me to convert as many files I wanted.

It's realtive simple. I wouldn't use the Windows Media Format SDK. Simply because of the fact that it's overkill for the job.
You need a MP3 decoder and a OGG encoder and a little bit of glue code around that (opening files, setting up the codecs, piping raw audio data around ect.)
For the MP3 decoder I suggest that you take a look at the liblame library or use this decoding lib http://www.codeproject.com/KB/audio-video/madlldlib.aspx as a starting point.
For OGG there aren't many choices. You need libogg and libvorbis. Easy as that. The example codes that come with the libs show you how to do the encoding.
Good luck.

It's a bad idea. To quote from the Vorbis FAQ
You can convert any audio format to
Ogg Vorbis. However, converting from
one lossy format, like MP3, to another
lossy format, like Vorbis, is
generally a bad idea. Both MP3 and
Vorbis encoders achieve high
compression ratios by throwing away
parts of the audio waveform that you
probably won't hear. However, the MP3
and Vorbis codecs are very different,
so they each will throw away different
parts of the audio, although there
certainly is some overlap. Converting
a MP3 to Vorbis involves decoding the
MP3 file back to an uncompressed
format, like WAV, and recompressing it
using the Ogg Vorbis encoder. The
decoded MP3 will be missing the parts
of the original audio that the MP3
encoder chose to discard. The Ogg
Vorbis encoder will then discard other
audio components when it compresses
the data. At best, the result will be
an Ogg file that sounds the same as
your original MP3, but it is most
likely that the resulting file will
sound worse than your original MP3. In
no case will you get a file that
sounds better than the original MP3.
Since many music players can play both
MP3 and Ogg files, there is no reason
that you should have to switch all of
your files to one format or the other.
If you like Ogg Vorbis, then we would
encourage you to use it when you
encode from original, lossless audio
sources (like CDs). When encoding from
originals, you will find that you can
make Ogg files that are smaller or of
better quality (or both) than your
MP3s.
(If you must absolutely must convert
from MP3 to Ogg, there are several
conversion scripts available on
Freshmeat.)
http://www.vorbis.com/faq/#transcode
And, for the sake of accuracy, from the same FAQ:
Ogg Ogg is the name of Xiph.org's
container format for audio, video, and
metadata.
Vorbis Vorbis is the name of
a specific audio compression scheme
that's designed to be contained in
Ogg. Note that other formats are
capable of being embedded in Ogg such
as FLAC and Speex.
I imagine it's theoretically possible to embed MP3 in Ogg, though I'm not sure why anyone would want to. FLAC is a lossless audio codec. Speex is a very lossy audio codec optimised for encoding speech. Vorbis is a general-use lossy audio codec. "Ogg audio" is, therefore, a bit of a misnomer. Ogg Vorbis is the proper term for what I imagine you mean.
All that said, if you still want to convert from MP3 to Ogg Vorbis, you could (a) try the Freshmeat link above, (b) look at the other answers, or (c) look at FFmpeg. FFmpeg is a general-purpose library for converting lots of video and audio codecs and formats. It can do a lot of clever stuff. I have heard that its default Vorbis encoder is poor quality, but it can be configured to use libvorbis instead of its inbuilt Vorbis encoder. (That last sentence may be out of date now. I don't know.)
Note that FFmpeg will be using LAME and libvorbis, just as you already are. It won't do anything new for you that way. It just gives you the option to do all sorts of other conversions too.

Foobar2000 (http://www.foobar2000.org/) is free and makes it quite easy to convert between file formats. It would take only a few clicks to convert from MP3 to OGG.
Keep in mind that moving from a lossy format to a lossy format will reduce the quality of the audio more than moving from a lossless format (FLAC, CD Audio, Apple Lossless Codec) to a lossy format (MP3, OGG, M4A). If you have access to the lossless source audio use that to convert it instead.

You will need to decode mp3 then encode into ogg.
One possibility is to use liblame for mp3 decoding and libogg/libvorbis for encoding into ogg. Or just use the command line versions of those.
But I wouldn't say converting from one lossy format to another is a great idea.

You can certainly do this in C++ with the Windows Media Format SDK.
I have only used WMFSDK9 myself. It contains a sample called UncompAVIToWMV, which may get you started. From the Readme:
It shows how to merge samples for
audio and video streams from several
AVI files and either merge these into
similar streams or create a new stream
based on the source stream profile.
It also shows how to create an
arbitrary stream, do multipass
encoding and add SMPTE time codes.

Related

How to convert wav to mp3 and mp3 to wav while keeping the same size

I cannot find out how I can convert a wav to mp3 and mp3 to wav. Does anyone know how to convert a .wav file into a .mp3 or .ogg and later convert back into .wav while matching 100% same size like untouched (if it can be done in the command line its much better). I tried to use LAME and later back to .wav with some tools but the file wouldn’t match 100% byte per byte like if it was never was touched. Does anyone know any command line in SoX or FFMPEG that can help me? Thanks!
Most WAV files are raw PCM. MP3 is MP3. And, most Ogg files are going to contain Vorbis or Opus.
MP3, Vorbis, and Opus, are all lossy codecs. They work by taking advantage of what we hear and what we don't hear, psychoacoustics and all that, and saving bandwidth. It's tradeoff between bandwidth and audio quality.
You cannot use the output of a lossy codec to get back to the original source. Therefore, you definitely can't expect to binary compare the outputs and get them to be the same.
You also can't even get the same file size really without knowing more about the source. For instance, the input of your MP3 codec might have been 24-bit audio, but the output of the receiving codec is almost always going to be configured for 16-bit. Also, it's common for these lossy codecs to not be sample-accurate. MP3 in particular has a problem with this. Read up on "gapless playback" if you're in doubt.

Speech to Text audio formats

Can we use MP3 audio file in speech to text Watson API ?
What are the popular unsupported formats for speech to text Watson API ?
I suggest you use WAV format, in the case: popular format. Depends the case use.
If you really need to use MP3, you can simple to convert MP3 to WAV.
But, the formats Speech to Text support is:
audio/flac: Free Lossless Audio Codec (FLAC), a lossless compressed audio coding format. For more information, see en.wikipedia.org/wiki/FLAC.
audio/l16: Linear 16-bit Pulse-Code Modulation (PCM), an uncompressed audio data format. Use this media type to pass a raw PCM file. Note that linear PCM audio can also reside inside a container Waveform Audio File Format (WAV) file. For more information, see the Internet Engineering Task Force (IETF) Request for Comment (RFC) 2586 and en.wikipedia.org/wiki/Pulse-code_modulation.
audio/wav: Waveform Audio File Format (WAV), a standard created by Microsoft® and IBM. A WAV file is a container that is often used for uncompressed audio bitstreams but can contain compressed audio, as well. For more information, see en.wikipedia.org/wiki/WAV.
The service supports WAV files that use any encoding. It accepts audio with a maximum of nine channels (due to an FFmpeg limitation).
audio/ogg/ audio/ogg;codecs=opus / audio/ogg; codecs=vorbis: Ogg is a free, open container format maintained by the Xiph.org Foundation; for more information, see www.xiph.org/ogg/.
Both codecs are free, open, lossy audio-compression formats. Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio.
audio/webm/ audio/webm;codecs=opus/ audio/webm;codecs=vorbis: Web Media (WebM) is an open media-file format; for more information, see webmproject.org. WebM supports audio streams compressed with the Opus and Vorbis audio codecs; Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio. For JavaScript code that shows how to capture audio from a microphone in a Chrome browser and encode it into a WebM data stream.
But, all formats with more details you can see in the Speech to Text Official Documentation.
I suggest you to edit with more details and read the documentation, commonly, the documentation from IBM is very objective and complete.
No MP3 support:
Watson Speech to Text audio formats
Don't struggle with choosing particular audio format for speech to text conversion, most of the manual speech to text or transcription services accepts all available formats. When we go for automatic speech to text service, i always prefer wav over mp3, since it contains high bit audio data without losing the quality of the audio and accepting by most speech engines. And here are the list of formats supported by any Transcription Company: https://www.transcriptionwave.com/format.html

Are wave files better candidate for steganography than mp3?if so then why?

I read about wav file format and found too many projects of steganography based on it but didn't found that much projects based on mp3 though it is found more frequently on web than wav.
The wav format is uncompressed audio with no formatting headers. You can change a few bits in this format without significantly affecting the audio; you will not break the file format and a listener will not be able to tell the difference between the original file and the modified one.
The mp3 format is compressed audio. If you change bits in mp3, you run risks:
You modify a header and the audio no longer plays back
You modify the audio, and a listener can tell the file is weird. The audio is compressed, so changes in the audio data get magnified upon decompression.

mp3 recognition using Sphinx 4

Can we use mp3 files for the voice recognition process without using wav files? or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy? The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?
Can we use mp3 files for the voice recognition process without using
wav files?
Not directly. To be able to recognize mp3 streams, you need to use java library to read mp3 and convert to pcm stream (tritonus-mp3, lameonj). You can also invoke ffmpeg as a separate process to decode.
or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy?
Accuracy is affected in both cases, no matter where you decode mp3 file.
The problem is I need to minimize the load transferred through the
network in my application. Will the information which is lost in the
conversion be a huge factor for accuracy?
It's better to use losseless codec like flac for transfer. mp3 conversion degrades ASR accuracy. Another approach would be to calculate features on the client and transfer them to the server.

Creating web-browser playable webm files with vp8 SDK?

I'm using the vp8 SDK (www.webmproject.org) to create a vp8-encoded video file. However, the SDK sample produces an IVF file, which the browser doesn't play.
I know the webm format is a matroska container so I guess I should store the video data in that format, but the mkv format specification is lengthy and complex and I don't think I should reinvent the wheel by figuring it out by myself.
So I would like to know if someone can recommend a sample of how to encode and produce a playable webm vp8 file.
If there is no such sample (as my searches on google suggest) at least point me to a simple and usable matroska lib which is proven to work for the browsers.