Can we use mp3 files for the voice recognition process without using wav files? or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy? The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?
Can we use mp3 files for the voice recognition process without using
wav files?
Not directly. To be able to recognize mp3 streams, you need to use java library to read mp3 and convert to pcm stream (tritonus-mp3, lameonj). You can also invoke ffmpeg as a separate process to decode.
or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy?
Accuracy is affected in both cases, no matter where you decode mp3 file.
The problem is I need to minimize the load transferred through the
network in my application. Will the information which is lost in the
conversion be a huge factor for accuracy?
It's better to use losseless codec like flac for transfer. mp3 conversion degrades ASR accuracy. Another approach would be to calculate features on the client and transfer them to the server.
Related
I read about wav file format and found too many projects of steganography based on it but didn't found that much projects based on mp3 though it is found more frequently on web than wav.
The wav format is uncompressed audio with no formatting headers. You can change a few bits in this format without significantly affecting the audio; you will not break the file format and a listener will not be able to tell the difference between the original file and the modified one.
The mp3 format is compressed audio. If you change bits in mp3, you run risks:
You modify a header and the audio no longer plays back
You modify the audio, and a listener can tell the file is weird. The audio is compressed, so changes in the audio data get magnified upon decompression.
How can I code speech recognition engine (Using Microsoft Speech SDK) to "listen" a video file and save the detection into a file?
This is very similar to this question and has a very similar answer. You need to separate out the audio portion, convert it to WAV format, and send it to an inproc recognizer.
However, it has the same problems that I described before (requires training, assumes a single voice, and assumes the microphone is close to the speaker). If that's the case, then you can likely get reasonably good results. If that's not the case (i.e., you're trying to transcribe a TV show, or worse, some sort of camcorder audio), then the results will likely be unsatisfactory.
I was wondering if anyone knew how to convert an mp3 audio file to an ogg audio file. I know there are programs you can buy online, but I would rather just have my own little app that allowed me to convert as many files I wanted.
It's realtive simple. I wouldn't use the Windows Media Format SDK. Simply because of the fact that it's overkill for the job.
You need a MP3 decoder and a OGG encoder and a little bit of glue code around that (opening files, setting up the codecs, piping raw audio data around ect.)
For the MP3 decoder I suggest that you take a look at the liblame library or use this decoding lib http://www.codeproject.com/KB/audio-video/madlldlib.aspx as a starting point.
For OGG there aren't many choices. You need libogg and libvorbis. Easy as that. The example codes that come with the libs show you how to do the encoding.
Good luck.
It's a bad idea. To quote from the Vorbis FAQ
You can convert any audio format to
Ogg Vorbis. However, converting from
one lossy format, like MP3, to another
lossy format, like Vorbis, is
generally a bad idea. Both MP3 and
Vorbis encoders achieve high
compression ratios by throwing away
parts of the audio waveform that you
probably won't hear. However, the MP3
and Vorbis codecs are very different,
so they each will throw away different
parts of the audio, although there
certainly is some overlap. Converting
a MP3 to Vorbis involves decoding the
MP3 file back to an uncompressed
format, like WAV, and recompressing it
using the Ogg Vorbis encoder. The
decoded MP3 will be missing the parts
of the original audio that the MP3
encoder chose to discard. The Ogg
Vorbis encoder will then discard other
audio components when it compresses
the data. At best, the result will be
an Ogg file that sounds the same as
your original MP3, but it is most
likely that the resulting file will
sound worse than your original MP3. In
no case will you get a file that
sounds better than the original MP3.
Since many music players can play both
MP3 and Ogg files, there is no reason
that you should have to switch all of
your files to one format or the other.
If you like Ogg Vorbis, then we would
encourage you to use it when you
encode from original, lossless audio
sources (like CDs). When encoding from
originals, you will find that you can
make Ogg files that are smaller or of
better quality (or both) than your
MP3s.
(If you must absolutely must convert
from MP3 to Ogg, there are several
conversion scripts available on
Freshmeat.)
http://www.vorbis.com/faq/#transcode
And, for the sake of accuracy, from the same FAQ:
Ogg Ogg is the name of Xiph.org's
container format for audio, video, and
metadata.
Vorbis Vorbis is the name of
a specific audio compression scheme
that's designed to be contained in
Ogg. Note that other formats are
capable of being embedded in Ogg such
as FLAC and Speex.
I imagine it's theoretically possible to embed MP3 in Ogg, though I'm not sure why anyone would want to. FLAC is a lossless audio codec. Speex is a very lossy audio codec optimised for encoding speech. Vorbis is a general-use lossy audio codec. "Ogg audio" is, therefore, a bit of a misnomer. Ogg Vorbis is the proper term for what I imagine you mean.
All that said, if you still want to convert from MP3 to Ogg Vorbis, you could (a) try the Freshmeat link above, (b) look at the other answers, or (c) look at FFmpeg. FFmpeg is a general-purpose library for converting lots of video and audio codecs and formats. It can do a lot of clever stuff. I have heard that its default Vorbis encoder is poor quality, but it can be configured to use libvorbis instead of its inbuilt Vorbis encoder. (That last sentence may be out of date now. I don't know.)
Note that FFmpeg will be using LAME and libvorbis, just as you already are. It won't do anything new for you that way. It just gives you the option to do all sorts of other conversions too.
Foobar2000 (http://www.foobar2000.org/) is free and makes it quite easy to convert between file formats. It would take only a few clicks to convert from MP3 to OGG.
Keep in mind that moving from a lossy format to a lossy format will reduce the quality of the audio more than moving from a lossless format (FLAC, CD Audio, Apple Lossless Codec) to a lossy format (MP3, OGG, M4A). If you have access to the lossless source audio use that to convert it instead.
You will need to decode mp3 then encode into ogg.
One possibility is to use liblame for mp3 decoding and libogg/libvorbis for encoding into ogg. Or just use the command line versions of those.
But I wouldn't say converting from one lossy format to another is a great idea.
You can certainly do this in C++ with the Windows Media Format SDK.
I have only used WMFSDK9 myself. It contains a sample called UncompAVIToWMV, which may get you started. From the Readme:
It shows how to merge samples for
audio and video streams from several
AVI files and either merge these into
similar streams or create a new stream
based on the source stream profile.
It also shows how to create an
arbitrary stream, do multipass
encoding and add SMPTE time codes.
Duplicate: audio and video file compressor
I would like to compress a wmv 2mb or larger file to 3gp 250kb file for mobile devices.
any great compressors for video or audio?
I'm a huge fan of ffmpeg. Find out what codec and resolution your mobile device wants. If you're lucky, H.264 will be supported.
You might have some trouble here. WMV is a container, not a codec, so we can't tell specifically the level of compression we're dealing with and what needs to be changed where, but it may be difficult to get such a dramatic reduction in filesize without making huge compromises, like decreasing the resolution of the video by several orders of magnitude. These compromises may be acceptable for mobile viewing, but there's no guarantee you'll be able to get that filesize down, especially if your file is encoded in a modern codec like H.264 or VC-1.
My first piece of advice is to attempt to locate a good wizard-like transcoder, with a nice non-developer interface on it, etc. Video compression is intense work, and the power tools behind it, and the tools that these wizard-like applications use to actually perform their work, are very complex and take lots of practice and tweaking to get right, and are usually restricted to commandlines. If your mobile device's vendor provides these utilities, for instance, you'll be much better off using them.
If you aren't able to locate such a utility, godspeed and spend lots of time with mencoder and ffmpeg's man pages and IRC rooms. It's not difficult per se, it just takes a lot of study and reading to get acceptable output, especially when you're going after the reductions you've mentioned. Good luck.
How do you programmatically compress a WAV file to another format (PCM, 11,025 KHz sampling rate, etc.)?
I'd look into audacity... I'm pretty sure they don't have a command line utility that can do it, but they may have a library...
Update:
It looks like they use libsndfile, which is released under the LGPL. I for one, would probably just try using that.
Use sox (Sound eXchange : universal sound sample translator) in Linux:
SoX is a command line program that can convert most popular audio files to most other popular audio file formats. It can optionally
change the audio sample data type and apply one or more sound effects to the file during this translation.
If you mean how do you compress the PCM data to a different audio format then there are a variety of libraries you can use to do this, depending on the platform(s) that you want to support. If you just want to change the sample rate of the PCM data then you need a sample rate conversion algorithm instead, which is a completely different problem. Can you be more specific in your requirements?
You're asking about resampling, and more specifically downsampling, not compression. While both processes are lossy (meaning that you will suffer loss of information), downsampling works on raw samples instead of in the frequency domain.
If you are interested in doing compression, then you should look into lame or OGG vorbis libraries; you are no doubt familiar with MP3 and OGG technology, though I have a feeling from your question that you are interested in getting back a PCM file with a lower sampling rate.
In that case, you need a resampling library, of which there are a few possibilites. The most widely known is libsamplerate, which I honestly would not recommend due to quality issues not only within the generated audio files, but also of the stability of the code used in the library itself. The other non-commercial possibility is sox, as a few others have mentioned. Depending on the nature of your program, you can either exec sox as a separate process, or you can call it from your own code by using it as a library. I personally have not tried this approach, but I'm working on a product now where we use sox (for upsampling, actually), and we're quite happy with the results.
The other option is to write your own sample rate conversion library, which can be a significant undertaking, but, if you only are interested in converting with an integer factor (ie, from 44.1kHz to 22kHz, or from 44.1kHz to 11kHz), then it is actually very easy, since you only need to strip out every Nth sample.
In Windows, you can make use of the Audio Compression Manager to convert between files (the acm... functions). You will also need a working knowledge of the WAVEFORMAT structure, and WAV file formats. Unfortunately, to write all this yourself will take some time, which is why it may be a good idea to investigate some of the open source options suggested by others.
I have written a my own open source .NET audio library called NAudio that can convert WAV files from one format to another, making use of the ACM codecs that are installed on your machine. I know you have tagged this question with C++, but if .NET is acceptable then this may save you some time. Have a look at the NAudioDemo project for an example of converting files.