I am trying to play sound by using QAudioOutput and wav in "raw format". After timer's timeout (every 50ms) I do following:
QByteArray TempSBuffer;
short int *hi;
// Check if wav has reached their end and reset its position to the beginning if yes
if((m_timerStepNum+1)*m_audioOutput->periodSize()>=m_soundBuffer.size()) {
m_timerStepNum=0;
}
// 2. Write the buffer data for the next timecycle into a temporary QByteArray TempSBuffer
TempSBuffer=m_soundBuffer.mid(m_timerStepNum*m_audioOutput->periodSize(), m_audioOutput->periodSize());
hi=(short int *)TempSBuffer.data();
for(int i=0;i < m_audioOutput->periodSize() / 2;i++) { hi[i]*= m_audioOutput->volume(); }
// 4. Play the resulting buffer
m_ioDevice->write(TempSBuffer, m_audioOutput->periodSize());
m_timerStepNum++;
Everything plays ok but when I try to change volume say for example 0.2 in QAudioOutput (and my master volume is 100%) I've got the horrible noise. I should admit that this happens only for my one wav file which has format:
bitsPerSample: 8
channels: 1
frequency: 16000
Other files play ok, as I said. Format examples of good-played waves:
bitsPerSample: 16
channels: 1
frequency: 22050
bitsPerSample: 16
channels: 2
frequency: 22050
bitsPerSample: 16
channels: 2
frequency: 22050
Well, according to The ABCs of PCM (Uncompressed) digital audio in Final Notes -
For some reason, WAV files don't support signed 8-bit format, so when reading and writing WAV files, be aware that 8-bits means unsigned, but in virtually all other cases it's safe to assume integers are signed.
I solved for a while my problem by converting my raw wav to 16-bit format.
Related
I'm attempting to write a simple windows media foundation command line tool to use IMFSourceReader and IMFSyncWriter to load in a video, read the video and audio as uncompressed streams and re-encode them to H.246/AAC with some specific hard-coded settings.
The simple program Gist is here
sample video 1
sample video 2
sample video 3
(Note: the video's i've been testing with are all stereo, 48000k sample rate)
The program works, however in some cases when comparing the newly outputted video to the original in an editing program, I see that the copied video streams match, but the audio stream of the copy is pre-fixed with some amount of silence and the audio is offset, which is unacceptable in my situation.
audio samples:
original - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
copy - |[silence] [silence] [silence] [audio1] [audio2] [audio3] ... etc
In cases like this the first video frames coming in have a non zero timestamp but the first audio frames do have a 0 timestamp.
I would like to be able to produce a copied video who's first frame from the video and audio streams is 0, so I first attempted to subtract that initial timestamp (videoOffset) from all subsequent video frames which produced the video i wanted, but resulted in this situation with the audio:
original - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
copy - |[audio4] [audio5] [audio6] [audio7] [audio8] ... etc
The audio track is shifted now in the other direction by a small amount and still doesn't align. This can also happen sometimes when a video stream does have a starting timestamp of 0 yet WMF still cuts off some audio samples at the beginning anyway (see sample video 3)!
I've been able to fix this sync alignment and offset the video stream to start at 0 with the following code inserted at the point of passing the audio sample data to the IMFSinkWriter:
//inside read sample while loop
...
// LONGLONG llDuration has the currently read sample duration
// DWORD audioOffset has the global audio offset, starts as 0
// LONGLONG audioFrameTimestamp has the currently read sample timestamp
//add some random amount of silence in intervals of 1024 samples
static bool runOnce{ false };
if (!runOnce)
{
size_t numberOfSilenceBlocks = 1; //how to derive how many I need!? It's aribrary
size_t samples = 1024 * numberOfSilenceBlocks;
audioOffset = samples * 10000000 / audioSamplesPerSecond;
std::vector<uint8_t> silence(samples * audioChannels * bytesPerSample, 0);
WriteAudioBuffer(silence.data(), silence.size(), audioFrameTimeStamp, audioOffset);
runOnce= true;
}
LONGLONG audioTime = audioFrameTimeStamp + audioOffset;
WriteAudioBuffer(dataPtr, dataSize, audioTime, llDuration);
Oddly, this creates an output video file that matches the original.
original - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
copy - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
The solution was to insert extra silence in block sizes of 1024 at the beginning of the audio stream. It doesn't matter what the audio chunk sizes provided by IMFSourceReader are, the padding is in multiples of 1024.
My problem is that there seems to be no detectable reason for the the silence offset. Why do i need it? How do i know how much i need? I stumbled across the 1024 sample silence block solution after days of fighting this problem.
Some videos seem to only need 1 padding block, some need 2 or more, and some need no extra padding at all!
My question here are:
Does anyone know why this is happening?
Am I using Media Foundation incorrectly in this situation to cause this?
If I am correct, How can I use the video metadata to determine if i need to pad an audio stream and how many 1024 blocks of silence need to be in the pad?
EDIT:
For the sample videos above:
sample video 1 : the video stream starts at 0 and needs no extra blocks, passthrough of original data works fine.
sample video 2 : video stream starts at 834166 (hns) and needs 1 1024 block of silence to sync
sample video 3 : video stream starts at 0 and needs 2 1024 blocks of silence to sync.
UPDATE:
Other things I have tried:
Increasing the duration of the first video frame to account for the offset: Produces no effect.
I wrote another version of your program to handle NV12 format correctly (yours was not working) :
EncodeWithSourceReaderSinkWriter
I use Blender as video editing tools. Here is my results with Tuning_against_a_window.mov :
from the bottom to the top :
Original file
Encoded file
I changed the original file by settings "elst" atoms with the value of 0 for number entries (I used Visual Studio hexa editor)
Like Roman R. said, MediaFoundation mp4 source doesn't use the "edts/elst" atoms. But Blender and your video editing tools do. Also the "tmcd" track is ignored by mp4 source.
"edts/elst" :
Edits Atom ( 'edts' )
Edit lists can be used for hint tracks...
MPEG-4 File Source
The MPEG-4 file source silently ignores hint tracks.
So in fact, the encoding is good. I think there is no audio stream sync offset, comparing to the real audio/video data. For example, you can add "edts/elst" to the encoded file, to get the same result.
PS: on the encoded file, i added "edts/elst" for both audio/video tracks. I also increased size for trak atoms and moov atom. I confirm, Blender shows same wave form for both original and encoded file.
EDIT
I tried to understand relation between mvhd/tkhd/mdhd/elst atoms, in the 3 video samples. (Yes I know, i should read the spec. But i'm lazy...)
You can use a mp4 explorer tool to get atom's values, or use the mp4 parser from my H264Dxva2Decoder project :
H264Dxva2Decoder
Tuning_against_a_window.mov
elst (media time) from tkhd video : 20689
elst (media time) from tkhd audio : 1483
GREEN_SCREEN_ANIMALS__ALPACA.mp4
elst (media time) from tkhd video : 2002
elst (media time) from tkhd audio : 1024
GOPR6239_1.mov
elst (media time) from tkhd video : 0
elst (media time) from tkhd audio : 0
As you can see, with GOPR6239_1.mov, media time from elst is 0. That's why there is no video/audio sync problem with this file.
For Tuning_against_a_window.mov and GREEN_SCREEN_ANIMALS__ALPACA.mp4, i tried to calculate the video/audio offset.
I modified my project to take this into account :
EncodeWithSourceReaderSinkWriter
For now, i didn't find a generic calculation for all files.
I just find the video/audio offset needed to encode correctly both files.
For Tuning_against_a_window.mov, i begin encoding after (movie time - video/audio mdhd time).
For GREEN_SCREEN_ANIMALS__ALPACA.mp4, i begin encoding after video/audio elst media time.
It's OK, but I need to find the right unique calculation for all files.
So you have 2 options :
encode the file and add elst atom
encode the file using right offset calculation
it depends on your needs :
The first option permits you to keep the original file.But you have to add the elst atom
With the second option you have to read atom from the file before encoding, and the encoded file will loose few original frames
If you choose the first option, i will explain how I add the elst atom.
PS : i'm intersting by this question, because in my H264Dxva2Decoder project, the edts/elst atom is in my todo list.
I parse it, but i don't use it...
PS2 : this link sounds interesting :
Audio Priming - Handling Encoder Delay in AAC
I’m using NAudio and Lame Audio to Convert Wav to Mp3, I’m newbie too for this Audio Conversion code. Thanks to Mark I’m using his Audio File Inspector to get the details
Here is the details
Input - Wave Format details
Opening D:\Data\Test\NAudio\Wav\8777828760-e5749e4c563bf5411c954442085d1ce1#10.58.13.40.wav
DviAdpcm 8000Hz 2 channels 4 bits per sample
Extra Size: 2 Block Align: 512 Average Bytes Per Second: 8110
WaveFormat: DviAdpcm
Length: 788808 bytes: 00:01:37.2640000
Chunk: fact, length 420 D9 0B 00
Output Mp3
Opening D:\Data\Test\NAudio\Mp3\8777828760-e5749e4c563bf5411c954442085d1ce1#10.58.13.40.mp3
MP3 File WaveFormat: MpegLayer3 8000Hz 2 channels 0 bits per sample
Extra Size: 12 Block Align: 1 Average Bytes Per Second: 3000
ID: Mpeg Flags: PaddingIso Block Size: 216 Frames per Block: 1
Length: 3119616 bytes: 00:01:37.4880000
ID3v1 Tag: None
ID3v2 Tag: None
Version25,Layer3,8000Hz,JointStereo,24000bps, length 216
Version25,Layer3,8000Hz,JointStereo,24000bps, length 216
….
….
I’m Converting Wav to Mp3 ( voice recording files).
Question : I’m seeing some compromise in Mp3 Quality, My converted Mp3 is lower file size when compared to Wav, but my audio quality is little poor than Wav, Wonder if i can increase the quality of the Mp3 file ?
Something like increasing the Bitrate etc.
Code for Wav to Mp3 conversation using NAudio / Lame Audio
string filePath = #"D:\Data\Test\NAudio\Wav\11mb.wav";
string outputPath = #"D:\Data\Test\NAudio\Mp3\11mb.mp3";
using (WaveFileReader wavReader = new WaveFileReader(filePath))
using (WaveStream pcm = WaveFormatConversionStream.CreatePcmStream(wavReader))
using (LameMP3FileWriter fileWriter = new LameMP3FileWriter(outputPath, pcm.WaveFormat, LAMEPreset.VBR_90))
{
pcm.CopyTo(fileWriter);
}
This link has more details on my above question
http://mark-dot-net.blogspot.com/search/label/NAudio
MP3 is a heavily compressed codec, it will never get close to the original .Wav quality.
However, if you look at the original .Wav quality, you are starting from a very poor recording. When the Hz and bit depth are that low, there is all sorts of artifacts getting created as the wavform is very poorly represented digitally to start with.
anything under CD quality is going to have a LOT of problems being compressed because so much is missing to begin with.
Perfect for making "Boards of Canada" music though. :)
I'm learning how to read WAV files in C++, and extract data according to the header. I have a few WAV files lying around. By looking at the header of all of them, I see that they all follow the rules of wave files. However, files recordings produced by TeamSpeak are weird, but they're still playable in media players.
So looking at the standard format of WAV files, it looks like this:
So in all files that look normal, I get legitimate values for all the values from "AudioFormat" up to "BitsPerSample" (from the picture). However, in TeamSpeak files, ALL these values are exactly zero.
This, but the first 3 values are not zero. So there's "RIFF" and "WAVE" in the first and third strings, and the ChunkSize seems legit.
So my question is: How does the player know anything about such a file and recognize that this file is mono or stereo? The sample rate? Anything about it? Is it like there's something standard to assume when all these values are zero?
Update
I examined the file with MediaInfo and got this:
General
Complete name : ts3_recording_16_10_02_17_53_54.wav
Format : Wave
File size : 2.45 MiB
Duration : 13 s 380 ms
Overall bit rate mode : Constant
Overall bit rate : 1 536 kb/s
Audio
Format : PCM
Format settings, Endianness : Little
Format settings, Sign : Signed
Codec ID : 1
Duration : 13 s 380 ms
Bit rate mode : Constant
Bit rate : 1 536 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Bit depth : 16 bits
Stream size : 2.45 MiB (100%)
Still though don't understand how it arrived at these conclusions.
After examining your file with a hex editor with WAV binary templates, it is obvious that there is an additional "JUNK" chunk before the "fmt" one (screenshot attached). The JUNK chunk is possibly there for some padding reasons, but all it's values are 0s. You need to seek (fseek maybe) the wav file in your code for the first occurrence of "fmt" bytes and parse the WAVEFORMATEX info from there.
I have searched for an answer to this question for several hours. I have already removed the 44 byte header, and have transferred the data using an ofstream. The input stereo WAV file is 16 bit PCM at a 44.1k Hz sample rate.
int szm;
char* buff = new char[szm];
ifstream ssn(f_infile,ios::binary);
ssn.seekg(0,ssn.end);
szm = ssn.tellg();
ssn.seekg(0,ssn.beg);
ssn.read(buff,szm);
ssn.close();
ofstream sso(f_outfile,ios::binary);
for(int i =0; i < szm; i++)
{
if(i > 44)
{
word_w(file, buff[i],1);
word_w(file, 0-(buff[i]), 1);
}
}
sso.close();
file.close();
I got the size of the file, and read the data into a buffer. I know all a RAW data file is is binary data, and I thought this simple technique would work. However, I got mixed results.
This first one worked like a charm. It was the original sample I wanted to convert. It is a side by side comparison of the original WAV file [top] and the raw data [bottom] imported into Audacity at 44.1k Hz.
This next one distorted the right channel for some reason, and doubled the length of the file. It is also a stereo wave file, 16 bit PCM, 44.1k Hz sample rate.
This third one is completely distorted, and the length has increased even more than the previous one.
Why did it work on the first file, but not the other ones when they are all in the exact same file format (16 bit, 44.1k Hz sample rate, 2 channels)?
To play my test wave file, I set the following format fmt:
fmt.setChannelCount(2);
fmt.setCodec("audio/pcm");
fmt.setByteOrder(QAudioFormat::LittleEndian);
fmt.setSampleType(QAudioFormat::SignedInt);
fmt.setSampleRate(44100);
fmt.setSampleSize(16);
It also works with these settings:
fmt.setSampleRate(22050);
fmt.setSampleSize(32);
Those settings are meant for a QAudioOutput:
player = new QAudioOutput(fmt);
file = new QFile(fileName);
file->open(QIODevice::ReadOnly);
player->start(file);
With this setting I can play my test wave file correctly.
But I want to detect the format's settings by reading the header.
I analyse it, it says:
Opening WAV file at: "C:/Deep Purple - Anthology (Disc 2) - 09 - Hold On.wav"
The size of the WAV file is: 53994908
WAV File Header read:
File Type: "RIFFWAVE"
File Size: 53994900
WAV Marker: "WAVE"
Format Name: "fmt??("
Format Length: 4128
Format Type: 256
Number of Channels: 512
Sample Rate: 11289600
Sample Rate * Bits/Sample * Channels / 8: 45158400
Bits per Sample * Channels / 8.1: 1024
Bits per Sample: 4096
Data Header: ""
Data Size: 937783393
If I divide the sample rate by the number of channels, I get a sample rate per channel of 22050. But why do I have to set 44100 to make it sound good? And why are there 512 channels? Opening the file with Audacity, there are only 2 (Audacity says: Stereo, 44100Hz, 32-bit float).
Here are a bunch of links that helped me out a ton when I was working on this kind of a project.
https://github.com/visore/QAudioCoder
http://qt-project.org/forums/viewthread/6899
http://doc.qt.digia.com/qt-maemo/demos-spectrum-app-wavfile-cpp.html
http://fledisplace.com/QtMultimediaExample2.html
I did find a typo in one of the examples, it wasn't reading one of the parameters with the correct endianness. Let me double check which one it was, and I'll get back to you.
Hope that helps.