I try to record sound from my pc audio device. I can hear the recorded sound only when I set samplesize to 8 but when I set the samplesize to 16 , the recorded sound is only a whistle. why?
here is the code
void audioprocess::startRecording()
{
outputFile.setFileName("C:/Users/Rem/Documents/Qtproject/remaudio.wav");
outputFile.open( QIODevice::WriteOnly );
QAudioFormat format;
// set up the format you want, eg.
format.setCodec("audio/pcm");
format.setSampleRate(8000);
format.setChannelCount(1);
format.setSampleSize(8);
format.setByteOrder(QAudioFormat::LittleEndian);
format.setSampleType(QAudioFormat::UnSignedInt);
audioInput = new QAudioInput(format);
QAudioDeviceInfo info = QAudioDeviceInfo::defaultInputDevice();
if (!info.isFormatSupported(format)) {
qWarning()<<"default format not supported try to use nearest";
format = info.nearestFormat(format);
}
QTimer::singleShot(120000, this, SLOT(stopRecording()));
audioInput->start(&outputFile);
// Records audio for 120000ms
}
The whistle is probably because every second sample is zero. That means you get a periodic signal at exactly half the sample frequency.
Now what does your code actually do? You haven't shown us the definition of outputFile, but the code snippet is almost literally taken from Qt documentation. And it defines outputFile as a QFile. The file there is called test.raw, and for good reason. Raw files lack a header. Thus it's impossible to determine their sample size.
Finally a question that's in my area of expertise. When it comes to audio development, there are a lot of factors to include into how to program an application. One of those specifically is hardware capabilities. My guess is that your audio device is being sampled too quickly and the higher end bits are just random data.
I would try to increase the sample size, the actual number of frames of audio that are being sampled before being processed, to allow the system to actually buffer the audio and process it while the audio device is receiving new audio input.
Another problem may be the Endianness of your processor. This would usually be an issue if you are on a PowerPC. Or developing for an embedded system that runs on a big Endian processor.
I'm going to take a wild guess here that you are leaving the sample type as unsigned int and only setting the bit depth to 16. Conventionally, 8 bit waveforms are unsigned while 16 bits are signed. Try patching in the following:
format.setSampleSize(16);
format.setSampleType(QAudioFormat::SignedInt);
Related
For my purpose, I want to record sounds in raw format(only samples), 8kHz, 16bit(little endian) and 1 channel. Then, I would like to transfer those samples to the windows and play it with QAudioOutput. So I have two separated programs: one for recording voice with QAudioInput, and other one gives a file which is contained some samples, then I play it with QAudioOutput. Below is my source code for creating QAudioInput and QAudioOutput.
//Initialize audio
void AudioBuffer::initializeAudio()
{
m_format.setFrequency(8000); //set frequency to 8000
m_format.setChannels(1); //set channels to mono
m_format.setSampleSize(16); //set sample sze to 16 bit
m_format.setSampleType(QAudioFormat::UnSignedInt ); //Sample type as usigned integer sample
m_format.setByteOrder(QAudioFormat::LittleEndian); //Byte order
m_format.setCodec("audio/pcm"); //set codec as simple audio/pcm
QAudioDeviceInfo infoIn(QAudioDeviceInfo::defaultInputDevice());
if (!infoIn.isFormatSupported(m_format))
{
//Default format not supported - trying to use nearest
m_format = infoIn.nearestFormat(m_format);
}
QAudioDeviceInfo infoOut(QAudioDeviceInfo::defaultOutputDevice());
if (!infoOut.isFormatSupported(m_format))
{
//Default format not supported - trying to use nearest
m_format = infoOut.nearestFormat(m_format);
}
createAudioInput();
createAudioOutput();
}
void AudioBuffer::createAudioOutput()
{
m_audioOutput = new QAudioOutput(m_Outputdevice, m_format, this);
}
void AudioBuffer::createAudioInput()
{
if (m_input != 0) {
disconnect(m_input, 0, this, 0);
m_input = 0;
}
m_audioInput = new QAudioInput(m_Inputdevice, m_format, this);
}
These programs work well in windows and Linux separately. However, it has a lot of noise when I record a voice in Linux and play it in windows.
I figure out captured samples in windows and Linux are different. First picture is related to captured sound in Linux and second one for windows.
Captured sound in Linux:
Captured sound in Windows:
A bit more on details is that silence in Windows and Linux is different. I tried many things including swapping bytes, even though I set little endian in both platforms.
Now, I am in doubt about alsa configuration. Are there any missed settings?
Do you think it will be better if I record voice directly without using QAudioInput?
The voice is UnSignedInt, but sample value has both negative and positive value! It seems you had trouble in capturing. Change QAudioFormat::UnSignedInt to QAudioFormat::SignedInt.
I'm using WinAPI - Wave functions to create a recording program that records the microphone for X seconds. I've searched a bit over the net, and found out PCM data is too large, and it'll be a problem to send it through sockets...
How can I compress it to something smaller? Any simple / "cheap" way ?
I've also noticed, when I'm declaring the format using the Wave API functions, I'm using this code :
WAVEFORMATEX pFormat;
pFormat.wFormatTag= WAVE_FORMAT_PCM; // simple, uncompressed format
pFormat.nChannels=1; // 1=mono, 2=stereo
pFormat.nSamplesPerSec=sampleRate; // 44100
pFormat.nAvgBytesPerSec=sampleRate*2; // = nSamplesPerSec * n.Channels * wBitsPerSample/8
pFormat.nBlockAlign=2; // = n.Channels * wBitsPerSample/8
pFormat.wBitsPerSample=16; // 16 for high quality, 8 for telephone-grade
pFormat.cbSize=0;
As you can see, pFormat.wFormatTag= WAVE_FORMAT_PCM;
maybe I can insert instead of WAVE_FORMAT_PCM something else, so it'll be compressed right away?
I've checked MSDN for other values, though none of them works for me in my Visual Studio...
So what can I do?
Thanks!
The simplest way is to simply reduce your sample rate from 44100 to something more manageable like 22050, 16000, 11025, or even 8000. Most voice codecs don't go higher than 16000 hz anyway. And the older ones are optimized for 8khz.
The next step is to find a codec. There's a handful of codecs to use with the Windows Audio Compression Manager, but almost all of them date back to Windows 95 and sound terrible by modern standards after being decompressed.
You can always convert to WMA in real time using the Format SDK or with Media Foundation APIs. Or just go get an open source MP3 library like LAME.
For telephone quality speech you can change to 8 bits per sample and a sample rate of 8000. This will greatly reduce the amount of data.
GSM has good compression. You can convert a block of PCM data to GSM (or any other codec you have installed) using acmStreamConvert(). Refer to MSDN for more details:
Converting Data from One Format to Another
Can anyone help decipher the correct implementation of the libspotify get_audio_buffer_stats callback. Specifically, we are supposed to populate a sp_audio_buffer_stats buffer, consisting of samples and stutter?
According to the Docs:
int samples - Samples in buffer.
int stutter - Number of stutters (audio dropouts) since last query.
I'm wondering about "samples." What exactly is this referring to?
The music playback (audio_delivery) callback has a num_frames variable, but then you have the issue of audio format (channels and/or sample_rate).
Is it correct to set "samples" to total amount of "num_frames" currently in my buffer? Or do I need to run some math based on total "num_samples", "channels", and "sample_rate"
It should be the number of frames in your output buffer. I.e. int samples is slightly misnamed and should probably be called int frames instead.
I am reading a .wav file in C and then I am trying to play the audio file using some of the QT functions. Here is how I read the file:
FILE *fhandle=fopen("myAudioFile.wav","rb");
fread(ChunkID,1,4,fhandle);
fread(&ChunkSize,4,1,fhandle);
fread(Format,1,4,fhandle);
fread(Subchunk1ID,1,4,fhandle);
fread(&Subchunk1Size,4,1,fhandle);
fread(&AudioFormat,2,1,fhandle);
fread(&NumChannels,2,1,fhandle);
fread(&SampleRate,4,1,fhandle);
fread(&ByteRate,4,1,fhandle);
fread(&BlockAlign,2,1,fhandle);
fread(&BitsPerSample,2,1,fhandle);
fread(&Subchunk2ID,1,4,fhandle);
fread(&Subchunk2Size,4,1,fhandle);
Data=new quint16 [Subchunk2Size/(BitsPerSample/8)];
fread(Data,BitsPerSample/8,Subchunk2Size/(BitsPerSample/8),fhandle);
fclose(fhandle);
So my audio file is inside Data. Each element of Data is unsigned 16-bit Integer.
To play the sound I divide each 16-bit unsigned Integer into two characters and then every 3 ms (using a timer) I send 256 characters to the audio card.
Assume myData is a character array of 256 characters I do this (every 3 ms) to play the sound:
m_output->write(myData, 256);
Also m_output is defined as:
m_output = m_audioOutput->start();
and m_audioOutput is defined as:
m_audioOutput = new QAudioOutput(m_Outputdevice, m_format, this);
And the audio format is set correctly as:
m_format.setFrequency(44100);
m_format.setChannels(2);
m_format.setSampleSize(16);
m_format.setSampleType(QAudioFormat::UnSignedInt );
m_format.setByteOrder(QAudioFormat::LittleEndian);
m_format.setCodec("audio/pcm");
However, when I try to run the code I hear some noise which is very different from the real audio file.
Is there anything I am doing wronge?
Thanks,
TJ
I think the problem is that you are using QTimer. QTimer is absolutely not going to allow you to run code every three milliseconds exactly, regardless of the platform you're using. And if you're off by just one sample, your audio is going to sound horrible. According to the QTimer docs:
...they are not guaranteed to time out at the exact value specified. In
many situations, they may time out late by a period of time that
depends on the accuracy of the system timers.
and
...the accuracy of the timer will not equal [1 ms] in many real-world situations.
As much as I love Qt, I wouldn't try to use it for signal processing. I would use another framework such as JUCE.
Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
works?
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
...
}
Full source code at http://pastebin.com/m2f28b578
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
?
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
frames?
Is that how it works?
Louise
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gf13067c0ebde29118ca05af76e5b17a9
frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack (http://jackaudio.org/) which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.
I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.
I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.
If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.
ref:
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___s_w___params.html