Playback pure tone, variable phase stream with pyaudio - python-2.7

I'm building an acoustic cancelling device based on Pyaudio, fourier transforms and. c-Media usb audio card. The software is threaded, using the producer/consumer model.
The device detects pure tones in the environment (reads chunks of microphone audio), uses fourier to detect the pure tone, and so far so good it works like a charm.
The final step however is getting tricky. I'm aiming to generate a 100ms wave (sine wave), which holds a certain amounts of periods of the frequency to be cancelled.
This wave buffer has to be played with Pyaudio on a separate thread continuously, which also must increase the phase little by little till detecting the amplitude of the tone in the environment drops. This is basically destructive interference.
My problem is when using Pyaudio.stream.write(), the buffer keeps overruning, since i have NO IDEA, what the function is doing internally. I have tried with many combinations of "frame_buffer_size" and audio lenght and no matter what i do, the buffer is overrun.
Ideally, the buffer does not have to be recalculated with a different phase on each run... instead, i'm trying pyaudio to read a different part of the buffer (window) to start writing the sine wave on a different origin each time.
I have no idea how to do that.
Long story short, how would you:
1) create a thread to fill a circular buffer continuously with audio data.
2) create a pyaudio consumer thread that continuously reads the buffer without overruning.
3) manipulate the volume on realtime
My output data must be 44100 hz, little endian, 16bit signed int. Any hints, advise, references or suggestions will be greatly appreciated.

Related

Seeking within MP3 file

I am working on the development of driving software for the hardware implementation by these people. The decoder works properly in overall, but I am struggling making it starting playing the sound at the middle. I suspect that it is common feature of the MP3 decoders as they must have some history of data in order to properly construct current sound (I am not that skilled in MPEG, however have an idea of some basics).
The problem is that this decoder is a black box, and any deepening in its code is an enormous time and effort.
I empirically found out that the sound garbage, when starting somewhere in the middle, happens in no more that 1 (one) seconds after start with file # 320 kbps and 44100 sampling rate. I am actually ok to mute decoder for a second (while it gathers/decodes proper required data for further playback), and then unmute it to continue playback.
I did search on the internet for the matter, did not find anything useful. Tried to invalidate first frames by corrupting frame headers (the easiest that could be done without going into the MP3 headers/data), made things even worse.
Questions:
is there any body of knowledge of how players perform seek in MP3 files and keep non-corrupt sound?
Is my action plan seem valid - mute for 1 second while decoder plays garbage? Is there any way to (easily) calculate the time I must mute output for?
Update: just tried on another file # 128 kbps/48k and maximal garbage time to be about 2 seconds... I can not believe that decoder with so limited resources - input buffer used is 2 kB with some intermediate working buffers, in total must be not more than 36 kB - can keep the history for 2 seconds, or decoder is having problems finding the sync word in the stream... and thus my driver needs to figure out the frame start (by finding out sync word, reading frame header, calculating frame size, and looking after the frame to contain another sync word).
I've found workarounds. The difficulty was that there are actually two problems overlaying each other, but was easy to cope with having structured approach.
The decoder is having issues getting the first sync word of the stream, and works very well when the first bytes supplied to it are FF FB or FF FA. All other bytes - in the middle of the frame - with very high probability, cause major sound corruption, until decoder catches correct sync. Thus I designed the code seeking to the next frame start after the seek point, checking that this is actual start of the frame by calculating frame size and looking at the next frame to contain FFFB/FA.
Having fixed the problem 1 I have had minor corruption left from the decoder starting decoding the frame without historical data. I have solved it by muting the decoder for the first 4 buffering transactions.
Major corruption still happens, but is rare, and it seems that nature of corruption depends on what was in the decoder buffers (not only Huffman input buffer, but other intermediate buffers) before the decoder is instructed to start. My hardware performs clear of the input buffers to 0 when decoder is in reset state, but it seems to be not enough (or just incorrect)...
The decoder itself is a kind of PoC (proof of concept) work, a student term with the aim to prove that they were able to make it; the package is having test bench code, but lacks low level documentation/comments in the code, and is not ready for field implementation and production. In general the fact that it worked for me at all (almost) out of the box makes the honor to the developers and is a mark of high quality of their work. I have reviewed and tried several published projects for MP3 decoders for silicon implementation (FPGA) and concluded that this one is the best available. In addition, the license they provide their work on is generous one.
Update: my research have shown that the most problem lies not in the input buffer (however it is possible to improve the situation by uploading 528 bytes of historical data to the decoder's buffer so that it would be able to grab main data from previous frame), but in the internal state of the decoder. Its documentation says:
To reduce resource usage, part of the RAM for buffering the intermediate data is also shared with Huffman decoding as bit reservoir ...
thus it is a contents of the reservoir and intermediate computed data affecting the decoding. I have confirmed it by starting various set of frames in different sequence, and if set of frames are played in different sequence, nature of garbage changes, or garbage may simply not appear.
Thus, unfortunately, my conclusion: it is not possible to properly seek using this decoder as is. I even do not think it is possible to "fake" playback (to quickly "play" the file till the needed point in buffers) as all three clocks are tied to each other.
I will keep my "best tested" implementation, with the notes on the quality.
Update 2: I was wrong, it is possible to seek softly, but to mitigate the sound corruption (yes, I am still unsure if I fixed it completely) I had to find another deficiency in the decoder: it is related to timing, decoder assumes that further data is always available in the buffer, while it may not be there yet. (It is actually clear from the test bench code supplied within the IP - the way data was replenished during QA and testing). In the cases I caught the corruption, first frames in the first part of the input buffer RAM were not decoded properly, skipped, and decoder quickly skips to second part of the RAM, assuming new data is there, however driving hardware is not ready yet fetching required data and putting this data into the second part of decoder's buffer RAM, thus corruption persisted for quite a long time with decoder looping skipping "invalid" frames until it catches correct image of the frame and normalizes its pace through the buffer.
Now the solution:
play (almost) 5 frames of silence through decoder before unmuting it. This will ensure all decoder's internal buffers are purged. It will not take much time, however requires some coding;
introduce a possibility to set huffman's decoder starting pointer readptr (in huffctl.v) after reset into the value other than 0. It will give the flexibility to have some history data uploaded into the decoder's buffer and start huffman decoder from the middle of the buffer rather than from its very start;
calculate the position to seek to, it calculates relatively easily for MPEG-1 Layer-3: duration=(filesize-ID3size)/(bitrate/8*1000), newPosition=ID3size+seekTime*(bitrate/8*1000). Duration is needed to check that position to seek to fits into the play time, alternatively newPosition can be used to check against file size. These forumlas do not take into account older tag versions appearing at the end of the file, but they are usually not more than 128 bytes, thus a kind of negligible for timing calculation relative to average MP3 sound file size; it also assumes CBR (VBR will require completely different way, requiring more power and data I/O for accurate seeking). Funny enough I found web pages with incorrect duration calculation formula, thus beware posts by ignorant people with cool job titles;
Seek to the calculated position, find next frame from this position on, calculate frame size, and ensure that there's next valid frame at that distance. New pointer will point to this next frame found at the distance;
find out the main_data_begin lookback pointer of the frame now being pointed to at step 4. Decrease the new pointer by this value so that pointer points within previous frame to the start of the main data for the current frame - it will be a pointer for the decoder data start. Note that it will fail if main data begins in more than one frame back (removal of headers of previous frame(s) will be required for proper operation);
fill decoder's buffer starting pointer identified in step 5, and set decoder's decoding start pointer to the one identified in step 4. While the implementation assumes you fill buffer in halves, do it different from the start: fill the whole buffer instead of just a first half. For this, after reset, set idle bit, check for data request, reset idle bit, perform two 1024 byte transfers to the decoder's buffer (effectively filling it completely), and then set idle bit, then reset it, and then set it again;
after performing step 7 continue normally replenishing 1024 bytes per decoder's request.
Employing this plan I had zero sound corruption cases. As you see it requires some changes to Verilog, but it must be easy if you know basics or hardware, know Verilog amd can perform reverse engineering.

Are there any constraints to encode a audio signal?

I capture a pcm sound at some sampling rate, e.g. 24 kHz. I need to encode it using some codec (I use Opus for that) to send over network. I noticed that at some sampling rate I use for encoding with Opus, I often hear some extra "cracking" noise at the receiving end. At other rates, it sounds ok. That might be an implementation bug, but I though there might be some constraints also that I don't know.
I also noticed that if I use another sampling rate while decoding Opus-encoded audio stream, I get a lower or higher pitch of sound, which seems logical to me. So I've read, that I need to resample on the other end, if receiving side doesn't support the original PCM sampling rate.
So I have 2 questions regarding all this:
Are there any constraints on sampling rate (or other parameters) of audio encoding? (Like I have a 24kHz pcm sound - maybe there are certain sample rates to use with it?)
Are there any common techniques to provide the same sound quality at both sides when sending audio stream over network?
The crackling noises are most likely a bug, since there is no limitations to the samplerate that would result in this kind of noise (there are other kinds of signal changes that come with sample rate conversion, especially when downsampling to a lower samplerate; but definitely not crackling).
A wild guess would be, that there is something wrong with the input buffer. Crackling often occurs if samples are omitted or duplicated, oftentimes the result of the boundaries of subsequent buffers not being correct.
Sending audio data over network in realtime will require compression, no matter what. The required data rate is simply too high. There are codecs which provide lossless audio compression (e.g. FLAC), but their compression ratio is comparatively low compared to e.g. Opus.
The problem was solved by buffering packets at receiving end and writing them to the soundcard buffer as soon as some amount has been reached. The 'crackling' noise was then most likely due to the gaps between subsequent frames that were sent to the soundcard buffer

How can I get the frequency value at given time with XAudio2?

I've already loaded the .wav audio to the buffer with XAudio2 (Windows 8.1) and to play it I just have to use:
//start consuming audio in the source voice
/* IXAudio2SourceVoice* */ g_source->Start();
//play the sound
g_source->SubmitSourceBuffer(buffer.xaBuffer());
I wonder, how can I get the frequency value at given time with XAudio2?
The question does not make much sense, a .wav file contains a great many frequencies. It is the blend of them that makes it sound like music to your ears, instead of just an artificial generated tone. A blend that's constantly changing.
A signal processing step is required to convert the samples in the .wav file from the time domain to the frequency domain. Generally known as spectrum analysis, the Fast Fourier Transform (FFT) is the standard technique.
A random Google hit on "xaudio2 fft" produced this code sample. No idea how good it is, but something to play with to get the lay of the land. You'll find more about it in this gamedev question.

WASAPI lagging playback

I'm writing a program to windows store in c++ which plays back the microphone. I have to modify the bits before sending that to the speakers. Firstly I wanted to play back the microphone without any effect bit it is lagging. The frequency and the bit rate is the same (24 bit, 192000Hz) but I also tried with (24 bit, 96000Hz). I debugged it and it seems that the speaker is faster therefore it has to wait for the data from the microphone like the squeakers would work in a higher frequency but according to the settings it doesn't. Dose anyone have a sightliest idea what is the problem here?
When you say that there are some 'lag', do you mean that there are some delay between when you feed the audio capture device with data and when the playback device renders the data or do you mean that the audio stream is 'chopped' with small pauses in between each sample being rendered?
If there's delay in playback I would take a look at with what latency value you've initialized the audio capture client.
If there are small pauses then I would recommend you using double buffering of sample data so that one buffer is being rendered while the other is being re-fetched from the audio capture device.

XAudio2 delay with small buffer size

I'm writing a video player. For audio part i'm using XAudio2. For this i have separate thread that is waiting for BufferEnd event and after this fills buffer with new data and call SubmitSourceBuffer.
The problem is that XAudio2(driver or sound card) has huge delays before playing next buffer if buffer size is small (1024 bytes). I made measurements and XAudio takes up to two times long for play such chunk. (1024 bytes chunk of 48khz raw 2-channeled pcm should be played in nearly 5ms, but on my computer it's played up to 10ms). And nearly no delays if i make buffer 4kbytes or more.
I need such small buffer to be able making synchronizations with video clock or external clock (like ffplay does). If i make my buffer too big then end-user will hear lot of noises in output due to synchronization stuff.
Also i have made measurements on all my functions that are decoding and synchronizing audio or anything else that could block or produce delays, they take 0 or 1 ms to execute, so they are not the problem 100%.
Does anybody know what can it be and why it's happenning? Can anyone check if he has same delay problems with small buffer?
I've not experienced any delay or pause using .wav files. If you are using mp3 format, it may add silence at the beginning and end of the sound during the compress operation thus causing a delay in your sound playing. See this post for more information.