libespeak voicing extra syllables at end of message - c++

I have an application that uses libespeak (version 1.47.11) to announce various status messages in a human-like voice.
This was working well until a new thread was introduced into the application. Now, commonly, the expected words are followed by gibberish. Occasionally these are the final syllables of a longer, previously announced message. Other times they're numbers or just stray letters.
My code resembles:
#include <espeak/speak_lib.h>
// ...
std::string message = "The rain in Spain falls mainly in the plain.";
espeak_Initialize(
AUDIO_OUTPUT_PLAYBACK, // plays audio data asynchronously
500, // length of buffers for synth function, in ms
nullptr, // dir containing espeak-data, or null for default
0); // options are mostly for phoneme callbacks, so 0
espeak_ERROR err = espeak_Synth(
message.c_str(), // text
message.size(), // size
0, // position to start from
POS_CHARACTER, // whether above 0 pos is chars/word/sentences
message.size(), // end position, 0 indicating no end
espeakCHARS_AUTO | // flags: AUTO 8 bit or UTF8 automatically
espeakENDPAUSE, // ENDPAUSE sentence pause at end of text
nullptr, // message identifier given to callback (unused)
nullptr); // user data, passed to the callback function (unused)
if (err != EE_OK)
cerr << "Error synthesising speech" << endl;
// Wait until everything has been spoken
espeak_Synchronize();
I tried allocating a large, zeroed buffer and copying my string into it before passing it off to libespeak, but it didn't help.
The scope of these calls persists as the call to espeak_Synchronize blocks until speech completes, so nothing is deleting the message string. It's as though libespeak is ignoring the length I'm requesting.
Note that if I shorten the size argument (the second one) then the spoken string is truncated.
Note too that I'm only calling libespeak from a single thread within my multithreaded application.

I found a solution to this problem which doesn't explain why speech was failing before, but does make speech sound as expected. And actually the code reads a little better now too.
Instead of using AUDIO_OUTPUT_PLAYBACK for asynchronous playback and then waiting for speech to finish via espeak_Synchronize, use AUDIO_OUTPUT_SYNCH_PLAYBACK for synchronous playback and remove the final call (it doesn't hurt, but is no longer needed.)

Related

socket write in for loop mixes string buffers

I call a function multiple times using a for loop like this:
for ( int con=0; con < this->controller_info.size(); con++ ) {
try {
this->pi.home_axis( this->controller_info.at(con).addr );
}
catch( std::out_of_range &e ) { ... }
}
where the home_axis() function is defined as:
long ServoInterface::home_axis( int addr ) {
std::stringstream cmd;
if ( addr > 0 ) cmd << addr << " ";
cmd << "FRF";
cmd << "\n";
int bytes=this->controller.Write( cmd.str() );
return NO_ERROR;
}
and the controller.Write() function is just a wrapper for the standard write(2) which writes the characters in the string to a socket file descriptor.
You can see that each time home_axis() is called it should have its own, fresh, std::stringstream cmd buffer. But what is happening is that, for the first time the for loop executes, the host that is receiving the bytes written by home_axis, is receiving a single string, once:
1 FRF2 FRF
but if I print the bytes written then it prints 6, twice. So the writer is writing correctly, 6 bytes two different times, but the host is receiving it apparently as a single buffer.
If I execute that for loop again, then the host receives (properly),
1 FRF
and then
2 FRF
handling the two received buffers each as they come in.
How can the std::stringstream cmd buffers be getting mixed like this?
There are no threads involved here.
In an effort to pick this apart a bit, if I insert just 1µsec of delay in that for loop, i.e. usleep(1); then it works properly. Also, if I call the home_axis() function manually, but equally rapid succession, without using a for loop like this,
this->pi.home_axis( this->controller_info.at(0).addr );
this->pi.home_axis( this->controller_info.at(1).addr );
then that also works.
So I'm wondering if it's possible there is a compiler optimization going on?
This has nothing to do with the compiler at all.
TCP is a byte stream. It has no concept of message boundaries. There is no 1:1 relationship between writes and reads. You can write 2 messages of 6 bytes each, and the receiver may receive all 12 bytes at a time, or 1 byte and then 11 bytes, or any combination in between. That is just the way TCP works. By default, it breaks up data packets as it sees fit to optimize transmissions.
What is important is that TCP guarantees the bytes will be delivered (unless the connection is lost), and it will deliver the bytes in the same order that they are written.
As such, the sender must indicate in the data itself where each message begins and ends. Either by sending a message's length before its content, or by separating each message with a unique delimiter (as you are).
On the receiving side, a single read may receive a partial message, or pieces of multiple messages, etc. It is the receiver's responsibility to buffer incoming bytes and extract only complete messages from that buffer as needed, regardless of however many reads it takes to complete them.
As you are delimiting your messages with a trailing \n, the receiver should buffer all bytes and extract only messages that have received their \n, leaving any incomplete message at the end of the buffer for subsequent reads to finish.
This way, message boundaries are preserved and handled correctly.

How to tell if SSL_read has received and processed all the records from single message

Following is the dilemma,
SSL_read, on success returns number of bytes read, SSL_pending is used to tell if the processed record has more that to be read, that means probably buffer provided is not sufficient to contain the record.
SSL_read may return n > 0, but what if this happens when first records has been processed and message effectively is multi record communication.
Question: I am using epoll to send/receive messages, which means I have to queue up event in case I expect more data. What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
PS: This code hasn't been tested so it may be incorrect. Purpose of the code is to share the idea that I am trying to implement.
Following is code snippet for the read -
//read whatever is available.
while (1)
{
auto n = SSL_read(ssl_, ptr_ + tail_, sz_ - tail_);
if (n <= 0)
{
int ssle = SSL_get_error(ch->ssl_, rd);
auto old_ev = evt_.events;
if (ssle == SSL_ERROR_WANT_READ)
{
//need more data to process, wait for epoll notification again
evt_.events = EPOLLIN | EPOLLERR;
}
else if (err == SSL_ERROR_WANT_WRITE)
{
evt_.events = EPOLLOUT | EPOLLERR;
}
else
{
/* connection closed by peer, or
some irrecoverable error */
done_ = true;
tail_ = 0; //invalidate the data
break;
}
if (old_ev != evt_.events)
if (epoll_ctl(epoll_fd_, EPOLL_CTL_MOD, socket_fd_, &evt_) < 0)
{
perror("handshake failed at EPOLL_CTL_MOD");
SSL_free(ssl_);
ssl_ = nullptr;
return false;
}
}
else //some data has been read
{
tail_ = n;
if (SSL_pending(ssl_) > 0)
//buffer wasn't enough to hold the content. resize and reread
resize();
else
break;
}
}
```
enter code here
SSL_read() returns the number of decrypted bytes returned in the caller's buffer, not the number of bytes received on the connection. This mimics the return value of recv() and read().
SSL_pending() returns the number of decrypted bytes that are still in the SSL's buffer and haven't been read by the caller yet. This would be equivalent to calling ioctl(FIONREAD) on a socket.
There is no way to know how many SSL/TLS records constitute an "application message", that is for the decrypted protocol data to dictate. The protocol needs to specify where a message ends and a new message begins. For instance, by including the message length in the message data. Or delimiting messages with terminators.
Either way, the SSL/TLS layer has no concept of "messages", only an arbitrary stream of bytes that it encrypts and decrypts as needed, and transmits in "records" of its choosing. Similar to how TCP breaks up a stream of arbitrary bytes into IP frames, etc.
So, while your loop is reading arbitrary bytes from OpenSSL, it needs to process those bytes to detect separations between protocol messages, so it can then act accordingly per message.
What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
I'd have hoped that your message has a header with the number of records in it. Otherwise the protocol you've got is probably unparseable.
What you'd need is to have a stateful parser that consumes all the available bytes and outputs records once they are complete. Such a parser needs to suspend its state once it reaches the last byte of decrypted input, and then must be called again when more data is available to be read. But in all cases if you can't predict ahead of time how much data is expected, you won't be able to tell when the message is finished - that is unless you're using a self-synchronizing protocol. Something like ATM headers would be a starting point. But such complication is unnecessary when all you need is just to properly delimit your data so that the packet parser can know exactly whether it's got all it needs or not.
That's the problem with sending messages: it's very easy to send stuff that can't be decoded by the receiver, since the sender is perfectly fine with losing data - it just doesn't care. But the receiver will certainly need to know how many bytes or records are expected - somehow. It can be told this a-priori by sending headers that include byte counts or fixed-size record counts (it's the same size information just in different units), or a posteriori by using unique record delimiters. For example, when sending printable text split into lines, such delimiters can be Unicode paragraph separators (U+2029).
It's very important to ensure that the record delimiters can't occur within the record data itself. Thus you need some sort of a "stuffing" mechanism, where if a delimiter sequence appears in the payload, you can alter it so that it's not a valid delimiter anymore. You also need an "unstuffing" mechanism so that such altered delimiter sequences can be detected and converted back to their original form, of course without being interpreted as a delimiter. A very simple example of such delimiting process is the octet-stuffed framing in the PPP protocol. It is a form of HDLC framing. The record separator is 0x7E. Whenever this byte is detected in the payload, it is escaped - replaced by a 0x7D 0x5E sequence. On the receiving end, the 0x7D is interpreted to mean "the following character has been XOR'd with 0x20". Thus, the receiver converts 0x7D 0x5E to 0x5E first (it removes the escape byte), and then XORs it with 0x20, yielding the original 0x7E. Such framing is easy to implement but potentially has more overhead than framing with a longer delimiter sequence, or even a dynamic delimiter sequence whose form differs for each position within the stream. This could be used to prevent denial-of-service attacks, when the attacker may maliciously provide a payload that will incur a large escaping overhead. The dynamic delimiter sequence - especially if unpredictable, e.g. by negotiating a new sequence for every connection - prevents such service degradation.

AT command response parser

I am working on my own implementation to read AT commands from a Modem using a microcontroller and c/c++
but!! always a BUT!! after I have two "threads" on my program, the first one were I am comparing the possible reply from the Moden using strcmp which I believe is terrible slow
comparing function
if (strcmp(reply, m_buffer) == 0)
{
memset(buffer, 0, buffer_size);
buffer_size = 0;
memset(m_buffer, 0, m_buffer_size);
m_buffer_size = 0;
return 0;
}
else
return 1;
this one works fine for me with AT commands like AT or AT+CPIN? where the last response from the Modem is "OK" and nothing in the middle, but it is not working with commands like AT+CREG?, wheres it responses:
+REG: n,n
OK
and I am specting for "+REG: n,n" but I believe strncpy is very slow and my buffer data is replaced for "OK"
2nd "thread" where it enables a UART RX interruption and replaces my buffer data every time it receives new data
Interruption handle:
m_buffer_size = buffer_size;
strncpy(m_buffer, buffer, buffer_size + m_buffer_size);
Do you know any out there faster than strcmp? or something to improve the AT command responses reading?
This has the scent of an XY Problem
If you have seen the buffer contents being over written, you might want to look into a thread safe queue to deliver messages from the RX thread to the parsing thread. That way even if a second message arrives while you're processing the first, you won't run into "buffer overwrite" problems.
Move the data out of the receive buffer and place it in another buffer. Two buffers is rarely enough, so create a pool of buffers. In the past I have used linked lists of pre-allocated buffers to keep fragmentation down, but depending on the memory management and caching smarts in your microcontroller, and the language you elect to use, something along the lines of std::deque may be a better choice.
So
Make a list of free buffers.
When a the UART handling thread loop looks something like,
Get a buffer from the free list
Read into the buffer until full or timeout
Pass buffer to parser.
Parser puts buffer in its own receive list
Parsing sends a signal to wake up its thread.
Repeat until terminated. If the free list is emptied, your program is probably still too slow to keep up. Perhaps adding more buffers will allow the program to get through a busy period, but if the data flow is relatively constant and the free list empties out... Well, you have a problem.
Parser loop also repeats until terminated looks like:
If receive list not empty,
Get buffer from receive list
Process buffer
Return buffer to free list
Otherwise
Sleep
Remember to protect the lists from concurrent access by the different threads. C11 and C++11 have a number of useful tools to assist you here.

waveOutWrite buffers are never returned to application

I have a problem with Microsoft's WaveOut API:
edit1: Added Link to sample project:
edit2: removed link, its not representative of the issue
After playing some audio, when I want to terminate a given playback stream, I call the function:
waveOutClose(hWaveOut_);
However, even after waveOutClose() is called, sometimes the library will still access memory previously passed to it by waveOutWrite(), causing an invalid memory access.
I then tried to ensure all the buffers are marked as done before freeing the buffer:
PcmPlayback::~PcmPlayback()
{
if(hWaveOut_ == nullptr)
return;
waveOutReset(hWaveOut_); // infinite-loops, never returns
for(auto it = buffers_.begin(); it != buffers_.end(); ++it)
waveOutUnprepareHeader(hWaveOut_, &it->wavehdr_, sizeof(WAVEHDR));
while( buffers_.empty() == false ) // infinite loops
removeCompletedBuffers();
waveOutClose(hWaveOut_);
//Unhandled exception at 0x75629E80 (msvcrt.dll) in app.exe:
// 0xC0000005: Access violation reading location 0xFEEEFEEE.
}
void PcmPlayback::removeCompletedBuffers()
{
for(auto it = buffers_.begin(); it != buffers_.end();)
{
if( it->wavehdr_.dwFlags & WHDR_DONE )
{
waveOutUnprepareHeader(hWaveOut_, &it->wavehdr_, sizeof(WAVEHDR));
it = buffers_.erase(it);
}
else
++it;
}
}
However, this situation never happens - the buffer never becomes empty. There will be 4-5 blocks remaining with wavehdr_.dwFlags == 18 (I believe this means the blocks are still marked as in playback)
How can I resolve this issue?
# Martin Schlott ("Can you provide the loop where you write the buffer to waveOutWrite?")
Its not quite a loop, instead I have a function that is called whenever I receive an audio packet over the network:
void PcmPlayback::addData(const std::vector<short> &rhs)
{
removeCompletedBuffers();
if(rhs.empty())
return;
// add new data
buffers_.push_back(Buffer());
Buffer & buffer = buffers_.back();
buffer.data_ = rhs;
ZeroMemory(&buffers_.back().wavehdr_, sizeof(WAVEHDR));
buffer.wavehdr_.dwBufferLength = buffer.data_.size() * sizeof(short);
buffer.wavehdr_.lpData = (char *)(buffer.data_.data());
waveOutPrepareHeader(hWaveOut_, &buffer.wavehdr_, sizeof(WAVEHDR)); // prepare block for playback
waveOutWrite(hWaveOut_, &buffer.wavehdr_, sizeof(WAVEHDR));
}
The described behavior can happen if you do not call
waveOutUnprepareHeader
to every buffer you used before you use
waveOutClose
The flagfield _dwFlags seems to indicate that the buffers are still enqueued (WHDR_INQUEUE | WHDR_PREPARED) try:
waveOutReset
before unprepare buffers.
After analyses your code, I found two problems/bugs which are not related to waveOut (funny, you use C++11 but the oldest media interface). You use a vector as buffer. During some calling operations, the vector is copied! One bug I found is:
typedef std::function<void(std::vector<short>)> CALLBACK_FN;
instead of:
typedef std::function<void(std::vector<short>&)> CALLBACK_FN;
which forces a copy of the vector.
Try to avoid using vectors if you expect to use it mostly as rawbuffer. Better use std::unique_pointer as buffer pointer.
Your callback in the recorder is not monitored by a mutex, nor does it check if a destructor was already called. The destructing happens during the callback (mostly) which leads to an exception.
For your test program, go back and use raw pointer and static callbacks before blaming waveOut. Your code is not bad, but the first bug already shows, that a small bug will lead to unpredictical errors. As you also organize your buffers in a std::array, I would search for bugs there. I guess, you make a unintentional copy of your whole buffer array, unpreparing the wrong buffers.
I did not have the time to dig deeper, but I guess those are the problems.
I managed to find my problem in the end, it was caused by multiple bugs and a deadlock. I will document what happened here so people can learn from this in the future
I was clued in to what was happening when I fixed the bugs in the sample:
call waveInStop() before waveInClose() in ~Recorder.cpp
wait for all buffers to have the WHDR_DONE flag before calling waveOutClose() in ~PcmPlayback.
After doing this, the sample worked fine and did not display the behavior of the WHDR_DONE flag never being marked.
In my main program, that behavior was caused by a deadlock that occurs in the following situation:
I have a vector of objects representing each peer I am streaming audio with
Each Object owns a Playback class
This vector is protected by a mutex
Recorder callback:
mutex.lock()
send audio packet to each peer.
Remove Peer:
mutex.lock()
~PcmPlayback
wait for WHDR_DONE flags to be marked
A deadlock occurs when I remove a peer, locking the mutex and the recorder callback tries to acquire a lock too.
Note that this will happen often because the playback buffer is usually (~4 * 20ms) while the recorder has a cadence of 20ms.
In ~PcmPlayback, the buffers will never be marked as WHDR_DONE and any calls to the WaveOut API will never return because the WaveOut API is waiting for the Recorder callback to complete, which is in turn waiting on mutex.lock(), causing a deadlock.

OpenAL unqueueing error code, incomplete documentation

I am trying to implement streaming audio and I've run into a problem where OpenAL is giving me an error codes seems impossible given the information in the documentation.
int buffersProcessed = 0;
alGetSourcei(m_Source, AL_BUFFERS_PROCESSED, &buffersProcessed);
PrintALError();
int toAddBufferIndex;
// Remove the first buffer from the queue and move it to
//the end after buffering new data.
if (buffersProcessed > 0)
{
ALuint unqueued;
alSourceUnqueueBuffers(m_Source, 1, &unqueued);
/////////////////////////////////
PrintALError(); // Prints AL_INVALID_OPERATION //
/////////////////////////////////
toAddBufferIndex = firstBufferIndex;
}
According to the documentation [PDF], AL_INVALID_OPERATION means: "There is no current context." This seems like it can't be true because OpenAL has been, and continues to play other audio just fine!
Just to be sure, I called ALCcontext* temp = alcGetCurrentContext( ); here and it returned a valid context.
Is there some other error condition that's possible here that's not mentioned in the docs?
More details: The sound source is playing when this code is being called, but the impression I got from reading the spec is you can safely unqueue processed buffers while the source is playing. PrintALError is just a wrapper for alGetError that prints if there is any error.
I am on a Mac (OS 10.8.3), in case it matters.
So far what I've gathered is that it seems this OpenAL implementation incorrectly throws an error if you unqueue a buffer while the source is playing. The spec says that you should be able to unqueue a buffer that has been marked as processing while the source is playing:
Removal of a given queue entry is not possible unless either the source is stopped (in which case then entire queue is considered processed), or if the queue entry has already been processed (AL_PLAYING or AL_PAUSED source).
On that basis I'm gonna say this is probably a bug in my OpenAL implementation. I'm gonna leave the question open in case someone can give a more concrete answer though.
To handle condition for multiple buffers use a loop.
Following works on iOS and linux :
// UN queue used buffers
ALint buffers_processed = 0;
alGetSourcei(streaming_source, AL_BUFFERS_PROCESSED, & buffers_processed); // get source parameter num used buffs
while (buffers_processed > 0) { // we have a consumed buffer so we need to replenish
ALuint unqueued_buffer;
alSourceUnqueueBuffers(streaming_source, 1, & unqueued_buffer);
available_AL_buffer_array_curr_index--;
available_AL_buffer_array[available_AL_buffer_array_curr_index] = unqueued_buffer;
buffers_processed--;
}