Espeak Functionality - c++

i am trying to do some functionality with espeak but missing some parameters
(i don`t know it) and working on code blocks on Linux
the next code runs well and reads Arabic Text
`#include<string.h>
#include<malloc.h>
#include</usr/local/include/espeak/speak_lib.h>
int main(int argc, char* argv[] )
{
char text[] = {"الله لطيف "};
espeak_Initialize(AUDIO_OUTPUT_PLAYBACK, 0, NULL, 0 );
espeak_SetVoiceByName("ar");
unsigned int size = 0;
while(text[size]!='\0') size++;
unsigned int flags=espeakCHARS_AUTO | espeakENDPAUSE;
espeak_Synth( text, size+1, 0,POS_CHARACTER,0, flags, NULL, NULL );
espeak_Synchronize( );
return 0;
}`
now could you help us finding these parameters from Espeak
1.Fuction which return the generated wave to store it in a variable
2.Frequency
3.number of channels
4.sample size
5.a buffer in which we store samples
6.number of samples

If you can't find a suitable example, you will have to read the documentation in the header file. Haven't used it, but it looks pretty comprehensible:
http://espeak.sourceforge.net/speak_lib.h
When you called espeak_Initialize you passed in AUDIO_OUTPUT_PLAYBACK. You will need to pass in AUDIO_OUTPUT_RETRIEVAL instead, and then it looks like you must call espeak_SetSynthCallback with a function of your own creation to accept the samples.
Your adapted code would look something like this (UNTESTED):
#include <string.h>
#include <vector>
#include </usr/local/include/espeak/speak_lib.h>
int samplerate; // determined by espeak, will be in Hertz (Hz)
const int buflength = 200; // passed to espeak, in milliseconds (ms)
std::vector<short> sounddata;
int SynthCallback(short *wav, int numsamples, espeak_EVENT *events) {
if (wav == NULL)
return 1; // NULL means done.
/* process your samples here, let's just gather them */
sounddata.insert(sounddata.end(), wav, wav + numsamples);
return 0; // 0 continues synthesis, 1 aborts
}
int main(int argc, char* argv[] ) {
char text[] = {"الله لطيف "};
samplerate = espeak_Initialize(AUDIO_OUTPUT_RETRIEVAL, buflength, NULL, 0);
espeak_SetSynthCallback(&SynthCallback);
espeak_SetVoiceByName("ar");
unsigned int flags=espeakCHARS_AUTO | espeakENDPAUSE;
size_t size = strlen(text);
espeak_Synth(text, size + 1, 0, POS_CHARACTER, 0, flags, NULL, NULL);
espeak_Synchronize();
/* in theory sounddata holds your samples now... */
return 0;
}
So for your questions:
Function which return the generated wave to store it in a variable - You write a callback function, and that function gets little buflength-long bits of the wav to process. If you are going to accumulate the data into a larger buffer, I've shown how you could do that yourself.
Frequency - Through this API it doesn't look like you pick it, espeak does. It's in Hz and returned as samplerate above.
Number of Channels - There's no mention of it, and voice synthesis is generally mono, one would think. (Vocals are mixed center by default in most stereo mixes...so you'd take the mono data you got back and play the same synthesized data on left and right channels.)
Sample Size - You get shorts. Those are signed integers, 2 bytes, range of -32,768 to 32,767. Probably it uses the entire range, doesn't seem to be configurable, but you could test and see what you get out.
A Buffer In Which We Store Samples - The synthesis buffer appears to belong to espeak, which handles the allocation and freeing of it. I've shown an example of using a std::vector to gather chunks from multiple calls.
Number of Samples - Each call to your SynthCallback will get a potentially different number of samples. You might get 0 for that number and it might not mean it's at the end.

Related

OpenH264 DecodeFrameNoDelay output format

I've used the OpenH264 turorial (https://github.com/cisco/openh264/wiki/UsageExampleForDecoder) to successfully decode an H264 frame, but I can't figure out from the tutorial what the output format is.
I'm using the "unsigned char *pDataResult[3];" (pData in the tutorial), and this gets populated, but I need to know the length in order to convert it to byte buffers to return it to java. I also need to know what is the ownership of this data (it seems to be owned by the decoder). This info isn't mentioned in the tutorial or docs as far as I can find.
unsigned char *pDataResult[3];
int iRet = pSvcDecoder->DecodeFrameNoDelay(pBuf, iSize, pDataResult, &sDstBufInfo);
The tutorial also lists an initializer, but gives "..." as the assignment.
//output: [0~2] for Y,U,V buffer for Decoding only
unsigned char *pData[3] =...;
Is the YUV data null terminated?
There is the SBufferInfo last parameter with TagSysMemBuffer:
typedef struct TagSysMemBuffer {
int iWidth; ///< width of decoded pic for display
int iHeight; ///< height of decoded pic for display
int iFormat; ///< type is "EVideoFormatType"
int iStride[2]; ///< stride of 2 component
} SSysMEMBuffer;
And the length is probably in there, but not clear exactly. Maybe it is "iWidth*iHeight" for each buffer?
pData is freed in decoder destructor with WelsFreeDynamicMemory in decoder.cpp, just as you supposed.
Decoder itself assign nullptr's to channels, but it's fine to initialize pData with them as a good habit.
You have iSize parameter as input, that is the byte buffers length you want.

8bpp BMP - refering pixels to the color table; want to read only one row of pixels; C++

I have a problem with reading 8bit grayscale bmp. I am able to get info from header and to read the palette, but I can't refer pixel values to the palette entries. Here I have found how to read the pixel data, but not actually how to use it in case of bmp with a palette. I am a beginner. My goal is to read only one row of pixels at a time.
Code:
#include <iostream>
#include <fstream>
using namespace std;
int main(int arc, char** argv)
{ const char* filename="Row_tst.bmp";
remove("test.txt");
ofstream out("test.txt",ios_base::app);//file for monitoring the results
FILE* f = fopen(filename, "rb");
unsigned char info[54];
fread(info, sizeof(unsigned char), 54, f); // read the header
int width = *(int*)&info[18];
int height = *(int*)&info[22];
unsigned char palette[1024]; //read the palette
fread(palette, sizeof(unsigned char), 1024, f);
for(int i=0;i<1024;i++)
{ out<<"\n";
out<<(int)palette[i];
}
int paletteSmall[256]; //1024-byte palette won't be needed in the future
for(int i=0;i<256;i++)
{ paletteSmall[i]=(int)palette[4*i];
out<<paletteSmall[i]<<"\n";
}
int size = width;
//for(int j=0;j<height;j++)
{ unsigned char* data = new unsigned char[size];
fread(data, sizeof(unsigned char), size, f);
for(int i=0;i<width;i++)
{ cout<<"\n"<<i<<"\t"<<paletteSmall[*(int*)&data[i]];
}
delete [] data;
}
fclose(f);
return 0;
}
What I get in the test.txt seems fine - first values from 0 0 0 0 to 255 255 255 0 (palette), next values from 0 do 255 (paletteSmall).
The problem is that I can't refer pixel values to the color table entries. My application callapses, with symptoms indicating, probably, that it tried to use some unexisting element of a table. If I understand properly, a pixel from a bmp with a color table should contain a number of a color table element, so I have no idea why it doesn't work. I ask for your help.
You are forcing your 8-bit values to be read as int:
cout<<"\n"<<i<<"\t"<<paletteSmall[*(int*)&data[i]];
The amount of casting indicates you were having problems here and probably resolved to adding one cast after another until "it compiled". As it turns out, compiling without errors is not the same as working without errors.
What happens here is that you force the data pointer to read 4 bytes (or as much as your local int size is, anyway) and so the value will almost always exceed the size of paletteSmall. (In addition, the last couple of values will be invalid under all circumstances, because you read bytes from beyond the valid range of data.)
Because the image data itself is 8-bit, all you need here is
cout<<"\n"<<i<<"\t"<<paletteSmall[data[i]];
No casts necessary; data is an unsigned char * so its values are limited from 0 to 255, and paletteSmall is exactly the correct size.
On Casting
The issue with casting is that your compiler will complain if you tell it flat out to treat a certain type of value as if it is another type altogether. By using a cast, you are telling it "Trust me. I know what I am doing."
This can lead to several problems if you actually do not know :)
For example: a line such as your own
int width = *(int*)&info[18];
appears to work because it returns the proper information, but that is in fact a happy accident.
The array info contains several disconnected unsigned char values, and you tell your compiler that there is an int stored starting at position #18 – it trusts you and reads an integer. It assumes that (1) the number of bytes that you want to combine into an integer is in fact the number of bytes that itself uses for an int (sizeof(int)), and (2) the individual bytes are in the same order as it uses internally (Endianness).
If either of these assumptions is false, you can get surprising results; and almost certainly not what you wanted.
The proper procedure is to scan the BMP file format for how the value for width is stored, and then using that information to get the data you want. In this case, width is "stored in little-endian format" and at offset 18 as 4 bytes. With that, you can use this instead:
int width = info[18]+(info[19]<<8)+(info[20]<<16)+(info[21]<<24);
No assumptions on how large an int is (except that it needs to be at least 4 bytes), no assumption on the order (shifting values 'internally' do not depend on endianness).
So why did it work anyway (at least, on your computer)? The most common size for an int in this decade is 4 bytes. The most popular CPU type happens to store multi-byte values in the same order as they are stored inside a BMP. Add that together, and your code works, on most computers, in this decade. A happy accident.
The above may not be true if you want to compile your code on another type of computer (such as an embedded ARM system that uses another endianness), or when the used compiler has a smaller (.. which by now would be a very old compiler) or a larger size for int (just wait another 10 years or so), or if you want to adjust your code to read other types of files (which will have parameters of their own, and the endianness used is one of them).

How to make secondary DirectBuffer sound?

I am trying to get sound from simple tapping keyboard. Looks like a little drum machine.
If DirectSound is not a proper way to do this, please suggest something else.
In my code I don't know what's wrong. Here it is without error checking and with translations:
//Declaring the IDirectSound object
IDirectSound* device;
DirectSoundCreate(NULL, &device, NULL);
device->SetCooperativeLevel(hWnd, DSSCL_NORMAL );
/* Declaring secondary buffers */
IDirectSoundBuffer* kickbuf;
IDirectSoundBuffer* snarebuf;
/* Declaring .wav files pointers
And to structures for reading the information int the begining of the .wav file */
FILE* fkick;
FILE* fsnare;
sWaveHeader kickHdr;
sWaveHeader snareHdr;
The structure sWaveHeader is declared this way:
typedef struct sWaveHeader
{
char RiffSig[4]; // 'RIFF'
unsigned long WaveformChunkSize; // 8
char WaveSig[4]; // 'WAVE'
char FormatSig[4]; // 'fmt '
unsigned long FormatChunkSize; // 16
unsigned short FormatTag; // WAVE_FORMAT_PCM
unsigned short Channels; // Channels
unsigned long SampleRate;
unsigned long BytesPerSec;
unsigned short BlockAlign;
unsigned short BitsPerSample;
char DataSig[4]; // 'data'
unsigned long DataSize;
} sWaveHeader;
The .wav file opening
#define KICK "D:/muzic/kick.wav"
#define SNARE "D:/muzic/snare.wav"
fkick = fopen(KICK, "rb")
fsnare = fopen(SNARE, "rb")
Here I make a function that does the common work for snarebuf* and **kickbuf
int read_wav_to_WaveHeader (sWaveHeader* , FILE* , IDirectSoundBuffer* ); // The declaring
But I wil not write this function, just show the way it works with kickbuf, for instance.
fseek(fkick, 0, SEEK_SET); // Zero the position in file
fread(&kickHdr, 1, sizeof(sWaveHeader), fkick); // reading the sWaveHeader structure from file
Here goes a checking for fitting if sWaveHeader structure:
if(memcmp(pwvHdr.RiffSig, "RIFF", 4) ||
memcmp(pwvHdr.WaveSig, "WAVE", 4) ||
memcmp(pwvHdr.FormatSig, "fmt ", 4) ||
memcmp(pwvHdr.DataSig, "data", 4))
return 1;
Declaring the format and descriptor for a buffer and filling them:
DSBUFFERDESC bufDesc;
WAVEFORMATEX wvFormat;
ZeroMemory(&wvFormat, sizeof(WAVEFORMATEX));
wvFormat.wFormatTag = WAVE_FORMAT_PCM;
wvFormat.nChannels = kickHdr.Channels;
wvFormat.nSamplesPerSec = kickHdr.SampleRate;
wvFormat.wBitsPerSample = kickHdr.BitsPerSample;
wvFormat.nBlockAlign = wvFormat.wBitsPerSample / 8 * wvFormat.nChannels;
ZeroMemory(&bufDesc, sizeof(DSBUFFERDESC));
bufDesc.dwSize = sizeof(DSBUFFERDESC);
bufDesc.dwFlags = DSBCAPS_CTRLVOLUME |
DSBCAPS_CTRLPAN |
DSBCAPS_CTRLFREQUENCY;
bufDesc.dwBufferBytes = kickHdr.DataSize;
bufDesc.lpwfxFormat = &wvFormat;
Well, the creating of a buffer:
device->CreateSoundBuffer(&bufDesc, &kickbuf, NULL); // Any mistakes by this point?
Now locking the buffer and loading some data to it.
This data starts after sizeof(sWaveHeader) bytes in a WAVE file, am I wrong?
LPVOID Ptr1; // pointer on a pointer on a First block of data
LPVOID Ptr2; // pointer on a pointer on a Second block of data
DWORD Size1, Size2; // their sizes
Now calling the Lock() method:
kickbuf->Lock((DWORD)LockPos, (DWORD)Size,
&Ptr1, &Size1,
&Ptr2, &Size2, 0);
Loading data (is it ok?):
fseek(fkick, sizeof(sWaveHeader), SEEK_SET);
fread(Ptr1, 1, Size1, fkick);
if(Ptr2 != NULL)
fread(Ptr2, 1, Size2, fkick);
Unlocking the buffer:
kickbuf->Unlock(Ptr1, Size1, Ptr2, Size2);
Setting the volume:
kickbuf->SetVolume(-2500);
Then I make a wile(1) looping:
1. ask for a key pressing
2. if it is pressed:
kickbuf->SetCurrentPosition(0)
kickbuf->Play(0,0,0);
But there's no sound playing, please say, what is not proper in my code or maybe in the whole concept. Thank you.
When you initialize the WAVEFORMATEX, your are forgetting to set the nAvgBytesPerSec member. Add this line after the initialization of wvFormat.nBlockAlign:
wvFormat.nAvgBytesPerSec = wvFormat.nSamplesPerSec * wvFormat.nBlockAlign;
Also, I suspect this could be a problem:
kickbuf->SetVolume(-2500);
I suspect that will just attenuate your sample to absolute silence. Try taking that call out so that it plays at full volume.
But more likely, none of you sample code above shows validation of the return values from any of the DirectSound APIs, nor any of the file I/O values. Have you validated the HRESULTs returned by all the DSound APIs are returning S_OK? Have you tried printing or using OutputDebugString to print the values you computed for the members of WAVEFORMATEX?
Have you debugging the fread calls to validate that you are getting valid data into your buffers?
Hope this helps.

ReadConsoleOutputCharacter gives ERROR_NOT_ENOUGH_MEMORY when requesting more than 0xCFE1 characters, is there a way around that?

the code:
#include <windows.h>
#include <stdio.h>
int main() {
system("mode 128");
int range = 0xCFE2;
char* buf = new char[range+1];
DWORD dwChars;
if (!ReadConsoleOutputCharacter(
GetStdHandle(STD_OUTPUT_HANDLE),
buf, // Buffer where store symbols
range, // Read len chars
{0,0}, // Read from row=8, column=6
&dwChars // How many symbols stored
)) {
printf("GetLastError: %lu\n", GetLastError());
}
system("pause");
return 0;
}
Console screen buffers cannot be larger than 64K. Each character in the buffer requires 2 bytes, one for the character code and another for the color attributes. It therefore never makes any sense to try to read more than 32K chars with ReadConsoleOutputCharacter().
You don't have a real problem.
The documentation for WriteConsole() says:
If the total size of the specified number of characters exceeds the available heap, the function fails with ERROR_NOT_ENOUGH_MEMORY.
ReadConsoleOutputCharacter() probably has a similar restriction if you try to read too much, even though it is not documented. Try using GetConsoleScreenBufferInfo() or similar function to determine how many rows and columns there are, and then don't read more than that.

How to write only regularly spaced items from a char buffer to disk in C++

How can I write only every third item in a char buffer to file quickly in C++?
I get a three-channel image from my camera, but each channel contains the same info (the image is grayscale). I'd like to write only one channel to disk to save space and make the writes faster, since this is part of a real-time, data collection system.
C++'s ofstream::write command seems to only write contiguous blocks of binary data, so my current code writes all three channels and runs too slowly:
char * data = getDataFromCamera();
int dataSize = imageWidth * imageHeight * imageChannels;
std::ofstream output;
output.open( fileName, std::ios::out | std::ios::binary );
output.write( data, dataSize );
I'd love to be able to replace the last line with a call like:
int skipSize = imageChannels;
output.write( data, dataSize, skipSize );
where skipSize would cause write to put only every third into the output file. However, I haven't been able to find any function that does this.
I'd love to hear any ideas for getting a single channel written to disk quickly.
Thanks.
You'll probably have to copy every third element into a buffer, then write that buffer out to disk.
You can use a codecvt facet on a local to filter out part of the output.
Once created you can imbue any stream with the appropraite local and it will only see every third character on the input.
#include <locale>
#include <fstream>
#include <iostream>
class Filter: public std::codecvt<char,char,mbstate_t>
{
public:
typedef std::codecvt<char,char,mbstate_t> MyType;
typedef MyType::state_type state_type;
typedef MyType::result result;
// This indicates that we are converting the input.
// Thus forcing a call to do_out()
virtual bool do_always_noconv() const throw() {return false;}
// Reads from -> from_end
// Writes to -> to_end
virtual result do_out(state_type &state,
const char *from, const char *from_end, const char* &from_next,
char *to, char *to_limit, char* &to_next) const
{
// Notice the increment of from
for(;(from < from_end) && (to < to_limit);from += 3,to += 1)
{
(*to) = (*from);
}
from_next = from;
to_next = to;
return((to > to_limit)?partial:ok);
}
};
Once you have the facet all you need is to know how to use it:
int main(int argc,char* argv[])
{
// construct a custom filter locale and add it to a local.
const std::locale filterLocale(std::cout.getloc(), new Filter());
// Create a stream and imbue it with the locale
std::ofstream saveFile;
saveFile.imbue(filterLocale);
// Now the stream is imbued we can open it.
// NB If you open the file stream first.
// Any attempt to imbue it with a local will silently fail.
saveFile.open("Test");
saveFile << "123123123123123123123123123123123123123123123123123123";
std::vector<char> data[1000];
saveFile.write( &data[0], data.length() /* The filter implements the skipSize */ );
// With a tinay amount of extra work
// You can make filter take a filter size
// parameter.
return(0);
}
Let's say your buffer is 24-bit RGB, and you're using a 32-bit processor (so that operations on 32-bit entities are the most efficient).
For the most speed, let's work with a 12-byte chunk at a time. In twelve bytes, we'll have 4 pixels, like so:
AAABBBCCCDDD
Which is 3 32-bit values:
AAAB
BBCC
CDDD
We want to turn this into ABCD (a single 32-bit value).
We can create ABCD by applying a mask to each input and ORing.
ABCD = A000 | 0BC0 | 000D
In C++, with a little-endian processor, I think it would be:
unsigned int turn12grayBytesInto4ColorBytes( unsigned int buf[3] )
{
return (buf[0]&0x000000FF) // mask seems reversed because of little-endianness
| (buf[1]&0x00FFFF00)
| (buf[2]&0xFF000000);
}
It's probably fastest to do this another conversion to another buffer and THEN dump to disk, instead of going directly to disk.
There is no such a functionality in the standardlibrary afaik. Jerry Coffin's solution will work best. I wrote a simple snippet which should do the trick:
const char * data = getDataFromCamera();
const int channelNum = 0;
const int channelSize = imageWidth * imageHeight;
const int dataSize = channelSize * imageChannels;
char * singleChannelData = new char[channelSize];
for(int i=0; i<channelSize ++i)
singleChannelData[i] = data[i*imageChannels];
try {
std::ofstream output;
output.open( fileName, std::ios::out | std::ios::binary );
output.write( singleChannelData, channelSize );
}
catch(const std::ios_base::failure& output_error) {
delete [] channelSize;
throw;
}
delete [] singleChannelData;
EDIT: i added try..catch. Of course you could aswell use a std::vector for nicer code, but it might be a tiny bit slower.
First, I'd mention that to maximize writing speed, you should write buffers that are multiples of the sector size (eg. 64KB or 256KB)
To answer your question, you're going to have to copy every 3rd element from your source data into another buffer, and then write that to the stream.
If I recall correctly Intel Performance Primitives has functions for copying buffers, skipping a certain number of elements. Using IPP will probably have faster results than your own copy routine.
I'm tempted to say that you should read your data into a struct and then overload the insertion operator.
ostream& operator<< (ostream& out, struct data * s) {
out.write(s->first);
}