Calculating sum of array with nonblocking operations - c++

Every process need to calculate its partial sums and send them to 0 process, then to count sum of array
I wrote this code
double* a;
a = new double[N];
for (int i = 0; i < N; i++)
a[i] = 1.0;
int k = (N - 1) / proc_count + 1;
int ibeg = proc_this * k;
int iend = (proc_this + 1) * k - 1;
if (ibeg >= N)
iend = ibeg - 1;
else if(iend >= N)
iend = N - 1;
double s = 0;
for (int i = ibeg; i <= iend; i++)
s += a[i];
MPI_Status* stats = new MPI_Status[proc_count];
MPI_Request* reqs = new MPI_Request[proc_count];
double* inmes = new double[proc_count];
inmes[0] = s;
if (proc_this != 0)
MPI_Isend(&s, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &reqs[proc_this]);
else
for (int i = 1; i < proc_count; i++)
MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[0]);
MPI_Waitall(proc_count, reqs, stats);
MPI_Finalize();
if (proc_this == 0) {
for (int i = 1; i<proc_count; i++)
inmes[0] += inmes[i];
printf("sum = %f", inmes[0]);
}
delete[] a;
but it keeps giving an error
Fatal error in PMPI_Waitall: Invalid MPI_Request, error stack:
PMPI_Waitall(274): MPI_Waitall(count=1, req_array=00000212B7B24A40, status_array=00000212B7B34740) failed
PMPI_Waitall(250): The supplied request in array element 0 was invalid (kind=3)
Could you explain what am I doing wrong?

In short, you need to set all elements of reqs to MPI_REQUEST_NULL right after allocating it.
The longer answer is that MPI programs run as multiple instances of one or more source programs and each instance (rank) has its own set of variables that aren't shared. When you have:
if (proc_this != 0)
MPI_Isend(&s, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &reqs[proc_this]);
else
for (int i = 1; i < proc_count; i++)
MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[0]);
you expect that the result will be that reqs will be full of values:
reqs: [ Irecv req | Isend req 1 | Isend req 2 | ... ]
The reality is that you'll have:
reqs in rank 0: [ Irecv req | ??? | ??? | ... ??? ... ]
reqs in rank 1: [ ??? | Isend req | ??? | ... ??? ... ]
reqs in rank 2: [ ??? | ??? | Isend req | ... ??? ... ]
etc.
where ??? stands for uninitialised memory. MPI_Waitall() is a local operation and it only sees the local copy of reqs. It cannot complete requests posted by other ranks.
Uninitialised memory can have any value in it and if that value results in an invalid request handle, MPI_Waitall() will abort with an error. If you set all the requests to MPI_REQUEST_NULL, this will not happen as null requests are ignored.
There is also a semantic error in your code:
MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[0]);
stores all receive requests in the same place with each new request overwriting the previous one. So you will never be able to wait for any request except the last one.
Given that you are calling MPI_Waitall() right after MPI_Isend(), there is no point in using non-blocking sends. A much cleaner version of the code would be:
if (proc_this != 0)
MPI_Send(&s, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
else {
MPI_Request* reqs = new MPI_Request[proc_count - 1];
for (int i = 1; i < proc_count; i++)
MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[i-1]);
MPI_Waitall(proc_count-1, reqs, MPI_STATUSES_IGNORE);
delete [] reqs;
}

Related

MPI Gathering arrays of strings

I'm trying to merge a collection of dictionaries into the root process. Here's a short example:
#define MAX_CF_LENGTH 55
map<string, int> dict;
if (rank == 0)
{
dict = {
{"Accelerator Defective", 33},
{"Aggressive Driving/Road Rage", 27},
{"Alcohol Involvement", 19},
{"Animals Action", 30}};
}
if (rank == 1)
{
dict = {
{"Driver Inexperience", 6},
{"Driverless/Runaway Vehicle", 46},
{"Drugs (Illegal)", 38},
{"Failure to Keep Right", 24}};
}
if (rank == 2)
{
dict = {
{"Lost Consciousness", 1},
{"Obstruction/Debris", 8},
{"Other Electronic Device", 25},
{"Other Lighting Defects", 43},
{"Other Vehicular", 7}};
}
Scatterer scatterer(rank, MPI_COMM_WORLD, num_workers);
scatterer.gatherDictionary(dict, MAX_CF_LENGTH);
The idea inside gatherDictionary() is to put every key in a char array at each process (duplicates are allowed). After that, gathering all keys into the root and creating the final (merged) dictionary before broadcasting it. Here's the code:
void Scatterer::gatherDictionary(map<string,int> &dict, int maxKeyLength)
{
// Calculate destination dictionary size
int numKeys = dict.size();
int totalLength = numKeys * maxKeyLength;
int finalNumKeys = 0;
MPI_Reduce(&numKeys, &finalNumKeys, 1, MPI_INT, MPI_SUM, 0, comm);
// Computing number of elements that are received from each process
int *recvcounts = NULL;
if (rank == 0)
recvcounts = new int[num_workers];
MPI_Gather(&totalLength, 1, MPI_INT, recvcounts, 1, MPI_INT, 0, comm);
// Computing displacement relative to recvbuf at which to place the incoming data from each process
int *displs = NULL;
if (rank == 0)
{
displs = new int[num_workers];
displs[0] = 0;
for (int i = 1; i < num_workers; i++)
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
}
char(*dictKeys)[maxKeyLength];
char(*finalDictKeys)[maxKeyLength];
dictKeys = (char(*)[maxKeyLength])malloc(numKeys * sizeof(*dictKeys));
if (rank == 0)
finalDictKeys = (char(*)[maxKeyLength])malloc(finalNumKeys * sizeof(*finalDictKeys));
// Collect keys for each process
int i = 0;
for (auto pair : dict)
{
strncpy(dictKeys[i], pair.first.c_str(), maxKeyLength);
i++;
}
MPI_Gatherv(dictKeys, totalLength, MPI_CHAR, finalDictKeys, recvcounts, displs, MPI_CHAR, 0, comm);
// Create new dictionary and distribute it to all processes
dict.clear();
if (rank == 0)
{
for (int i = 0; i < finalNumKeys; i++)
dict[finalDictKeys[i]] = dict.size();
}
delete[] dictKeys;
if (rank == 0)
{
delete[] finalDictKeys;
delete[] recvcounts;
delete[] displs;
}
broadcastDictionary(dict, maxKeyLength);
}
I'm sure of broadcastDicitonary() correctness as I've already tested it. Debugging into the gathering function I'm getting the following partial results:
Recvcounts:
220
220
275
Displacements:
0
221
442
FinalDictKeys:
Rank:0 Accelerator Defective
Rank:0 Aggressive Driving/Road Rage
Rank:0 Alcohol Involvement
Rank:0 Animals Action
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Rank:0
Since only root data is being collected I'm wondering if this has something to do with the characters allocation even if it should be contiguous. I don't think this is related to a missing null character at the end since there's already a lot of padding for each string/key.
Thanks in advance for pointing out any missings or improvements and please comment if you need any extra infos.
If you wish to test it yourself I've put in a one-file only the code all together, it is compile&run ready (of course this works with 3 mpi processes). Code Here
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
That + 1 at the end is superfluous. Change it to:
displs[i] = displs[i - 1] + recvcounts[i - 1];

c++ copy array to array

I have taken code from here Webduino Network Setp
I added one more field.
struct config_t
{
....
...
.....
byte subnet[4];
byte dns_server[4];
unsigned int webserverPort;
char HostName[10]; // Added code Here..
} eeprom_config;
Snippet..
#define NAMELEN 5
#define VALUELEN 10
void setupNetHTML(WebServer &server, WebServer::ConnectionType type, char *url_tail, bool tail_complete)
{
URLPARAM_RESULT rc;
char name[NAMELEN];
char value[VALUELEN];
boolean params_present = false;
byte param_number = 0;
char buffer [13];
.....
.....
}
Added Lines to read date from web page and Wire to eeprom
Write to eeprom: ( Facing issue here, I need to copy value to eeprom_config.HostName[0] ... )
// read Host Name
if (param_number >= 25 && param_number <= 35) {
// eeprom_config.HostName[param_number - 25] = strtol(value, NULL, 10);
eeprom_config.HostName[param_number - 25] = value ; // Facing Issue here..
}
and...
for (int a = 0; a < 10; a++) {
server.printP(Form_input_text_start);
server.print(a + 25);
server.printP(Form_input_value);
server.print(eeprom_config.HostName[a]);
server.printP(Form_input_size1);
server.printP(Form_input_end);
}
Issue was resolved.
Thanks , got idea from this post.
invalid conversion from char' tochar*'
How ! changed
// read Host Name
if (param_number >= 25 && param_number <= 35) {
// eeprom_config.HostName[param_number - 25] = strtol(value, NULL, 10);
eeprom_config.HostName[param_number - 25] = value ; // Facing Issue here..
}
changed to
// read Host Name
if (param_number >= 25 && param_number <= 35) {
eeprom_config.HostName[param_number - 25] = value[0];
}

SQLITE_MISUSE in prepared statement

I'm getting a SQLITE_MISUSE error on the following code, and I am wondering if it might be caused by having the table name be a bound parameter? What are some different causes of SQLITE_MISUE?
const char sqlNeuralStateInsert[] =
"INSERT INTO ?1(LAYER_ID, NEURON_ID, INPUT_ID, VALUE)"
"VALUES(?2, ?3, ?4, ?5);";
sqlite3_stmt* stmt1;
rc = sqlite3_prepare_v2(db, sqlNeuralStateInsert, -1, &stmt1, NULL);
if(rc){
//!< Failed to prepare insert statement
}
sqlite3_bind_text(stmt1, 1, this->getNName().c_str(), -1, SQLITE_STATIC);
for(uint32_t i = 0; i < m_nlayers; i++){
sqlite3_bind_int(stmt1, 2, i); // Layer id
for(uint32_t j = 0; j < m_layers[i]->getNeuronCount(); j++){
std::vector<double> weights = m_layers[i]->getWeights(j);
sqlite3_bind_int(stmt1, 3, j); // Neuron id
for(uint32_t k = 0; k < weights.size(); k++){
sqlite3_bind_int(stmt1, 4, k);
sqlite3_bind_double(stmt1, 5, weights[k]);
rc = sqlite3_step(stmt1);
printf("%d\n", rc);
}
}
}
sqlite3_finalize(stmt1);
You're right; you cannot bind the table name:
Generally one cannot use SQL parameters/placeholders for database identifiers (tables, columns, views, schemas, etc.) or database functions (e.g., CURRENT_DATE), but instead only for binding literal values.
You could have trivially tested this hypothesis by hard-coding the table name.

playing created Audio-Data has noise and periodical clicking in sound

I write an application, which plays a sound getting from Hardware (like a ring buffer filled with a sinus wave with certain frequency). Everything works fine, and I can playback the created sound correctly except a periodical clicking (maybe at the end of buffer?) and noise.
I initialize and run the Buffer:
void Audiooutput::InitializeAudioParameters()
{
Audio_DataWritten = 0;
Audio_fragments = 4;
Audio_channels = 2;
Audio_BufferSize = 256;
Audio_Samplerate = 8000;
Audio_ResamplingFactor = 1;
Audio_Framesize = 2;
// (SND_PCM_FORMAT_S16_LE / 8);
Audio_frames = Audio_BufferSize / Audio_Framesize * Audio_fragments;
snd_pcm_uframes_t size;
err = snd_pcm_hw_params_any(pcmPlaybackHandle, hw_params);
err = snd_pcm_hw_params_set_rate_resample(pcmPlaybackHandle, hw_params, 1);
// qDebug()<<a1.sprintf(" % d \t snd_pcm_hw_params_set_rate: %s",Audio_Samplerate,snd_strerror(err));
err =
snd_pcm_hw_params_set_format(pcmPlaybackHandle, hw_params,
SND_PCM_FORMAT_S16_LE);
err =
snd_pcm_hw_params_set_channels(pcmPlaybackHandle, hw_params,
Audio_channels);
err = snd_pcm_hw_params_set_rate_near(pcmPlaybackHandle, hw_params, &Audio_Samplerate, 0);
// qDebug()<<a1.sprintf(" % d \t snd_pcm_hw_params_set_rate: %s",Audio_Samplerate,snd_strerror(err));
if ((err =
snd_pcm_hw_params_set_periods_near(pcmPlaybackHandle, hw_params,
&Audio_fragments, 0)) < 0) {
qDebug() << a1.sprintf("Error setting # fragments to %d: %s\n",
Audio_fragments, snd_strerror(err));
} else
qDebug() << a1.sprintf("setting # fragments to %d: %s\n",
Audio_fragments, snd_strerror(err));
err = snd_pcm_hw_params_get_buffer_size(hw_params, &size);
if ((err =
snd_pcm_hw_params_set_buffer_size_near(pcmPlaybackHandle,
hw_params,
&Audio_frames)) < 0) {
qDebug() << a1.
sprintf("Error setting buffer_size %d frames: %s",
Audio_frames, snd_strerror(err));
} else
qDebug() << a1.sprintf("setting Buffersize to %d --> %d: %s\n",
Audio_BufferSize, Audio_frames,
snd_strerror(err));
Audio_BufferSize = Audio_frames;
if ((err = snd_pcm_hw_params(pcmPlaybackHandle, hw_params)) < 0) {
qDebug() << a1.sprintf("Error setting HW params: %s",
snd_strerror(err));
}
Q_ASSERT(err >= 0);
}
void Audiooutput::ProduceAudioOutput(int n, int mmodes, int totalMModeGates,
short *sinusValue, short *cosinusValue)
{
for (int audioSample = 0; audioSample < n;
audioSample += Audio_ResamplingFactor) {
currentposition =
(int)(m_Audio.generalPos % (Audio_BufferSize / 2));
if (currentposition == 0) {
QueueAudioBuffer();
m_Audio.currentPos = 0;
}
m_Audio.generalPos++;
AudioData[currentposition * 2] =
(short)(sinusValue[audioSample]);
AudioData[currentposition * 2 + 1] =
(short)(cosinusValue[audioSample]);
}
}
void Audiooutput::QueueAudioBuffer()
{
snd_pcm_prepare(pcmPlaybackHandle);
Audio_DataWritten +=
snd_pcm_writei(pcmPlaybackHandle, AudioData, Audio_BufferSize);
}
Changing the audiobuffer size or fragments changes also the clicking period.
Can anyone help me with this issue ?
I checked also the first and Last Values. Thy are always difference.
OS: Ubuntu 11
more detail.
the count of received data is dynamically, and changes depend of different parameters. But I play always a certain part e.g. 128 values or 256 or 512....
// I get the Audiodata from a hardware (in a Timerloop)
audiobuffersize = 256;
short *AudioData = new short[256];
int generalAudioSample = 0;
void CollectDataFromHw()
{
...
int n = 0;
n = GetData(buf1,buf2);//buf1 = new short[MAX_SHRT]
if(n > 0)
FillAudioBuffer(n,buf1,buf2)
...
}
-------------------------------------------
void FillAudioBuffer(int n, short*buf1, short*buf2)
{
for(int audioSample = 0;audioSample < n; audioSample++){
iCurrentAudioSample = (int)(generalAudioSample % (audiobuffersize/2));
if(iCurrentAudioSample == 0) {
snd_pcm_writei(pcmPlaybackHandle,AudioData,audiobuffersize );
memset(AudioData,0x00,audiobuffersize*sizeof(short));
}
generalAudioSample++;
AudioData[iCurrentAudioSample * 2] = (short)(buf1[audioSample];
AudioData[iCurrentAudioSample * 2 +1] = (short)(buf2[audioSample];
}
}
I changed the audiobuffersize also. If I set it to a bigger size, I have some Echo additional to clicks.
any Idea ?
//-----------------------
the Problem is
snd_pcm_prepare(pcmPlaybackHandle);
every call of this function produce a click in sound !
Can't test the source code, but I think that the high-frequency clicks you hear are discontinuities in the sound wave. You have to assure that looping period (or, buffer size) is multiple of wave period.
Check if first and last value of buffer are almost the same (+/- 1, for example). Their distance determines the amplitude of the unwanted click.
solved
buffer has been played several times before it was filled with the data.
stupid error in the code.missing a parantez --> audio_buffersize/2 <--
and therefore the result was very often if(iCurrentAudioSample == 0) true !!!!!
iCurrentAudioSample = (int)(generalAudioSample % (audio_buffersize/2));
if(iCurrentAudioSample == 0)
{
writetoaudioStream(audiobuffer);
}

What am I doing wrong? (multithreading)

Here s what I'm doing in a nutshell.
In my class's cpp file I have:
std::vector<std::vector<GLdouble>> ThreadPts[4];
The thread proc looks like this:
unsigned __stdcall BezierThreadProc(void *arg)
{
SHAPETHREADDATA *data = (SHAPETHREADDATA *) arg;
OGLSHAPE *obj = reinterpret_cast<OGLSHAPE*>(data->objectptr);
for(unsigned int i = data->start; i < data->end - 1; ++i)
{
obj->SetCubicBezier(
obj->Contour[data->contournum].UserPoints[i],
obj->Contour[data->contournum].UserPoints[i + 1],
data->whichVector);
}
_endthreadex( 0 );
return 0;
}
SetCubicBezier looks like this:
void OGLSHAPE::SetCubicBezier(USERFPOINT &a,USERFPOINT &b, int &currentvector )
{
std::vector<GLdouble> temp;
if(a.RightHandle.x == a.UserPoint.x && a.RightHandle.y == a.UserPoint.y
&& b.LeftHandle.x == b.UserPoint.x && b.LeftHandle.y == b.UserPoint.y )
{
temp.clear();
temp.push_back((GLdouble)a.UserPoint.x);
temp.push_back((GLdouble)a.UserPoint.y);
ThreadPts[currentvector].push_back(temp);
temp.clear();
temp.push_back((GLdouble)b.UserPoint.x);
temp.push_back((GLdouble)b.UserPoint.y);
ThreadPts[currentvector].push_back(temp);
}
}
The code that calls the threads looks like this:
for(int i = 0; i < Contour.size(); ++i)
{
Contour[i].DrawingPoints.clear();
if(Contour[i].UserPoints.size() < 2)
{
break;
}
HANDLE hThread[4];
SHAPETHREADDATA dat;
dat.objectptr = (void*)this;
dat.start = 0;
dat.end = floor((Contour[i].UserPoints.size() - 1) * 0.25);
dat.whichVector = 0;
dat.contournum = i;
hThread[0] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
dat.start = dat.end;
dat.end = floor((Contour[i].UserPoints.size() - 1) * 0.5);
dat.whichVector = 1;
hThread[1] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
dat.start = dat.end;
dat.end = floor((Contour[i].UserPoints.size() - 1) * 0.75);
dat.whichVector = 2;
hThread[2] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
dat.start = dat.end;
dat.end = Contour[i].UserPoints.size();
dat.whichVector = 3;
hThread[3] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
WaitForMultipleObjects(4,hThread,true,INFINITE);
}
Is there something wrong with this?
I'd expect it to fill ThreadPts[4]; ... There should never be any conflicts the way I have it set up. I usually get error writing at... on the last thread where dat->whichvector = 3. If I remove:
dat.start = dat.end;
dat.end = Contour[i].UserPoints.size();
dat.whichVector = 3;
hThread[3] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
Then it does not seem to crash, what could be wrong?
Thanks
The problem is that you're passing the same dat structure to each thread as the argument to the threadproc.
For example, When you start thread 1, there's no guarantee that it will have read the information in the dat structure before your main thread starts loading that same dat structure with the information for thread 2 (and so on). In fact, you're constantly directly using that dat structure throughout the thread's loop, so the thread won't be finished with the structure passed to it until the thread is basically done with all its work.
Also note that currentvector in SetCubicBezier() is a reference to data->whichVector, which is referring to the exact same location in a threads. So SetCubicBezier() will be performing push_back() calls on the same object in separate threads because of this.
There's a very simple fix: you should use four separate SHAPETHREADDATA instances - one to initialize each thread.