Allreduce with user defined function and MPI_BOTTOM

Allreduce with user defined function and MPI_BOTTOM - c++

Consider the following program that is supposed to do some stupid addition of doubles:
#include <iostream>
#include <vector>
#include <mpi.h>
void add(void* invec, void* inoutvec, int* len, MPI_Datatype*)
{
double* a = reinterpret_cast <double*> (inoutvec);
double* b = reinterpret_cast <double*> (invec);
for (int i = 0; i != *len; ++i)
{
a[i] += b[i];
}
}
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
std::vector<double> buffer = { 2.0, 3.0 };
MPI_Op operation;
MPI_Op_create(add, 1, &operation);
MPI_Datatype types[1];
MPI_Aint addresses[1];
int lengths[1];
int count = 1;
MPI_Get_address(buffer.data(), &addresses[0]);
lengths[0] = buffer.size();
types[0] = MPI_DOUBLE;
MPI_Datatype type;
MPI_Type_create_struct(count, lengths, addresses, types, &type);
MPI_Type_commit(&type);
MPI_Allreduce(MPI_IN_PLACE, MPI_BOTTOM, 1, type, operation, MPI_COMM_WORLD);
MPI_Type_free(&type);
MPI_Op_free(&operation);
MPI_Finalize();
std::cout << buffer[0] << " " << buffer[1] << "\n";
}
Because this is part of larger program where the data I want to send is 1) on the heap and 2) consists of different types I think I have to use a user-defined type.
Now something must be wrong here because the program crashes when run with mpirun -n 2 ./a.out. The backtrace from gdb is:
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:158
#1 0x00007ffff65de460 in non_overlap_copy_content_same_ddt () from /usr/local/lib/libopen-pal.so.6
#2 0x00007ffff180a69b in ompi_coll_tuned_allreduce_intra_recursivedoubling () from /usr/local/lib/openmpi/mca_coll_tuned.so
#3 0x00007ffff793bb8b in PMPI_Allreduce () from /usr/local/lib/libmpi.so.1
#4 0x00000000004088b6 in main (argc=1, argv=0x7fffffffd708) at mpi_test.cpp:39
Line 39 is the MPI_Allreduce call. This is probably a dumb mistake, but after staring on it for hours I still don't see it. Does anyone spot the mistake? Thanks!

Edit: There is a bug in how Open MPI handles types with non-zero lower bounds (such as the one that you create when using absolute addresses) while performing in-place reduce-to-all. It seems to exist in all versions, including the development branch. The status can be tracked by following the issue on GitHub.
Your add operator is wrong as you fail to account for the datatype's lower bound. A proper solution would be something like:
void add(void* invec, void* inoutvec, int* len, MPI_Datatype* datatype)
{
MPI_Aint lb, extent;
MPI_Type_get_true_extent(*datatype, &lb, &extent);
double* a = reinterpret_cast <double*> (reinterpret_cast <char*>(inoutvec) + lb);
double* b = reinterpret_cast <double*> (reinterpret_cast <char*>(invec) + lb);
for (int i = 0; i != *len; ++i)
{
a[i] += b[i];
}
}
This will access the data correctly but is still wrong. *len will be 1 as that is what you pass to MPI_Allreduce but there are two doubles behind each element. The correctly written operator will either use the type introspection mechanism to obtain the length of the block of doubles and multiply *len by it or simply hardcode the vector length to be two:
for (int i = 0; i < 2*(*len); i++)
{
a[i] += b[i];
}

Related

Get float values back from byte buffer?

I am learning C++ at the moment and currently I am experimenting with pointers and structures. In the following code, I am copying vector A into a buffer of size 100 bytes. Afterwards I copy vector B into the same buffer with an offset, so that the vectors are right next to each other in the buffer. Afterward, I want to find the vectors in the buffer again and calculate the dot product between the vectors.
#include <iostream>
const short SIZE = 5;
typedef struct vector {
float vals[SIZE];
} vector;
void vector_copy (vector* v, vector* target) {
for (int i=0; i<SIZE; i++) {
target->vals[i] = v->vals[i];
}
}
float buffered_vector_product (char buffer[]) {
float scalar_product = 0;
int offset = SIZE * 4;
for (int i=0; i<SIZE; i=i+4) {
scalar_product += buffer[i] * buffer[i+offset];
}
return scalar_product;
}
int main() {
char buffer[100] = {};
vector A = {{1, 1.5, 2, 2.5, 3}};
vector B = {{0.5, -1, 1.5, -2, 2.5}};
vector_copy(&A, (vector*) buffer);
vector_copy(&B, (vector*) (buffer + sizeof(vector)));
float prod = buffered_vector_product(buffer);
std::cout << prod <<std::endl;
return 0;
}
Unfortunately this doesn't work yet. The problem lies within the function buffered_vector_product. I am unable to get the float values back from the buffer. Each float value should need 4 bytes. I don't know, how to access these 4 bytes and convert them into a float value. Can anyone help me out? Thanks a lot!

In the function buffered_vector_product, change the lines
int offset = SIZE * 4;
for (int i=0; i<SIZE; i=i+4) {
scalar_product += buffer[i] * buffer[i+offset];
}
to
for ( int i=0; i<SIZE; i++ ) {
scalar_product += ((float*)buffer)[i] * ((float*)buffer)[i+SIZE];
}
If you want to calculate the offsets manually, you can instead replace it with the following:
size_t offset = SIZE * sizeof(float);
for ( int i=0; i<SIZE; i++ ) {
scalar_product += *(float*)(buffer+i*sizeof(float)) * *(float*)(buffer+i*sizeof(float)+offset);
}
However, with both solutions, you should beware of both the alignment restrictions and the strict aliasing rule.
The problem with the alignment restrictions can be solved by changing the line
char buffer[100] = {};
to the following:
alignas(float) char buffer[100] = {};
The strict aliasing rule is a much more complex issue, because the exact rule has changed significantly between different C++ standards and is (or at least was) different from the strict aliasing rule in the C language. See the link in the comments section for further information on this issue.

Cuda passing char** to kernel

I am having a spot of bother with this basic CUDA code.
I have a char** which is a flat 2d array of passwords, my current implementation is for CUDA simply to iterate through this list and display the passwords. However, when I go to display them I simply get "(NULL)". I'm not quite sure why this is. Can someone explain what it happening?
Main:
char ** pwdAry;
pwdAry = new char *[numberOfPwd];
//pwdAry given some values (flat 2d array layout)
const int pwdArySize = sizeof(pwdAry);
dim3 grid(gridSize,gridSize);
dim3 block(blockSize,blockSize);
searchKeywordKernel << <grid, block >> >(pwdAry);
return EXIT_SUCCESS;
Cuda:
__global__ void searchKeywordKernel(char **passwordList)
{
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int pitch = blockDim.x * gridDim.x;
int idx = x + y * pitch;
int tidy = idx / pitch;
int tidx = idx - (pitch * tidy);
int bidx = tidx / blockDim.x;
int bidy = tidy / blockDim.y;
int currentThread = threadIdx.x + blockDim.x * threadIdx.y;
printf("hi, i am thread: %i, and my block x: %i, and y: %i\n", currentThread, bidx, bidy);
printf("My password is: %s\n", passwordList[currentThread]);
}

Based on discussion in the comments, here is an example code that roughly follows the code in the question, using 3 different methods:
Use a "flattened" array. This is the traditional advice for beginners who are asking about how to handle a double pointer array (char **, or any other type), or any data structure that contains embedded pointers. The basic idea is to create a single pointer array of the same type (e.g. char *), and copy all the data to that array, end-to-end. In this case, since the array elements are of variable length, we also need to pass an array containing the starting indices of each string (in this case).
Use a direct double-pointer method. I consider this code difficult to write. It may also have performance implications. The canonical example is here, and a stepwise description of what is required algorithmically is here and/or here is a 3D (i.e. triple-pointer) worked example with method description (yuck!). This is fundamentally doing a deep-copy in CUDA, and I consider it somewhat more difficult than typical CUDA coding.
Use the managed memory subsystem, that is available in CUDA platforms that support it. Coding-wise, this is probably simpler than either of the above 2 approaches.
Here is a worked example of all 3 methods:
$ cat t1035.cu
#include <stdio.h>
#include <string.h>
#define nTPB 256
__global__ void kern_1D(char *data, unsigned *indices, unsigned num_strings){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < num_strings)
printf("Hello from thread %d, my string is %s\n", idx, data+indices[idx]);
}
__global__ void kern_2D(char **data, unsigned num_strings){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < num_strings)
printf("Hello from thread %d, my string is %s\n", idx, data[idx]);
}
int main(){
const int num_strings = 3;
const char s0[] = "s1\0";
const char s1[] = "s2\0";
const char s2[] = "s3\0";
int ds[num_strings];
ds[0] = sizeof(s0)/sizeof(char);
ds[1] = sizeof(s1)/sizeof(char);
ds[2] = sizeof(s2)/sizeof(char);
// pretend we have a dynamically allocated char** array
char **data;
data = (char **)malloc(num_strings*sizeof(char *));
data[0] = (char *)malloc(ds[0]*sizeof(char));
data[1] = (char *)malloc(ds[1]*sizeof(char));
data[2] = (char *)malloc(ds[2]*sizeof(char));
// initialize said array
strcpy(data[0], s0);
strcpy(data[1], s1);
strcpy(data[2], s2);
// method 1: "flattening"
char *fdata = (char *)malloc((ds[0]+ds[1]+ds[2])*sizeof(char));
unsigned *ind = (unsigned *)malloc(num_strings*sizeof(unsigned));
unsigned next = 0;
for (int i = 0; i < num_strings; i++){
strcpy(fdata+next, data[i]);
ind[i] = next;
next += ds[i];}
//copy to device
char *d_fdata;
unsigned *d_ind;
cudaMalloc(&d_fdata, next*sizeof(char));
cudaMalloc(&d_ind, num_strings*sizeof(unsigned));
cudaMemcpy(d_fdata, fdata, next*sizeof(char), cudaMemcpyHostToDevice);
cudaMemcpy(d_ind, ind, num_strings*sizeof(unsigned), cudaMemcpyHostToDevice);
printf("method 1:\n");
kern_1D<<<(num_strings+nTPB-1)/nTPB, nTPB>>>(d_fdata, d_ind, num_strings);
cudaDeviceSynchronize();
//method 2: "2D" (pointer-to-pointer) array
char **d_data;
cudaMalloc(&d_data, num_strings*sizeof(char *));
char **d_temp_data;
d_temp_data = (char **)malloc(num_strings*sizeof(char *));
for (int i = 0; i < num_strings; i++){
cudaMalloc(&(d_temp_data[i]), ds[i]*sizeof(char));
cudaMemcpy(d_temp_data[i], data[i], ds[i]*sizeof(char), cudaMemcpyHostToDevice);
cudaMemcpy(d_data+i, &(d_temp_data[i]), sizeof(char *), cudaMemcpyHostToDevice);}
printf("method 2:\n");
kern_2D<<<(num_strings+nTPB-1)/nTPB, nTPB>>>(d_data, num_strings);
cudaDeviceSynchronize();
// method 3: managed allocations
// start over with a managed char** array
char **m_data;
cudaMallocManaged(&m_data, num_strings*sizeof(char *));
cudaMallocManaged(&(m_data[0]), ds[0]*sizeof(char));
cudaMallocManaged(&(m_data[1]), ds[1]*sizeof(char));
cudaMallocManaged(&(m_data[2]), ds[2]*sizeof(char));
// initialize said array
strcpy(m_data[0], s0);
strcpy(m_data[1], s1);
strcpy(m_data[2], s2);
// call kernel directly on managed data
printf("method 3:\n");
kern_2D<<<(num_strings+nTPB-1)/nTPB, nTPB>>>(m_data, num_strings);
cudaDeviceSynchronize();
return 0;
}
$ nvcc -arch=sm_35 -o t1035 t1035.cu
$ cuda-memcheck ./t1035
========= CUDA-MEMCHECK
method 1:
Hello from thread 0, my string is s1
Hello from thread 1, my string is s2
Hello from thread 2, my string is s3
method 2:
Hello from thread 0, my string is s1
Hello from thread 1, my string is s2
Hello from thread 2, my string is s3
method 3:
Hello from thread 0, my string is s1
Hello from thread 1, my string is s2
Hello from thread 2, my string is s3
========= ERROR SUMMARY: 0 errors
$
Notes:
I suggest running this code with cuda-memcheck if you are just testing it out for the first time. I have omitted proper cuda error checking for brevity of presentation, but I recommend it any time you are having trouble with a CUDA code. Proper execution of this code depends on having a managed memory subsystem available (read the doc links I have provided). If your platform does not support it, running this code as-is will probably result in a seg fault, because I have not included proper error checking.
Copying a double-pointer array from device to host, although not explicitly covered in this example, is essentially the reverse of the steps for each of the 3 methods. For method 1, a single cudaMemcpy call can do it. For method 2, it requires a for-loop that reverses the steps to copy to the device (including the use of the temp pointers). For method 3, nothing at all is required, other than proper adherence to managed memory coding practices, such as use of cudaDeviceSynchronize() after a kernel call, before attempting to access the device from host code again.
I don't wish to argue about whether or not methods 1 and 3 explicitly adhere to the letter of the question in terms of providing a method to pass a char ** array to a CUDA kernel. If your focus is that narrow, then please use method 2, or else disregard this answer entirely.
EDIT: Based on a question in the comments below, here is the above code modified with a different initialization sequence for the host-side strings (at line 42). There are now compilation warnings, but those warnings arise from the code specifically requested to be used by OP:
$ cat t1036.cu
#include <stdio.h>
#include <string.h>
#define nTPB 256
__global__ void kern_1D(char *data, unsigned *indices, unsigned num_strings){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < num_strings)
printf("Hello from thread %d, my string is %s\n", idx, data+indices[idx]);
}
__global__ void kern_2D(char **data, unsigned num_strings){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < num_strings)
printf("Hello from thread %d, my string is %s\n", idx, data[idx]);
}
int main(){
const int num_strings = 3;
#if 0
const char s0[] = "s1\0";
const char s1[] = "s2\0";
const char s2[] = "s3\0";
int ds[num_strings];
ds[0] = sizeof(s0)/sizeof(char);
ds[1] = sizeof(s1)/sizeof(char);
ds[2] = sizeof(s2)/sizeof(char);
// pretend we have a dynamically allocated char** array
char **data;
data = (char **)malloc(num_strings*sizeof(char *));
data[0] = (char *)malloc(ds[0]*sizeof(char));
data[1] = (char *)malloc(ds[1]*sizeof(char));
data[2] = (char *)malloc(ds[2]*sizeof(char));
// initialize said array
strcpy(data[0], s0);
strcpy(data[1], s1);
strcpy(data[2], s2);
#endif
char ** pwdAry; pwdAry = new char *[num_strings]; for (int a = 0; a < num_strings; a++) { pwdAry[a] = new char[1024]; } for (int a = 0; a < 3; a++) { pwdAry[a] = "hello\0"; }
// method 1: "flattening"
char *fdata = (char *)malloc((1024*num_strings)*sizeof(char));
unsigned *ind = (unsigned *)malloc(num_strings*sizeof(unsigned));
unsigned next = 0;
for (int i = 0; i < num_strings; i++){
memcpy(fdata+next, pwdAry[i], 1024);
ind[i] = next;
next += 1024;}
//copy to device
char *d_fdata;
unsigned *d_ind;
cudaMalloc(&d_fdata, next*sizeof(char));
cudaMalloc(&d_ind, num_strings*sizeof(unsigned));
cudaMemcpy(d_fdata, fdata, next*sizeof(char), cudaMemcpyHostToDevice);
cudaMemcpy(d_ind, ind, num_strings*sizeof(unsigned), cudaMemcpyHostToDevice);
printf("method 1:\n");
kern_1D<<<(num_strings+nTPB-1)/nTPB, nTPB>>>(d_fdata, d_ind, num_strings);
cudaDeviceSynchronize();
//method 2: "2D" (pointer-to-pointer) array
char **d_data;
cudaMalloc(&d_data, num_strings*sizeof(char *));
char **d_temp_data;
d_temp_data = (char **)malloc(num_strings*sizeof(char *));
for (int i = 0; i < num_strings; i++){
cudaMalloc(&(d_temp_data[i]), 1024*sizeof(char));
cudaMemcpy(d_temp_data[i], pwdAry[i], 1024*sizeof(char), cudaMemcpyHostToDevice);
cudaMemcpy(d_data+i, &(d_temp_data[i]), sizeof(char *), cudaMemcpyHostToDevice);}
printf("method 2:\n");
kern_2D<<<(num_strings+nTPB-1)/nTPB, nTPB>>>(d_data, num_strings);
cudaDeviceSynchronize();
// method 3: managed allocations
// start over with a managed char** array
char **m_data;
cudaMallocManaged(&m_data, num_strings*sizeof(char *));
cudaMallocManaged(&(m_data[0]), 1024*sizeof(char));
cudaMallocManaged(&(m_data[1]), 1024*sizeof(char));
cudaMallocManaged(&(m_data[2]), 1024*sizeof(char));
// initialize said array
for (int i = 0; i < num_strings; i++)
memcpy(m_data[i], pwdAry[i], 1024);
// call kernel directly on managed data
printf("method 3:\n");
kern_2D<<<(num_strings+nTPB-1)/nTPB, nTPB>>>(m_data, num_strings);
cudaDeviceSynchronize();
return 0;
}
$ nvcc -arch=sm_35 -o t1036 t1036.cu
t1036.cu(42): warning: conversion from a string literal to "char *" is deprecated
t1036.cu(42): warning: conversion from a string literal to "char *" is deprecated
$ cuda-memcheck ./t1036
========= CUDA-MEMCHECK
method 1:
Hello from thread 0, my string is hello
Hello from thread 1, my string is hello
Hello from thread 2, my string is hello
method 2:
Hello from thread 0, my string is hello
Hello from thread 1, my string is hello
Hello from thread 2, my string is hello
method 3:
Hello from thread 0, my string is hello
Hello from thread 1, my string is hello
Hello from thread 2, my string is hello
========= ERROR SUMMARY: 0 errors
$

why this binary conversion does not work?

#include<stdio.h>
#include<conio.h>
unsigned * bin(unsigned n) {
unsigned a[16];
int i = 0, j = 0;
for (i = 0; i < 16; i++) {
a[i] = n & 0x1;
n = n >> 1;
}
return a;
}
void main() {
unsigned n = 5;
int i = 0;
unsigned * a = bin(n);
for (i = 15; i >= 0; i--) {
printf("%d\n", (*(a + i)));
}
getch();
}
Please help this binary conversion does not work. I'm trying to calculate x^n using binary conversion.
can anybode help??

You are returning a pointer to a local variable. This variable is stored on the stack, and will not be valid after the function returns.
Dereferencing this pointer will lead to undefined behavior.
The solution is to either make the variable static, or pass in the array as an argument to the function, or (as noted in a comment by James Kanze) use a type that copies the contents.

you can not return a local array defined in the function in this way.
The content of the array will be erased when the function finish the execution.
instead of using
unsigned a[16];
you can use the following:
unsigned *a =malloc(16 * (sizeof *a));
And do not forget in your main to free the memory allocated for a when the a array become useless in your program. you can free the array with:
free(a);

Actually, this is a typical case where using new (or malloc) is a pretty bad choice. However, as others have said, returning a pointer to a local array is bad.
Instead, pass in an array:
void bin(unsigned n, unsigned a[]) {
int i = 0;
for (i = 0; i < 16; i++) {
a[i] = n & 0x1;
n = n >> 1;
}
}
and in main:
unsigned a[16];
bin(n, a);
Now you have no need to allocate or return an array from bin.

Segmentation fault when printing copy of array

The following program segfaults when v2 is printed but not during the array copy. Does anyone know why?
#include <stdio.h>
#include <stdlib.h>
void cpyarray (void* dst, void* src, size_t memsize, size_t size) {
for (size_t i = 0; i < size; i++) {
*(((char*) dst) + i*memsize) = *(((char*) src) + i*memsize);
}
}
int main () {
size_t N = 10;
double* v1 = (double *) malloc(N * sizeof(double));
double* v2 = (double *) malloc(N * sizeof(double));
for (size_t i = 0; i < N; i++) *(v1+i) = i;
printf("\n\nv1: ");
for (size_t i = 0; i < N; i++) printf("%g ", v1[i]);
cpyarray(&v2, &v1, sizeof(double), N);
printf("\n\nv2: ");
for (size_t i = 0; i < N; i++) printf("%g ", v2[i]); // program crashes here
return 0;
}
EDIT: the code does not crash if I copy arrays of ints instead of doubles.

v1, v2 are pointers to memory blocks that you want to operate on. But you're passing their addresses to your cpyarray function.
So you're operating on the wrong memory blocks, stepping on memory around the v2 variable, and changing what v2 points to.
cpyarray(v2, v1, sizeof(double), N);

You are passing &v2 to cpyarray(), which means that your function is changing where v2 is pointing! Since it's now pointing to an invalid location, you get a seg fault when you dereference it for printf().
Instead, just pass v2 to cpyarray().

You are passing wrong pointers to cpyarray function.
Your code sends pointer to array (pointer to pointer to double)
cpyarray(&v2, &v1, sizeof(double), N);
You should send array instead (pointer to double)
cpyarray(v2, v1, sizeof(double), N);

In addition to the problem that the current answers have pointed out, cpyarray doesn't copy the array, just some parts of it. Instead of trying to imitate types with i*memsize, just treat the memory block as a bunch of bytes and copy them one by one. Now all you have to do is figure out how many bytes to copy, which isn't hard.

Help Needed!! Compiling C++ program - errors

This is a routine that I believe is for C. I copied it (legally) out a book and am trying to get it to compile and run in visual studio 2008. I would like to keep it as a C++ program. Lots of programming experience in IBM mainframe assembler but none in C++. Your help is greatly appreciated. I think just a couple of simple changes but I have read tutorials and beat on this for hours - getting nowhere. Getting a lot (4) of error C2440 '=' : cannot convert form 'void*' to to 'int*' errors in statements past the reedsolomon function: Thanks so much! Program follows:
#include <iostream>
using namespace std;
int wd[50] = {131,153,175,231,5,184,89,239,149,29,181,153,175,191,153,175,191,159,231,3,127,44,12,164,59,209,104,254,150,45};
int nd = 30, nc=20, i, j, k, *log, *alog, *c, gf=256, pp=301;
/* The following is routine which calculates the error correction codewords
for a given data codeword string of length "nd", stored as an integer array wd[].
The function ReedSolomon()first generates log and antilog tables for the Galois
Field of size "gf" (in the case of ECC 200, 28) with prime modulus "pp"
(in the case of ECC 200, 301), then uses them in the function prod(), first to
calculate coefficients of the generator polynomial of order "nc" and then to
calculate "nc" additional check codewords which are appended to the data in wd[].*/
/* "prod(x,y,log,alog,gf)" returns the product "x" times "y" */
int prod(int x, int y, int *log, int *alog, int gf)
{if (!x || !y)
return 0;
else
return alog[(log[x] + log[y]) % (gf-1)];
}
/* "ReedSolomon(wd,nd,nc,gf.pp)" takes "nd" data codeword values in wd[] */
/* and adds on "nc" check codewords, all within GF(gf) where "gf" is a */
/* power of 2 and "pp" is the value of its prime modulus polynomial */
void ReedSolomon(int *wd, int nd, int nc, int gf, int pp)
{int i, j, k, *log,*alog,*c;
/* allocate, then generate the log & antilog arrays: */
log = malloc(sizeof(int) * gf);
alog = malloc(sizeof(int) * gf);
log[0] = 1-gf; alog[0] = 1;
for (i = 1; i < gf; i++)
{alog[i] = alog[i-1] * 2;
if (alog[i] >= gf) alog[i] ^= pp;
log[alog[i]] = i;
}
/* allocate, then generate the generator polynomial coefficients: */
c = malloc(sizeof(int) * (nc+1));
for (i=1; i<=nc; i++) c[i] = 0; c[0] = 1;
for (i=1; i<=nc; i++)
{c[i] = c[i-1];
for (j=i-1; j>=1; j--)
{c[j] = c[j-1] ^ prod(c[j],alog[i],log,alog,gf);
}
c[0] = prod(c[0],alog[i],log,alog,gf);
}
/* clear, then generate "nc" checkwords in the array wd[] : */
for (i=nd; i<=(nd+nc); i++) wd[i] = 0;
for (i=0; i<nd; i++)
{k = wd[nd] ^ wd[i] ;
for (j=0; j<nc; j++)
{wd[nd+j] = wd[nd+j+1] ^ prod(k,c[nc-j-1],log, alog,gf);
}
}
free(c);
free(alog);
free(log);
return ();
}
int main ()
{reedsolomon (50,30,20,256,301);
for (i = 1; i < 51; i++)
{cout<< i; "="; wd[i];}
cout<<"HEY, you, I'm alive! Oh, and Hello World!\n";
cin.get();
return 1;
}

In C++, a void pointer can't be implicitly cast to a different pointer.
So instead of
int *pInt;
pInt = malloc(sizeof(int) * 5);
You need to say
int *pInt;
pInt = (int *) malloc(sizeof(int) * 5);
or preferably
int *pInt = new int[5];
(with a matching delete[] instead of free), or preferably preferably use a vector if it's intended to be dynamic.

At the beginning of the program type: #include <cstdlib> . If you do not include this library, then malloc will not work. In C++, void* to int* is not an automatic conversion, in lines: 31 32 and 40 you need to cast to int* e.g: log = (int *)malloc(sizeof(int) * gf);
At main funcion, line 63 you're calling the function as reedsolomon, it should be ReedSolomon, the way you declared it.
Also, in "void ReedSolomon(int *wd, int nd, int nc, int gf, int pp)" when you call the function in the main, you say ReedSolomon (50,30,20,256,301); so you are asigning an int value to a pointer to int, that's a type clash. I'm not sure what it is you want to do with wd.
Next time, please post the errors from the compiler so people dont have to compile the code themselves to check and see whats wrong.
Also a good technique which will save you a lot of time is to do a google search on the error the compiler gives you (it is very likely somebody already had that same mistake), and also read a C++ book to get acquainted with the language.
Cheers!

C++ requires that you cast the return value of malloc to whatever type of pointer you're assigning it to. So e.g. log = malloc(sizeof(int) * gf); needs to become log = (int *) malloc(sizeof(int) * gf);.

You should type cast when assigning a pointer to the return of malloc.
Example:
log = reinterpret_cast<int*>(malloc(sizeof(int) * gf));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Allreduce with user defined function and MPI_BOTTOM - c++

Related

Get float values back from byte buffer?

Cuda passing char** to kernel

why this binary conversion does not work?

Segmentation fault when printing copy of array

Help Needed!! Compiling C++ program - errors

Categories

Resources