How to use struct in OpenCL - c++

How can I use my custom struct in OpenCL? Because there are no array of objects in OpenCL, or 2D array beside image.
struct Block {
char item[4][4];
};
I would like to use array of these structs in OpenCL and access its elements by indices like in C/C++.
For example
Block *keys = new Block[11];
keys[3].item[2][2];
Let me explane. I am working on implementing AES-128 ECB in OpenCL.
Here is AES description.
These structs(blocks) I used for dividing plaintext into blocks 4x4 bytes. This array of 11 blocks is 11 keys for each round.
I did same thing with plaintext. For example, plaintext of 67bytes is divided into 5 blocks. In C this is working very well in sequential execution (key scheduling, subbytes, shift rows, mixcolumns, addround) encryption and decryption.
But problem now is that is not simple like that in OpenCL. How can I use array of these blocks in OpenCL? Or do I need to transform everything into 1D array of char (for example)?

In OpenCL C and OpenCL C++ you can't dynamically allocate memory in a kernel --
no malloc, only placement new, etc. You can indeed make 2D arrays like char item[4][4] and declare structs like Block in your kernels. But since you can't allocate memory if you must have a dynamically sized array then you can do the following things:
Declare an array (of whatever dimension) of automatic storage duration that is sufficiently large for your use. For example declare char item[100][100] if you know you won't need more than a 100x100 array.
Create a buffer on the host with clCreateBuffer and pass it in as a kernel argument.
If you want to build an array of structs on your host and then pass it in to your kernel as a buffer, you can do that too! But you must declare the structs separately in your host source and kernel source, and pay attention to size and alignment characteristics, and byte order. It's on you to make sure bits you pass in from the host are interpreted correctly on your device.
Edit:
To understand the layout of a struct take a look at this question: C struct memory layout?
OpenCL C is based on C, so you can expect the layout of structs to follow the same rules. The sizes of primitive types may differ on your host and device, but OpenCL defines several typedefs like cl_int that you should probably use when declaring a struct on your host to make sure it has the same size as on your device. For example, the size of cl_int on your host will be the same as the size of int on your device.
You can determine the endianness of your device with clGetDeviceInfo using the param_name CL_DEVICE_ENDIAN_LITTLE.

Related

How to encode an array of structs to store it in memory?

I have a controller and I need to store in flash memory an array of structs, which can have 2000+ items.
What is the best way to encode the array and save it in the flash memory?
Is this good enough?
char b[sizeof(array)];
memcpy(b, &array, sizeof(array));
Thanks
char b[sizeof(array)];
memcpy(b, &array, sizeof(array));
This will give you a bit-by-bit identical copy of the array. If you then write that to flash using whatever interface you have to the flash then you might as well just cast the original array to std::byte and write it directly.
The problem comes when the objects in the array are not trivially copyable. Anything with pointers (like std::string for example) will not work. You would have to serialize before writing.
You might actually want to do that even with just fundamental types in a struct because the struct might contain padding. The size of the struct might be larger than the size of the data it contains. With 2000+ objects in the array even saving 1 byte would save you 2000 byte of flash every time your write. You could also use variable length encoding if you for example have 4 bytes ints that are mostly small numbers.

Char*** in OpenCL kernel argument?

I need to pass a vector<vector<string>> to a kernel OpenCL. What is the easiest way of doing it? Passing a char*** gives me an error:
__kernel void vadd(
__global char*** sets,
__global int* m,
__global long* result)
{}
ERROR: clBuildProgram(CL_BUILD_PROGRAM_FAILURE)
In OpenCL 1.x, this sort of thing is basically not possible. You'll need to convert your data such that it fits into a single buffer object, or at least into a fixed number of buffer objects. Pointers on the host don't make sense on the device. (With OpenCL 2's SVM feature, you can pass pointer values between host and kernel code, but you'll still need to ensure the memory is allocated in a way that's appropriate for this.)
One option I can think of, bearing in mind I know nothing about the rest of your program, is as follows:
Create an OpenCL buffer for all the strings. Add up the number of bytes required for all your strings. (possibly including nul termination, depending on what you're trying to do)
Create a buffer for looking up string start offsets (and possibly length). Looks like you have 2 dimensions of lookup (nested vectors) so how you lay this out will depend on whether your inner vectors (second dimension) are all the same size or not.
Write your strings back-to-back into the first buffer, recording start offsets (and length, if necessary) in the second buffer.

Can I alias an array of structs to an array of struct members?

I am wondering if it is possible to create/copy a "virtual" array of a specific member of a struct in another array. Let's say we have a struct
struct foo {
int value;
char character;
};
Now assume there is an array containing this struct foo and I have an operation that needs to add all int value's together. This would normally be very easy with a loop adding all the values with a pointer. Problem is I am using OpenCL and need to copy an array to some device. In OpenCL this is done using
clEnqueueWriteBuffer(cmdQueue, buffer, CL_TRUE, 0, datasize, A, 0, NULL, NULL);
which will copy an array buffer to the device. It doesn't make sense to copy the entire array of structs, since the would take more time, because it also sends the characters which is not needed. It would also take up more space on the OpenCL device. Is it therefore possible to copy the "array" of values from the structs directly as an array to the device?
I know I can create a new array on the host (CPU) with all the values and then copy that array to the OpenCL device, but then I would spend time copying to a local int-array and afterwards copy that array to the OpenCL device.
Would it be possible to copy a "virtual" array of values directly from the array of foo-structs, containing only the int values?
Please beware, that this is a very simplified example of my actual problem and would like to avoid having the values in a separate array from the beginning, which the structs would then point to. I have big doubts that this is possible, and if my explanation even makes sense, but look forward to feedback!
No.
clEnqueueWriteBuffer expects a contiguous container. You cannot create a "virtual" contiguous container.
[I] like to avoid having the values in a separate array from the beginning.
At that point, you must profile and compare two implementations: one copying the array as-is with the superfluous data, and one creating a local copy of the useful data to send. Compare and choose.
If you have an array of structs you would need a staging buffer with just the values, which is extra copies on the CPU side.
Sometimes such work is unavoidable, but if you can, it is better to have multiple arrays of continuous values. Even in pure CPU work this is frequently more efficient for the CPU cache as it avoids read/writes of unneeded members and is often easier for SIMD instruction sets like SSE.
For example you could have int *values and char *chars of the same length (prefer some type like std::vector or std::unique_ptr<T[]> though!), then the copy is easy.

C++ fill an empty buffer with a single value

I apologize in advance if I am using the incorrect terminology, I'm new to the C++ language. I have a class with a constructor that creates an empty buffer using malloc
LPD6803PWM::LPD6803PWM(uint16_t leds, uint8_t dout, uint8_t cout) {
numLEDs = leds;
pixels = (uint16_t *) malloc(numLEDs);
dataPin = dout;
clockPin = cout;
}
My understanding is that this creates an empty buffer with the length of whatever I pass to numLEDs this is essentially a dynamically created array correct? I'm using malloc because this code goes on an Arduino that has very limited memory and I want to avoid overflows and from what I have read, this is the best way to declare arrays is you don't know what size the array will be and you want to avoid overflow errors.
My question is, once this array has been created is there a faster way than a traditional for loop to fill the array with a single value. Very often I will want to do this and even microseconds make a difference in this application. I know that from the C++ standard library array classes have a fill method, but what about an array declared in this way?
My question is, once this array has been created is there a faster way than a traditional for loop to fill the array with a single value.
The C standard library provides memset() and related functions for filling a buffer. There's also calloc(), which allocates a buffer just like malloc(), but fills the buffer with 0 at the same time.
Very often I will want to do this and even microseconds make a difference in this application.
In that case you might consider ways to avoid repeatedly allocating the array, which could take more time than filling an existing array. As well, the easiest way to make your code go faster is to run it on faster hardware. Arduino is a great platform, but Raspberry Pi Zero costs less ($5, if you can find them), has a LOT more memory, and has a clock speed that's 64x faster than a typical Arduino (1Ghz vs. 16MHz). Computing is often a tradeoff between good, cheap, and fast, but in this case you get all three.
You can still use std::fill (or std::fill_n), most standard library implementations will delegate to memset for RandomAccessIterator (e.g. gcc and Clang). Trust in the standard library writers!
You can use memset. But you have to be careful about the value you want to set. And you won't be much faster than using a for loop. The computer needs to set all these values somehow! memset may set larger contiguous memory spans and therefore be faster, but a smart compiler may do the same for a for loop.
If you're really concerned about microseconds you need to do some profiling.
Well, you can use memset from stdlib.h:
memset(array, 0, size_of_array_in_bytes);
Note however that memset works byte for byte,e.g it sets the first byte to 0 or whatever value you set as the second parameter, then the second byte and so on, which means that you must be careful.
Just a note:
malloc gets its size as the size of arrays in bytes, so you might consider multiplying its parameter by sizeof(uint16_t)

Create an N dimensional array, where N is determined at runtime (C++)

I'm encoding N-Dim image cubes into a different image format. I don't know the dimensions of the image until runtime and the library I'm using to read from the original image needs an N-dim array destination buffer as a parameter.
How can I declare such an array in C++? Thanks :)
The short answer is that you cannot declare such an array in C++. The dimensions of an array are part of the type (with a miscellaneous exception that sometimes the value of one of the dimensions can be unknown, for an extern array declaration). The number of dimensions is always part of the type, and the type must be known at compile time.
What you might be able to do instead is to use a "flat" array of appropriate size. For example, if you need a 3x3...x3 array then you can compute 3^n at runtime, dynamically allocate that many int (probably using a vector<int> for convenience), and you have memory with the same layout as an int[3][3]...[3]. You can refer to this memory via a void*.
I'm not certain that it's strictly legal in C++ to alias a flat array as a multi-dimensional array. But firstly the function you're calling might not actually alias it that way anyway given that it also doesn't know the dimension at compile-time. Secondly it will work in practice (if it doesn't, the function you're calling is either broken or else has some cunning way to deal with this that you should find out about and copy).
You can't use array in this case. Array is only for those data whose size and dimension are known at compile time. Try use an array of std::vector instead