Dart/Flutter FFI: Convert List to Array - c++

I have a struct that receives an array of type Float from a C++ library.
class MyStruct extends Struct{
#Array.multi([12])
external Array<Float> states;
}
I am able to receive data and parse it in Dart.
Now I want to do the reverse. I have a List<double> which I want to assign to this struct and pass to C++.
The following cast fails at run time.
myStructObject.states = listObject as Array<Float>;
Neither Array class, nor List class has any related methods. Any idea on this?

There's no way to get around copying elements into FFI arrays.
for (var i = 0; i < listObject.length; i++) {
my_struct.states[i] = listObject[i];
}
This may seem inefficient, but consider that depending on the specialization of listObject, the underlying memory layout of the data may differ significantly from the contiguous FFI layout, and so a type conversion sugar provided by Dart would likely also need to perform conversions on individual elements anyways (as opposed to just performing a single memcpy under the hood).
One possibility for closing the convenience gap would be to define an extension method. For example:
extension FloatArrayFill<T> on ffi.Array<ffi.Float> {
void fillFromList(List<T> list) {
for (var i = 0; i < list.length; i++) {
this[i] = list[i] as double;
}
}
}
Usage:
my_struct.states.fillFromList(list);
Note that a separate extension method would be need to be defined for each ffi.Array<T> specialization you want to do this for (Array<Uint32>, Array<Double>, Array<Bool>, etc.).
This is due to the [] operator being implemented through a separate extension method for each of these type specializations internally.

Related

How to Retrieve a Scalar Value from a Compute Function in Apache Arrow

In am looping over the elements of an Arrow Array and trying to apply a compute function to each scalar that will tell me the year, month, day, etc... of each element. The code looks something like this:
arrow::NumericArray<arrow::Date32Type> array = {...}
for (int64_t i = 0; i < array.length(); i++) {
arrow::Result<std::shared_ptr<arrow::Scalar>> result = array->GetScalar(i);
if (!result.ok()) {
// TODO: handle error
}
arrow::Result<arrow::Datum> year = arrow::compute::Year(*result);
}
However, I am not really clear as to how to extract the actual int64_t value from the arrow::compute::Year call. I have tried to do things like
const std::shared_ptr<int64_t> val = year.ValueOrDie();
>>> 'arrow::Datum' to non-scalar type 'const std::shared_ptr<long int>' requested
I've tried similarly to assign to just an int64_t which also fails with error: cannot convert 'arrow::Datum' to 'int64_t'
I didn't see any method of the Datum class that would otherwise return a scalar value in the primitive type that I think arrow::compute::Year should be returning. Any idea what I might be misunderstanding with the Datum / Scalar / Compute APIs?
Arrow's compute functions are really meant to be applied on arrays and not scalars, otherwise the overhead renders the operation rather inefficient. The arrow::compute::Year function takes in a Datum. This is a convenience item that could be a Scalar, an Array, ArrayData, RecordBatch, or Table. Not all functions accept all possible values of Datum (in particular, many do not accept RecordBatch or Table).
Once you have a result, there are a few ways you can get the data, and grabbing individual scalars is probably going to be the least efficient, especially if you know the type of the data ahead of time (in this case we know the type will be int64_t). This is because a scalar is meant to be a type-erased wrapper (e.g. like an "object" in python or java) around some value and it carries some overhead.
So my suggestion would be:
// If you are going to be passing your array through the compute
// infrastructure you'll need to have it in a shared_ptr.
// Also, NumericArray is a base class so you don't often need
// to refer to it directly. You'll typically be getting one of the
// concrete subclasses like Date32Array
std::shared_ptr<arrow::Date32Array> array = {...}
// A datum can be implicitly constructed from a shared_ptr to an
// array. You could also explicitly construct it if that is more
// comfortable to you. Here `array` is being implicitly cast to a Datum.
ARROW_ASSIGN_OR_RAISE(arrow::Datum year_datum, arrow::compute::Year(array));
// Now we have a datum, but the docs tell us the return value from the
// `Year` function is always an array, so lets just unwrap it. This is
// something that could probably be improved in Arrow (might as well
// return an array)
std::shared_ptr<arrow::Array> years_arr = year_datum.make_array();
// Also, we know that the data type is Int64 so let's go ahead and
// cast further
std::shared_ptr<arrow::Int64Array> years = std::dynamic_pointer_cast<arrow::Int64Array>(years_arr);
// The concrete classes can be iterated in a variety of ways. GetScalar
// is the least efficient (but doesn't require knowing the type up front)
// Since we know the type (we've cast to Int64Array) we can use Value
// to get a single int64_t, raw_values() to get a const int64_t* (e.g a
// C-style array) or, perhaps the simplest, begin() and end() to get STL
// compliant iterators of int64_t
for (int64_t year : years) {
std::cout << "Year: " << year << std::endl;
}
If you really want to work with scalars:
arrow::Array array = {...}
for (int64_t i = 0; i < array.length(); i++) {
arrow::Result<std::shared_ptr<arrow::Scalar>> result = array->GetScalar(i);
if (!result.ok()) {
// TODO: handle error
}
ARROW_ASSIGN_OR_RAISE(Datum year_datum, arrow::compute::Year(*result));
std::shared_ptr<arrow::Scalar> year_scalar = year_datum.scalar();
std::shared_ptr<arrow::Int64Scalar> year_scalar_int = std::dynamic_pointer_cast<arrow::Int64Scalar>(year_scalar);
int64_t year = year_scalar_int->value;
}

How to convert arrow::Array to std::vector?

I have an Apache arrow array that is created by reading a file.
std::shared_ptr<arrow::Array> array;
PARQUET_THROW_NOT_OK(reader->ReadColumn(0, &array));
Is there a way to convert it to std::vector or any other native array type in C++?
You can use std::static_pointer_cast to cast the arrow::Array to, for example, an arrow::DoubleArray if the array contains doubles, and then use the Value function to get the value at a particular index. For example:
auto arrow_double_array = std::static_pointer_cast<arrow::DoubleArray>(array);
std::vector<double> double_vector;
for (int64_t i = 0; i < array->length(); ++i)
{
double_vector.push_back(arrow_double_array->Value(i));
}
See the latter part of the ColumnarTableToVector function in this example:
https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html. In that example, table->column(0)->chunk(0) is a std::shared_ptr<arrow::Array>.
To learn more, I found it useful to click on various parts of the inheritance diagram tree here: https://arrow.apache.org/docs/cpp/classarrow_1_1_flat_array.html. For example, strings in an arrow::StringArray are accessed using a GetString function instead of a Value function.
This is just what I've pieced together from these links, johnathan's comment above, and playing around with a small example myself, so I'm not sure if this is the best way, as I'm quite new to this.

Can't extend generic struct for specific type

Wanted to toy with adding some sugar in Swift3. Basically, I wanted to be able to do something like:
let randomAdjust = (-10...10).random
To do that, I decided I would need to extend ClosedRange. But then found it would probably be even better for my case, I really just plan on doing Int's for now, to use CountableClosedRange. My latest of multiple attempts looked like:
extension CountableClosedRange where Bound == Int {
var random:Int {
return Int(arc4random_uniform(UInt32(self.count) + 1)) + self.lowerBound
}
}
But the playground complains:
error: same-type requirement makes generic parameter 'Bound' non-generic
extension CountableClosedRange where Bound == Int {
I don't even know what it's telling me there.
The way this roadblock is commonly encountered is when attempting to extend Array. This is legal:
extension Array where Element : Comparable {
}
But this is illegal:
extension Array where Element == Int {
}
The compiler complains:
Same-type requirement makes generic parameter 'Element' non-generic
The problem is the use of == here in combination with Array's parameterized type Element, because Array is a generic struct.
One workaround with Array is to rise up the hierarchy of Array's inheritance to reach something that is not a generic struct:
extension Sequence where Iterator.Element == Int {
}
That's legal because Sequence and Iterator are generic protocols.
Another solution, though, is to rise up the hierarchy from the target type, namely Int. If we can find a protocol to which Int conforms, then we can use the : operator instead of ==. Well, there is one:
extension CountableClosedRange where Bound : Integer {
}
That's the real difference between our two attempts to implement random on a range. The reason your attempt hits a roadblock and mine doesn't is that you are using == whereas I am using :. I can do that because there's a protocol (FloatingPoint) to which Double conforms.
But, as you've been told, with luck all this trickery will soon be a thing of the past.
In Swift 4, what you are attempting is now completely supported. Hooray!
extension Stack where Element: Equatable {
func isTop(_ item: Element) -> Bool {
guard let topItem = items.last else {
return false
}
return topItem == item
}
}
Example from Swift docs: https://docs.swift.org/swift-book/LanguageGuide/Generics.html#ID553

Sort objects of dynamic size

Problem
Suppose I have a large array of bytes (think up to 4GB) containing some data. These bytes correspond to distinct objects in such a way that every s bytes (think s up to 32) will constitute a single object. One important fact is that this size s is the same for all objects, not stored within the objects themselves, and not known at compile time.
At the moment, these objects are logical entities only, not objects in the programming language. I have a comparison on these objects which consists of a lexicographical comparison of most of the object data, with a bit of different functionality to break ties using the remaining data. Now I want to sort these objects efficiently (this is really going to be a bottleneck of the application).
Ideas so far
I've thought of several possible ways to achieve this, but each of them appears to have some rather unfortunate consequences. You don't necessarily have to read all of these. I tried to print the central question of each approach in bold. If you are going to suggest one of these approaches, then your answer should respond to the related questions as well.
1. C quicksort
Of course the C quicksort algorithm is available in C++ applications as well. Its signature matches my requirements almost perfectly. But the fact that using that function will prohibit inlining of the comparison function will mean that every comparison carries a function invocation overhead. I had hoped for a way to avoid that. Any experience about how C qsort_r compares to STL in terms of performance would be very welcome.
2. Indirection using Objects pointing at data
It would be easy to write a bunch of objects holding pointers to their respective data. Then one could sort those. There are two aspects to consider here. On the one hand, just moving around pointers instead of all the data would mean less memory operations. On the other hand, not moving the objects would probably break memory locality and thus cache performance. Chances that the deeper levels of quicksort recursion could actually access all their data from a few cache pages would vanish almost completely. Instead, each cached memory page would yield only very few usable data items before being replaced. If anyone could provide some experience about the tradeoff between copying and memory locality I'd be very glad.
3. Custom iterator, reference and value objects
I wrote a class which serves as an iterator over the memory range. Dereferencing this iterator yields not a reference but a newly constructed object to hold the pointer to the data and the size s which is given at construction of the iterator. So these objects can be compared, and I even have an implementation of std::swap for these. Unfortunately, it appears that std::swap isn't enough for std::sort. In some parts of the process, my gcc implementation uses insertion sort (as implemented in __insertion_sort in file stl_alog.h) which moves a value out of the sequence, moves a number items by one step, and then moves the first value back into the sequence at the appropriate position:
typename iterator_traits<_RandomAccessIterator>::value_type
__val = _GLIBCXX_MOVE(*__i);
_GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
*__first = _GLIBCXX_MOVE(__val);
Do you know of a standard sorting implementation which doesn't require a value type but can operate with swaps alone?
So I'd not only need my class which serves as a reference, but I would also need a class to hold a temporary value. And as the size of my objects is dynamic, I'd have to allocate that on the heap, which means memory allocations at the very leafs of the recusrion tree. Perhaps one alternative would be a vaue type with a static size that should be large enough to hold objects of the sizes I currently intend to support. But that would mean that there would be even more hackery in the relation between the reference_type and the value_type of the iterator class. And it would mean I would have to update that size for my application to one day support larger objects. Ugly.
If you can think of a clean way to get the above code to manipulate my data without having to allocate memory dynamically, that would be a great solution. I'm using C++11 features already, so using move semantics or similar won't be a problem.
4. Custom sorting
I even considered reimplementing all of quicksort. Perhaps I could make use of the fact that my comparison is mostly a lexicographical compare, i.e. I could sort sequences by first byte and only switch to the next byte when the firt byte is the same for all elements. I haven't worked out the details on this yet, but if anyone can suggest a reference, an implementation or even a canonical name to be used as a keyword for such a byte-wise lexicographical sorting, I'd be very happy. I'm still not convinced that with reasonable effort on my part I could beat the performance of the STL template implementation.
5. Completely different algorithm
I know there are many many kinds of sorting algorithms out there. Some of them might be better suited to my problem. Radix sort comes to my mind first, but I haven't really thought this through yet. If you can suggest a sorting algorithm more suited to my problem, please do so. Preferrably with implementation, but even without.
Question
So basically my question is this:
“How would you efficiently sort objects of dynamic size in heap memory?”
Any answer to this question which is applicable to my situation is good, no matter whether it is related to my own ideas or not. Answers to the individual questions marked in bold, or any other insight which might help me decide between my alternatives, would be useful as well, particularly if no definite answer to a single approach turns up.
The most practical solution is to use the C style qsort that you mentioned.
template <unsigned S>
struct my_obj {
enum { SIZE = S; };
const void *p_;
my_obj (const void *p) : p_(p) {}
//...accessors to get data from pointer
static int c_style_compare (const void *a, const void *b) {
my_obj aa(a);
my_obj bb(b);
return (aa < bb) ? -1 : (bb < aa);
}
};
template <unsigned N, typename OBJ>
void my_sort (const char (&large_array)[N], const OBJ &) {
qsort(large_array, N/OBJ::SIZE, OBJ::SIZE, OBJ::c_style_compare);
}
(Or, you can call qsort_r if you prefer.) Since STL sort inlines the comparision calls, you may not get the fastest possible sorting. If all your system does is sorting, it may be worth it to add the code to get custom iterators to work. But, if most of the time your system is doing something other than sorting, the extra gain you get may just be noise to your overall system.
Since there are only 31 different object variations (1 to 32 bytes), you could easily create an object type for each and select a call to std::sort based on a switch statement. Each call will get inlined and highly optimized.
Some object sizes might require a custom iterator, as the compiler will insist on padding native objects to align to address boundaries. Pointers can be used as iterators in the other cases since a pointer has all the properties of an iterator.
I'd agree with std::sort using a custom iterator, reference and value type; it's best to use the standard machinery where possible.
You worry about memory allocations, but modern memory allocators are very efficient at handing out small chunks of memory, particularly when being repeatedly reused. You could also consider using your own (stateful) allocator, handing out length s chunks from a small pool.
If you can overlay an object onto your buffer, then you can use std::sort, as long as your overlay type is copyable. (In this example, 4 64bit integers). With 4GB of data, you're going to need a lot of memory though.
As discussed in the comments, you can have a selection of possible sizes based on some number of fixed size templates. You would have to have pick from these types at runtime (using a switch statement, for example). Here's an example of the template type with various sizes and example of sorting the 64bit size.
Here's a simple example:
#include <vector>
#include <algorithm>
#include <iostream>
#include <ctime>
template <int WIDTH>
struct variable_width
{
unsigned char w_[WIDTH];
};
typedef variable_width<8> vw8;
typedef variable_width<16> vw16;
typedef variable_width<32> vw32;
typedef variable_width<64> vw64;
typedef variable_width<128> vw128;
typedef variable_width<256> vw256;
typedef variable_width<512> vw512;
typedef variable_width<1024> vw1024;
bool operator<(const vw64& l, const vw64& r)
{
const __int64* l64 = reinterpret_cast<const __int64*>(l.w_);
const __int64* r64 = reinterpret_cast<const __int64*>(r.w_);
return *l64 < *r64;
}
std::ostream& operator<<(std::ostream& out, const vw64& w)
{
const __int64* w64 = reinterpret_cast<const __int64*>(w.w_);
std::cout << *w64;
return out;
}
int main()
{
srand(time(NULL));
std::vector<unsigned char> buffer(10 * sizeof(vw64));
vw64* w64_arr = reinterpret_cast<vw64*>(&buffer[0]);
for(int x = 0; x < 10; ++x)
{
(*(__int64*)w64_arr[x].w_) = rand();
}
std::sort(
w64_arr,
w64_arr + 10);
for(int x = 0; x < 10; ++x)
{
std::cout << w64_arr[x] << '\n';
}
std::cout << std::endl;
return 0;
}
Given the enormous size (4GB), I would seriously consider dynamic code generation. Compile a custom sort into a shared library, and dynamically load it. The only non-inlined call should be the call into the library.
With precompiled headers, the compilation times may actually be not that bad. The whole <algorithm> header doesn't change, nor does your wrapper logic. You just need to recompile a single predicate each time. And since it's a single function you get, linking is trivial.
#define OBJECT_SIZE 32
struct structObject
{
unsigned char* pObject;
bool operator < (const structObject &n) const
{
for(int i=0; i<OBJECT_SIZE; i++)
{
if(*(pObject + i) != *(n.pObject + i))
return (*(pObject + i) < *(n.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * OBJECT_SIZE); // 10 Objects
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*OBJECT_SIZE);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to check the sort
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end());
free(pObjects);
To skip the #define
struct structObject
{
unsigned char* pObject;
};
struct structObjectComparerAscending
{
int iSize;
structObjectComparerAscending(int _iSize)
{
iSize = _iSize;
}
bool operator ()(structObject &stLeft, structObject &stRight)
{
for(int i=0; i<iSize; i++)
{
if(*(stLeft.pObject + i) != *(stRight.pObject + i))
return (*(stLeft.pObject + i) < *(stRight.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int iObjectSize = 32; // Read it from somewhere
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * iObjectSize);
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*iObjectSize);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to work with something...
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end(), structObjectComparerAscending(iObjectSize));
free(pObjects);

accessing multi-dimensional array element c++

I am trying to work with a multi-dimensional array in MSVS2010 console application, and I need to access members of a 2D array. I instantiate the array as
Thing::Thing(int _n){
// size of the array
this.m = _n;
thing = new int*[m];
for(int ii = 0; ii < m; ii++){
thing[ii] = new int[m];
}
}
this is working fine. though when I go to do a operator=, or operator== that both use the similar structure of:
Thing& Thing::operator=(const Thing & _thing){
for(int ii = 0; ii < m; ii++){
for(int jj = 0; jj < m; jj++){
thing[ii][jj] = _thing[ii][jj]; //error thrown on this line
}
}
return *this;
}
this throws 2 errors
binary "[": 'const Thing' does not define this operator or a conversion to a type acceptable to the predefined operator
IntelliSense: no operator"[]" matches these operands
this doesn't make sense as it is an array of type int, and the "[]" operators have not been altered not to mention that error highlighting only puts it under:
_thing[ii][jj];
I can kinda live without the assignment operator, but I need the comparison operator to have functionality.
You should do: thing[ii][jj] = _thing.thing[ii][jj]; in your assignment loop. And you should also check if the array sizes for both (this and _thing) are the same: it may give a crash otherwise.
You get an error because you are trying to use operator[] (indexing operator) on an object class Thing, not on its internal array. If you want to use the Thing class like an array you should define an indexing operator for it e.g.:
int* Thing::operator[](int idx)
{
return thing[idx];
}
I think you've got your "thing"-s confused. Since:
Thing& Thing::operator=(const Thing & _thing)
you probably want to have:
thing[ii][jj] = _thing.thing[ii][jj];
_thing is the Thing object
_thing.thing is the multidimensional array
Thing is the class, thing is the member, thing the parameter... and you forgot that if you want to access the member in the operator= call then you should use _thing.thing.
Your naming choice is quite bad, so bad that it even confused yourself while you were writing the code (and if it was easy for you to make a mistake now try to imagine how much easier would be for someone else to get confused by this code or even for you in a few months from now).
What about calling for example the class Array instead, the member data and the parameter other? I also would suggest avoiding having leading underscores in names, they are ugly and dangerous at the same time (do you know all the C++ rules about where you can put underscores in names and how many of them you are allowed to use?).
When designing a class or a function you have many things to consider and the class name or the function name is important but is one of the many factors. But for a data member or a variable you only have to choose the type and the name and both of them are most important choices.
So please take the habit of thinking carefully to names, especially of variables. The relative importance is tremendous for them. Variables and data members are just names... the name is actually the only reason for which in programming we like to use variables (the computer instead only uses numeric addresses and is perfectly happy with them).
About the class design you probably would also like defining operator[](int)...
int *operator[](int index) { return data[index]; }
By doing this you will be able to write code like
Array a(m);
a[0][0] = 42;
without the need to explicitly refer to data (and, by the way, this addition would also make your original code working... but still fix the names!!).