C++ Array to HDF5 - c++

I need to store data from two float32 arrays in an .h5-file. The arrays both have the size 76800 (240x320) and represent an image each. I would be happy to just store the two arrays as they are in an .h5 file, but since I'm a total beginner with c++, I have no clue how to do this.
I looked here, but the conversion to a multi-array does not seem to be necessary for my case. Even though this seems like a really simple problem, I couldn't find a simple code example for this.
Here is my code so far:
H5::H5File file("/home/m/Desktop/tryout/file.h5", H5F_ACC_TRUNC);
// Vector X
hsize_t dims_pol[1] = {f->flow_x.size()};
H5::DataSpace dataspace_x(1, dims_pol);
H5::IntType datatype_x(H5::PredType::NATIVE_FLOAT);
H5::DataSet dataset_x(file.createDataSet("p", datatype_x, dataspace_x));
dataset_x.write(f->flow_x.data(), H5::PredType::NATIVE_UINT8);
dataset_x.close();
However, this only writes the one vector into the file, and additionally, I can't open the file in python (with pandas). It works with h5dump though.
Thanks for your help

One way to solve your issue could be through the usage of HDFql in C++ as follows:
// declare variables 'arr1' and 'arr2'
float arr1[240][320];
float arr2[240][320];
// populate variable 'arr1' with values
// populate variable 'arr2' with values
// register variable 'arr1' for subsequent usage (by HDFql)
HDFql::variableTransientRegister(&arr1);
// create dataset 'dset1' of data type float (size 240x320) populated with values from 'arr1'
HDFql::execute("create dataset dset1 as float(240, 320) values from memory 0");
// register variable 'arr2' for subsequent usage (by HDFql)
HDFql::variableTransientRegister(&arr2);
// create dataset 'dset2' of data type float (size 240x320) populated with values from 'arr2'
HDFql::execute("create dataset dset2 as float(240, 320) values from memory 0");
Additional info can be found in HDFql reference manual.

I think I found the solution, although I'm not super happy with it because pandas (python) can't open it and I have to use h5py.
However, here's my code. If you see any improvements, please let me know.
#include "H5Cpp.h"
H5::H5File file("/home/m/Desktop/tryout/file.h5", H5F_ACC_TRUNC);
// Vector X
hsize_t dims_pol[1] = {f->flow_x.size()};
H5::DataSpace dataspace_x(1, dims_pol);
H5::IntType datatype_x(H5::PredType::NATIVE_FLOAT);
H5::DataSet dataset_x(file.createDataSet("x", datatype_x, dataspace_x));
dataset_x.write(f->flow_x.data(), H5::PredType::NATIVE_FLOAT);
dataset_x.close();
// Vector Y
H5::DataSpace dataspace_y(1, dims_pol);
H5::IntType datatype_y(H5::PredType::NATIVE_FLOAT);
H5::DataSet dataset_y(file.createDataSet("y", datatype_y, dataspace_y));
dataset_y.write(f->flow_y.data(), H5::PredType::NATIVE_FLOAT);
dataset_y.close();

Related

How to populate torch::tensor with c++ array?

This is very basic: I am normally using Eigen3 for my math operations, but need to use libtorch for a network forward pass. Now I want to populate the torch::tensor with the data from my Eigen3 (or pure C++ array), but without a for loop. How can I do this?
Here is the solution with a loop:
Eigen::Matrix<double, N, 1> inputEigen; // previously initialized
torch::Tensor inputTorch = torch::ones({1, N}); // my torch tensor for the forward pass
for (int i = 0; i < N; i++) {
inputTorch[0][i] = inputEigen[i]; // batch size == 1
}
std::vector<torch::jit::IValue> inputs;
inputs.push_back(inputTorch);
at::Tensor output = net.forward(inputs).toTensor();
This works fine for now, but N might become really large and I'm just looking for a way to directly set the underlying data of my torch::tensor with a previously used C++ array
Libtorch provides the torch::from_blob function (see this thread), which asks for a void* pointer to some data and an IntArrayRef to know the dimensions of the interpreted data. So that would give something like:
Eigen::Matrix<double, N, 1> inputEigen; // previously initialized;
torch::Tensor inputElement = torch::from_blob(inputEigen.data(), {1,N}).clone(); // dims
Please note the call to clone which you may or may not need depending or your use case : basically from_blob does not take ownership of the underlying data, so without the clone it will remain shared with (and possibly destroyed by) your Eigen matrix

Reshaping Tensor in C

How can I reshape TF_Tensor* using Tensorflow's C_api as it's being done in C++?
TensorShape inputShape({1,1,80,80});
Tensor inputTensor;
Tensor newTensor;
bool result = inputTensor->CopyFrom(newTensor, inputShape);
I don't see a similar method using the tensorflow's c_api.
Tensorflow C API operates with a (data,dims) model - treating data as a flat raw array supplied with the needed dimensions.
Step 1: Allocating a new Tensor
Have a look at TF_AllocateTensor(ref):
TF_CAPI_EXPORT extern TF_Tensor* TF_AllocateTensor(TF_DataType,
const int64_t* dims,
int num_dims, size_t len);
Here:
TF_DataType: The TF equivalent of the data type you need from here.
dims: Array corresponding to dimensions of tensor to be allocated eg. {1, 1, 80, 80}
num_dims: length of dims(4 above)
len: reduce(dims, *): i.e. 1*1*80*80*sizeof(DataType) = 6400*sizeof(DataType).
Step 2: Copying data
// Get the tensor buffer
auto buff = (DataType *)TF_TensorData(output_of_tf_allocate);
// std::memcpy() ...
Here is some sample code from a project I did a while back on writing a very light Tensorflow C-API Wrapper.
So, essentially your reshape will involve allocating your new tensor and copying the data from the original tensor into buff.
The Tensorflow C API isnt meant for regular usage and thus is harder to learn + lacking documentation. I figured a lot of this out with experimentation. Any suggestions from the more experienced developers out there?

Changing array shape in a compound type

I'm starting from a struct
typedef struct SpikeInfo {
uint64 timestamp;
int recording;
int16 waveform[MAX_TRANSFORM_SIZE];
} SpikeInfo;
and then I create a HDF5 compound datatype:
CompType spiketype(sizeof(SpikeInfo));
hsize_t dims[2] = {MAX_TRANSFORM_SIZE/nChannels, nChannels};
spiketype.insertMember(H5std_string("waveform"), HOFFSET(SpikeInfo, waveform), ArrayType(getNativeType(I16), 2, dims));
spiketype.insertMember(H5std_string("recording"), HOFFSET(SpikeInfo, recording), getNativeType(U16));
spiketype.insertMember(H5std_string("timestamp"), HOFFSET(SpikeInfo, timestamp), getNativeType(U64));
Thus the array has allocated MAX_TRANSFORM_SIZE/nChannels rows. However, it's most likely that I will be constantly receiving less data (for example with the number of rows being onlyhalf of MAX_TRANSFORM_SIZE/nChannels), and then I end up having to fill the rest with zeros, which unnecessarily increases the filesize. I was wondering if there's a way to make that ArrayType resizable, so that I can make it bigger if it's necessary, but it's small at first.
I know there is a type VarLenType, but the documentation doesn't show any useful methods for it. Is such a thing possible at all?

How to create multi-value attribute in a HDF5 file using C++ API

EDIT STARTS
I'm trying to create an "pair, triplet or n-uplet" attribute based on a native type (float, int...) :
pair of float, triplet of float, n-uplet of floats attribute
pair of int, triplet of int, n-uplet of int attribute
I'm not trying to create an "Array" attribute, I'm not trying to create a "Compound" attribute
EDIT END
I'm trying to create an attribute based on a native type (float, int...) but which contains 2,3 or more values (equivalent to a pair or a n-uplet).
I don't want to create an array ! I want to create something very similar to an array, but not the same
I can create single value attribute this way (for a "double" attribute) :
H5::DataSpace dataSpace = H5::DataSpace();
H5::Attribute attribute = group.createAttribute(attributeName, H5::PredType::IEEE_F64LE, dataSpace);
attribute.write(H5::PredType::IEEE_F64LE, &attributeValue);
To create a couple of "double", I've tried this :
hsize_t dimension;
dimension = 2;
H5::ArrayType dataType(H5::PredType::IEEE_F64LE, 1, &dimension);
H5::DataSpace dataSpace = H5::DataSpace();
H5::Attribute attribute = group.createAttribute(attributeName, dataType, dataSpace);
double attributeValue[2];
attributeValue[0] = x;
attributeValue[1] = y;
attribute.write(dataType, attributeValue);
But it creates an array type attribute in the HDF5 file.
I know it's possible to create an attribute whcih contains multiple values because I can do it using the HDFView GUI software (the first one is created using the above code, the second line is an attribute created using the GUI - I want to create this kind of attribute) :
Any help will be appreciated !
Without exactly knowing what you are trying to accomplish, I believe what you are looking for is a self-defined datatype using the HDF5 compound datatype H5::CompType, which is usually used to save simple structs. Taken from the HDF5 C++ compound example page, the struct
typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
has the associated compound datatype:
CompType mtype1( sizeof(s1_t) );
mtype1.insertMember( MEMBER1, HOFFSET(s1_t, a), PredType::NATIVE_INT);
mtype1.insertMember( MEMBER3, HOFFSET(s1_t, c), PredType::NATIVE_DOUBLE);
mtype1.insertMember( MEMBER2, HOFFSET(s1_t, b), PredType::NATIVE_FLOAT);
Compound datatyped are then treated the same way as native datatypes and may also be saved as attributes.
Edit
The error you made in your code above was to define your datatype to be saved as an H5::ArrayType when you didn't actually want to save an Array. What you really want is a simple datatype (such as PredType::NATIVE_DOUBLE) saved in a higher dimensional dataspace.
#include "H5Cpp.h"
#ifndef H5_NO_NAMESPACE
using namespace H5;
#ifndef H5_NO_STD
using std::cout;
using std::endl;
#endif // H5_NO_STD
#endif
const H5std_string FILE_NAME("save.h5");
const H5std_string ATT_NAME("Attribute");
int main(){
const hsize_t dims=5;
int ndims=1;
DataType dtype=PredType::NATIVE_DOUBLE;
H5File h5file(FILE_NAME, H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT);
DataSpace* dspace = new DataSpace(ndims,&dims);
Attribute att=h5file.createAttribute(ATT_NAME,dtype,*dspace);
delete dspace;
double attvalue[dims];
for(auto i=0;i<dims;++i) attvalue[i]=i;
att.write(dtype,attvalue);
h5file.close();
return 0;
}
This should reproduce the "createdUsingHDFVIEW" attribute above (except for the datatype). I can't check to make sure as I dont have HDFView. This didn't occur to me at first as I tend to think of H5::DataSpace as a type of array (which it actually is).

Function to return SQL query into a vector<double> in C++

I have a query in T-SQL which returns 300 records. Each record has 2 columns (date, int)
What is the easiest way in C++ to put all the dates in one vector and all the integers in another one?
I would like to do it in a function.
It's hard to provide full code without knowing your SQL client library - this affects how you populate the vectors, but basically you loop through the rows read from the DB doing push_back on your two vectors for the values retrieved in each row.
The main question is how are you going to handle the returned parameters? You have two vectors, as you have specified the problem here. You could achieve this by having the caller create the vectors and then the function populate them, like this:
#include <vector>
// function declaration - return false on error, or throw exception if preferred
bool populate(std::vector<double>& dates, std::vector<int>& values);
// calling code
std::vector<double> myDates;
std::vector<int> myValues;
// if you know the row count is 300 ahead of time, do this
unsigned int rowCount;
// rowCount gets set up, to 300 in this example
myDates.reserve(rowCount);
myValues.reserve(rowCount);
// Populate vectors, checking for error (false = error)
if (populate(myDates, myValues)) {
// work with the returned data
}
For extra credit because of better encapsulation or the row data, I would be inclined to use a vector of POD structures. The advantage of this is that each date and value then remain tightly coupled - you can extend this into a full-blown class if you have operations you wish to do for each row. Hide the data behind getters, preferably.
struct Row {
public:
double date;
int value;
};
bool populate(std::vector<Row>& rows);