How can I reshape TF_Tensor* using Tensorflow's C_api as it's being done in C++?
TensorShape inputShape({1,1,80,80});
Tensor inputTensor;
Tensor newTensor;
bool result = inputTensor->CopyFrom(newTensor, inputShape);
I don't see a similar method using the tensorflow's c_api.
Tensorflow C API operates with a (data,dims) model - treating data as a flat raw array supplied with the needed dimensions.
Step 1: Allocating a new Tensor
Have a look at TF_AllocateTensor(ref):
TF_CAPI_EXPORT extern TF_Tensor* TF_AllocateTensor(TF_DataType,
const int64_t* dims,
int num_dims, size_t len);
Here:
TF_DataType: The TF equivalent of the data type you need from here.
dims: Array corresponding to dimensions of tensor to be allocated eg. {1, 1, 80, 80}
num_dims: length of dims(4 above)
len: reduce(dims, *): i.e. 1*1*80*80*sizeof(DataType) = 6400*sizeof(DataType).
Step 2: Copying data
// Get the tensor buffer
auto buff = (DataType *)TF_TensorData(output_of_tf_allocate);
// std::memcpy() ...
Here is some sample code from a project I did a while back on writing a very light Tensorflow C-API Wrapper.
So, essentially your reshape will involve allocating your new tensor and copying the data from the original tensor into buff.
The Tensorflow C API isnt meant for regular usage and thus is harder to learn + lacking documentation. I figured a lot of this out with experimentation. Any suggestions from the more experienced developers out there?
Related
I'm trying to apply thrust algorithms to the data in cuda::GpuMats. Unfortunately OpenCV basically never produces continuous GpuMats (which screws up virtually all my algorithms, code, performance etc...). Normally, when I encounter these kinds of scenarios with discontinuous host matrices, I just clone the matrix, typically host side matrices become discontinuous when the matrix came from a rectangle view of another matrix.
This... doesn't work with gpu mats. Literally never seem to come out contiguous. I'm not sure what the heck is going on with OpenCV. All I'm doing is the following:
cv::Mat host(600,400);
cv::gpu::GpuMat device;
device.upload(host);
cv::gpu::GpuMat continuous;
if(device.isContinuous()){
continuous = device;
}else{
continuous = device.clone();
}
//always prints...
if(!continuous.isContinuous()){
std::cout << "isn't Continuous\n";
}
As you can see, the mere act of uploading data produces discontinuous data...
Hi Dear,
To generate continuous GpuMat, you can use one of the below methods:
use cv::cuda::createContinuous(int rows, int cols, int type, continuous_gpumat) or its overloads.
allocate a continuous Cuda memory using cudaMalloc cuda api call or similar functions, and then construct a GpuMat header for this continuous buffer:
// elem_size is depent on data type
int alloc_size = rows*cols*elem_size;
uchar *data = nullptr;
cudaError_t status = cudaMalloc(&data, alloc_size);
assert(status==cudaSuccess);
continuous_gpumat = cv::cuda::GpuMat(rows, cols, type, data);
// in destructor:
status = cudaFree(data);
assert(status==cudaSuccess);
I need to store data from two float32 arrays in an .h5-file. The arrays both have the size 76800 (240x320) and represent an image each. I would be happy to just store the two arrays as they are in an .h5 file, but since I'm a total beginner with c++, I have no clue how to do this.
I looked here, but the conversion to a multi-array does not seem to be necessary for my case. Even though this seems like a really simple problem, I couldn't find a simple code example for this.
Here is my code so far:
H5::H5File file("/home/m/Desktop/tryout/file.h5", H5F_ACC_TRUNC);
// Vector X
hsize_t dims_pol[1] = {f->flow_x.size()};
H5::DataSpace dataspace_x(1, dims_pol);
H5::IntType datatype_x(H5::PredType::NATIVE_FLOAT);
H5::DataSet dataset_x(file.createDataSet("p", datatype_x, dataspace_x));
dataset_x.write(f->flow_x.data(), H5::PredType::NATIVE_UINT8);
dataset_x.close();
However, this only writes the one vector into the file, and additionally, I can't open the file in python (with pandas). It works with h5dump though.
Thanks for your help
One way to solve your issue could be through the usage of HDFql in C++ as follows:
// declare variables 'arr1' and 'arr2'
float arr1[240][320];
float arr2[240][320];
// populate variable 'arr1' with values
// populate variable 'arr2' with values
// register variable 'arr1' for subsequent usage (by HDFql)
HDFql::variableTransientRegister(&arr1);
// create dataset 'dset1' of data type float (size 240x320) populated with values from 'arr1'
HDFql::execute("create dataset dset1 as float(240, 320) values from memory 0");
// register variable 'arr2' for subsequent usage (by HDFql)
HDFql::variableTransientRegister(&arr2);
// create dataset 'dset2' of data type float (size 240x320) populated with values from 'arr2'
HDFql::execute("create dataset dset2 as float(240, 320) values from memory 0");
Additional info can be found in HDFql reference manual.
I think I found the solution, although I'm not super happy with it because pandas (python) can't open it and I have to use h5py.
However, here's my code. If you see any improvements, please let me know.
#include "H5Cpp.h"
H5::H5File file("/home/m/Desktop/tryout/file.h5", H5F_ACC_TRUNC);
// Vector X
hsize_t dims_pol[1] = {f->flow_x.size()};
H5::DataSpace dataspace_x(1, dims_pol);
H5::IntType datatype_x(H5::PredType::NATIVE_FLOAT);
H5::DataSet dataset_x(file.createDataSet("x", datatype_x, dataspace_x));
dataset_x.write(f->flow_x.data(), H5::PredType::NATIVE_FLOAT);
dataset_x.close();
// Vector Y
H5::DataSpace dataspace_y(1, dims_pol);
H5::IntType datatype_y(H5::PredType::NATIVE_FLOAT);
H5::DataSet dataset_y(file.createDataSet("y", datatype_y, dataspace_y));
dataset_y.write(f->flow_y.data(), H5::PredType::NATIVE_FLOAT);
dataset_y.close();
This is very basic: I am normally using Eigen3 for my math operations, but need to use libtorch for a network forward pass. Now I want to populate the torch::tensor with the data from my Eigen3 (or pure C++ array), but without a for loop. How can I do this?
Here is the solution with a loop:
Eigen::Matrix<double, N, 1> inputEigen; // previously initialized
torch::Tensor inputTorch = torch::ones({1, N}); // my torch tensor for the forward pass
for (int i = 0; i < N; i++) {
inputTorch[0][i] = inputEigen[i]; // batch size == 1
}
std::vector<torch::jit::IValue> inputs;
inputs.push_back(inputTorch);
at::Tensor output = net.forward(inputs).toTensor();
This works fine for now, but N might become really large and I'm just looking for a way to directly set the underlying data of my torch::tensor with a previously used C++ array
Libtorch provides the torch::from_blob function (see this thread), which asks for a void* pointer to some data and an IntArrayRef to know the dimensions of the interpreted data. So that would give something like:
Eigen::Matrix<double, N, 1> inputEigen; // previously initialized;
torch::Tensor inputElement = torch::from_blob(inputEigen.data(), {1,N}).clone(); // dims
Please note the call to clone which you may or may not need depending or your use case : basically from_blob does not take ownership of the underlying data, so without the clone it will remain shared with (and possibly destroyed by) your Eigen matrix
I have the following problem:
I have an Eigen::SparseMatrix I need to send over the network, and my network library only supports sending arrays of primitive types.
I can retrieve the pointers to the backing arrays of my SparseMatrix by doing something like (here's the backing object's code):
// Get pointers to the indices and values, send data over the network
int num_items = sparse_matrix.nonZeros()
auto values_ptr = sparse_matrix.data().valuePtr()
auto index_ptr = sparse_matrix.data().indexPtr()
network_lib::send(values_ptr, num_items)
network_lib::send(index_ptr, 2 * num_items) // Times two b/c we have 2 indices per value
Now on the other side I have access to these two arrays. But AFAIK there is no way to create a SparseArray without copying all the data into a new SparseMatrix (see docs for construction).
I'd like to do something like:
Eigen::SparseMatrix<float> zero_copy_matrix(num_rows, num_cols);
zero_copy_matrix.data().valuePtr() = received_values_ptr;
zero_copy_matrix.data().indexPtr() = received_index_ptr;
But this throws a compiler error:
error: lvalue required as left operand of assignment zero_copy_matrix.data().valuePtr() = received_values_ptr;
Any idea on how we could zero-copy construct a sparse Eigen matrix from existing arrays of indexes and data?
Another approach I tried that didn't work (this is local, no communication):
zero_copy_matrix.reserve(num_non_zeros);
zero_copy_matrix.data().swap(original_matrix.data());
When I try to print out the zero_copy_matrix it has no values in it.
After digging around I think a good option for me would be to use an Eigen::Map<Eigen::SparseMatrix<float>> as such:
Eigen::Map<Eigen::SparseMatrix<float>> sparse_map(num_rows, num_cols, num_non_zeros,
original_outer_index_ptr, original_inner_index_ptr,
original_values_ptr);
AFAIK, this should be zero-copy. Answer from here.
For flexibility, I'm loading data into dynamic-sized matrices (e.g. Eigen::MatrixXf) using the C++ library Eigen. I've written some functions which require mixed- or fixed-sized matrices as parameters (e.g. Eigen::Matrix<float, 3, Eigen::Dynamic> or Eigen::Matrix4f). Assuming I do the proper assertions for row and column size, how can I convert the dynamic matrix (size set at runtime) to a fixed matrix (size set at compile time)?
The only solution I can think of is to map it, for example:
Eigen::MatrixXf dyn = Eigen::MatrixXf::Random(3, 100);
Eigen::Matrix<float, 3, Eigen::Dynamic> fixed =
Eigen::Map<float, 3, Eigen::Dynamic>(dyn.data(), 3, dyn.cols());
But it's unclear to me if that will work either because the fixed size map constructor doesn't accept rows and columns as parameters in the docs. Is there a better solution? Simply assigning dynamic- to fixed-sized matrices doesn't work.
You can use Ref for that purpose, it's usage in your case is simpler, and it will do the runtime assertion checks for you, e.g.:
MatrixXf A_dyn(4,4);
Ref<Matrix4f> A_fixed(A_dyn);
You might even require a fixed outer-stride and aligned memory:
Ref<Matrix4f,Aligned16,OuterStride<4> > A_fixed(A_dyn);
In this case, A_fixed is really like a Matrix4f.