Equivalence of slicing tensor in Pytorch/ATen C++

Equivalence of slicing tensor in Pytorch/ATen C++ - c++

In Python given a 2-D tensor, we can use tensor[:,:2] to slice the a 2x2 matrix of the first two elements in the top left of the matrix, e.g. :
x = torch.tensor([[-1.4673, 0.9980, -2.1427, -1.1798, -0.0646, -0.2635, -2.8930, -0.2563,
0.4559, -0.7947, -0.4540, 3.3224, 0.2295, 5.5568, -8.0451, -2.4529,
4.8724, 2.1640, 3.3255, 0.6693, -1.2362, 4.4713, -3.5547, -0.0528,
0.1031, -1.2472, -1.6014, 1.8134],
[ 2.1636, -1.1497, -5.0298, 2.8261, -0.5684, 0.6389, 2.9009, -5.1609,
1.7358, -3.1819, -0.9877, 5.5598, 6.7142, 4.5704, -1.2683, -5.3046,
3.0454, 3.2757, -3.2541, 3.6619, -3.6391, -0.2002, 5.7175, 5.7130,
0.6632, -0.0744, -0.3502, 4.8116]])
y, z = x[:,:2].chunk(2,1)
print(y)
print(z)
[out]:
tensor([[-1.4673],
[ 2.1636]])
tensor([[ 0.9980],
[-1.1497]])
What is right way to do it in C++ for PyTorch's ATen particularly?
For e.g. in the LSTM, there is the gate.chunk(4,1) function at https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/RNN.cpp#L253
If I want to do a gate[:,:2].chunk(2,1) to extract different parts of the gates, e.g. auto partial_gates = gates[:,:2].chunk(4, 1);, how can it be done?
template <typename cell_params>
struct LSTMCell : Cell<std::tuple<Tensor, Tensor>, cell_params> {
using hidden_type = std::tuple<Tensor, Tensor>;
hidden_type operator()(const Tensor& input, const hidden_type& hidden, const cell_params& params) const override {
auto hx = std::get<0>(hidden);
auto cx = std::get<1>(hidden);
if (input.is_cuda()) {
auto igates = params.matmul_ih(input);
auto hgates = params.matmul_hh(hx);
auto result = at::_thnn_fused_lstm_cell(igates, hgates, cx, params.b_ih, params.b_hh);
// Slice off the workspace argument (it's needed only for AD).
return std::make_tuple(std::get<0>(result), std::get<1>(result));
}
auto gates = params.linear_ih(input) + params.linear_hh(hx);
auto chunked_gates = gates.chunk(4, 1);
auto partial_gates = gates[:,:2].chunk(4, 1);
auto ingate = chunked_gates[0].sigmoid();
auto forgetgate = chunked_gates[1].sigmoid();
auto cellgate = chunked_gates[2].tanh();
auto outgate = chunked_gates[3].sigmoid();
auto cy = (forgetgate * cx) + (ingate * cellgate);
auto hy = outgate * cy.tanh();
return std::make_tuple(hy, cy);
}
};

1. You can also use .slice
Tensor::slice(int64_t dim, int64_t start, int64_t end, int64_t step)
auto partial_gates = gates.slice(1, 0, 3).chunk(4, 1);
2. Pytorch 1.5 using Tensor::index and Tensor::index_put_
using namespace torch::indexing;
auto partial_gates = gates.index({"...", Slice(None, 2)}).chunk(4, 1);
Also supports multimensional indexing
General translation for Tensor::index and Tensor::index_put_
Python C++ (assuming `using namespace torch::indexing`)
-------------------------------------------------------------------
0 0
None None
... "..." or Ellipsis
: Slice()
start:stop:step Slice(start, stop, step)
True / False true / false
[[1, 2]] torch::tensor({{1, 2}})

It's .narrow() from https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L364
auto partial_gates = gates.narrow(1,0,2).chunk(4, 1);

Related

What should you store or append a batch of tensors to in C++ when using LibTorch?

In C++, when using LibTorch (The C++ version of PyTorch), what should you store a batch of tensors in? I'm running into the problem of not being able to reset the batch on the next step because C++ doesn't allow storing a new variable over an existing variable.
In my attempt my batch of tensors is one single 385x385 tensor. The batch size is 385. In a for loop I use torch::cat to concatenate 385 smaller 1D tensors, which are 385 numbers long. (Maybe 'stack' or 'append' are better terms for what I'm doing since the are stacked together picket fence style more than 'concatenated', but that's what I'm using.) Anyways, there is not problem with this shape. It seems to work fine for one forward and backward pass but then the tensor becomes 770x385 on the next pass instead of a 385x385 tensor of the next 385, 385 long arrays. I hope I am painting a picture and not being too verbose.
The code.
Near the bottom I have the line all_step_obs = torch::tensor({}); to try to wipe out the contents of the tensor, AKA, the batch, but this gives me a Segmentation fault (core dumped). I guess for trying to access the tensor outside of the loop(?)
If I don't have this line I get a 770x385 tensor after the next step.
The model
#include "mujoco/mujoco.h"
struct Net : torch::nn::Module {
torch::Tensor action_high, action_low;
public:
Net(torch::Tensor action_high, torch::Tensor action_low) : action_high(action_high), action_low(action_low){
// Construct and register two Linear submodules.
fc1 = torch::nn::Linear(385, 385);
fc2 = torch::nn::Linear(385, 385);
fc3 = torch::nn::Linear(385, 42);
// cholesky_layer = torch::nn::Linear(385, (42 * (42 + 1)) / 2);
cholesky_layer = torch::nn::Linear(385, 385);
}
// Implement the Net's algorithm.
torch::Tensor forward(torch::Tensor x) {
// Use one of many tensor manipulation functions.
x = torch::relu(fc1->forward(x));
x = torch::dropout(x, /*p=*/0.2, /*train=*/is_training());
x = torch::relu(fc2->forward(x));
auto mean_layer = fc3->forward(x);
auto mean = action_low + (action_high - action_low) * mean_layer;
auto chol_l = cholesky_layer->forward(x);
// auto chol = torch::rand({385, 385});
auto chol = torch::matmul(chol_l, chol_l.transpose(0, 1));
chol = torch::nan_to_num(chol, 0, 2.0);
chol = chol.add(torch::eye(385));
auto cholesky = torch::linalg::cholesky(chol);
// return torch::cat({mean, cholesky}, 0);
return mean_layer;
}
// Use one of many "standard library" modules.
torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr}, cholesky_layer{nullptr};
};
The training
auto high = torch::ones({385, 42}) * 0.4;
auto low = torch::ones({385, 42}) * -0.4;
auto actor = Net(low, high);
int max_steps = 385;
int steps = 2000;
auto l1_loss = torch::smooth_l1_loss;
auto optimizer = torch::optim::Adam(actor.parameters(), 3e-4);
torch::Tensor train() {
torch::Tensor all_step_obs;
for (int i = 0; i<steps; ++i)
{
for (int i = 0; i<max_steps; ++i)
{
all_step_obs = torch::cat({torch::rand({385}).unsqueeze(0), all_step_obs});
}
auto mean = actor.forward(all_step_obs);
auto loss = l1_loss(mean, torch::rand({385, 42}), 1, 0);
optimizer.zero_grad();
loss.backward();
optimizer.step();
all_step_obs = torch::tensor({});
if (steps == 1999) {
return loss;
}
}
};
int main (int argc, const char** argv) {
std::cout << train();
}

GSL ODE solver returns -nan although same ODE with same parameters is being solved in python

I use python to solve ODEs using scipy.integrate.odeint. Currently, I am working on a small project where I am using gsl in C++ to solve ODEs. I am trying to solve an ODE but the solver is returning -nan for each time point. Following is my code:
#include <stdio.h>
#include <math.h>
#include <iostream>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_errno.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_odeiv2.h>
struct param_type {
double k;
double n;
double m;
double s;
};
int func (double t, const double y[], double f[], void *params)
{
(void)(t); /* avoid unused parameter warning */
struct param_type *my_params_pointer = (param_type *)params;
double k = my_params_pointer->k;
double n = my_params_pointer->n;
double m = my_params_pointer->m;
double s = my_params_pointer->s;
f[0] = m*k*pow(s,n)*pow((y[0]/(k*pow(s,n))),(m-1)/m);
return GSL_SUCCESS;
}
int * jac;
int main ()
{
struct param_type mu = {1e-07, 1.5, 0.3, 250};
gsl_odeiv2_system sys = {func, NULL, 1, &mu};
gsl_odeiv2_driver * d = gsl_odeiv2_driver_alloc_y_new (&sys, gsl_odeiv2_step_rk8pd, 1e-6, 1e-6, 0.0);
int i;
double t = 0.0, t1 = 10.0;
double step_size = 0.1;
double y[1] = { 1e-06 };
gsl_vector *time = gsl_vector_alloc ((t1 / step_size) + 1);
gsl_vector *fun_val = gsl_vector_alloc ((t1 / step_size) + 1);
for (i = 1; i <= t1/step_size; i++)
{
double ti = i * t1 / (t1 / step_size);
int status = gsl_odeiv2_driver_apply (d, &t, ti, y);
if (status != GSL_SUCCESS)
{
printf ("error, return value=%d\n", status);
break;
}
printf ("%.5e %.5e\n", t, y[0]);
gsl_vector_set (time, i, t);
gsl_vector_set (fun_val, i, y[0]);
}
gsl_vector_add(time, fun_val);
{
FILE * f = fopen ("test.dat", "w");
gsl_vector_fprintf (f, time, "%.5g");
fclose (f);
}
gsl_odeiv2_driver_free (d);
gsl_vector_free (time);
gsl_vector_free (fun_val);
return 0;
}
As mentioned here, I don't need jacobian for an explicit solver that's why I passed NULL pointer for the jac function.
When I run the above code, I get -nan values at all time points.
To cross-check, I wrote the program in python which has the same function and same parameters, solved using scipy.integrate.odeint. It runs and provides a plausible answer.
Following my python code:
import numpy as np
from scipy.integrate import odeint
def nb(y, t, *args):
k = args[0]
n = args[1]
m = args[2]
s = args[3]
return m*k*s**n*(y/(k*s**n))**((m-1)/m)
t = np.linspace(0,10,int(10/0.1))
y0 = 1e-06
k = 1e-07
n = 1.5
m = 0.3
s = 250
res = odeint(nb, y0, t, args=(k,n,m,s)).flatten()
print(res)
Could anyone please help me figure out, what I am doing wrong in the C++ code using GSL for solving the ODE?

Your problem is here:
f[0] = m*k*pow(s,n)*pow((y[0]/(k*pow(s,n))),(m-1)/m);
As the solver proceeds, it may want to sample negative values of y[0]. In Python this makes no problem, in C++ it produces NANs.
To handle this, you can mimic Python's behavior:
auto sign = (y[0] < 0) ? -1.0 : 1.0;
f[0] = sign*m*k*pow(s,n)*pow((std::abs(y[0])/(k*pow(s,n))),(m-1)/m);
or even set sign effectively to 1:
f[0] = m*k*pow(s,n)*pow((std::abs(y[0])/(k*pow(s,n))),(m-1)/m);
After all, raising negative values to noninteger powers is an error unless one considers complex numbers, which is not the case.
Please notice that y[0] was secured with std::abs.

Matrix multiplication in SYCL using nd_range

I am doing matrix multiplication in sycl, but having some problems. I am using 2 (4x4) Matrices for multiplication and on first iteration of for loop it works but on second iteration when i = 1 it works fine until C[11] = A[11]*B[15] but then it skips 1 multiplication and move forward. I know the problem why it skips but unfortunately i have been unable to change matrix B index properly. Kindly if someone can help i will greatly appreciate it. Thanks
Here is the code
Matsize= 4,
Blocksize = 4 also i know for loop will be equal to matsize it is 2 just to get clear idea of execution flow
{
range<1> dimensions(matSize * matSize);
const property_list props = { property::buffer::use_host_ptr() };
buffer<T> A_buf(MA, dimensions, props);
buffer<T> B_buf(MB, dimensions, props);
buffer<T> C_buf(MC, dimensions, props);
myQueue.submit([&](handler& cgh) {
auto A_ptr = A_buf.template get_access<access::mode::read>(cgh);
auto B_ptr = B_buf.template get_access<access::mode::read_write>(cgh);
auto C_ptr = C_buf.template get_access<access::mode::write>(cgh);
auto localRange = range<1>(blockSize* blockSize);
accessor<T, 1, access::mode::read_write, access::target::local>
C(matSize * matSize, cgh);
cgh.parallel_for<mxm_kernel>(
nd_range<2>(range<2>{matSize, matSize},
range<2>{blockSize, blockSize}),
[=](nd_item<2> item) {
const auto id_x = item.get_global_id(0);
const auto id_y = item.get_global_id(1);
const auto width = item.get_group_range(0) * item.get_local_range(0);
const auto index = id_x * width + id_y;
const auto index2 = id_y * width + id_x;
for (int i = 0; i < 2 ; i++) {
C[index] += A_ptr[index] * B_ptr[index2 + i ];
}
out << "C is!" << C[index] << sycl::endl;
item.barrier(cl::sycl::access::fence_space::local_space);
C_ptr[index] = C[index];
});
});
}

Keras custom layer with CNTK backend (CRF as RNN)

I am attempting to duplicate the CRF as RNN which has been implemented in Keras but uses TensorFlow as a backend (https://github.com/sadeepj/crfasrnn_keras). The Keras front-end is fine, but some of the backend code is written as a TensorFlow custom op. I am trying to duplicate this in CNTK, but I have a few questions.
import cntk as C
from cntk import ops
import copy
import numpy as np
C.device.try_set_default_device(C.device.cpu())
ops.register_native_user_function('HighDimFilterOp', 'Cntk.HighDimFilter-' + C.__version__.rstrip('+'), 'CreateHighDimFilter')
def high_dim_filter(image=None, rgb=None, **kwargs):
inputs = [list(image), list(rgb)];
layer_config = copy.deepcopy(kwargs)
ops.native_user_function('HighDimFilterOp', inputs, layer_config, 'high_dim_filter')
This code is the Python call to my user C++ function. The C++ interface is as follows:
#include "HighDimFilter.h"
using namespace CNTK;
extern "C"
#ifdef _WIN32
__declspec(dllexport)
#endif
Function* CreateHighDimFilter(const Variable* operands, size_t /*numOperands*/, const Dictionary* attributes, const wchar_t* name)
{
printf("Creating HighDimFilter\n");
return new HighDimFilter({operands[0], operands[1]}, *attributes, name);
}
and the custom function itself is defined as:
#pragma once
#include "CNTKLibrary.h"
#include "modified_permutohedral.h"
using namespace CNTK;
class HighDimFilter final : public Function
{
bool _bilateral;
float _theta_alpha;
float _theta_beta;
float _theta_gamma;
enum Input : uint32_t
{
SCORES,
IM_INFO
};
public:
HighDimFilter(const std::vector<Variable>& inputs, const Dictionary& attributes, const std::wstring& name = L"HighDimFilter")
: Function(inputs, attributes, name)
{
if (attributes.Contains(L"bilateral"))
_bilateral = attributes[L"bilateral"].Value<bool>();
if (_bilateral == false)
{
if (attributes.Contains(L"theta_gamma"))
_theta_gamma = static_cast<float>(attributes[L"theta_gamma"].Value<double>());
}
else
{
if (attributes.Contains(L"theta_alpha"))
_theta_alpha = static_cast<float>(attributes[L"theta_alpha"].Value<double>());
if (attributes.Contains(L"theta_beta"))
_theta_beta = static_cast<float>(attributes[L"theta_beta"].Value<double>());
}
}
private:
void _compute_spatial_kernel(NDArrayViewPtr& Tensor, const float theta_gamma)
{
auto output_kernel = Tensor->WritableDataBuffer<float>();
auto outputShape = Tensor->Shape();
//auto channels = outputShape[0];
auto height = outputShape[1];
auto width = outputShape[2];
const auto num_pixels = width * height;
for (int p = 0; p < num_pixels; ++p)
{
output_kernel[2 * p] = static_cast<float>(p % width) / theta_gamma;
output_kernel[2 * p + 1] = static_cast<float>(p / width) / theta_gamma;
}
}
void _compute_bilateral_kernel(NDArrayViewPtr& Tensor, const NDArrayViewPtr& Image,
const float theta_alpha, const float theta_beta)
{
auto output_kernel = Tensor->WritableDataBuffer<float>();
auto rgb = Image->DataBuffer<float>();
auto outputShape = Tensor->Shape();
//auto channels = outputShape[0];
auto height = outputShape[1];
auto width = outputShape[2];
const auto num_pixels = height * width;
for (int p = 0; p < num_pixels; ++p)
{
// Spatial terms
output_kernel[5 * p] = static_cast<float>(p % width) / theta_alpha;
output_kernel[5 * p + 1] = static_cast<float>(p / width) / theta_alpha;
// Color terms
output_kernel[5 * p + 2] = static_cast<float>(rgb[p] / theta_beta);
output_kernel[5 * p + 3] = static_cast<float>(rgb[num_pixels + p] / theta_beta);
output_kernel[5 * p + 4] = static_cast<float>(rgb[2 * num_pixels + p] / theta_beta);
}
}
BackPropStatePtr Forward(const std::vector<ValuePtr>& inputValues,
std::unordered_map<Variable, ValuePtr>& outputs,
const DeviceDescriptor& computeDevice,
const std::unordered_set<Variable>& /*outputsToRetainBackwardStateFor */) override
{
#if 0
auto scoresShape = inputValues[Input::SCORES]->Shape();
auto channels = scoresShape[0];
auto height = scoresShape[1];
auto width = scoresShape[2];
const auto num_pixels = width * height;
auto &outputValue = outputs[this->Output()];
if (outputValue == nullptr)
{
outputValue = MakeSharedObject<Value>(MakeSharedObject<NDArrayView>(DataType::Float, scoresShape, computeDevice));
}
if (computeDevice.Type() != DeviceKind::CPU)
throw std::runtime_error("HighDimFilter: only CPU evaluation is supported at the moment.");
ModifiedPermutohedral mp;
if (_bilateral)
{
auto &kernel_vals = MakeSharedObject<Value>(MakeSharedObject<NDArrayView>(DataType::Float, NDShape({5, height, width}), computeDevice));
//float* kernel_vals = new float[5 * num_pixels];
_compute_bilateral_kernel(kernel_vals->Data(), inputValues[Input::IM_INFO]->Data(),
_theta_alpha, _theta_beta);
mp.init(kernel_vals->Data(), 5, num_pixels);
mp.compute(outputValue->Data(), inputValues[Input::SCORES]->Data(), false);
}
else
{
auto &kernel_vals = MakeSharedObject<Value>(MakeSharedObject<NDArrayView>(DataType::Float, NDShape({2, height, width}), computeDevice));
_compute_spatial_kernel(kernel_vals->Data(), _theta_gamma);
mp.init(kernel_vals->Data(), 2, num_pixels);
mp.compute(outputValue->Data(), inputValues[Input::SCORES]->Data(), channels, false);
}
return MakeSharedObject<BackPropState>(this->shared_from_this(), computeDevice, std::unordered_map<Variable, ValuePtr>({ {Inputs()[Input::IM_INFO], inputValues[Input::IM_INFO]} }));
#else
return nullptr;
#endif
}
void Backward(const BackPropStatePtr& state,
const std::unordered_map<Variable, ValuePtr>& rootGradientValues,
std::unordered_map<Variable, ValuePtr>& backPropagatedGradientValuesForInputs) override
{
#if 0
auto gradOutputVariable = Inputs()[Input::SCORES];
auto inputVariable = Inputs()[Input::IM_INFO];
auto &gradValue = backPropagatedGradientValuesForInputs[gradOutputVariable];
if (gradValue == nullptr)
gradValue = MakeSharedObject<Value>(MakeSharedObject<NDArrayView>(DataType::Float, gradOutputVariable.Shape(), state->Device()));
auto imageData = state->SavedForwardPropValues().at(inputVariable)->Data();
auto imageShape = imageData->Shape();
auto channels = imageShape[0];
auto height = imageShape[1];
auto width = imageShape[2];
const auto num_pixels = width * height;
if (state->Device().Type() != DeviceKind::CPU)
throw std::runtime_error("HighDimFilter: only CPU evaluation is supported at the moment.");
auto rootGradientData = rootGradientValues.at(this->Output())->Data();
ModifiedPermutohedral mp;
if (_bilateral)
{
auto &kernel_vals = MakeSharedObject<Value>(MakeSharedObject<NDArrayView>(DataType::Float, NDShape({5, height, width}), state->Device()));
//float* kernel_vals = new float[5 * num_pixels];
_compute_bilateral_kernel(kernel_vals->Data(), imageData,
_theta_alpha, _theta_beta);
mp.init(kernel_vals->Data(), 5, num_pixels);
mp.compute(gradValue->Data(), rootGradientData, true);
}
else
{
auto &kernel_vals = MakeSharedObject<Value>(MakeSharedObject<NDArrayView>(DataType::Float, NDShape({2, height, width}), state->Device()));
_compute_spatial_kernel(kernel_vals->Data(), _theta_gamma);
mp.init(kernel_vals->Data(), 2, num_pixels);
mp.compute(gradValue->Data(), rootGradientData, channels, true);
}
#endif
return;
}
const std::wstring& OpName() const override
{
static const std::wstring opName = L"HighDimFilterOp";
return opName;
}
size_t CurrentVersion() const override
{
NOT_IMPLEMENTED;
}
void InferOutputs(std::vector<Variable>& /*outputs */) override
{
NOT_IMPLEMENTED;
}
FunctionPtr Clone(const std::vector<Variable>& /*clonedInputs */) override
{
return nullptr;
}
};
My python call looks like:
bilateral_high_dim_filter = custom_module.high_dim_filter(image=all_ones_flat,
rgb=rgb_flat,
bilateral=True,
theta_alpha=self.theta_alpha,
theta_beta=self.theta_beta)
high_dim_filter = custom_module.high_dim_filter(image=all_ones_flat,
rgb=rgb_flat,
bilateral=False,
theta_gamma=self.theta_gamma)
The questions are as follows: 1) What are the "operands" passed in to the native_user_function on initialization? Are these only passed on initialization (are they intended to be weight and bias initialization)? How are the input operands used in the "Function" construction initializer? If I set these to "None" in Python, the code crashes.
2) How do you forward propagate the filter? Just call "forward()"? What about the required arguments to forward propagate?
3) Is there a numerical gradient calculation in CNTK similar to TensorFlow to check the gradient?

CNTK Evaluate model has two inputs C++

I have a project based on CNTK 2.3. I used the code from the integration tests to train MNIST classifier like this:
auto device = DeviceDescriptor::GPUDevice(0);
const size_t inputDim = sizeBlob * sizeBlob;
const size_t numOutputClasses = numberOfClasses;
const size_t hiddenLayerDim = 200;
auto input = InputVariable({ inputDim }, CNTK::DataType::Float, L"features");
auto scaledInput = ElementTimes(Constant::Scalar(0.00390625f, device), input);
auto classifierOutput = FullyConnectedDNNLayer(scaledInput, hiddenLayerDim, device, std::bind(Sigmoid, _1, L""));
auto outputTimesParam = Parameter(NDArrayView::RandomUniform<float>({ numOutputClasses, hiddenLayerDim }, -0.05, 0.05, 1, device));
auto outputBiasParam = Parameter(NDArrayView::RandomUniform<float>({ numOutputClasses }, -0.05, 0.05, 1, device));
classifierOutput = Plus(outputBiasParam, Times(outputTimesParam, classifierOutput), L"classifierOutput");
auto labels = InputVariable({ numOutputClasses }, CNTK::DataType::Float, L"labels");
auto trainingLoss = CNTK::CrossEntropyWithSoftmax(classifierOutput, labels, L"lossFunction");;
auto prediction = CNTK::ClassificationError(classifierOutput, labels, L"classificationError");
// Test save and reload of model
Variable classifierOutputVar = classifierOutput;
Variable trainingLossVar = trainingLoss;
Variable predictionVar = prediction;
auto combinedNet = Combine({ trainingLoss, prediction, classifierOutput }, L"MNISTClassifier");
//SaveAndReloadModel<float>(combinedNet, { &input, &labels, &trainingLossVar, &predictionVar, &classifierOutputVar }, device);
classifierOutput = classifierOutputVar;
trainingLoss = trainingLossVar;
prediction = predictionVar;
const size_t minibatchSize = 64;
const size_t numSamplesPerSweep = 60000;
const size_t numSweepsToTrainWith = 2;
const size_t numMinibatchesToTrain = (numSamplesPerSweep * numSweepsToTrainWith) / minibatchSize;
auto featureStreamName = L"features";
auto labelsStreamName = L"labels";
auto minibatchSource = TextFormatMinibatchSource(trainingSet, { { featureStreamName, inputDim },{ labelsStreamName, numOutputClasses } });
auto featureStreamInfo = minibatchSource->StreamInfo(featureStreamName);
auto labelStreamInfo = minibatchSource->StreamInfo(labelsStreamName);
LearningRateSchedule learningRatePerSample = TrainingParameterPerSampleSchedule<double>(0.003125);
auto trainer = CreateTrainer(classifierOutput, trainingLoss, prediction, { SGDLearner(classifierOutput->Parameters(), learningRatePerSample) });
size_t outputFrequencyInMinibatches = 20;
for (size_t i = 0; i < numMinibatchesToTrain; ++i)
{
auto minibatchData = minibatchSource->GetNextMinibatch(minibatchSize, device);
trainer->TrainMinibatch({ { input, minibatchData[featureStreamInfo] },{ labels, minibatchData[labelStreamInfo] } }, device);
PrintTrainingProgress(trainer, i, outputFrequencyInMinibatches);
size_t trainingCheckpointFrequency = 100;
if ((i % trainingCheckpointFrequency) == (trainingCheckpointFrequency - 1))
{
const wchar_t* ckpName = L"feedForward.net";
//trainer->SaveCheckpoint(ckpName);
//trainer->RestoreFromCheckpoint(ckpName);
}
}
combinedNet->Save(g_dnnFile);
That part works fine and I train the model then save to a model file. But when I try to evaluate a simple image to test the model it looks like something is wrong in the model.
// Load the model.
// The model is trained by <CNTK>/Examples/Image/Classification/ResNet/Python/TrainResNet_CIFAR10.py
// Please see README.md in <CNTK>/Examples/Image/Classification/ResNet about how to train the model.
FunctionPtr modelFunc = Function::Load(modelFile, device);
// Get input variable. The model has only one single input.
std::vector<Variable> inputs = modelFunc->Arguments();
Variable inputVar = modelFunc->Arguments()[0];
// The model has only one output.
// If the model has more than one output, use modelFunc->Outputs to get the list of output variables.
std::vector<Variable> outputs = modelFunc->Outputs();
Variable outputVar = outputs[0];
// Prepare input data.
// For evaluating an image, you first need to perform some image preprocessing to make sure that the input image has the correct size and layout
// that match the model inputs.
// Please note that the model used by this example expects the CHW image layout.
// inputVar.Shape[0] is image width, inputVar.Shape[1] is image height, and inputVar.Shape[2] is channels.
// For simplicity and avoiding external dependencies, we skip the preprocessing step here, and just use some artificially created data as input.
Mat image = imread(".....");
uint8_t* imagePtr = (uint8_t*)(image).data;
auto width = image.cols;
auto heigth = image.rows;
std::vector<float> inputData(inputVar.Shape().TotalSize());
for (size_t i = 0; i < inputData.size(); ++i)
{
auto curChVal = imagePtr[(i)];
inputData[i] = curChVal;
}
// Create input value and input data map
ValuePtr inputVal = Value::CreateBatch(inputVar.Shape(), inputData, device);
std::unordered_map<Variable, ValuePtr> inputDataMap = { { inputVar, inputVal } };
// Create output data map. Using null as Value to indicate using system allocated memory.
// Alternatively, create a Value object and add it to the data map.
std::unordered_map<Variable, ValuePtr> outputDataMap = { { outputVar, nullptr } };
// Start evaluation on the device
modelFunc->Evaluate(inputDataMap, outputDataMap, device);
// Get evaluate result as dense output
ValuePtr outputVal = outputDataMap[outputVar];
std::vector<std::vector<float>> outputData;
outputVal->CopyVariableValueTo(outputVar, outputData);
PrintOutput<float>(outputVar.Shape().TotalSize(), outputData);
I run the same code on c# and it works fine. What I found as a difference is that modelFunc->Arguments() should have one argument but it has two - it finds features and labels as two inputs but I need to have only feature as an input and it throws the following error:

Find input and output variables by name, instead of modelFunc->Arguments()[0].
Variable inputVar;
GetInputVariableByName(modelFunc, L"features", inputVar);
Variable outputVar;
GetOutputVaraiableByName(modelFunc, L"classifierOutput", outputVar);
GetInputVariableByName and GetOutputVaraiableByName() come from
https://github.com/Microsoft/CNTK/blob/v2.3.1/Tests/EndToEndTests/EvalClientTests/CNTKLibraryCPPEvalExamplesTest/EvalMultithreads.cpp#L316

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Equivalence of slicing tensor in Pytorch/ATen C++ - c++

It's .narrow() from https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L364 auto partial_gates = gates.narrow(1,0,2).chunk(4, 1);

Related

What should you store or append a batch of tensors to in C++ when using LibTorch?

GSL ODE solver returns -nan although same ODE with same parameters is being solved in python

Matrix multiplication in SYCL using nd_range

Keras custom layer with CNTK backend (CRF as RNN)

CNTK Evaluate model has two inputs C++

Categories

Resources