Does tensor-flow C++ API support automatic differentiation to back-propagate the gradient?
If I write a graph in c++ and would like to run it in a c++ code (not in python!) will automatic differentiation work?
Let's suppose every op in the graph has a gradient implementation.
I think the documentation regarding what tensor-flow C++ API can and can't do is is very poor.
Thank you very much for the help
Technically it can, but AFAIK the automatic differentiation is only "configured" in Python. What I mean by this is that, at a lower level, each TensorFlow operation does not declare itself what its gradient is (that is, the corresponding operation that computes its gradient). That is instead declared at Python level. For example, you can take a look at math_ops.py. You will see that, among other things, there are several functions decorated with #ops.RegisterGradient(...). What this decorator does is adding that function to a global registry (in Python) of operations and their gradients. So, for example, optimizer classes are largely implemented in Python, since they make use of this registry to build the backpropagation computation (as opposed to making use of native TensorFlow primitives to that end, which do not exist).
So the point is that you can do the same computations using the same ops (which are then implemented with the same kernels), but I don't think that C++ has (or will ever have) such gradient registry (and optimizer classes), so you would need to work out or copy that backpropagation construction by yourself. In general, the C++ API is not well suited to building the computation graph.
Now a different question (and maybe this was what you were asking about in the first place) is whether you can run an already existing graph that does backpropagation in C++. By this I mean building a computation graph in Python, creating an optimizer (which in turn creates the necessary operations in the graph to compute the gradient and update the variables) and exporting the graph, then load that graph in C++ and run it. That is entirely possible and no different to running any other kind of thing in TensorFlow C++.
Related
When I was using a matlab, I was using the method filloutliers. I was wondering if there is something similar to that in C++.
In other words, I want to know if there is any sort of a built-in method in a certain library that detect outliers in a data set and replace them.
No, there's no built-in standard library facility which does that. Numerical analysis is not a focus or a strong point of C++, though of course there are numerical analysis libraries available out there (available via a Google search). Note that Matlab's method is a very particular one: there's no precise and universal definition of an "outlier" (some would say there's no such thing as an outlier). So expect to have to come up with your own opinion of how to classify a point as an outlier.
I need to implement a custom Mask-RCNN in C++ to perform instance segmentation on a custom dataset. Since I'm a beginner, I just know the theory, but I really don't know how to apply it.
Could you give me some guidelines to start my project? Thank you.
For a beginner, doing machine learning in C++ will be a very high bar.
Pretty much all the packages out there use python for the API. Tensorflow allows running the session API in C++, but you need to build the graph in python. And dealing with the build of tensorflow will be a pain.
Get the Mask-RCNN from its github, run it in python, understand it. Check that the license fits your need. Then, assuming your project is in C++, brush up on bindings between C++ and python. Have your C++ make calls to a python layer that imports Mask-RCNN.
Any other approach will offer significant hurdles to a beginner.
C++ Is great for making ML applications.
Some concepts you'll need to learn are
Matrix Layouts ( Row Major, Column Major)
Vectors
Matrix Vector Multiplication
Matrix Matrix Multiplication
The most important thing is cache locality. Decreasing cache misses ESPECIALLY in matrix multiplication (gemm and gemv) will be the determining factor of your networks speed. Using a naive matrix multiplication (n^3 loads) that is cache friendly is going to provide you with the best results.
I am thinking about using Intel Embree in my renderer and currently playing around with Embree tutorials.
So, the question is, is it possible to use Intel Embree efficiently via API?
I mean, I can see that the functions from <embree2/rtcore.h>, <embree2/rtcore_ray.h>, e.t.c use some internal data structures like RTCRay.
And obviously, since I can't overload any operators I always have to cast my data structures to Embree data structures and vice versa.
And that's not even just a type cast, that's a construction of a new object.
For example, before calling rtcIntersect(RTCScene scene, RTCRay ray); I construct RTCRay ray from my Ray class object and then when the function returns some data, I copy some values back.
It doesn't look like a good way to use Intel Embree.
I used the API directly from the pre-built binaries. The downside is there are no vector or matrix features to work with, but on the upside it means you can use any other library you want, it's not decided for you. I used the open source single header file linalg.h to keep things simple.
Here is my EmbreeTest project that has a single Main.cpp file which gets you started. Just use the Embree installer and that's all you need.
As for efficiency, if you start with this project you should be able to identify if there are any performance bottlenecks, as it does almost nothing but call Embree. The ray cast method just copies the ray org and dir that I've calculated into the RTCRay structure on the stack. I don't think this will be much of an overhead. Restructuring to cast multiple rays at once will make more of a difference to performance than the copy for sure.
Constructing RTCRay, use rtcIntersect, then copy the data back. The overhead is negligible (<0.5%) compared to ray-traversing and primitive intersection.
I think Corona render uses RTCRay internally, so they save the 0.5% overhead.
I know that V-Ray does exactly constructing RTCRay, use rtcIntersect, then copy the data back.
General advice: Avoid premature optimization. Implement working code and then use profiler to guide your optimizations.
We're working on a machine learning project in which we'd like to see the influence of certain online sample embedding methods on SVMs.
In the process we've tried interfacing with Pegasos and dlib as well as designing (and attempting to write) our own SVM implementation.
dlib seems promising as it allows interfacing with user written kernels.
Yet kernels don't give us the desired "online" behavior (unless that assumption is wrong).
Therefor, if you know about an SVM library which supports online embedding and custom written embedders, it would be of great help.
Just to be clear about "online".
It is crucial that the embedding process will happen online in order to avoid heavy memory usage.
We basically want to do the following within Stochastic subGradient Decent(in very general pseudo code):
w = 0 vector
for t=1:T
i = random integer from [1,n]
embed(sample_xi)
// sample_xi is sent to sub gradient loss i as a parameter
w = w - (alpha/t)*(sub_gradient(loss_i))
end
I think in your case you might want to consider the Budgeted Stochastic Gradient Descent for Large-Scale SVM Training (BSGD) [1] by Wang, Crammer, Vucetic
This is because, as specified in the paper about the "Curse of Kernelization" you might want to explore this option instead what you have indicated in the pseudocode in your question.
The Shark Machine Learning Library implements BSGD. Check a quick tutorial here
Maybe you want to use something like dlib's empirical kernel map. You can read it's documentation and particularly the example program for the gory details of what it does, but basically it lets you project a sample into the span of some basis set in a kernel feature space. There are even algorithms in dlib that iteratively build the basis set, which is maybe what you are asking about.
I'm doing some linear algebra math, and was looking for some really lightweight and simple to use matrix class that could handle different dimensions: 2x2, 2x1, 3x1 and 1x2 basically.
I presume such class could be implemented with templates and using some specialization in some cases, for performance.
Anybody know of any simple implementation available for use? I don't want "bloated" implementations, as I'll running this in an embedded environment where memory is constrained.
Thanks
You could try Blitz++ -- or Boost's uBLAS
I've recently looked at a variety of C++ matrix libraries, and my vote goes to Armadillo.
The library is heavily templated and header-only.
Armadillo also leverages templates to implement a delayed evaluation framework (resolved at compile time) to minimize temporaries in the generated code (resulting in reduced memory usage and increased performance).
However, these advanced features are only a burden to the compiler and not your implementation running in the embedded environment, because most Armadillo code 'evaporates' during compilation due to its design approach based on templates.
And despite all that, one of its main design goals has been ease of use - the API is deliberately similar in style to Matlab syntax (see the comparison table on the site).
Additionally, although Armadillo can work standalone, you might want to consider using it with LAPACK (and BLAS) implementations available to improve performance. A good option would be for instance OpenBLAS (or ATLAS). Check Armadillo's FAQ, it covers some important topics.
A quick search on Google dug up this presentation showing that Armadillo has already been used in embedded systems.
std::valarray is pretty lightweight.
I use Newmat libraries for matrix computations. It's open source and easy to use, although I'm not sure it fits your definition of lightweight (it includes over 50 source files which Visual Studio compiles it into a 1.8MB static library).
CML matrix is pretty good, but may not be lightweight enough for an embedded environment. Check it out anyway: http://cmldev.net/?p=418
Another option, altough may be too late is:
https://launchpad.net/lwmatrix
I for one wasn't able to find simple enough library so I wrote it myself: http://koti.welho.com/aarpikar/lib/
I think it should be able to handle different matrix dimensions (2x2, 3x3, 3x1, etc) by simply setting some rows or columns to zero. It won't be the most fastest approach since internally all operations will be done with 4x4 matrices. Although in theory there might exist that kind of processors that can handle 4x4-operations in one tick. At least I would much rather believe in existence of such processors that than go optimizing those low level matrix calculations. :)
How about just store the matrix in an array, like
2x3 matrix = {2,3,val1,val2,...,val6}
This is really simple, and addition operations are trivial. However, you need to write your own multiplication function.