Making an Eigen::Vector look like a vector of points - c++

I want to represent a 2D shape in such a way that it can be interacted with as if it were a vector of points, in particular I want to be able to call operator[] and at() on it and return references to things that act like 2D points. Currently I just use a class whose only member variable is a vector of points and that has various arithmetic and geometric operations defined pointwise on its elements.
However, in other parts of my code I need to treat a vector of n points as an element of 2n dimensional space and perform basic linear algebra on it (e.g. projecting the vector onto a given subspace of R^2n). Currently I'm creating an Eigen::VectorXd object every time I want to do this, and then converting back after performing these operations. I don't want to do this, as I make the conversion often enough that all the copying is a noticeable source of inefficiency.
If I was storing the data as a flat array of doubles/floats/ints, I could cast a pointer to its nth element to a pointer to a Point (whose members would just be a pair of doubles/floats/ints). However, as I don't know the internal representation that Eigen uses for vectors (and it may well change), this isn't possible.
Is there a sensible way of solving this? I could just use Eigen::Vectors everywhere, but I really want most of the code to be able to pretend that it is dealing with a set of points.

However, as I don't know the internal representation that Eigen uses for vectors (and it may well change), this isn't possible.
Eigen offers the Map classes that allow mapping plain arrays to Eigen structures. For example:
double numbers[2];
Eigen::Vector2f::Map( numbers ).dot( Eigen::Vector2f::Constant(1) );

Related

Traverse of multidimensional Array in any axis

I have a (kind of) performance problem in my code, that roots in the chosen architecture.
I will use multidimensional tensors (basically matrices with more dimensions) in the form of cubes to store my data.
Since the dimension is not known at compile-time, I can't use Boost's MultidimensionalArray (IIRC), but have to come up, with my own solution.
Right now, I save each dimension, on it's own. I have a Tensor of dimension (let's say 3), that holds a lot of tensors of dimension 2 (in an std::vector), that each have a std::vector with tensors of dimension 1, that each holds a std::vector of (numerical) data. I use an abstract base-class for my tensor, so everything in there is a pointer to the abstract class, while beeing (secretly) multi- or one-dimensional.
I extract a single numerical data-point by giving a std::list of indices to a tensor, that get's the first element, searches for the according tensor and passes the rest of the list to that tensor in a (kind of) recursive call.
I now have to do a multi-dimensional Fast-Fourier Transformation on that data. I use a Threadpool and Job-Objects, that works on copying data from an Tensor along one dimension, doing an FFT and writes that data back.
I already have logic to implement ThreadPool and organize the dimensions to FFT along, but there is one problem:
My data-structure is the cache-unfriendliest beast, one can think of... While the Data-Copying along the first dimension (that, with it's data in a single 1D-Tensor) is reasonable fast, but in other directions, I need to copy my data from all over the place.
Since there are no race-conditions (I make sure every concurrent FFT is on distinct data-points), I thought, I would not use a Mutex-Guard to let everybody copy at the same time. However this heavily slows down the process ("I copy my data now!" - "No, I copy my data now!"- "But it's my turn now!"...)
Guarding the copy-Process with a mutex, does not increase speed. The FFT of a vector with 1024 elements is way faster, then the copy-process to get these elements, resulting in nearly all of my threads waiting, while one is copying.
Long story short:
Is there any kind of multi-dimensional data-structure, that does not need to set the dimension at compile-time, that allows me to traverse fast along all axis? I searched for a while now, by nothing came up besides Boost MultiArray. Vectorization also does not work since the indices would grow too fast to hold in usual int-types.
I can't think of how to present code-examples here, since most of that code is rather simple, but If needed, I can get that in.
Eigen has multi-dimensional tensor support (nominally unsupported, but written by the DeepMind people, so "somewhat" supported?), and FFTW has 1d to 3d FFTs. Using external libraries with a set of 1D to 3D FFTs would outsource most of the hard work.
Edit: Actually, FFTW has support for threaded n-dimensional FFTs

How to use arrays in machine learning classes?

I'm new to C++ and I think a good way for me to jump in is to build some basic models that I've built in other languages. I want to start with just Linear Regression solved using first order methods. So here's how I want things to be organized (in pseudocode).
class LinearRegression
LinearRegression:
tol = <a supplied tolerance or defaulted to 1e-5>
max_ite = <a supplied max iter or default to 1k>
fit(X, y):
// model learns weights specific to this data set
_gradient(X, y):
// compute the gradient
score(X,y):
// model uses weights learned from fit to compute accuracy of
// y_predicted to actual y
My question is when I use fit, score and gradient methods I don't actually need to pass around the arrays (X and y) or even store them anywhere so I want to use a reference or a pointer to those structures. My problem is that if the method accepts a pointer to a 2D array I need to supply the second dimension size ahead of time or use templating. If I use templating I now have something like this for every method that accepts a 2D array
template<std::size_t rows, std::size_t cols>
void fit(double (&X)[rows][cols], double &y){...}
It seems there likely a better way. I want my regression class to work with any size input. How is this done in industry? I know in some situations the array is just flattened into row or column major format where just a pointer to the first element is passed but I don't have enough experience to know what people use in C++.
You wrote a quite a few points in your question, so here are some points addressing them:
Contemporary C++ discourages working directly with heap-allocated data that you need to manually allocate or deallocate. You can use, e.g., std::vector<double> to represent vectors, and std::vector<std::vector<double>> to represent matrices. Even better would be to use a matrix class, preferably one that is already in mainstream use.
Once you use such a class, you can easily get the dimension at runtime. With std::vector, for example, you can use the size() method. Other classes have other methods. Check the documentation for the one you choose.
You probably really don't want to use templates for the dimensions.
a. If you do so, you will need to recompile each time you get a different input. Your code will be duplicated (by the compiler) to the number of different dimensions you simultaneously use. Lots of bad stuff, with little gain (in this case). There's no real drawback to getting the dimension at runtime from the class.
b. Templates (in your setting) are fitting for the type of the matrix (e.g., is it a matrix of doubles or floats), or possibly the number of dimesions (e.g., for specifying tensors).
Your regressor doesn't need to store the matrix and/or vector. Pass them by const reference. Your interface looks like that of sklearn. If you like, check the source code there. The result of calling fit just causes the class object to store the parameter corresponding to the prediction vector β. It doesn't copy or store the input matrix and/or vector.

standard and efficient map between objects

I am working on clustering problem where I have something called distance matrix. This distance matrix is something like:
the number of nodes(g) are N (dynamic)
This matrix is Symmetric (dist[i,j]==dist[j,i])
g1,g2,.... are object (they contain strings , integers and may even more..)
I want to be able to reach any value by simple way like dist[4][3] or even more clear way like dist(g1,g5) (here g1 and g5 may be some kind of pointer or reference)
many std algorithm will be applied on this distance matrix like min, max, accumulate ..etc
preferably but not mandatory, I would like not to use boost or other 3rd party libraries
What is the best standard way to declare this matrix.
You can create two dimensional vector like so
std::vector<std::vector<float> > table(N, std::vector<float>(N));
don`t forget to initialize it like this, it reserves memory for N members, so it does not need to reallocate all the members then you are adding more. And does not fragment the memory.
you can access its members like so
table[1][2] = 2.01;
it does not uses copy constructors all the time because vector index operator returns a reference to a member;
so it is pretty efficient if N does not need to change.

Data structure for handling a list of 3 integers

I'm currently coding a physical simulation on a lattice, I'm interested in describing loops in this lattice, they are closed curved composed by the edges of the lattice cells. I'm storing the information on this lattice cells (by information I mean a Boolean variable saying if the edge is valuable or no for composing a loop) in a 3 dimensional Boolean array.
I'm now thinking about a good structure to handle this loops. they are basically a list of edges, so I would need something like an array of 3d integer vectors, each edge being defined by 3 coordinates in my current parameterization. I'm already thinking about building a class around this "list" object as I'll need methods computing the loop diameter and probably more in the future.
But, I'm definitely not so aware of the choice of structure I have to do that, my physics background hasn't taught me enough in C++. And for so, I'd like to hear your suggestion for shaping this piece of code. I would really enjoy discovering some new ways of coding this kid of things.
You want two separate things. One is keeping track of all edges and allowing fast lookup of edge objects by an (int,int,int) index (you probably don't want int there but something like size_t or so). This is entirely independent from your second goal crating ordered subsets of these.
General Collection (1)
Since your edge database is going to be sparse (i.e. only a few of the possible indices will actually identify as a particular edge), my prior suggestion of using a 3d matrix is unsuitable. Instead, you probably want to lookup edges with a hash map.
How easy this is, depends on the expected size of the individual integers. That is, can you manage to have no more than 21 bit per integer (for instance if your integers are short int values, which have only 16 bit), then you can concatenate them to one 64 bit value, which already has an std::hash implementation. Otherwise, you will have to implement your own hash specialisation for, e.g., std::hash<std::array<uint32_t,3>> (which is also quite easy, and highly stackable).
Once you can hash your key, you can throw it into an std::unordered_map and be done with it. That thing is fast.
Loop detection (2)
Then you want to have short-lived data structures for identifying loops, so you want a data structure that extends on one end but never on the other. That means you're probably fine with an std::vector or possibly with an std::deque if you have very large instances (but try the vector first!).
I'd suggest simply keeping the index to an edge in the local vector. You can always lookup the edge object in your unordered_map. Then the question is how to represent the index. If Int represents your integer type (e.g. int, size_t, short, ...) it's probably the most consistent to use an std::array<Int,3> --- if the types of the integers differ, you'll want an std::tuple<...>.

Efficiant multidimensional data storage in C++

I'm trying to write a C++ program that needs to store and adjust data in a 3D array. The size is given by the user and doesn't change throughout the run, and I don't need to perform any complicated matrix operations on it. I just need it to be optimized to set and get from given 3D coordinates (I do quite some iterations over all the members, and it's a big array). What's the best way to go about defining that array? Vector of vector of vector? Arrays of vectors? CvMat/IplImage with multi channels? Should I even keep it as 3D or just turn it into one very long interleaved vector and calculate indexes accordingly?
Thanks!
I would go with your last option, a single large array with transformed indices. If all you want to do is read and write known indices, this is probably the most efficient structure, both in terms of storage and speed. You can also wrap this in a class and overload operator () to make it easy to access 3D coordinates, for eg. you could write a(1,2,3) = 10; and the overloaded operator could take care transforming the 3D coordinates into a linear index. Iterating over such an array would also be quite simple since there's only one dimension.
It depends on what you mean by efficient, but have you looked at KD Trees?