In OpenCV 2.0, they switched from having separate image and matrix classes to a unified class called cv::Mat. What was the design decision there? To me, who works with both images and matrices on a daily basis, they are very different objects that just happen to have a commonality: they are both accessed in a grid. However, the thing that makes a matrix a matrix in my mind is you can do y = A*x, where A is m by n, x is n by 1, and y is m by 1. This makes almost no sense when A is an image why you would want to do this operation.
Merging the classes also had the nasty side effect of needing templating and odd matrix types (like CV_32FC3 for a 3-channel floating-point matrix/image). Since I know the guys working on OpenCV aren't crazy, what was the design decision that made them merge image and matrix classes? Was it code reuse? Was it efficiency somehow?
Main drawback is that you can't overload ' * ' to do a multiplcation, but I don't think you should overload ' * ' for anything more complex than builtin types anyway.
What is a convolution kernel - an image or a matrix?
You only have to learn all the handler/ctor functions once - instead of two sets of them
Related
I have a (kind of) performance problem in my code, that roots in the chosen architecture.
I will use multidimensional tensors (basically matrices with more dimensions) in the form of cubes to store my data.
Since the dimension is not known at compile-time, I can't use Boost's MultidimensionalArray (IIRC), but have to come up, with my own solution.
Right now, I save each dimension, on it's own. I have a Tensor of dimension (let's say 3), that holds a lot of tensors of dimension 2 (in an std::vector), that each have a std::vector with tensors of dimension 1, that each holds a std::vector of (numerical) data. I use an abstract base-class for my tensor, so everything in there is a pointer to the abstract class, while beeing (secretly) multi- or one-dimensional.
I extract a single numerical data-point by giving a std::list of indices to a tensor, that get's the first element, searches for the according tensor and passes the rest of the list to that tensor in a (kind of) recursive call.
I now have to do a multi-dimensional Fast-Fourier Transformation on that data. I use a Threadpool and Job-Objects, that works on copying data from an Tensor along one dimension, doing an FFT and writes that data back.
I already have logic to implement ThreadPool and organize the dimensions to FFT along, but there is one problem:
My data-structure is the cache-unfriendliest beast, one can think of... While the Data-Copying along the first dimension (that, with it's data in a single 1D-Tensor) is reasonable fast, but in other directions, I need to copy my data from all over the place.
Since there are no race-conditions (I make sure every concurrent FFT is on distinct data-points), I thought, I would not use a Mutex-Guard to let everybody copy at the same time. However this heavily slows down the process ("I copy my data now!" - "No, I copy my data now!"- "But it's my turn now!"...)
Guarding the copy-Process with a mutex, does not increase speed. The FFT of a vector with 1024 elements is way faster, then the copy-process to get these elements, resulting in nearly all of my threads waiting, while one is copying.
Long story short:
Is there any kind of multi-dimensional data-structure, that does not need to set the dimension at compile-time, that allows me to traverse fast along all axis? I searched for a while now, by nothing came up besides Boost MultiArray. Vectorization also does not work since the indices would grow too fast to hold in usual int-types.
I can't think of how to present code-examples here, since most of that code is rather simple, but If needed, I can get that in.
Eigen has multi-dimensional tensor support (nominally unsupported, but written by the DeepMind people, so "somewhat" supported?), and FFTW has 1d to 3d FFTs. Using external libraries with a set of 1D to 3D FFTs would outsource most of the hard work.
Edit: Actually, FFTW has support for threaded n-dimensional FFTs
I have a dataset from custom abstract objects and a custom distance function. Is there any good SVM libraries that allows me to train on my custom objects (not 2d points) and my custom distance function?
I searched the answers in this similar stackoverflow question, but none of them allows me to use custom objects and distance functions.
First things first.
SVM does not work on distance functions, it only accepts dot products. So your distance function (actually similarity, but usually 1-distance is similarity) has to:
be symmetric s(a,b)=s(b,a)
be positive definite s(a,a)>=0, s(a,a)=0 <=> a=0
be linear in first argument s(ka, b) = k s(a,b) and s(a+b,c) = s(a,c) + s(b,c)
This can be tricky to check, as you actually ask "is there a function from my objects to some vector space, phi such that s(phi(x), phi(y))" is a dot-product, thus leading to definition of so called kernel, K(x,y)=s(phi(x), phi(y)). If your objects are themselves elements of vector space, then sometimes it is enough to put phi(x)=x thus K=s, but it is not true in general.
Once you have this kind of similarity nearly any SVM library (for example libSVM) works with providing Gram matrix. Which is simply defined as
G_ij = K(x_i, x_j)
Thus requiring O(N^2) memory and time. Consequently it does not matter what are your objects, as SVM only works on pairwise dot-products, nothing more.
If you look appropriate mathematical tools to show this property, what can be done is to look for kernel learning from similarity. These methods are able to create valid kernel which behaves similarly to your similarity.
Check out the following:
MLPack: a lightweight library that provides lots of functionality.
DLib: a very popular toolkit that is used both in industry and academia.
Apart from these, you can also use Python packages, but import them from C++.
This is a very simple question - what is the best practice to work with triangle matrixes and to work with sparse matrices in C++?
For triangle matrix I suggest a data format as easy as
double* myMatrix;
int dimension;
as data structure in custom class. (I suggest that it was a square matrix in the full form.) And there will be methods for setting and accessing elements.
For sparse matrices - I know a couple of methods like saving just positions of elements in the row/column and their values. It's the question for your experience - which implementation of sparse matrix will be the best one?
P.S. Less memory, less CPU usage - that is my target, I am looking for the best solution, not the simplest one. All matrices will be used for solving systems of linear equations. And the size of matrices will be huge.
Thanks a lot for every advice!
If you have no idea about the structure of the matrix, then it is basically the same as a map. You could use std::map<std::pair<int,int>,double>. Or perhaps std::unordered_map if you have it.
I'm looking for a data structure that would allow me to store an M-by-N 2D matrix of values contiguously in memory, such that the distance in memory between any two points approximates the Euclidean distance between those points in the matrix. That is, in a typical row-major representation as a one-dimensional array of M * N elements, the memory distance differs between adjacent cells in the same row (1) and adjacent cells in neighbouring rows (N).
I'd like a data structure that reduces or removes this difference. Really, the name of such a structure is sufficient—I can implement it myself. If answers happen to refer to libraries for this sort of thing, that's also acceptable, but they should be usable with C++.
I have an application that needs to perform fast image convolutions without hardware acceleration, and though I'm aware of the usual optimisation techniques for this sort of thing, I feel a specialised data structure or data ordering could improve performance.
Given the requirement that you want to store the values contiguously in memory, I'd strongly suggest you research space-filling curves, especially Hilbert curves.
To give a bit of context, such curves are sometimes used in database indexes to improve the locality of multidimensional range queries (e.g., "find all items with x/y coordinates in this rectangle"), thereby aiming to reduce the number of distinct pages accessed. A bit similar to the R-trees that have been suggested here already.
Either way, it looks that you're bound to an M*N array of values in memory, so the whole question is about how to arrange the values in that array, I figure. (Unless I misunderstood the question.)
So in fact, such orderings would probably still only change the characteristics of distance distribution.. average distance for any two randomly chosen points from the matrix should not change, so I have to agree with Oli there. Potential benefit depends largely on your specific use case, I suppose.
I would guess "no"! And if the answer happens to be "yes", then it's almost certainly so irregular that it'll be way slower for a convolution-type operation.
EDIT
To qualify my guess, take an example. Let's say we store a[0][0] first. We want a[k][0] and a[0][k] to be similar distances, and proportional to k, so we might choose to interleave the storage of first row and first column (i.e. a[0][0], a[1][0], a[0][1], a[2][0], a[0][2], etc.) But how do we now do the same for e.g. a[1][0]? All the locations near it in memory are now taken up by stuff that's near a[0][0].
Whilst there are other possibilities than my example, I'd wager that you always end up with this kind of problem.
EDIT
If your data is sparse, then there may be scope to do something clever (re Cubbi's suggestion of R-trees). However, it'll still require irregular access and pointer chasing, so will be significantly slower than straightforward convolution for any given number of points.
You might look at space-filling curves, in particular the Z-order curve, which (mostly) preserves spatial locality. It might be computationally expensive to look up indices, however.
If you are using this to try and improve cache performance, you might try a technique called "bricking", which is a little bit like one or two levels of the space filling curve. Essentially, you subdivide your matrix into nxn tiles, (where nxn fits neatly in your L1 cache). You can also store another level of tiles to fit into a higher level cache. The advantage this has over a space-filling curve is that indices can be fairly quick to compute. One reference is included in the paper here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.8959
This sounds like something that could be helped by an R-tree. or one of its variants. There is nothing like that in the C++ Standard Library, but looks like there is an R-tree in the boost candidate library Boost.Geometry (not a part of boost yet). I'd take a look at that before writing my own.
It is not possible to "linearize" a 2D structure into an 1D structure and keep the relation of proximity unchanged in both directions. This is one of the fundamental topological properties of the world.
Having that that, it is true that the standard row-wise or column-wise storage order normally used for 2D array representation is not the best one when you need to preserve the proximity (as much as possible). You can get better result by using various discrete approximations of fractal curves (space-filling curves).
Z-order curve is a popular one for this application: http://en.wikipedia.org/wiki/Z-order_(curve)
Keep in mind though that regardless of which approach you use, there will always be elements that violate your distance requirement.
You could think of your 2D matrix as a big spiral, starting at the center and progressing to the outside. Unwind the spiral, and store the data in that order, and distance between addresses at least vaguely approximates Euclidean distance between the points they represent. While it won't be very exact, I'm pretty sure you can't do a whole lot better either. At the same time, I think even at very best, it's going to be of minimal help to your convolution code.
The answer is no. Think about it - memory is 1D. Your matrix is 2D. You want to squash that extra dimension in - with no loss? It's not going to happen.
What's more important is that once you get a certain distance away, it takes the same time to load into cache. If you have a cache miss, it doesn't matter if it's 100 away or 100000. Fundamentally, you cannot get more contiguous/better performance than a simple array, unless you want to get an LRU for your array.
I think you're forgetting that distance in computer memory is not accessed by a computer cpu operating on foot :) so the distance is pretty much irrelevant.
It's random access memory, so really you have to figure out what operations you need to do, and optimize the accesses for that.
You need to reconvert the addresses from memory space to the original array space to accomplish this. Also, you've stressed distance only, which may still cause you some problems (no direction)
If I have an array of R x C, and two cells at locations [r,c] and [c,r], the distance from some arbitrary point, say [0,0] is identical. And there's no way you're going to make one memory address hold two things, unless you've got one of those fancy new qubit machines.
However, you can take into account that in a row major array of R x C that each row is C * sizeof(yourdata) bytes long. Conversely, you can say that the original coordinates of any memory address within the bounds of the array are
r = (address / C)
c = (address % C)
so
r1 = (address1 / C)
r2 = (address2 / C)
c1 = (address1 % C)
c2 = (address2 % C)
dx = r1 - r2
dy = c1 - c2
dist = sqrt(dx^2 + dy^2)
(this is assuming you're using zero based arrays)
(crush all this together to make it run more optimally)
For a lot more ideas here, go look for any 2D image manipulation code that uses a calculated value called 'stride', which is basically an indicator that they're jumping back and forth between memory addresses and array addresses
This is not exactly related to closeness but might help. It certainly helps for minimation of disk accesses.
one way to get better "closness" is to tile the image. If your convolution kernel is less than the size of a tile you typical touch at most 4 tiles at worst. You can recursively tile in bigger sections so that localization improves. A Stokes-like (At least I thinks its Stokes) argument (or some calculus of variations ) can show that for rectangles the best (meaning for examination of arbitrary sub rectangles) shape is a smaller rectangle of the same aspect ratio.
Quick intuition - think about a square - if you tile the larger square with smaller squares the fact that a square encloses maximal area for a given perimeter means that square tiles have minimal boarder length. when you transform the large square I think you can show you should the transform the tile the same way. (might also be able to do a simple multivariate differentiation)
The classic example is zooming in on spy satellite data images and convolving it for enhancement. The extra computation to tile is really worth it if you keep the data around and you go back to it.
Its also really worth it for the different compression schemes such as cosine transforms. (That's why when you download an image it frequently comes up as it does in smaller and smaller squares until the final resolution is reached.
There are a lot of books on this area and they are helpful.
I'm trying to find/write a function that would perform the same operation as imlincomb(). However, I am having trouble finding such functions in C++ without using any Matlab API functions other than Intel Performance Primitiives library, and I don't really want to purchase a license for it unless my application really has to take advantage of it. What would be any easy method of implementing it, or perhaps if there are any standard functions that make the job a lot easier?
Thanks in advance.
There's definitely nothing of the sort in any standard C++ package. You might be able to use something in LAPACK, but I think you'd be better off writing your own. It's a fairly simple function: each output pixel is independent and depends only on the input pixels at the same coordinates. In pseudocode:
for each row y in [0, height-1]
for each column x in [0, width-1]
for each color channel c in (R, G, B)
output[y][x][c] = 0
for each input i
output[y][x][c] += weight[i] * input[i][y][x][c]
Of course, the exact formulation depends on how exactly your images are stored (3D array, 2D array, or 1D array, and be careful about the order of your dimensions!).