I have an unsorted vector of eigenvalues and a related matrix of eigenvectors. I'd like to sort the columns of the matrix with respect to the sorted set of eigenvalues. (e.g., if eigenvalue[3] moves to eigenvalue[2], I want column 3 of the eigenvector matrix to move over to column 2.)
I know I can sort the eigenvalues in O(N log N) via std::sort. Without rolling my own sorting algorithm, how do I make sure the matrix's columns (the associated eigenvectors) follow along with their eigenvalues as the latter are sorted?
Typically just create a structure something like this:
struct eigen {
int value;
double *vector;
bool operator<(eigen const &other) const {
return value < other.value;
}
};
Alternatively, just put the eigenvalue/eigenvector into an std::pair -- though I'd prefer eigen.value and eigen.vector over something.first and something.second.
I've done this a number of times in different situations. Rather than sorting the array, just create a new array that has the sorted indices in it.
For example, you have a length n array (vector) evals, and a 2d nxn array evects. Create a new array index that has contains the values [0, n-1].
Then rather than accessing evals as evals[i], you access it as evals[index[i]] and instead of evects[i][j], you access it evects[index[i]][j].
Now you write your sort routine to sort the index array rather than the evals array, so instead of index looking like {0, 1, 2, ... , n-1}, the value in the index array will be in increasing order of the values in the evals array.
So after sorting, if you do this:
for (int i=0;i<n;++i)
{
cout << evals[index[i]] << endl;
}
you'll get a sorted list of evals.
this way you can sort anything that's associated with that evals array without actually moving memory around. This is important when n gets large, you don't want to be moving around the columns of the evects matrix.
basically the i'th smallest eval will be located at index[i] and that corresponds to the index[i]th evect.
Edited to add. Here's a sort function that I've written to work with std::sort to do what I just said:
template <class DataType, class IndexType>
class SortIndicesInc
{
protected:
DataType* mData;
public:
SortIndicesInc(DataType* Data) : mData(Data) {}
Bool operator()(const IndexType& i, const IndexType& j) const
{
return mData[i]<mData[j];
}
};
The solution purely relies on the way you store your eigenvector matrix.
The best performance while sorting will be achieved if you can implement swap(evector1, evector2) so that it only rebinds the pointers and the real data is left unchanged.
This could be done using something like double* or probably something more complicated, depends on your matrix implementation.
If done this way, swap(...) wouldn't affect your sorting operation performance.
The idea of conglomerating your vector and matrix is probably the best way to do it in C++. I am thinking about how I would do it in R and seeing if that can be translated to C++. In R it's very easy, simply evec<-evec[,order(eval)]. Unfortunately, I don't know of any built in way to perform the order() operation in C++. Perhaps someone else does, in which case this could be done in a similar way.
Related
What i am trying to accomplish is to store an unknown size of a polynomial using arrays.
What i have seen over the internet is using an array that each cell contains the coeffecient and the degree is the cell number, but that is not effecient because what if we have a polynomial like : 6x^14+x+5. this would mean we would have zeros all throughout the cells from 1 till 13.Ive already looked at some solutions with vectors and linked lists but is there any other way to effectively tackle this problem, without the use of (std::vectors or std::list)?
Unless there is a compelling reason to act otherwise (this is a programming assignment where you are required to use C-style arrays), you should use a std::vector from the standard library. Libraries are there for a reason: to make your life easier. The overhead is probably insignificant in the context of your program.
You mention that storing a polynomial (such as 4*x^5 + x - 1) in an std::vector with the indices representing the power (such as [-1, 1, 0, 0, 0, 4]) is inefficient. This is true, but unless you are storing polynomials of degree greater than 1000, this waste is entirely insignificant. For "sparse" polynomials, of high degree but with few coefficients, you could consider using a vector of pairs, with the first value of each pair storing the power and the second value storing the coefficient.
A sparse polynomial can be represented with a map, where a zero element is represented by nonexistent key. Here is an example of such class:
#include <map>
//example of sparse integer polynomial
class SparsePolynomial{
std::map<int,int> coeff;
int& operator[](const int& degree);
int get(int degree);
void update(int degree, int val);
};
Whenever you try to get or update the coefficient of an element, its existence in the map is evaluated. Everytime the coefficient of an element is updated, it is checked whether the value is zero. Hence, the size of the map can always be minimal.
We can replace these two methods with operator[]. However, in that case, we would not be able to check for zero during an update operation, thus the storage would not be as efficient as using two separate methods for access and update.
int SparsePolynomial::get(int degree){
if (coeff.find(degree) == coeff.end()){
return 0;
}else{
return coeff[degree];
}
}
void SparsePolynomial::update(int degree, int val){
if (val == 0){
std::map<int,int>::iterator it = coeff.find(degree);
if (it!=coeff.end()){
coeff.erase(it);
}
}else{
coeff[degree]=val;
}
}
While this method gives us a more efficient storage, it requires more time for access and update than vector does. However, in the case of a sparse polynomial, the difference can be small. Given a std::map of size N, the average search complexity of an element is O(log N). Suppose you have a sparse polynomial with degree d and number of non-zero coefficients N. If N is much smaller than d, then the access and update time would be small enough not to notice.
I have a matrix class as follows (some parts are omitted for clarity):
template <typename T> class CMatrix{
protected:
vector<T>* m_matrix;
public:
void SetCellValue(unsigned int row,unsigned int col,T value){ m_matrix->at(row*m_column+col)=value;}
T& GetCellValue(unsigned int row,unsigned int column) const{return m_matrix->at(row*m_column+column);}
I would like to have a function to sort the matrix based on a chosen column. Say if the matrix is:
2 3
1 4
After sorting based on 1st column it should look like:
1 4
2 3
Basically, since 1<2 we performed a row exchange. I know if m_matrix was a 2D vector, then std::sort would have worked. Is it possible to achieve sorting 1D std::vector based matrix, based on a chosen column.
The following worked very well for a 1D data type but could not tweak it to work with a matrix:
template <typename T> class Sorter{
bool m_IsAscending;
public:
Sorter() {m_IsAscending=true;}
void SortAscending() {m_IsAscending=true;}
void SortDescending(){m_IsAscending=false;}
bool operator()(T i, T j){
if(m_IsAscending) return (i<j); else return (i>j);
}
};
The solution is very easy. Remember that std::sort takes begin and end iterators. So all you have to do is split your matrix into parts and sort them individually:
for(long i = 0; i < num_of_columns; i++)
{
std::sort(m_matrix->begin()+num_of_rows*i, m_matrix->begin()+num_of_rows*(i+1));
}
This will sort all individual columns independently. If you want to sort only one column, don't use a loop, and choose an i that is the column number you want to sort.
Caveats:
This will work if your matrix is flattened in column-major order. Otherwise, if it's in row-major, all you have to do is transpose the matrix, sort it with the code above, and transpose it back. I guess this is the only way to go if you want to avoid writing your own sorting function. However, if all you want is to sort a single column, and your matrix is in row-major order, then it's much cheaper to just copy that row to a new vector, sort it, and copy it back.
Btw, I don't understand why m_matrix is a pointer... that's very bad practice and is a welcome invitation to memory leaks (unless you're using a smart pointer to wrap it, such as std::unique_ptr).
Hope this helps. Cheers!
This is a programming problem I come across very often and was wondering whether there is a data structure, either in the C++ STL or one I can implement myself which provides both random and sequential access.
An example of why I might need this:
Say there are n types of items, (n = 1000000, for example), and there's a fixed number of each type of item (for example, 0 or 10)
I store these items into an array, where the array index represents the type of the item, and the value represents how many items of that given type are there
Now, I have an algorithm which iterates over all EXISTING items. To obtain these items, it is very wasteful to iterate over the entire array when all the entries are 0, except for i.e. Array[99999] and Array[999999].
Normally, I solve this by using a linked list which saves the indices of all the nonzero array entries. I implement the standard operations in this way:
Insert(int t):
1) If Array[t] == 0, LinkedList.push_back(t);
2) Array[t]++;
Delete(int t):
1) If Array[t] == 1, find and remove t from LinkedList;
2) Array[t]--;
If I want O(1) complexity for the deletion operation, I make the array store containers instead of integers. Each container contains an integer and a pointer to the respective element of the LinkedList, so I don't have to search through the list.
I would love to know whether there is a data structure which formalizes/improves this approach, or whether there's a better way to do this altogether.
Given the following requirements:
Random access
Fast lookups
Fast insertions
Fast removals
Avoid wasted space
then you probably want something called a sparse array. Sparse arrays are not part of the standard library, so you'll have to emulate your own, using a std::map or std::unordered_map. In a sparse array, only non-zero elements occupy space in the collection.
An ordered_map will have O(1) lookups, insertions, and removals, but does not provide ordered iteration. A map will generally have slower operations, but will provide ordered iteration. I'm oversimplifying things when I say std::map is slower, as it depends on the number of elements and usage patterns (a topic probably already discussed in another question).
If you must absolutely have both O(1) lookups and ordered iteration, then you can combine both a map and ordered_map and keep them in sync. At that point, you'll want to consider using Boost.MultiIndex.
Here's a rough sketch showing how you can implement your own sparse vector class:
class SparseVector
{
public:
int get(size_t index) const
{
auto kv = map_.find(index);
return (kv == map_.end()) ? 0 : kv->second;
}
void put(size_t index, int value)
{
if (value == 0)
map_.erase(index);
else
map_.emplace(index, value);
}
// etc...
private:
std::unordered_map<size_t, int> map_;
};
In such a sparse vector class, you can overload operator[] if you wish to allow something like sparseVec[42] = 123.
Linear algebra libraries, such as Eigen or Boost.uBlas, already provide templates for sparse vectors and sparse matrices.
I have a data structure like this:
struct X {
float value;
int id;
};
a vector of those (size N (think 100000), sorted by value (stays constant during the execution of the program):
std::vector<X> values;
Now, I want to write a function
void subvector(std::vector<X> const& values,
std::vector<int> const& ids,
std::vector<X>& out /*,
helper data here */);
that fills the out parameter with a sorted subset of values, given by the passed ids (size M < N (about 0.8 times N)), fast (memory is not an issue, and this will be done repeatedly, so building lookuptables (the helper data from the function parameters) or something else that is done only once is entirely ok).
My solution so far:
Build lookuptable lut containing id -> offset in values (preparation, so constant runtime)
create std::vector<X> tmp, size N, filled with invalid ids (linear in N)
for each id, copy values[lut[id]] to tmp[lut[id]] (linear in M)
loop over tmp, copying items to out (linear in N)
this is linear in N (as it's bigger than M), but the temporary variable and repeated copying bugs me. Is there a way to do it quicker than this? Note that M will be close to N, so things that are O(M log N) are unfavourable.
Edit: http://ideone.com/xR8Vp is a sample implementation of mentioned algorithm, to make the desired output clear and prove that it's doable in linear time - the question is about the possibility of avoiding the temporary variable or speeding it up in some other way, something that is not linear is not faster :).
An alternative approach you could try is to use a hash table instead of a vector to look up ids in:
void subvector(std::vector<X> const& values,
std::unordered_set<int> const& ids,
std::vector<X>& out) {
out.clear();
out.reserve(ids.size());
for(std::vector<X>::const_iterator i = values.begin(); i != values.end(); ++i) {
if(ids.find(i->id) != ids.end()) {
out.push_back(*i);
}
}
}
This runs in linear time since unordered_set::find is constant expected time (assuming that we have no problems hashing ints). However I suspect it might not be as fast in practice as the approach you described initially using vectors.
Since your vector is sorted, and you want a subset of it sorted the same way, I assume we can just slice out the chunk you want without rearranging it.
Why not just use find_if() twice. Once to find the start of the range you want and once to find the end of the range. This will give you the start and end iterators of the sub vector. Construct a new vector using those iterators. One of the vector constructor overloads takes two iterators.
That or the partition algorithm should work.
If I understood your problem correctly, you actually try to create a linear time sorting algorithm (subject to the input size of numbers M).
That is NOT possible.
Your current approach is to have a sorted list of possible values.
This takes linear time to the number of possible values N (theoretically, given that the map search takes O(1) time).
The best you could do, is to sort the values (you found from the map) with a quick sorting method (O(MlogM) f.e. quicksort, mergesort etc) for small values of M and maybe do that linear search for bigger values of M.
For example, if N is 100000 and M is 100 it is much faster to just use a sorting algorithm.
I hope you can understand what I say. If you still have questions I will try to answer them :)
edit: (comment)
I will further explain what I mean.
Say you know that your numbers will range from 1 to 100.
You have them sorted somewhere (actually they are "naturally" sorted) and you want to get a subset of them in sorted form.
If it would be possible to do it faster than O(N) or O(MlogM), sorting algorithms would just use this method to sort.
F.e. by having the set of numbers {5,10,3,8,9,1,7}, knowing that they are a subset of the sorted set of numbers {1,2,3,4,5,6,7,8,9,10} you still can't sort them faster than O(N) (N = 10) or O(MlogM) (M = 7).
There are a couple of other posts about sorting a vector A based on values in another vector B. Most of the other answers tell to create a struct or a class to combine the values into one object and use std::sort.
Though I'm curious about the performance of such solutions as I need to optimize code which implements bubble sort to sort these two vectors. I'm thinking to use a vector<pair<int,int>> and sort that.
I'm working on a blob-tracking application (image analysis) where I try to match previously tracked blobs against newly detected blobs in video frames where I check each of the frames against a couple of previously tracked frames and of course the blobs I found in previous frames. I'm doing this at 60 times per second (speed of my webcam).
Any advice on optimizing this is appreciated. The code I'm trying to optimize can be shown here:
http://code.google.com/p/projectknave/source/browse/trunk/knaveAddons/ofxBlobTracker/ofCvBlobTracker.cpp?spec=svn313&r=313
important: I forgot to mention that the size of the vectors will never be bigger than 5, and mostly have only 3 items in it and will be unsorted (maybe I could even hardcode it for 3 items?)
Thanks
C++ provides lots of options for sorting, from the std::sort algorithm to sorted containers like std::map and std::set. You should always try to use these as your first solution, and only try things like "optimised bubble sorts" as a last resort.
I implemented this a while ago. Also, I think you mean ordering a vector B in the same way as the
sorted values of A.
Index contains the sorting order of data.
/** Sorts a vector and returns index of the sorted values
* \param Index Contains the index of sorted values in the original vector
* \param data The vector to be sorted
*/
template<class T>
void paired_sort(vector<unsigned int> & Index, const vector<T> & data)
{
// A vector of a pair which will contain the sorted value and its index in the original array
vector<pair<T,unsigned int>> IndexedPair;
IndexedPair.resize(data.size());
for(unsigned int i=0;i<IndexedPair.size();++i)
{
IndexedPair[i].first = data[i];
IndexedPair[i].second = i;
}
sort(IndexedPair.begin(),IndexedPair.end());
Index.resize(data.size());
for(size_t i = 0; i < Index.size(); ++i) Index[i] = IndexedPair[i].second;
}