What i am trying to accomplish is to store an unknown size of a polynomial using arrays.
What i have seen over the internet is using an array that each cell contains the coeffecient and the degree is the cell number, but that is not effecient because what if we have a polynomial like : 6x^14+x+5. this would mean we would have zeros all throughout the cells from 1 till 13.Ive already looked at some solutions with vectors and linked lists but is there any other way to effectively tackle this problem, without the use of (std::vectors or std::list)?
Unless there is a compelling reason to act otherwise (this is a programming assignment where you are required to use C-style arrays), you should use a std::vector from the standard library. Libraries are there for a reason: to make your life easier. The overhead is probably insignificant in the context of your program.
You mention that storing a polynomial (such as 4*x^5 + x - 1) in an std::vector with the indices representing the power (such as [-1, 1, 0, 0, 0, 4]) is inefficient. This is true, but unless you are storing polynomials of degree greater than 1000, this waste is entirely insignificant. For "sparse" polynomials, of high degree but with few coefficients, you could consider using a vector of pairs, with the first value of each pair storing the power and the second value storing the coefficient.
A sparse polynomial can be represented with a map, where a zero element is represented by nonexistent key. Here is an example of such class:
#include <map>
//example of sparse integer polynomial
class SparsePolynomial{
std::map<int,int> coeff;
int& operator[](const int& degree);
int get(int degree);
void update(int degree, int val);
};
Whenever you try to get or update the coefficient of an element, its existence in the map is evaluated. Everytime the coefficient of an element is updated, it is checked whether the value is zero. Hence, the size of the map can always be minimal.
We can replace these two methods with operator[]. However, in that case, we would not be able to check for zero during an update operation, thus the storage would not be as efficient as using two separate methods for access and update.
int SparsePolynomial::get(int degree){
if (coeff.find(degree) == coeff.end()){
return 0;
}else{
return coeff[degree];
}
}
void SparsePolynomial::update(int degree, int val){
if (val == 0){
std::map<int,int>::iterator it = coeff.find(degree);
if (it!=coeff.end()){
coeff.erase(it);
}
}else{
coeff[degree]=val;
}
}
While this method gives us a more efficient storage, it requires more time for access and update than vector does. However, in the case of a sparse polynomial, the difference can be small. Given a std::map of size N, the average search complexity of an element is O(log N). Suppose you have a sparse polynomial with degree d and number of non-zero coefficients N. If N is much smaller than d, then the access and update time would be small enough not to notice.
Related
I currently have a solution but I feel it's not as efficient as it could be to this problem, so I want to see if there is a faster method to this.
I have two arrays (std::vectors for example). Both arrays contain only unique integer values that are sorted but are sparse in value, ie: 1,4,12,13... What I want to ask is there fast way I can find the INDEX to one of the arrays where the values are the same. For example, array1 has values 1,4,12,13 and array2 has values 2,12,14,16. The first matching value index is 1 in array2. The index into the array is what is important as I have other arrays that contain data that will use this index that "matches".
I am not confined to using arrays, maps are possible to. I am only comparing the two arrays once. They will not be reused again after the first matching pass. There can be small to large number of values (300,000+) in either array, but DO NOT always have the same number of values (that would make things much easier)
Worse case is a linear search O(N^2). Using map would get me better O(log N) but I would still have convert an array to into a map of value, index pairs.
What I currently have to not do any container type conversions is this. Loop over the smaller of the two arrays. Compare current element of small array (array1) with the current element of large array (array2). If array1 element value is larger than array2 element value, increment the index for array2 until is it no longer larger than array1 element value (while loop). Then, if array1 element value is smaller than array2 element, go to next loop iteration and begin again. Otherwise they must be equal and I have my index to either arrays of the matching value.
So in this loop, I am at best O(N) if all values have matches and at worse O(2N) if none match. So I am wondering if there is something faster out there? It's hard to know for sure how often the two arrays will match, but I would way I would lean more toward most of the arrays will mostly have matches than not.
I hope I explained the problem well enough and I appreciate any feedback or tips on improving this.
Code example:
std::vector<int> array1 = {4,6,12,34};
std::vector<int> array2 = {1,3,6,34,40};
for(unsigned int i=0, z=0; i < array1.size(); i++)
{
int value1 = array1[i];
while(value1 > array2[z] && z < array2.size())
z++;
if (z >= array2.size())
break; // reached end of array2
if (value1 < array2[z])
continue;
// we have a match, i and z indices have same value
}
Result will be matching indexes for array1 = [1,3] and for array2= [2,3]
I wrote an implementation of this function using an algorithm that performs better with sparse distributions, than the trivial linear merge.
For distributions, that are similar†, it has O(n) complexity but ranges where the distributions are greatly different, it should perform below linear, approaching O(log n) in optimal cases. However, I wasn't able to prove that the worst case isn't better than O(n log n). On the other hand, I haven't been able to find that worst case either.
I templated it so that any type of ranges can be used, such as sub-ranges or raw arrays. Technically it works with non-random access iterators as well, but the complexity is much greater, so it's not recommended. I think it should be possible to modify the algorithm to fall back to linear search in that case, but I haven't bothered.
† By similar distribution, I mean that the pair of arrays have many crossings. By crossing, I mean a point where you would switch from one array to another if you were to merge the two arrays together in sorted order.
#include <algorithm>
#include <iterator>
#include <utility>
// helper structure for the search
template<class Range, class Out>
struct search_data {
// is any there clearer way to get iterator that might be either
// a Range::const_iterator or const T*?
using iterator = decltype(std::cbegin(std::declval<Range&>()));
iterator curr;
const iterator begin, end;
Out out;
};
template<class Range, class Out>
auto init_search_data(const Range& range, Out out) {
return search_data<Range, Out>{
std::begin(range),
std::begin(range),
std::end(range),
out,
};
}
template<class Range, class Out1, class Out2>
void match_indices(const Range& in1, const Range& in2, Out1 out1, Out2 out2) {
auto search_data1 = init_search_data(in1, out1);
auto search_data2 = init_search_data(in2, out2);
// initial order is arbitrary
auto lesser = &search_data1;
auto greater = &search_data2;
// if either range is exhausted, we are finished
while(lesser->curr != lesser->end
&& greater->curr != greater->end) {
// difference of first values in each range
auto delta = *greater->curr - *lesser->curr;
if(!delta) { // matching value was found
// store both results and increment the iterators
*lesser->out++ = std::distance(lesser->begin, lesser->curr++);
*greater->out++ = std::distance(greater->begin, greater->curr++);
continue; // then start a new iteraton
}
if(delta < 0) { // set the order of ranges by their first value
std::swap(lesser, greater);
delta = -delta; // delta is always positive after this
}
// next crossing cannot be farther than the delta
// this assumption has following pre-requisites:
// range is sorted, values are integers, values in the range are unique
auto range_left = std::distance(lesser->curr, lesser->end);
auto upper_limit =
std::min(range_left, static_cast<decltype(range_left)>(delta));
// exponential search for a sub range where the value at upper bound
// is greater than target, and value at lower bound is lesser
auto target = *greater->curr;
auto lower = lesser->curr;
auto upper = std::next(lower, upper_limit);
for(int i = 1; i < upper_limit; i *= 2) {
auto guess = std::next(lower, i);
if(*guess >= target) {
upper = guess;
break;
}
lower = guess;
}
// skip all values in lesser,
// that are less than the least value in greater
lesser->curr = std::lower_bound(lower, upper, target);
}
}
#include <iostream>
#include <vector>
int main() {
std::vector<int> array1 = {4,6,12,34};
std::vector<int> array2 = {1,3,6,34};
std::vector<std::size_t> indices1;
std::vector<std::size_t> indices2;
match_indices(array1, array2,
std::back_inserter(indices1),
std::back_inserter(indices2));
std::cout << "indices in array1: ";
for(std::vector<int>::size_type i : indices1)
std::cout << i << ' ';
std::cout << "\nindices in array2: ";
for(std::vector<int>::size_type i : indices2)
std::cout << i << ' ';
std::cout << std::endl;
}
Since the arrays are already sorted you can just use something very much like the merge step of mergesort. This just looks at the head element of each array, and discards the lower element (the next element becomes the head). Stop when you find a match (or when either array becomes exhausted, indicating no match).
This is O(n) and the fastest you can do for arbitrary distubtions. With certain clustered distributions a "skip ahead" approach could be used rather than always looking at the next element. This could result in better than O(n) running times for certain distributions. For example, given the arrays 1,2,3,4,5 and 10,11,12,13,14 an algorithm could determine there were no matches to be found in as few as one comparison (5 < 10).
What is the range of the stored numbers?
I mean, you say that the numbers are integers, sorted, and sparse (i.e. non-sequential), and that there may be more than 300,000 of them, but what is their actual range?
The reason that I ask is that, if there is a reasonably small upper limit, u, (say, u=500,000), the fastest and most expedient solution might be to just use the values as indices. Yes, you might be wasting memory, but is 4*u really a lot of memory? This depends on your application and your target platform (i.e. if this is for a memory-constrained embedded system, its less likely to be a good idea than if you have a laptop with 32GiB RAM).
Of course, if the values are more-or-less evenly spread over 0-2^31-1, this crude idea isn't attractive, but maybe there are properties of the input values that you can exploit other simply than the range. You might be able to hand-write a fairly simple hash function.
Another thing worth considering is whether you actually need to be able to retrieve the index quickly or if it helps just be able to tell if the index exists in the other array quickly. Whether or not a value exists at a particular index requires only one bit, so you could have a bitmap of the range of the input values using 32x less memory (i.e. mask off 5 LSBs and use that as a bit position, then shift the remaining 27 bits 5 places right and use that as an array index).
Finally, a hybrid approach might be worth considering, where you decide how much memory you're prepared to use (say you decide 256KiB, which corresponds to 64Ki 4-byte integers) then use that as a lookup-table to into much smaller sub-problems. Say you have 300,000 values whose LSBs are pretty evenly distributed. Then you could use 16 LSBs as indices into a lookup-table of lists that are (on average) only 4 or 5 elements long, which you can then search by other means. A couple of year ago, I worked on some simulation software that had ~200,000,000 cells, each with a cell id; some utility functionality used a binary search to identify cells by id. We were able to speed it up significantly and non-intrusively with this strategy. Not a perfect solution, but a great improvement. (If the LSBs are not evenly distributed, maybe that's a property that you can exploit or maybe you can choose a range of bits that are, or do a bit of hashing.)
I guess the upshot is “consider some kind of hashing”, even the “identity hash” or simple masking/modulo with a little “your solution doesn't have to be perfectly general” on the side and some “your solution doesn't have to be perfectly space efficient” sauce on top.
This is a programming problem I come across very often and was wondering whether there is a data structure, either in the C++ STL or one I can implement myself which provides both random and sequential access.
An example of why I might need this:
Say there are n types of items, (n = 1000000, for example), and there's a fixed number of each type of item (for example, 0 or 10)
I store these items into an array, where the array index represents the type of the item, and the value represents how many items of that given type are there
Now, I have an algorithm which iterates over all EXISTING items. To obtain these items, it is very wasteful to iterate over the entire array when all the entries are 0, except for i.e. Array[99999] and Array[999999].
Normally, I solve this by using a linked list which saves the indices of all the nonzero array entries. I implement the standard operations in this way:
Insert(int t):
1) If Array[t] == 0, LinkedList.push_back(t);
2) Array[t]++;
Delete(int t):
1) If Array[t] == 1, find and remove t from LinkedList;
2) Array[t]--;
If I want O(1) complexity for the deletion operation, I make the array store containers instead of integers. Each container contains an integer and a pointer to the respective element of the LinkedList, so I don't have to search through the list.
I would love to know whether there is a data structure which formalizes/improves this approach, or whether there's a better way to do this altogether.
Given the following requirements:
Random access
Fast lookups
Fast insertions
Fast removals
Avoid wasted space
then you probably want something called a sparse array. Sparse arrays are not part of the standard library, so you'll have to emulate your own, using a std::map or std::unordered_map. In a sparse array, only non-zero elements occupy space in the collection.
An ordered_map will have O(1) lookups, insertions, and removals, but does not provide ordered iteration. A map will generally have slower operations, but will provide ordered iteration. I'm oversimplifying things when I say std::map is slower, as it depends on the number of elements and usage patterns (a topic probably already discussed in another question).
If you must absolutely have both O(1) lookups and ordered iteration, then you can combine both a map and ordered_map and keep them in sync. At that point, you'll want to consider using Boost.MultiIndex.
Here's a rough sketch showing how you can implement your own sparse vector class:
class SparseVector
{
public:
int get(size_t index) const
{
auto kv = map_.find(index);
return (kv == map_.end()) ? 0 : kv->second;
}
void put(size_t index, int value)
{
if (value == 0)
map_.erase(index);
else
map_.emplace(index, value);
}
// etc...
private:
std::unordered_map<size_t, int> map_;
};
In such a sparse vector class, you can overload operator[] if you wish to allow something like sparseVec[42] = 123.
Linear algebra libraries, such as Eigen or Boost.uBlas, already provide templates for sparse vectors and sparse matrices.
I'm a student working on a small project for an high performance computing course, hence efficiency it's a key issue.
Let say that I have a vector of N floats and I want to remove the smallest n elements and the biggest n elements. There are two simple ways of doing this:
A
sort in ascending order // O(NlogN)
remove the last n elements // O(1)
invert elements order // O(N)
remove the last n elements // O(1)
B
sort in ascending order // O(NlogN)
remove the last n elements // O(1)
remove the first n elements // O(N)
In A inverting the elements order require swapping all the elements, while in B removing the first n elements require moving all the others to occupy the positions left empty. Using std::remove would give the same problem.
If I could remove the first n elements for free then solution B would be cheaper. That should be easy to achieve, if instead of having a vector, i.e. an array with some empty space after vector::end(), I would have a container with some free space also before vector::begin().
So the question is: does exist already an array-like (i.e. contiguous memory, no linked lists) in some libraries (STL, Boost) that allows for O(1) inserting/removing on both sides of the array?
If not, do you think that there are better solutions than creating such a data structure?
Have you thought of using std::partition with a custom functor like the example below:
#include <iostream>
#include <vector>
#include <algorithm>
template<typename T>
class greaterLess {
T low;
T up;
public:
greaterLess(T const &l, T const &u) : low(l), up(u) {}
bool operator()(T const &e) { return !(e < low || e > up); }
};
int main()
{
std::vector<double> v{2.0, 1.2, 3.2, 0.3, 5.9, 6.0, 4.3};
auto it = std::partition(v.begin(), v.end(), greaterLess<double>(2.0, 5.0));
v.erase(it, v.end());
for(auto i : v) std::cout << i << " ";
std::cout << std::endl;
return 0;
}
This way you would erase elements from your vector in O(N) time.
Try boost::circular_buffer:
It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms.
Having looked at the source, it seems (and is only logical) that data is kept as a continuous memory block.
The one caveat is that the buffer has fixed capacity and after exhausting it elements will get overwritten. You can either detect such cases yourself and resize the buffer manually, or use boost::circular_buffer_space_optimized with a humongous declared capacity, since it won't allocate it if not needed.
To shrink & grow a vector at both ends, you can use idea of slices, reserving extra memory to expand into ahead of time at front and back, if efficient growth is needed.
Simply, make a class with not only a length but indices for first & last elements and a suitably sized vector, to create a window of data on the underlying block of stored floats. A C++ class can provide inlined functions, for things like deleting items, address into the array, find the nth largest value, shift the slice values down or up to insert new elements maintaining sorted order. Should no spare elements be available, then dynamic allocation of a new larger float store, permits continuing growth at the cost of an array copy.
A circular buffer is designed as a FIFO, with new elements added at end, removal at front, and not allowing insertion in the middle, a self defined class can also (trivially) support array subscript values different from 0..N-1
Due to memory locality, avoiding excessive indirection due to pointer chains, and the pipelining of subscript calculations on a modern processor, a solution based on an array (or a vector), is likely to be most efficicent, despite element copying on insertion. Deque would be suitable but it fails to guarantee contiguous storage.
Additional supplementary info. Researching classes providing slices, finds some plausible alternatives to evaluate :
A) std::slice which uses slice_arrays
B) Boost Class Range
Hope this is the kind of specific information you were hoping for, in general a simpler clearer solution is more maintainable, than a tricky one. I would expect slices and ranges on sorted data sets, being quite common, for example filtering experimental data where "outliers" are excluded as faulty readings.
I think a good solution, should actually be - O(NlogN), 2xO(1), with any binary searches O(logN +1) for filtering on outlying values, in place of deleting a fixed number of small or large values; it matters that the "O" is relatively fast to, sometimes an O(1) algorithmn can be in practice slower for practical values of N than an O(N) one.
as a complementary to #40two 's answer, before partitioning the array, you will need to find the partitioning pivot, which is you will need to find the nth smallest number, and the nth greatest number in an unsorted array.
There is a discussion on that in SO: How to find the kth largest number in unsorted array
There are several algorithms to solve this problem. Some are deterministic O(N) - on of them is a variation on finding the median (median of medians). There are some non-deterministic algorithms with O(N) average-case.
A good source book to find those algorithms is Introduction to algorithms.
Also in books like
So eventually, your code will run in an O(N) time
I am given
struct point
{
int x;
int y;
};
and the table of points:
point tab[MAX];
Program should return the minimal distance between the centers of gravity of any possible pair of subsets from tab. Subset can be any size (of course >=1 and < MAX).
I am obliged to write this program using recursion.
So my function will be int type because I have to return int.
I globally set variable min (because while doing recurssion I have to compare some values with this min)
int min = 0;
My function should for sure, take number of elements I add, sum of Y coordinates and sum of X coordinates.
int return_min_distance(int sY, int sX, int number, bool iftaken[])
I will be glad for any help further.
I thought about another table of bools which I pass as a parameter to determine if I took value or not from table. Still my problem is how to implement this, I do not know how to even start.
I think you need a function that can iterate through all subsets of the table, starting with either nothing or an existing iterator. The code then gets easy:
int min_distance = MAXINT;
SubsetIterator si1(0, tab);
while (si1.hasNext())
{
SubsetIterator si2(&si1, tab);
while (si2.hasNext())
{
int d = subsetDistance(tab, si1.subset(), si2.subset());
if (d < min_distance)
{
min_distance = d;
}
}
}
The SubsetIterators can be simple base-2 numbers capable of counting up to MAX, where a 1 bit indicates membership in the subset. Yes, it's a O(N^2) algorithm, but I think it has to be.
The trick is incorporating recursion. Sorry, I just don't see how it helps here. If I can think of a way to use it, I'll edit my answer.
Update: I thought about this some more, and while I still can't see a use for recursion, I found a way to make the subset processing easier. Rather than run through the entire table for every distance computation, the SubsetIterators could store precomputed sums of the x and y values for easy distance computation. Then, on every iteration, you subtract the values that are leaving the subset and add the values that are joining. A simple bit-and operation can reveal these. To be even more efficient, you could use gray coding instead of two's complement to store the membership bitmap. This would guarantee that at each iteration exactly one value enters and/or leaves the subset. Minimal work.
I have an unsorted vector of eigenvalues and a related matrix of eigenvectors. I'd like to sort the columns of the matrix with respect to the sorted set of eigenvalues. (e.g., if eigenvalue[3] moves to eigenvalue[2], I want column 3 of the eigenvector matrix to move over to column 2.)
I know I can sort the eigenvalues in O(N log N) via std::sort. Without rolling my own sorting algorithm, how do I make sure the matrix's columns (the associated eigenvectors) follow along with their eigenvalues as the latter are sorted?
Typically just create a structure something like this:
struct eigen {
int value;
double *vector;
bool operator<(eigen const &other) const {
return value < other.value;
}
};
Alternatively, just put the eigenvalue/eigenvector into an std::pair -- though I'd prefer eigen.value and eigen.vector over something.first and something.second.
I've done this a number of times in different situations. Rather than sorting the array, just create a new array that has the sorted indices in it.
For example, you have a length n array (vector) evals, and a 2d nxn array evects. Create a new array index that has contains the values [0, n-1].
Then rather than accessing evals as evals[i], you access it as evals[index[i]] and instead of evects[i][j], you access it evects[index[i]][j].
Now you write your sort routine to sort the index array rather than the evals array, so instead of index looking like {0, 1, 2, ... , n-1}, the value in the index array will be in increasing order of the values in the evals array.
So after sorting, if you do this:
for (int i=0;i<n;++i)
{
cout << evals[index[i]] << endl;
}
you'll get a sorted list of evals.
this way you can sort anything that's associated with that evals array without actually moving memory around. This is important when n gets large, you don't want to be moving around the columns of the evects matrix.
basically the i'th smallest eval will be located at index[i] and that corresponds to the index[i]th evect.
Edited to add. Here's a sort function that I've written to work with std::sort to do what I just said:
template <class DataType, class IndexType>
class SortIndicesInc
{
protected:
DataType* mData;
public:
SortIndicesInc(DataType* Data) : mData(Data) {}
Bool operator()(const IndexType& i, const IndexType& j) const
{
return mData[i]<mData[j];
}
};
The solution purely relies on the way you store your eigenvector matrix.
The best performance while sorting will be achieved if you can implement swap(evector1, evector2) so that it only rebinds the pointers and the real data is left unchanged.
This could be done using something like double* or probably something more complicated, depends on your matrix implementation.
If done this way, swap(...) wouldn't affect your sorting operation performance.
The idea of conglomerating your vector and matrix is probably the best way to do it in C++. I am thinking about how I would do it in R and seeing if that can be translated to C++. In R it's very easy, simply evec<-evec[,order(eval)]. Unfortunately, I don't know of any built in way to perform the order() operation in C++. Perhaps someone else does, in which case this could be done in a similar way.