C++ multidimensional arrays - c++

i was thinkg about writing a code that creates a pascal triangle. I 've done it but then i thought about doing it better. One idea came to my mind but i couldnt find a proper answer for it. Is it possible to create an array which will be look like that?
[1]|[1][1]|[1][2][1]|[1][3][3][1]|[1][4][6][4][1]| and so on? so my [1] would be (0,0) and [1][2][1] would be elements of cells(2,0),(2,1),(2,2). I would be grateful for any advise.

You can implement triangle array through a single-dimension array. Fixed-size array may look like this:
template<typename T, size_t N>
struct TriangleArray {
T& element(size_t i, size_t j)
{
if (i >= N || j >= N || i < j)
throw std::out_of_range("incorrect index");
return container[(i + 1) * i / 2 + j];
}
private:
T container[(N + 1) * N / 2];
};

No it's not possible. In an array, all the element must have the same type. Two dimensional arrays are arrays of arrays. That means that for a multidimensional array, all the line must have the same length. You should probably use a
std::vector<std::vector<int> >
here. Or a one dimensional array and and the logic to compute the 1 dim position from the 2 dim index:
index = row*(row+1)/2 + column.
See iterate matrix without nested loop if you want the reverse indexing.
Edit: fixed my formula which was off by one. Here is a check in Python:
The following index function takes row, col and compute the corresponding index in a one dimensional array using my formula:
>>> index = lambda row, col: row*(row+1)/2 + col
Here are the coordinate pairs
>>> [[(i,j) for j in range(i+1)] for i in range(5)]
[[(0, 0)],
[(1, 0), (1, 1)],
[(2, 0), (2, 1), (2, 2)],
[(3, 0), (3, 1), (3, 2), (3, 3)],
[(4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]]
I'm now checking that the corresponding index are the sequence of integer starting from 0 (indentation of the printing is mine):
>>> [[index(i,j) for j in range(i+1)] for i in range(5)]
[[0],
[1, 2],
[3, 4, 5],
[6, 7, 8, 9],
[10, 11, 12, 13, 14]]

The nicest thing would be to wrap the whole thing in a class called PascalTriangle and implement it along the following lines:
class PascalTriangle
{
private:
std::vector<std::vector<int> > m_data;
std::vector<int> CalculateRow(int row_index) const
{
// left as an exercise :)
}
public:
PascalTriangle(int num_rows) :
m_data()
{
assert(num_rows >= 0);
for (int row_index = 0; row_index < num_rows; ++row_index)
{
m_data.push_back(CalculateRow(row_index));
}
}
int operator()(int row_index, int column_index) const
{
assert(row_index >= 0 && row_index < m_data.size());
assert(column_index >= 0 && column_index < row_index);
return m_data[row_index][column_index];
}
};
Now here comes the catch: this approach allows you to perform lazy evaluation. Consider the following case: you might not always need each and every value. For example, you may only be interested in the 5th row. Then why store the other, unused values?
Based on this idea, here's an advanced version of the previous class:
class PascalTriangle
{
private:
int m_num_rows;
std::vector<int> CalculateRow(int row_index) const
{
// left as an exercise :)
}
public:
PascalTriangle(int num_rows) :
m_num_rows(num_rows)
{
assert(num_rows >= 0);
// nothing is done here!
}
int operator()(int row_index, int column_index) const
{
assert(row_index >= 0 && row_index < m_num_rows);
assert(column_index >= 0 && column_index < row_index);
return CalculateRow(row_index)[column_index];
}
};
Notice that the public interface of the class remains exactly the same, yet its internals are completely different. Such are the advantages of proper encapsulation. You effectively centralise error handling and optimisation points.
I hope these ideas inspire you to think more about the operations you want to perform with your Pascal triangle, because they will dictate the most appropriate data structure.
Edit: by request, here are some more explanations:
In the first version, m_data is a vector of vectors. Each contained std::vector<int> represents a row in the triangle.
The operator() function is a syntactical helper, allowing you to access PascalTriangle objects like this:
PascalTriangle my_triangle(10);
int i = my_triangle(3, 2);
assert makes sure that your code does not operate on illegal values, e.g. a negative row count or a row index greater than the triangle. But this is just one possible error reporting mechanism. You could also use exceptions, or error return values, or the Fallible idiom (std::optional). See past Stackoverflow questions for which error reporting mechanism to use when. This is a pure software-engineering aspect and has nothing to do with maths, but as you can imagine, it's, well, very important in software :)
CalculateRow returns a std::vector<int> representing the row specified by row_index. To implement it correctly, you'll need some maths. This is what I just found on Google: http://www.mathsisfun.com/pascals-triangle.html
In order to apply the maths, you'll want to know how to calculate n! in C++. There have been a lot of past Stackoverflow questions on this, for example here: Calculating large factorials in C++
Note that with the class approach, you can easily switch to another implementation later on. (You can even take it to the extreme and switch to a specific calculation algorithm based on the triangle height, without the users of the class ever noticing anything! See how powerful proper encapsulation can be?)
In the second version of the class, there is no permanent data storage anymore. CalculateRow is called only if and when needed, but the client of the class doesn't know this. As an additional possibly performance-improving measure, you could remember rows which you already calculated, for example by adding a private std::map<int, std::vector<int> > member variable whose int key represents the row index and whose values the rows. Every CalculateRow call would then first look if the result is already there, and add calculated ones at the end:
private mutable std::map<int, std::vector<int> > m_cache;
std::vector<int> CalculateRow(int row_index) const
{
// find the element at row_index:
std::map<int, std::vector<int> >::const_iterator cache_iter =
m_cache.find(row_index);
// is it there?
if (cache_iter != m_cache.end())
{
// return its value, no need to calculate it again:
return cache_iter->second;
}
// actual calculation of result left as an exercise :)
m_cache[row_index] = result;
return result;
}
By the way, this would also be a nice application of the new C++11 auto keyword. For example, you'd then just write auto cache_iter = m_cache.find(row_index);
And here's for another edit: I made m_cache mutable, because otherwise the thing wouldn't compile, as CalculateRow is a const member function (i.e. shouldn't change an object of the class from the client's point of view). This is a typical idiom for cache member variables.

Related

Correct datastructure for user specified integer mapping into rows of a matrix

I have a C++ class that manipulates an NxM matrix. The rows individually are meaningful, but the C++ contiguous indexing [0,1,2,...,N-1] is not. The users find it preferable to choose an indexing which has meaning to them, e.g., for a 3 row matrix, the user may wish to have the integer -3 label row zero, -1 label row 1, and 3 label row 2.
I may assume that 1) the labels are integers, and 2) the labels are monotonically increasing, and 3) the number of rows is not huge. I may not assume the labels are continuous, or even gapped with even spacing. The pseudocode is below:
template<typename T>
class Foo {
public:
Foo(std::vector<int> labels, int columns) {
m_.resize(labels.size()*columns);
}
void update(int label, T value) {
// map label to index, update the entry in the matrix:
int idx = ...;
m_[idx] = value;
}
std::vector<T> get_row(int label) {
// Map label
}
private:
// A matrix:
std::vector<T> m_;
// What datastructure should I use here?
SomeDataStructure label_to_row_;
};
The call to update must be extremely fast. What is the best datastructure to use to quickly map the label to the row of the matrix?
Theoretically speaking, hash maps are the fastest containers for what you're trying to achieve (with O(1)) complexity. But in practice, there are a couple of things you can do.
First of all, you can have multiple implementations using different data structures and choose to return one of these based on the given indices at runtime (using abstract classes or other similar ways). You can do this on the structures I propose below and choose one at runtime.
If you know that the range of data is small (or you can detect it at runtime), Then the problem is easy. Just create a vector that has the same size as the range of data and set the ordered index in this vector:
std::vector<int> indices = {/*data*/};
auto minmax = std::minmax_element(indices.begin(), indices.end());
int min = *minmax.first, max = *minmax.second, range = max - min;
std::vector<int> index_map(range);
for (size_t i = 0; i < indices.size(); ++i) index_map[indices[i] - min] = i;
I hope you got what I'm trying to say because I feel like I didn't explain it very well.
If your range of data is large but the minimum spacing between them is also larger than 1, then you can do the previous method with a small modification:
std::vector<int> indices = {/*data*/};
auto minmax = std::minmax_element(indices.begin(), indices.end());
int min = *minmax.first, max = *minmax.second, range = max - min;
// Assuming indices are sorted
int diff = std::numeric_limits<int>::max();
for (size_t i = 0; i < indices.size() - 1; ++i) diff = std::min(diff, indices[i] - indices[i + 1]);
// diff can't be zero
std::vector<int> index_map(range);
for (size_t i = 0; i < indices.size(); ++i) index_map[(indices[i] - min) / diff] = i;
Here we find the minimum spacing between indices and divide by that.
Use an optimized 3rd party map that is optimized further (using vectorization, multi-threading, and other methods) like these.
Maybe you can try to use a weaker but faster hash function since the number of indices are not large.
I'll add to the list if I think of anything else.

Vectorize a Symmetric Matrix

I would like to write a function with the following signature
VectorXd vectorize (const MatrixXd&);
which returns the contents of a symmetric matrix in VectorXd form, without repeated elements. For example,
int n = 3; // n may be much larger in practice.
MatrixXd sym(n, n);
sym << 9, 2, 3,
2, 8, 4,
3, 4, 7;
std::cout << vectorize(sym) << std::endl;
should return:
9
2
3
8
4
7
The order of elements within vec is not important, provided it is systematic. What is important for my purposes is to return the data of sym without the repeated elements, because sym is always assumed to be symmetric. That is, I want to return the elements of the upper or lower triangular "view" of sym in VectorXd form.
I have naively implemented vectorize with nested for loops, but this function may be called very often within my program (over 1 million times). My question is thus: what is the most computationally efficient way to write vectorize? I was hoping to use Eigen's triangularView, but I do not see how.
Thank you in advance.
Regarding efficiency, you could write a single for loop with column-wise (and thus vectorized) copies:
VectorXd res(mat.rows()*(mat.cols()+1)/2);
Index size = mat.rows();
Index offset = 0;
for(Index j=0; j<mat.cols(); ++j) {
res.segment(offset,size) = mat.col(j).tail(size);
offset += size;
size--;
}
In practice, I expect that the compiler already fully vectorized your nested loop, and thus speed should be roughly the same.

C++ sort with 'tweaked' compare functor

I have a class functor (too complex to implement as a lambda), but to strip the idea down, I want to ensure the functor satisfies the Compare predicate. The issue is, I want all values larger than (1) to yield ascending order, but to place all values of (1) at the 'end' - e.g., treated as 'larger' values.
e.g., {2, 2, 2, 3, 3, 3, 4, 5, 6, 6, ..., 1, 1, 1}
The function object is implemented as a struct to extract arguments from a (complicated) object reference it is constructed with, but the important part is the method in the function object. To simplify:
bool operator () (unsigned i, unsigned j)
{
if (i == 1) return false; // (1 >= x)
if (j == 1) return true; // (x <= 1)
return (i < j);
}
This appears to work as expected with std::sort and std::stable_sort. But, I'm still not convinced it correctly satisfies the criteria for Compare, in terms of strict weak ordering. Note that x <= 1 in all cases - that is, for: i, j >= 1. Clearly, (1, 1) => false
Is my 'tweaked' functor correct, even as it places values of (1) at the end? That is (1) has been handled to be interpreted as greater than values x > 1? Or have I just been lucky with my sort implementations?
As I should have clarified, the value (0) does not occur. I originally had this in a comment for the (very clever) accepted answer but mistakenly deleted it.
If you can define a bijective operation in which the comparison is total/weak order then you are fine.
It turns our that for your type (unsigned) this is simply -=2/+=2
bool operator()(unsigned i, unsigned j) const{
return (i-2) < (j-2); // usigned will wrap around 0
}
Well, that also depends what you want to do with zero.
This relies in 1 - 2 == std::numeric_limits<unsigned>::max() so when you "compare" e.g. 1 with x you get std::numeric_limits<unsigned>::max() < x - 2 which is false, even if x is also 1 (it will be true for 0 if there is such).

How to get intersection of two Arrays

I have two integer arrays
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
And i have to get intersection of these array.
i.e. output will be - 2,6,7
I am thinking to sove it by saving array A in a data strcture and then i want to compare all the element till size A or B and then i will get intersection.
Now i have a problem i need to first store the element of Array A in a container.
shall i follow like -
int size = sizeof(A)/sizeof(int);
To get the size but by doing this i will get size after that i want to access all the elemts too and store in a container.
Here i the code which i am using to find Intersection ->
#include"iostream"
using namespace std;
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
int main()
{
int sizeA = sizeof(A)/sizeof(int);
int sizeB = sizeof(B)/sizeof(int);
int big = (sizeA > sizeB) ? sizeA : sizeB;
int small = (sizeA > sizeB) ? sizeB : sizeA;
for (int i = 0; i <big ;++i)
{
for (int j = 0; j <small ; ++j)
{
if(A[i] == B[j])
{
cout<<"Element is -->"<<A[i]<<endl;
}
}
}
return 0;
}
Just use a hash table:
#include <unordered_set> // needs C++11 or TR1
// ...
unordered_set<int> setOfA(A, A + sizeA);
Then you can just check for every element in B, whether it's also in A:
for (int i = 0; i < sizeB; ++i) {
if (setOfA.find(B[i]) != setOfA.end()) {
cout << B[i] << endl;
}
}
Runtime is expected O(sizeA + sizeB).
You can sort the two arrays
sort(A, A+sizeA);
sort(B, B+sizeB);
and use a merge-like algorithm to find their intersection:
#include <vector>
...
std::vector<int> intersection;
int idA=0, idB=0;
while(idA < sizeA && idB < sizeB) {
if (A[idA] < B[idB]) idA ++;
else if (B[idB] < A[idA]) idB ++;
else { // => A[idA] = B[idB], we have a common element
intersection.push_back(A[idA]);
idA ++;
idB ++;
}
}
The time complexity of this part of the code is linear. However, due to the sorting of the arrays, the overall complexity becomes O(n * log n), where n = max(sizeA, sizeB).
The additional memory required for this algorithm is optimal (equal to the size of the intersection).
saving array A in a data strcture
Arrays are data structures; there's no need to save A into one.
i want to compare all the element till size A or B and then i will get intersection
This is extremely vague but isn't likely to yield the intersection; notice that you must examine every element in both A and B but "till size A or B" will ignore elements.
What approach i should follow to get size of an unkown size array and store it in a container??
It isn't possible to deal with arrays of unknown size in C unless they have some end-of-array sentinel that allows counting the number of elements (as is the case with NUL-terminated character arrays, commonly referred to in C as "strings"). However, the sizes of your arrays are known because their compile-time sizes are known. You can calculate the number of elements in such arrays with a macro:
#define ARRAY_ELEMENT_COUNT(a) (sizeof(a)/sizeof *(a))
...
int *ptr = new sizeof(A);
[Your question was originally tagged [C], and my comments below refer to that]
This isn't valid C -- new is a C++ keyword.
If you wanted to make copies of your arrays, you could simply do it with, e.g.,
int Acopy[ARRAY_ELEMENT_COUNT(A)];
memcpy(Acopy, A, sizeof A);
or, if for some reason you want to put the copy on the heap,
int* pa = malloc(sizeof A);
if (!pa) /* handle out-of-memory */
memcpy(pa, A, sizeof A);
/* After you're done using pa: */
free(pa);
[In C++ you would used new and delete]
However, there's no need to make copies of your arrays in order to find the intersection, unless you need to sort them (see below) but also need to preserve the original order.
There are a few ways to find the intersection of two arrays. If the values fall within the range of 0-63, you can use two unsigned longs and set the bits corresponding to the values in each array, then use & (bitwise "and") to find the intersection. If the values aren't in that range but the difference between the largest and smallest is < 64, you can use the same method but subtract the smallest value from each value to get the bit number. If the range is not that small but the number of distinct values is <= 64, you can maintain a lookup table (array, binary tree, hash table, etc.) that maps the values to bit numbers and a 64-element array that maps bit numbers back to values.
If your arrays may contain more than 64 distinct values, there are two effective approaches:
1) Sort each array and then compare them element by element to find the common values -- this algorithm resembles a merge sort.
2) Insert the elements of one array into a fast lookup table (hash table, balanced binary tree, etc.), and then look up each element of the other array in the lookup table.
Sort both arrays (e.g., qsort()) and then walk through both arrays one element at a time.
Where there is a match, add it to a third array, which is sized to match the larger of the two input arrays (your result array can be no larger than the largest of the two arrays). Use a negative or other "dummy" value as your terminator.
When walking through input arrays, where one value in the first array is larger than the other, move the index of the second array, and vice versa.
When you're done walking through both arrays, your third array has your answer, up to the terminator value.

Best way to extract a subvector from a vector?

Suppose I have a std::vector (let's call it myVec) of size N. What's the simplest way to construct a new vector consisting of a copy of elements X through Y, where 0 <= X <= Y <= N-1? For example, myVec [100000] through myVec [100999] in a vector of size 150000.
If this cannot be done efficiently with a vector, is there another STL datatype that I should use instead?
vector<T>::const_iterator first = myVec.begin() + 100000;
vector<T>::const_iterator last = myVec.begin() + 101000;
vector<T> newVec(first, last);
It's an O(N) operation to construct the new vector, but there isn't really a better way.
Just use the vector constructor.
std::vector<int> data();
// Load Z elements into data so that Z > Y > X
std::vector<int> sub(&data[100000],&data[101000]);
This discussion is pretty old, but the simplest one isn't mentioned yet, with list-initialization:
vector<int> subvector = {big_vector.begin() + 3, big_vector.end() - 2};
It requires c++11 or above.
Example usage:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main(){
vector<int> big_vector = {5,12,4,6,7,8,9,9,31,1,1,5,76,78,8};
vector<int> subvector = {big_vector.begin() + 3, big_vector.end() - 2};
cout << "Big vector: ";
for_each(big_vector.begin(), big_vector.end(),[](int number){cout << number << ";";});
cout << endl << "Subvector: ";
for_each(subvector.begin(), subvector.end(),[](int number){cout << number << ";";});
cout << endl;
}
Result:
Big vector: 5;12;4;6;7;8;9;9;31;1;1;5;76;78;8;
Subvector: 6;7;8;9;9;31;1;1;5;76;
std::vector<T>(input_iterator, input_iterator), in your case foo = std::vector<T>(myVec.begin () + 100000, myVec.begin () + 150000);, see for example here
These days, we use spans! So you would write:
#include <gsl/span>
...
auto start_pos = 100000;
auto length = 1000;
auto span_of_myvec = gsl::make_span(myvec);
auto my_subspan = span_of_myvec.subspan(start_pos, length);
to get a span of 1000 elements of the same type as myvec's. Or a more terse form:
auto my_subspan = gsl::make_span(myvec).subspan(1000000, 1000);
(but I don't like this as much, since the meaning of each numeric argument is not entirely clear; and it gets worse if the length and start_pos are of the same order of magnitude.)
Anyway, remember that this is not a copy, it's just a view of the data in the vector, so be careful. If you want an actual copy, you could do:
std::vector<T> new_vec(my_subspan.cbegin(), my_subspan.cend());
Notes:
gsl stands for Guidelines Support Library. For more information about gsl, see: http://www.modernescpp.com/index.php/c-core-guideline-the-guidelines-support-library.
There are several gsl implementations . For example: https://github.com/martinmoene/gsl-lite
C++20 provides an implementation of span. You would use std::span and #include <span> rather than #include <gsl/span>.
For more information about spans, see: What is a "span" and when should I use one?
std::vector has a gazillion constructors, it's super-easy to fall into one you didn't intend to use, so be careful.
If both are not going to be modified (no adding/deleting items - modifying existing ones is fine as long as you pay heed to threading issues), you can simply pass around data.begin() + 100000 and data.begin() + 101000, and pretend that they are the begin() and end() of a smaller vector.
Or, since vector storage is guaranteed to be contiguous, you can simply pass around a 1000 item array:
T *arrayOfT = &data[0] + 100000;
size_t arrayOfTLength = 1000;
Both these techniques take constant time, but require that the length of data doesn't increase, triggering a reallocation.
You didn't mention what type std::vector<...> myVec is, but if it's a simple type or struct/class that doesn't include pointers, and you want the best efficiency, then you can do a direct memory copy (which I think will be faster than the other answers provided). Here is a general example for std::vector<type> myVec where type in this case is int:
typedef int type; //choose your custom type/struct/class
int iFirst = 100000; //first index to copy
int iLast = 101000; //last index + 1
int iLen = iLast - iFirst;
std::vector<type> newVec;
newVec.resize(iLen); //pre-allocate the space needed to write the data directly
memcpy(&newVec[0], &myVec[iFirst], iLen*sizeof(type)); //write directly to destination buffer from source buffer
You could just use insert
vector<type> myVec { n_elements };
vector<type> newVec;
newVec.insert(newVec.begin(), myVec.begin() + X, myVec.begin() + Y);
You can use STL copy with O(M) performance when M is the size of the subvector.
Suppose there are two vectors.
vector<int> vect1{1, 2, 3, 4};
vector<int> vect2;
Method 1. Using copy function. copy(first_iterator_index, last_iterator_index, back_inserter()) :- This function takes 3 arguments, firstly, the first iterator of old vector. Secondly, the last iterator of old vector and third is back_inserter function to insert values from back.
// Copying vector by copy function
copy(vect1.begin(), vect1.end(), back_inserter(vect2));
Method 2. By using Assign Function. assign(first_iterator_o, last_iterator_o). This method assigns the same values to new vector as old one. This takes 2 arguments, first iterator to old vector and last iterator to old vector.
//Copying vector by assign function
vect2.assign(vect1.begin(), vect1.end());
The only way to project a collection that is not linear time is to do so lazily, where the resulting "vector" is actually a subtype which delegates to the original collection. For example, Scala's List#subseq method create a sub-sequence in constant time. However, this only works if the collection is immutable and if the underlying language sports garbage collection.
Maybe the array_view/span in the GSL library is a good option.
Here is also a single file implementation: array_view.
Copy elements from one vector to another easily
In this example, I am using a vector of pairs to make it easy to understand
`
vector<pair<int, int> > v(n);
//we want half of elements in vector a and another half in vector b
vector<pair<lli, lli> > a(v.begin(),v.begin()+n/2);
vector<pair<lli, lli> > b(v.begin()+n/2, v.end());
//if v = [(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
//then a = [(1, 2), (2, 3)]
//and b = [(3, 4), (4, 5), (5, 6)]
//if v = [(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
//then a = [(1, 2), (2, 3), (3, 4)]
//and b = [(4, 5), (5, 6), (6, 7)]
'
As you can see you can easily copy elements from one vector to another, if you want to copy elements from index 10 to 16 for example then we would use
vector<pair<int, int> > a(v.begin()+10, v.begin+16);
and if you want elements from index 10 to some index from end, then in that case
vector<pair<int, int> > a(v.begin()+10, v.end()-5);
hope this helps, just remember in the last case v.end()-5 > v.begin()+10
Yet another option:
Useful for instance when moving between a thrust::device_vector and a thrust::host_vector, where you cannot use the constructor.
std::vector<T> newVector;
newVector.reserve(1000);
std::copy_n(&vec[100000], 1000, std::back_inserter(newVector));
Should also be complexity O(N)
You could combine this with top anwer code
vector<T>::const_iterator first = myVec.begin() + 100000;
vector<T>::const_iterator last = myVec.begin() + 101000;
std::copy(first, last, std::back_inserter(newVector));
vector::assign may be another solution
// note: size1 < src.size() && size2 < src.size()
std::vector<int> sub1(size1), sub2(size2);
sub1.assign(src.begin(), src.begin() + size1);
sub2.assign(src.begin(), src.begin() + size2);
Posting this late just for others..I bet the first coder is done by now.
For simple datatypes no copy is needed, just revert to good old C code methods.
std::vector <int> myVec;
int *p;
// Add some data here and set start, then
p=myVec.data()+start;
Then pass the pointer p and a len to anything needing a subvector.
notelen must be!! len < myVec.size()-start