Efficiently inserting column/row into a matrix stored in row/col-major vector in-place - c++

It's not hard to efficiently insert row or column into a matrix stored in a row-
or col-major (respectively) vector. The problem of inserting row into a col-major vector or column into a row-major vector is slightly more interesting.
For example, given a 2x3 matrix stored in row-major in vector:
1 2 3 <=> 1 2 3 4 5 6
4 5 6
and a column 7 8 that is inserted before after column 1 in the original matrix, we get:
1 7 2 3 <=> 1 7 2 3 4 8 5 6
4 8 5 6
[Inserting a row into a col-major vector is similar.]
The sample setup in C++:
auto m = 2; // #rows
auto n = 3; // #cols
// row-major vector
auto x = std::vector<double>{1,2,3,4,5,6};
auto const colIndex = 1;
auto const col = std::vector<double>{7,8};
// insert column {7,8} into the 2nd position
// =>{1,7,2,3,4,8,5,6}
There could be various options to achieve this algorithmically and in C++, but we're looking for the efficiency and scalability to large matrices and multiple inserts.
The first obvious option that I can think of is to use std::vector<double>::insert to insert new elements to the correct positions:
//option 1: insert in-place
x.reserve(m*(n+1));
for(auto i = 0; i < col.size(); i++)
x.insert(begin(x) + colIndex + i * (n + 1), col[i]);
, which is valid but extremely slow even for moderate data sizes because of the resizing and shifting on each iteration.
Another, more direct option is to create another vector, populate all the columns in the ranges [0,colIndex),colIndex,(colIndex,n+1], and swap it with the original vector:
// option 2: temp vec and swap
{
auto tmp = std::vector<double>(m*(n+1));
for(auto i = 0; i < m; i++)
{
for(auto j = 0; j < colIndex; j++)
tmp[j + i * (n + 1)] = x[j + i * n];
tmp[colIndex + i * (n + 1)] = col[i];
for(auto j = colIndex + 1; j < n + 1; j++)
tmp[j + i * (n + 1)] = x[(j - 1) + i * n];
}
std::swap(tmp, x);
};
This is much faster than the option 1, but requires extra space for the matrix copy and iterating over all elements.
Are there any other ways to achieve this that would beat the above in speed/space or both?
Example code on ideone: https://ideone.com/iXrPfF

This version is likely to be much faster, especially at scale, and could be the basis for further micro-optimization (if [and only if] really necessary):
// one-time reallocation of the vector to get space for the new column
x.resize(x.size() + col.size());
// we'll start shifting elements over from the right
double *from = &x[m * n];
const double *src = &col[m];
double *to = from + m;
size_t R = n - colIndex; // number of cols left of the insert
size_t L = colIndex; // number of cols right of the insert
while (to != &x[0]) {
for (size_t i = 0; i < R; ++i) *(--to) = *(--from);
*(--to) = *(--src); // insert value from new column
for (size_t i = 0; i < L; ++i) *(--to) = *(--from);
}
ideone
This doesn't require any temporary allocation and aside from possible micro-optimizations of the loop it's probably about as fast as it gets. To understand how it works, we can start by observing that the bottom-right element of the original matrix is being shifted m elements to the right in the source vector. Working backwards from the last element, at some point a value from the inserted column vector gets inserted, and subsequent elements from the source vector are now shifted m - 1 only elements to the right. Using that logic we simply construct a 3-phase loop that works from right to left on the source array. The loop iterates m times, once for each row. The three phases of the loop, corresponding to its three lines of code, are:
Shift row elements that are "to the right" of the insertion point.
Insert the row value from the new column.
Shift row elements that are "to the left" of the insertion point (shifting one less place than in phase 1).
There's also serious room for improvement in the naming of the variables, and the algorithm should certainly be encapsulated in its own function with proper input parameters. One possible signature would be:
void insert_column(std::vector<double>& matrix,
size_t rows, size_t columns, size_t insertBefore,
const std::vector<double>& column);
From here there's further room for improvement in making it generic using templates.
And from there, you might observe that the algorithm has possible application beyond matrices. What's really happening is that you're "zippering" two vectors together with a skip and an offset (i.e., starting at element i, insert an element from B into A after every n'th element).

so what I would go with is something like (completely untested (tm))
x.resize(x.size() + col.size());
for (size_t processed = 0; processed < col.size(); ++processed) {
// shift the elements for row n (starting at the end)
// to their new location
auto start = x.end()-(processed+1) * rowSize;
auto end = start + rowSize;
auto middle = end - (col.size()-processed);
std::rotate(start, middle, end);
// replace one of the default value items to be the new value
x[x.size()- rowSize*(1+processed)] = col[col.size()-processed-1];
}
The idea being that you go from
[1,2,3,4,5,6] & adding [a,b,c]
Resize:
[1,2,3,4,5,6,x,x,x]
First loop shift:
[1,2,3,4,x,x,x,5,6]
First loop replace
[1,2,3,4,x,x,c,5,6]
Second loop shift
[1,2,x,x,3,4,c,5,6]
and so on.
Since std::rotate is linear, and each item only ever gets moved once; this should also be linear.
This differs to your option #1 in that every time you inserted, you have to move everything afterwards; meaning that the last x elements are shifted col.size() times.

An alternate solution can be transpose followed by insertion and transpose again. However, the in-place transpose in non-trivial (https://en.wikipedia.org/wiki/In-place_matrix_transposition). See the implementation here https://stackoverflow.com/a/9320349

Related

Construct mirror vector around the centre element in c++

I have a for-loop that is constructing a vector with 101 elements, using (let's call it equation 1) for the first half of the vector, with the centre element using equation 2, and the latter half being a mirror of the first half.
Like so,
double fc = 0.25
const double PI = 3.1415926
// initialise vectors
int M = 50;
int N = 101;
std::vector<double> fltr;
fltr.resize(N);
std::vector<int> mArr;
mArr.resize(N);
// Creating vector mArr of 101 elements, going from -50 to +50
int count;
for(count = 0; count < N; count++)
mArr[count] = count - M;
// using these elements, enter in to equations to form vector 'fltr'
int n;
for(n = 0; n < M+1; n++)
// for elements 0 to 50 --> use equation 1
fltr[n] = (sin((fc*mArr[n])-M))/((mArr[n]-M)*PI);
// for element 51 --> use equation 2
fltr[M] = fc/PI;
This part of the code works fine and does what I expect, but for elements 52 to 101, I would like to mirror around element 51 (the output value using equation)
For a basic example;
1 2 3 4 5 6 0.2 6 5 4 3 2 1
This is what I have so far, but it just outputs 0's as the elements:
for(n = N; n > M; n--){
for(i = 0; n < M+1; i++)
fltr[n] = fltr[i];
}
I feel like there is an easier way to mirror part of a vector but I'm not sure how.
I would expect the values to plot like this:
After you have inserted the middle element, you can get a reverse iterator to the mid point and copy that range back into the vector through std::back_inserter. The vector is named vec in the example.
auto rbeg = vec.rbegin(), rend = vec.rend();
++rbeg;
copy(rbeg, rend, back_inserter(vec));
Lets look at your code:
for(n = N; n > M; n--)
for(i = 0; n < M+1; i++)
fltr[n] = fltr[i];
And lets make things shorter, N = 5, M = 3,
array is 1 2 3 0 0 and should become 1 2 3 2 1
We start your first outer loop with n = 3, pointing us to the first zero. Then, in the inner loop, we set i to 0 and call fltr[3] = fltr[0], leaving us with the array as
1 2 3 1 0
We could now continue, but it should be obvious that this first assignment was useless.
With this I want to give you a simple way how to go through your code and see what it actually does. You clearly had something different in mind. What should be clear is that we do need to assign every part of the second half once.
What your code does is for each value of n to change the value of fltr[n] M times, ending with setting it to fltr[M] in any case, regardless of what value n has. The result should be that all values in the second half of the array are now the same as the center, in my example it ends with
1 2 3 3 3
Note that there is also a direct error: starting with n = N and then accessing fltr[n]. N is out of bounds for an arry of size N.
To give you a very simple working solution:
for(int i=0; i<M; i++)
{
fltr[N-i-1] = fltr[i];
}
N-i-1 is the mirrored address of i (i = 0 -> N-i-1 = 101-0-1 = 100, last valid address in an array with 101 entries).
Now, I saw several guys answering with a more elaborate code, but I thought that as a beginner, it might be beneficial for you to do this in a very simple manner.
Other than that, as #Pzc already said in the comments, you could do this assignment in the loop where the data is generated.
Another thing, with your code
for(n = 0; n < M+1; n++)
// for elements 0 to 50 --> use equation 1
fltr[n] = (sin((fc*mArr[n])-M))/((mArr[n]-M)*PI);
// for element 51 --> use equation 2
fltr[M] = fc/PI;
I have two issues:
First, the indentation makes it look like fltr[M]=.. would be in the loop. Don't do that, not even if this should have been a mistake when you wrote the question and is not like this in the code. This will lead to errors in the future. Indentation is important. Using the auto-indentation of your IDE is an easy way to go. And try to use brackets, even if it is only one command.
Second, n < M+1 as a condition includes the center. The center is located at adress 50, and 50 < 50+1. You haven't seen any problem as after the loop you overwrite it, but in a different situation, this can easily produce errors.
There are other small things I'd change, and I recommend that, when your code works, you post it on CodeReview.
Let's use std::iota, std::transform, and std::copy instead of raw loops:
const double fc = 0.25;
constexpr double PI = 3.1415926;
const std::size_t M = 50;
const std::size_t N = 2 * M + 1;
std::vector<double> mArr(M);
std::iota(mArr.rbegin(), mArr.rend(), 1.); // = [M, M - 1, ..., 1]
const auto fn = [=](double m) { return std::sin((fc * m) + M) / ((m + M) * PI); };
std::vector<double> fltr(N);
std::transform(mArr.begin(), mArr.end(), fltr.begin(), fn);
fltr[M] = fc / PI;
std::copy(fltr.begin(), fltr.begin() + M, fltr.rbegin());

Why do we make n-1 iterations in bubble sort algorithm

Most common way of bubble sort algorithm is to have two for loops. Inner one being done from j=0 until j n-i-1. I assume we substract minus i, because when we reach last element we don't compare it because we don't have an element after him. But why do we use n-1. Why we don't run outer loop from i=0 until i < n and inner from j=0 until n-i? Could someone explain it to me, tutorials on internet does not emphasize this.
for (int i = 0; i < n - 1; i++) // Why do we have n-1 here?
{
swapped = false;
for (int j = 0; j < n - i - 1; j++)
{
countComparisons++;
if (arr[j] > arr[j + 1])
{
countSwaps++;
swap(&arr[j], &arr[j + 1]);
swapped = true;
}
}
}
For example, if I have an array with 6 elements, why do I only need to make 5 iterations?
Because a swap requires at least two elements.
So if you have 6 elements, you only need to consider 5 consecutive pairs.
For comparison purposes in an array, two adjacent cells are needed; in an array of 6 elements, you do 5 comparisons only; in an array of 10 elements, 9 comparisons, and so on:
array and comparisons between adjacent cells
So for 7 elements, just 6 comparisons are done, hence the general rule of n-1 in the outer for loop
About the n-1-i expression, remember that the highest (or lowest, depending on the ordering criterion) value in the bubble sort goes to the last position in the array after the first cycle, so there is no need to compare that value with anything else, therefore the array has to be "shortened" 1 cell at a time, and the value of i in the outer loop is the counter responsible for that in the inner loop:
5 | 3 | 9 | 20 | elements (n) = 4
after first cycle (i = 0), 20 has reached its correct position within the array (using an ascending order), leaving us with an array of 3 elements to do comparisons to; in next cycle, i will be equal to 1, and as n-1 remains the same, we need to substract 1 in that expression to "shorten" the array:
n-1-i = 4-1-1 = 2, which is the index of the last element in that new array as well as the quantity of comparisons needed.
Hope it helps!

Shifting elements in array by N elements

I have an array where each 'element' is composed of 4 consecutive values. Upon update I move the array by 4 values towards the end and insert 4 new values in the beginning.
Shift:
int m = 4;
for (int i = _vsize - 1; i + 1 - m != 0; i--){
_varray[i] = std::move(_varray[i - m]);
}
Insertion:
memcpy(&_varray[0], glm::value_ptr(new_element), 4 * sizeof(float));
where new_element is of type glm::vec4 containing said 4 new values.
Any suggestions on how to improve this?
(Right now Im only shifting by one element, but want the flexibility of being able to shift say 8 times, without having to put this in a loop)
Thank you.
You can try std::copy_backward. You want to copy a range of values to another range in the same container. Since the ranges overlap and you are copying to the right you can't use regular std::copy but must use std::copy_backward instead.
int m = 4; // make this a multiple of your 'element' size
std::copy_backward(&_varray[0], &_varray[_vsize - m], &_varray[_vsize]);
There is also std::move_backward but that doesn't really matter since your float values aren't movable.

Efficiently computing vector combinations

I'm working on a research problem out of curiosity, and I don't know how to program the logic that I've in mind. Let me explain it to you:
I've four vectors, say for example,
v1 = 1 1 1 1
v2 = 2 2 2 2
v3 = 3 3 3 3
v4 = 4 4 4 4
Now what I want to do is to add them combination-wise, that is,
v12 = v1+v2
v13 = v1+v3
v14 = v1+v4
v23 = v2+v3
v24 = v2+v4
v34 = v3+v4
Till this step it is just fine. The problem is now I want to add each of these vectors one vector from v1, v2, v3, v4 which it hasn't added before. For example:
v3 and v4 hasn't been added to v12, so I want to create v123 and v124. Similarly for all the vectors like,
v12 should become:
v123 = v12+v3
v124 = v12+v4
v13 should become:
v132 // This should not occur because I already have v123
v134
v14 should become:
v142 // Cannot occur because I've v124 already
v143 // Cannot occur
v23 should become:
v231 // Cannot occur
v234 ... and so on.
It is important that I do not do all at one step at the start. Like for example, I can do (4 choose 3) 4C3 and finish it off, but I want to do it step by step at each iteration.
How do I program this?
P.S.: I'm trying to work on an modified version of an apriori algorithm in data mining.
In C++, given the following routine:
template <typename Iterator>
inline bool next_combination(const Iterator first,
Iterator k,
const Iterator last)
{
/* Credits: Thomas Draper */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator itr1 = first;
Iterator itr2 = last;
++itr1;
if (last == itr1)
return false;
itr1 = last;
--itr1;
itr1 = k;
--itr2;
while (first != itr1)
{
if (*--itr1 < *itr2)
{
Iterator j = k;
while (!(*itr1 < *j)) ++j;
std::iter_swap(itr1,j);
++itr1;
++j;
itr2 = k;
std::rotate(itr1,j,last);
while (last != j)
{
++j;
++itr2;
}
std::rotate(k,itr2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
You can then proceed to do the following:
int main()
{
unsigned int vec_idx[] = {0,1,2,3,4};
const std::size_t vec_idx_size = sizeof(vec_idx) / sizeof(unsigned int);
{
// All unique combinations of two vectors, for example, 5C2
std::size_t k = 2;
do
{
std::cout << "Vector Indicies: ";
for (std::size_t i = 0; i < k; ++i)
{
std::cout << vec_idx[i] << " ";
}
}
while (next_combination(vec_idx,
vec_idx + k,
vec_idx + vec_idx_size));
}
std::sort(vec_idx,vec_idx + vec_idx_size);
{
// All unique combinations of three vectors, for example, 5C3
std::size_t k = 3;
do
{
std::cout << "Vector Indicies: ";
for (std::size_t i = 0; i < k; ++i)
{
std::cout << vec_idx[i] << " ";
}
}
while (next_combination(vec_idx,
vec_idx + k,
vec_idx + vec_idx_size));
}
return 0;
}
**Note 1:* Because of the iterator oriented interface for the next_combination routine, any STL container that supports forward iteration via iterators can also be used, such as std::vector, std::deque and std::list just to name a few.
Note 2: This problem is well suited for the application of memoization techniques. In this problem, you can create a map and fill it in with vector sums of given combinations. Prior to computing the sum of a given set of vectors, you can lookup to see if any subset of the sums have already been calculated and use those results. Though you're performing summation which is quite cheap and fast, if the calculation you were performing was to be far more complex and time consuming, this technique would definitely help bring about some major performance improvements.
I think this problem can be solved by marking which combination har occured.
My first thought is that you may use a 3-dimension array to mark what combination has happened. But that is not very good.
How about a bit-array (such as an integer) for flagging? Such as:
Num 1 = 2^0 for vector 1
Num 2 = 2^1 for vector 2
Num 4 = 2^2 for vector 3
Num 8 = 2^3 for vector 4
When you make a compose, just add all the representative number. For example, vector 124 will have the value: 1 + 2 + 8 = 11. This value is unique for every combination.
This is just my thought. Hope it helps you someway.
EDIT: Maybe I'm not be clear enough about my idea. I'll try to explain it a bit clearer:
1) Assign for each vector a representative number. This number is the id of a vector, and it's unique. Moreover, the sum of every sub-set of those number is unique, means that if we have sum of k representative number is M; we can easily know that which vectors take part in the sum.
We do that by assign: 2^0 for vector 1; 2^1 for vector 2; 2^2 for vector 3, and so on...
With every M = sum (2^x + 2^y + 2^z + ... ) = (2^x OR 2^y OR 2^z OR ...). We know that the vector (x + 1), (y + 1), (z +1) ... take part in the sum. This can easily be checked by express the number in binary mode.
For example, we know that:
2^0 = 1 (binary)
2^1 = 10 (binary)
2^2 = 100 (binary)
...
So that if we have the sum is 10010 (binary), we know that vector(number: 10) and vector(number: 10000) join in the sum.
And for the best, the sum here can be calculated by "OR" operator, which is also easily understood if you express the number in binary.
2) Utilizing the above facts, every time before you count the sum of your vector, you can add/OR their representative number first. And you can keep track them in something like a lookup array. If the sum already exists in the lookup array, you can omit it. By that you can solve the problem.
Maybe I am misunderstanding, but isn't this equivalent to generating all subsets (power set) of 1, 2, 3, 4 and then for each element of the power set, summing the vector? For instance:
//This is pseudo C++ since I'm too lazy to type everything
//push back the vectors or pointers to vectors, etc.
vector< vector< int > > v = v1..v4;
//Populate a vector with 1 to 4
vector< int > n = 1..4
//Function that generates the power set {nil, 1, (1,2), (1,3), (1,4), (1,2,3), etc.
vector< vector < int > > power_vec = generate_power_set(n);
//One might want to make a string key by doing a Perl-style join of the subset together by a comma or something...
map< vector < int >,vector< int > > results;
//For each subset, we sum the original vectors together
for subset_iter over power_vec{
vector<int> result;
//Assumes all the vecors same length, can be modified carefully if not.
result.reserve(length(v1));
for ii=0 to length(v1){
for iter over subset from subset_iter{
result[ii]+=v[iter][ii];
}
}
results[*subset_iter] = result;
}
If that is the idea you had in mind, you still need a power set function, but that code is easy to find if you search for power set. For example,
Obtaining a powerset of a set in Java.
Maintain a list of all for choosing two values.
Create a vector of sets such that the set consists of elements from the original vector with the 4C2 elements. Iterate over the original vectors and for each one, add/create a set with elements from step 1. Maintain a vector of sets and only if the set is not present, add the result to the vector.
Sum up the vector of sets you obtained in step 2.
But as you indicated, the easiest is 4C3.
Here is something written in Python. You can adopt it to C++
import itertools
l1 = ['v1','v2','v3','v4']
res = []
for e in itertools.combinations(l1,2):
res.append(e)
fin = []
for e in res:
for l in l1:
aset = set((e[0],e[1],l))
if aset not in fin and len(aset) == 3:
fin.append(aset)
print fin
This would result:
[set(['v1', 'v2', 'v3']), set(['v1', 'v2', 'v4']), set(['v1', 'v3', 'v4']), set(['v2', 'v3', 'v4'])]
This is the same result as 4C3.

All possible combinations of length 8 in a 2d array

I've been trying to solve a problem in combinations. I have a matrix 6X6 i'm trying to find all combinations of length 8 in the matrix.
I have to move from neighbor to neighbor form each row,column position and i wrote a recursive program which generates the combination but the problem is it generates a lot of duplicates as well and hence is inefficient. I would like to know how could i eliminate calculating duplicates and save time.
int a={{1,2,3,4,5,6},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
{2,3,4,5,6,7},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
}
void genSeq(int row,int col,int length,int combi)
{
if(length==8)
{
printf("%d\n",combi);
return;
}
combi = (combi * 10) + a[row][col];
if((row-1)>=0)
genSeq(row-1,col,length+1,combi);
if((col-1)>=0)
genSeq(row,col-1,length+1,combi);
if((row+1)<6)
genSeq(row+1,col,length+1,combi);
if((col+1)<6)
genSeq(row,col+1,length+1,combi);
if((row+1)<6&&(col+1)<6)
genSeq(row+1,col+1,length+1,combi);
if((row-1)>=0&&(col+1)<6)
genSeq(row-1,col+1,length+1,combi);
if((row+1)<6&&(row-1)>=0)
genSeq(row+1,col-1,length+1,combi);
if((row-1)>=0&&(col-1)>=0)
genSeq(row-1,col-1,length+1,combi);
}
I was also thinking of writing a dynamic program basically recursion with memorization. Is it a better choice?? if yes than I'm not clear how to implement it in recursion. Have i really hit a dead end with approach???
Thankyou
Edit
Eg result
12121212,12121218,12121219,12121211,12121213.
the restrictions are that you have to move to your neighbor from any point, you have to start for each position in the matrix i.e each row,col. you can move one step at a time, i.e right, left, up, down and the both diagonal positions. Check the if conditions.
i.e
if your in (0,0) you can move to either (1,0) or (1,1) or (0,1) i.e three neighbors.
if your in (2,2) you can move to eight neighbors.
so on...
To eliminate duplicates you can covert 8 digit sequences into 8-digit integers and put them in a hashtable.
Memoization might be a good idea. You can memoize for each cell in the matrix all possible combinations of length 2-7 that can be achieved from it. Going backwards: first generate for each cell all sequences of 2 digits. Then based on that of 3 digits etc.
UPDATE: code in Python
# original matrix
lst = [
[1,2,3,4,5,6],
[8,9,1,2,3,4],
[5,6,7,8,9,1],
[2,3,4,5,6,7],
[8,9,1,2,3,4],
[5,6,7,8,9,1]]
# working matrtix; wrk[i][j] contains a set of all possible paths of length k which can end in lst[i][j]
wrk = [[set() for i in range(6)] for j in range(6)]
# for the first (0rh) iteration initialize with single step paths
for i in range(0, 6):
for j in range(0, 6):
wrk[i][j].add(lst[i][j])
# run iterations 1 through 7
for k in range(1,8):
# create new emtpy wrk matrix for the next iteration
nw = [[set() for i in range(6)] for j in range(6)]
for i in range(0, 6):
for j in range(0, 6):
# the next gen. wrk[i][j] is going to be based on the current wrk paths of its neighbors
ns = set()
if i > 0:
for p in wrk[i-1][j]:
ns.add(10**k * lst[i][j] + p)
if i < 5:
for p in wrk[i+1][j]:
ns.add(10**k * lst[i][j] + p)
if j > 0:
for p in wrk[i][j-1]:
ns.add(10**k * lst[i][j] + p)
if j < 5:
for p in wrk[i][j+1]:
ns.add(10**k * lst[i][j] + p)
nw[i][j] = ns
wrk = nw
# now build final set to eliminate duplicates
result = set()
for i in range(0, 6):
for j in range(0, 6):
result |= wrk[i][j]
print len(result)
print result
There are LOTS of ways to do this. Going through every combination is a perfectly reasonable first approach. It all depends on your requirements. If your matrix is small, and this operation isn't time sensitive, then there's no problem.
I'm not really an algorithms guy, but I'm sure there are really clever ways of doing this that someone will post after me.
Also, in Java when using CamelCase, method names should start with a lowercase character.
int a={{1,2,3,4,5,6},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
{2,3,4,5,6,7},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
}
By length you mean summation of combination of matrix elements resulting 8. i.e., elements to sum up 8 with in row itself and with the other row elements. From row 1 = { {2,6}, {3,5}, } and now row 1 elements with row 2 and so on. Is that what you are expecting ?
You can think about your matrix like it is one-dimension array - no matter here ("place" the rows one by one). For one-dimension array you can write a function like (assuming you should print the combinations)
f(i, n) prints all combinations of length n using elements a[i] ... a[last].
It should skip some elements from a[i] to a[i + k] (for all possible k), print a[k] and make a recursive call f(i + k + 1, n - 1).