"Normalize" a 2D Vector in C++ using lambda - c++

I am implementing a lambda to row normalize a 2D vector in C++. Consider the simple case of a 3x3 matrix.
1 0 1
0 1 0
0 1 1
My normalization factor is the sum of non-zero entries in the row. Each entry is then divided by this normalization factor. For instance, row 1 has 2 non-zero entries summing up 2. Therefore, I divide each entry by 2. The row normalized vector is defined as follows:
1/2 0 1/2
0 1 0
0 1/2 1/2
The relevant normalization code is shown here(note MAX_SIZE = 3). There is a syntactical error in the lambda capture list.
for(int i = 0; i < MAX_SIZE ; i++)
{
transform(matrix[i].begin(),matrix[i].end(),matrix.begin(), [matrix[i].begin()](int x){
return distance(matrix[i].begin(),lower_bound(matrix[i].begin(),matrix[i].end(),x))});
}
Am I missing anything here?

A lambda capture list in C++ can only specify the names of values to capture, and matrix[i].begin() is not a name, it is a temporary value. You can either give it a name or you can make a variable for it in the enclosing scope. Much of the surrounding code is missing, so I invented a working version of the code for you to dissect:
#include <algorithm>
#include <cstdio>
template<int N>
void normalize(double (&mat)[N][N]) {
std::for_each(std::begin(mat), std::end(mat),
[](double (&row)[N]) {
double sum = std::accumulate(std::begin(row), std::end(row), 0.0);
std::transform(std::begin(row), std::end(row), std::begin(row),
[sum](double x) { return x / sum; });
});
}
template<int N>
void print(const double (&mat)[N][N]) {
std::for_each(std::begin(mat), std::end(mat),
[](const double (&row)[N]) {
std::for_each(std::begin(row), std::end(row),
[](double x) { std::printf(" %3.1f", x); });
std::putchar('\n');
});
}
int main() {
double mat[3][3] = {
{ 1, 0, 1 },
{ 0, 1, 0 },
{ 0, 1, 1 },
};
std::puts("Matrix:");
print(mat);
normalize(mat);
std::puts("Normalized:");
print(mat);
return 0;
}
Here is the output:
Matrix:
1.0 0.0 1.0
0.0 1.0 0.0
0.0 1.0 1.0
Normalized:
0.5 0.0 0.5
0.0 1.0 0.0
0.0 0.5 0.5
This code is a bit weird, as far as C++ code goes, because it uses lambdas for everything instead of loops (or mixing for loops with higher-order-functions). But you can see that by having a variable for each row (named row) we can make it very easy to loop over that row instead of specifying matrix[i] everywhere.
The weird syntax for array parameters double (&mat)[N][N] is to avoid pointer decay, which allows us to use begin() and end() in the function body (which don't work if the parameters decay to pointers).

Related

C++ add values of a vector that might contain NaN values

I am a C++ noob.
What I am trying to do is sum the values of a vector of doubles (let's call it x) and ignore any values that are NaN. I tried to look this up, but I couldn't find anything specifically referencing what would happen if a vector contains any NaN values.
E.g.:
// let's say x = [1.0, 2.0, 3.0, nan, 4.0]
y = sum(x) // y should be equal to 10.0
Would the accumulate function work here? Or would it return NaN if x contains a NaN? Would a for loop work here with a condition to check for if the value is NaN (if yes, how do I check if NaN? In Python, the language I know best, this kind of check is not always straightforward).
std::isnan returns true if the passed floating point value is not a number. You have to add this check to all functions to avoid including NANs in your calculations. For example for sum:
constexpr auto sum(auto list) {
typename decltype(list)::value_type result = 0;
for (const auto& i : list) {
if (!std::isnan(i)) { // < - crucial check here
result += i;
}
}
return result;
}
Demo:
int main() {
auto list = std::array{ 1.0f, 2.0f, 3.0f, NAN };
std::cout << sum(list); //prints out 6
}
you could use std::accumulate with a custom summation operation.
const std::vector<double> myVector{1.0, 2.0, 3.0, std::nan("42"), 4.0};
auto nansum = [](const double a, const double b)
{
return a + (std::isnan(b) ? 0 : b);
}
auto mySum = std::accumulate(myVector.begin(), myVector.end(), 0.0, nansum);

Sorting in C++ of a sparse matrix in COO format

I use sparse matrices in COO format in my program. The COO format uses 3 separate vectors to represent the matrix: rowindex, colindex and values. I need to sort the matrix first by rowindex and then by colindex. For example, if the vectors contain:
rowindex = [1 2 2 1 0 2 0 1 0 2 1 2]
colindex = [7 7 2 1 3 9 8 6 6 0 3 4]
values = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2]
(meaning that element [1,7] in the matrix has a value of 0.1, element [2,7] has a value of 0.2, element [2,2] has a value of 0.3, etc) the matrix after sorting should be:
rowindex = [0 0 0 1 1 1 1 2 2 2 2 2]
colindex = [3 6 8 1 3 6 7 0 2 4 7 9]
values = [0.5 0.9 0.7 0.4 1.1 0.8 0.1 1.0 0.3 1.2 0.2 0.6]
I left some more spaces in the desired result to (hopefully) better show what I would like to achieve.
Can this be achieved somehow:
Using the available sort functions in C++
Without using additional memory (e.g. additional vectors), as the sparse matrices I use are huge and almost take up all memory
Without having to resort to representing the matrix as an array of structs (where I know that the sort() function can be used).
Some answers I found about sorting multiple vectors, perform sorting regarding values of only one of the vectors. They do not have the requirement to sort elements that have the same value in the first vector, according to the second vector.
Generally it is possible to sort sparse matrices given in COO. But not with your constraints.
Using the available sort functions in C++: Basically not possible, because existing sort functions from the std::libraries will only work on one range. Even with putting the other vectors in the predicates closure and come up with some complex lambda functions, or, outting everything in a Functor, it is not meaningful feasible.
Additional space would be required, would make the solution feasable and easy to solve.
same of constraint 3.
So, you need to make comprimises. Or use non standard C++ libraries.
After thinking more about the issue, I decided to follow a different path. Instead of reading the whole sparse matrix from the corresponding file and then sorting it, I now sort it while reading it. Each element read from the file is directly inserted into the correct position. For anyone interested, the part of the program that performs the sorting follows. Works correctly for the cases I have tested.
row = read value from file (zero-based indexing)
col = read value from file (zero-based indexing)
val = read value from file
/*
* Identify the easy cases:
* - Insertion of the first element or
* - The new element must be inserted at the end of the vectors
*
* The second case could be handled by the 'else', but handling it this way
* avoids more expensive searches by equal_range() and upper_bound().
*/
if ((rowindex.empty()) ||
((row >= rowindex[rowindex.size() - 1]) && (col > colindex[colindex.size() - 1]))) {
rowindex.push_back(row);
colindex.push_back(col);
values.push_back(val);
} else {
/*
* Find the elements of the same row as the element being inserted into the matrix.
*
* If this is the first element in a specific row, the two iterators returned by equal_range()
* point to the first element of the next larger row.
*
* If there are already other elements in the same row, the two iterators returned by equal_range()
* point to the first element of the row and the first element of the next larger row.
*
* Using the iterators also calculate indices to the elements returned by equal_range().
* These are used to index the corresponding elements in the other two vectors representing
* the sparse matrix (colindex and values).
*/
const auto p = equal_range(rowindex.begin(), rowindex.end(), row);
const auto index_of_first = p.first - rowindex.begin();
const auto index_of_last = p.second - rowindex.begin();
/*
* Create iterators to point to the corresponding elements in colindex.
*/
const auto first = next(colindex.begin(), index_of_first);
const auto last = next(colindex.begin(), index_of_last);
/*
* Find the correct position where the new element must be inserted and perform the corresponding
* insertions into the three vectors representing the sparse matrix.
*/
auto col_pos_it = upper_bound(first, last, col);
auto pos = col_pos_it - colindex.begin();
colindex.insert(col_pos_it, col);
auto row_pos_it = next(rowindex.begin(), pos);
rowindex.insert(row_pos_it, row);
auto val_pos_it = next(values.begin(), pos);
values.insert(val_pos_it, value);
}
C++23 will add std::ranges::views::zip, so that you can write something like the following.
#include <algorithm>
#include <ranges>
#include <vector>
#include <iostream>
int main()
{
std::vector colindex{ 1, 2, 2, 1, 0, 2, 0, 1, 0, 2, 1, 2 };
std::vector rowindex{ 7, 7, 2, 1, 3, 9, 8, 6, 6, 0, 3, 4 };
std::vector values{ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2 };
auto el_view{ std::views::zip(colindex, rowindex, values) };
std::ranges::sort( el_view );
for ( auto [r, c, v] : el_view )
std::cout << r << ' ' << c << ' ' << v << '\n';
}
Live, here.

std::min, std::max and autoDiff

When implementing the ReLU function for AutoDiff, one of the methods used is the std::max function; other implementations (conditional statements) work correctly but a try to implement max functions returns only 0 in the whole range.
On input vector:
dual in[] = { -3.0, -1.5, -0.1, 0.1, 1.5, 3.0 }
the derivative call in the form
derivative(ReLU,wrt(y),at(y)) where y = in[i]
gives proper results if ReLU is implemented with:
dual ReLU_ol(dual x) {
return (x > 0) * x; // ok; autodiff gives x > 0 ? 1 : 0
}
dual ReLU_if(dual x) {
if (x > 0.0) {
return x;
} else {
return 0.0;
}
// ok; autodiff gives x > 0 ? 1 : 0
}
that is (regarding derivative) one if x > 0 and zero elsewhere.
When ReLU is implemented in the form:
dual ReLU_max(dual x) {
return std::max(0.0,(double)x); // results an erroneous result of derivative
}
as result, I get zero in the whole range.
I expect that std::max (or std::min) should be prepared correctly for automatic differentiation.
Am I doing something wrong or maybe I miss understand something?
The resulting plot is:
where d / dx is calculated with AutoDiff; purple and blue lines are overlapping; results for ReLU_ol are not plotted.

Interpolation search?

I have a uniform 1D grid with value {0.1, 0.22, 0.35, 0.5, 0.78, 0.92}. These values are equally positioned from position 0 to 5 like following:
value 0.1 0.22 0.35 0.5 0.78 0.92
|_________|_________|_________|_________|_________|
position 0 1 2 3 4 5
Now I like to extract/interpolated value positioned, say, at 2.3, which should be
val(2.3) = val(2)*(3-2.3) + val(3)*(2.3-2)
= 0.35*0.7 + 0.5*0.3
= 0.3950
So how should I do it in a optimized way in C++? I am on Visual Studio 2017.
I can think of a binary search, but is any some std methods/or better way to do the job? Thanks.
You can get the integer part of the interpolation value and use that to index the two values you need to interpolate between. No need to use binary search as you are always know between which two values you interpolate. Only need to look out for indices that are outside of the values if that could ever happen.
This only works if the values are always mapped to integer indices starting with zero.
#include <cmath>
float get( const std::vector<float>& val, float p )
{
// let's assume p is always valid so it is good as index
const int a = static_cast<int>(p); // round down
const float t = p - a;
return std::lerp(val[a], val[a+1], t);
}
Edit:
std::lerp is a c++20 feature. If you use earlier versions you can use the following implementation which should be good enough:
float lerp(float a, float b, float t)
{
return a + (b - a) * t;
}

Constructing fractions Interview challenge

I recently came across the following interview question, I was wondering if a dynamic programming approach would work, or/and if there was some kind of mathematical insight that would make the solution easier... Its very similar to how ieee754 doubles are constructed.
Question:
There is vector V of N double values. Where the value at the ith index of the vector is equal to 1/2^(i+1). eg: 1/2, 1/4, 1/8, 1/16 etc...
You're to write a function that takes one double 'r' as input, where 0 < r < 1, and output the indexes of V to stdout that when summed will give a value closest to the value 'r' than any other combination of indexes from the vector V.
Furthermore the number of indexes should be a minimum, and in the event there are two solutions, the solution closest to zero should be preferred.
void getIndexes(std::vector<double>& V, double r)
{
....
}
int main()
{
std::vector<double> V;
// populate V...
double r = 0.3;
getIndexes(V,r);
return 0;
}
Note: It seems like there are a few SO'ers that aren't in the mood of reading the question completely. So lets all note the following:
The solution, aka the sum may be larger than r - hence any strategy incrementally subtracting fractions from r, until it hits zero or near zero is wrong
There are examples of r, where there will be 2 solutions, that is |r-s0| == |r-s1| and s0 < s1 - in this case s0 should be selected, this makes the problem slightly more difficult, as the knapsack style solutions tend to greedy overestimates first.
If you believe this problem is trivial, you most likely haven't understood it. Hence it would be a good idea to read the question again.
EDIT (Matthieu M.): 2 examples for V = {1/2, 1/4, 1/8, 1/16, 1/32}
r = 0.3, S = {1, 3}
r = 0.256652, S = {1}
Algorithm
Consider a target number r and a set F of fractions {1/2, 1/4, ... 1/(2^N)}. Let the smallest fraction, 1/(2^N), be denoted P.
Then the optimal sum will be equal to:
S = P * round(r/P)
That is, the optimal sum S will be some integer multiple of the smallest fraction available, P. The maximum error, err = r - S, is ± 1/2 * 1/(2^N). No better solution is possible because this would require the use of a number smaller than 1/(2^N), which is the smallest number in the set F.
Since the fractions F are all power-of-two multiples of P = 1/(2^N), any integer multiple of P can be expressed as a sum of the fractions in F. To obtain the list of fractions that should be used, encode the integer round(r/P) in binary and read off 1 in the kth binary place as "include the kth fraction in the solution".
Example:
Take r = 0.3 and F as {1/2, 1/4, 1/8, 1/16, 1/32}.
Multiply the entire problem by 32.
Take r = 9.6, and F as {16, 8, 4, 2, 1}.
Round r to the nearest integer.
Take r = 10.
Encode 10 as a binary integer (five places)
10 = 0b 0 1 0 1 0 ( 8 + 2 )
^ ^ ^ ^ ^
| | | | |
| | | | 1
| | | 2
| | 4
| 8
16
Associate each binary bit with a fraction.
= 0b 0 1 0 1 0 ( 1/4 + 1/16 = 0.3125 )
^ ^ ^ ^ ^
| | | | |
| | | | 1/32
| | | 1/16
| | 1/8
| 1/4
1/2
Proof
Consider transforming the problem by multiplying all the numbers involved by 2**N so that all the fractions become integers.
The original problem:
Consider a target number r in the range 0 < r < 1, and a list of fractions {1/2, 1/4, .... 1/(2**N). Find the subset of the list of fractions that sums to S such that error = r - S is minimised.
Becomes the following equivalent problem (after multiplying by 2**N):
Consider a target number r in the range 0 < r < 2**N and a list of integers {2**(N-1), 2**(N-2), ... , 4, 2, 1}. Find the subset of the list of integers that sums to S such that error = r - S is minimised.
Choosing powers of two that sum to a given number (with as little error as possible) is simply binary encoding of an integer. This problem therefore reduces to binary encoding of a integer.
Existence of solution: Any positive floating point number r, 0 < r < 2**N, can be cast to an integer and represented in binary form.
Optimality: The maximum error in the integer version of the solution is the round-off error of ±0.5. (In the original problem, the maximum error is ±0.5 * 1/2**N.)
Uniqueness: for any positive (floating point) number there is a unique integer representation and therefore a unique binary representation. (Possible exception of 0.5 = see below.)
Implementation (Python)
This function converts the problem to the integer equivalent, rounds off r to an integer, then reads off the binary representation of r as an integer to get the required fractions.
def conv_frac (r,N):
# Convert to equivalent integer problem.
R = r * 2**N
S = int(round(R))
# Convert integer S to N-bit binary representation (i.e. a character string
# of 1's and 0's.) Note use of [2:] to trim leading '0b' and zfill() to
# zero-pad to required length.
bin_S = bin(S)[2:].zfill(N)
nums = list()
for index, bit in enumerate(bin_S):
k = index + 1
if bit == '1':
print "%i : 1/%i or %f" % (index, 2**k, 1.0/(2**k))
nums.append(1.0/(2**k))
S = sum(nums)
e = r - S
print """
Original number `r` : %f
Number of fractions `N` : %i (smallest fraction 1/%i)
Sum of fractions `S` : %f
Error `e` : %f
""" % (r,N,2**N,S,e)
Sample output:
>>> conv_frac(0.3141,10)
1 : 1/4 or 0.250000
3 : 1/16 or 0.062500
8 : 1/512 or 0.001953
Original number `r` : 0.314100
Number of fractions `N` : 10 (smallest fraction 1/1024)
Sum of fractions `S` : 0.314453
Error `e` : -0.000353
>>> conv_frac(0.30,5)
1 : 1/4 or 0.250000
3 : 1/16 or 0.062500
Original number `r` : 0.300000
Number of fractions `N` : 5 (smallest fraction 1/32)
Sum of fractions `S` : 0.312500
Error `e` : -0.012500
Addendum: the 0.5 problem
If r * 2**N ends in 0.5, then it could be rounded up or down. That is, there are two possible representations as a sum-of-fractions.
If, as in the original problem statement, you want the representation that uses fewest fractions (i.e. the least number of 1 bits in the binary representation), just try both rounding options and pick whichever one is more economical.
Perhaps I am dumb...
The only trick I can see here is that the sum of (1/2)^(i+1) for i in [0..n) where n tends towards infinity gives 1. This simple fact proves that (1/2)^i is always superior to sum (1/2)^j for j in [i+1, n), whatever n is.
So, when looking for our indices, it does not seem we have much choice. Let's start with i = 0
either r is superior to 2^-(i+1) and thus we need it
or it is inferior and we need to choose whether 2^-(i+1) OR sum 2^-j for j in [i+2, N] is closest (deferring to the latter in case of equality)
The only step that could be costly is obtaining the sum, but it can be precomputed once and for all (and even precomputed lazily).
// The resulting vector contains at index i the sum of 2^-j for j in [i+1, N]
// and is padded with one 0 to get the same length as `v`
static std::vector<double> partialSums(std::vector<double> const& v) {
std::vector<double> result;
// When summing doubles, we need to start with the smaller ones
// because of the precision of representations...
double sum = 0;
BOOST_REVERSE_FOREACH(double d, v) {
sum += d;
result.push_back(sum);
}
result.pop_back(); // there is a +1 offset in the indexes of the result
std::reverse(result.begin(), result.end());
result.push_back(0); // pad the vector to have the same length as `v`
return result;
}
// The resulting vector contains the indexes elected
static std::vector<size_t> getIndexesImpl(std::vector<double> const& v,
std::vector<double> const& ps,
double r)
{
std::vector<size_t> indexes;
for (size_t i = 0, max = v.size(); i != max; ++i) {
if (r >= v[i]) {
r -= v[i];
indexes.push_back(i);
continue;
}
// We favor the closest to 0 in case of equality
// which is the sum of the tail as per the theorem above.
if (std::fabs(r - v[i]) < std::fabs(r - ps[i])) {
indexes.push_back(i);
return indexes;
}
}
return indexes;
}
std::vector<size_t> getIndexes(std::vector<double>& v, double r) {
std::vector<double> const ps = partialSums(v);
return getIndexesImpl(v, ps, r);
}
The code runs (with some debug output) at ideone. Note that for 0.3 it gives:
0.3:
1: 0.25
3: 0.0625
=> 0.3125
which is slightly different from the other answers.
At the risk of downvotes, this problem seems to be rather straightforward. Just start with the largest and smallest numbers you can produce out of V, adjust each index in turn until you have the two possible closest answers. Then evaluate which one is the better answer.
Here is untested code (in a language that I don't write):
void getIndexes(std::vector<double>& V, double r)
{
double v_lower = 0;
double v_upper = 1.0 - 0.5**V.size();
std::vector<int> index_lower;
std::vector<int> index_upper;
if (v_upper <= r)
{
// The answer is trivial.
for (int i = 0; i < V.size(); i++)
cout << i;
return;
}
for (int i = 0; i < N; i++)
{
if (v_lower + V[i] <= r)
{
v_lower += V[i];
index_lower.push_back(i);
}
if (r <= v_upper - V[i])
v_upper -= V[i];
else
index_upper.push_back(i);
}
if (r - v_lower < v_upper - r)
printIndexes(index_lower);
else if (v_upper - r < r - v_lower)
printIndexes(index_upper);
else if (v_upper.size() < v_lower.size())
printIndexes(index_upper);
else
printIndexes(index_lower);
}
void printIndexes(std::vector<int>& ind)
{
for (int i = 0; i < ind.size(); i++)
{
cout << ind[i];
}
}
Did I get the job! :D
(Please note, this is horrible code that relies on our knowing exactly what V has in it...)
I will start by saying that I do believe that this problem is trivial...
(waits until all stones have been thrown)
Yes, I did read the OP's edit that says that I have to re-read the question if I think so. Therefore I might be missing something that I fail to see - in this case please excuse my ignorance and feel free to point out my mistakes.
I don't see this as a dynamic programming problem. At the risk of sounding naive, why not try keeping two estimations of r while searching for indices - namely an under-estimation and an over-estimation. After all, if r does not equal any sum that can be computed from elements of V, it will lie between some two sums of the kind. Our goal is to find these sums and to report which is closer to r.
I threw together some quick-and-dirty Python code that does the job. The answer it reports is correct for the two test cases that the OP provided. Note that if the return is structured such that at least one index always has to be returned - even if the best estimation is no indices at all.
def estimate(V, r):
lb = 0 # under-estimation (lower-bound)
lbList = []
ub = 1 - 0.5**len(V) # over-estimation = sum of all elements of V
ubList = range(len(V))
# calculate closest under-estimation and over-estimation
for i in range(len(V)):
if r == lb + V[i]:
return (lbList + [i], lb + V[i])
elif r == ub:
return (ubList, ub)
elif r > lb + V[i]:
lb += V[i]
lbList += [i]
elif lb + V[i] < ub:
ub = lb + V[i]
ubList = lbList + [i]
return (ubList, ub) if ub - r < r - lb else (lbList, lb) if lb != 0 else ([len(V) - 1], V[len(V) - 1])
# populate V
N = 5 # number of elements
V = []
for i in range(1, N + 1):
V += [0.5**i]
# test
r = 0.484375 # this value is equidistant from both under- and over-estimation
print "r:", r
estimate = estimate(V, r)
print "Indices:", estimate[0]
print "Estimate:", estimate[1]
Note: after finishing writing my answer I noticed that this answer follows the same logic. Alas!
I don't know if you have test cases, try the code below. It is a dynamic-programming approach.
1] exp: given 1/2^i, find the largest i as exp. Eg. 1/32 returns 5.
2] max: 10^exp where exp=i.
3] create an array of size max+1 to hold all possible sums of the elements of V.
Actually the array holds the indexes, since that's what you want.
4] dynamically compute the sums (all invalids remain null)
5] the last while loop finds the nearest correct answer.
Here is the code:
public class Subset {
public static List<Integer> subsetSum(double[] V, double r) {
int exp = exponent(V);
int max = (int) Math.pow(10, exp);
//list to hold all possible sums of the elements in V
List<Integer> indexes[] = new ArrayList[max + 1];
indexes[0] = new ArrayList();//base case
//dynamically compute the sums
for (int x=0; x<V.length; x++) {
int u = (int) (max*V[x]);
for(int i=max; i>=u; i--) if(null != indexes[i-u]) {
List<Integer> tmp = new ArrayList<Integer>(indexes[i - u]);
tmp.add(x);
indexes[i] = tmp;
}
}
//find the best answer
int i = (int)(max*r);
int j=i;
while(null == indexes[i] && null == indexes[j]) {
i--;j++;
}
return indexes[i]==null || indexes[i].isEmpty()?indexes[j]:indexes[i];
}// subsetSum
private static int exponent(double[] V) {
double d = V[V.length-1];
int i = (int) (1/d);
String s = Integer.toString(i,2);
return s.length()-1;
}// summation
public static void main(String[] args) {
double[] V = {1/2.,1/4.,1/8.,1/16.,1/32.};
double r = 0.6, s=0.3,t=0.256652;
System.out.println(subsetSum(V,r));//[0, 3, 4]
System.out.println(subsetSum(V,s));//[1, 3]
System.out.println(subsetSum(V,t));//[1]
}
}// class
Here are results of running the code:
For 0.600000 get 0.593750 => [0, 3, 4]
For 0.300000 get 0.312500 => [1, 3]
For 0.256652 get 0.250000 => [1]
For 0.700000 get 0.687500 => [0, 2, 3]
For 0.710000 get 0.718750 => [0, 2, 3, 4]
The solution implements Polynomial time approximate algorithm. Output of the program is the same as outputs of another solutions.
#include <math.h>
#include <stdio.h>
#include <vector>
#include <algorithm>
#include <functional>
void populate(std::vector<double> &vec, int count)
{
double val = .5;
vec.clear();
for (int i = 0; i < count; i++) {
vec.push_back(val);
val *= .5;
}
}
void remove_values_with_large_error(const std::vector<double> &vec, std::vector<double> &res, double r, double max_error)
{
std::vector<double>::const_iterator iter;
double min_err, err;
min_err = 1.0;
for (iter = vec.begin(); iter != vec.end(); ++iter) {
err = fabs(*iter - r);
if (err < max_error) {
res.push_back(*iter);
}
min_err = std::min(err, min_err);
}
}
void find_partial_sums(const std::vector<double> &vec, std::vector<double> &res, double r)
{
std::vector<double> svec, tvec, uvec;
std::vector<double>::const_iterator iter;
int step = 0;
svec.push_back(0.);
for (iter = vec.begin(); iter != vec.end(); ++iter) {
step++;
printf("step %d, svec.size() %d\n", step, svec.size());
tvec.clear();
std::transform(svec.begin(), svec.end(), back_inserter(tvec),
std::bind2nd(std::plus<double>(), *iter));
uvec.clear();
uvec.insert(uvec.end(), svec.begin(), svec.end());
uvec.insert(uvec.end(), tvec.begin(), tvec.end());
sort(uvec.begin(), uvec.end());
uvec.erase(unique(uvec.begin(), uvec.end()), uvec.end());
svec.clear();
remove_values_with_large_error(uvec, svec, r, *iter * 4);
}
sort(svec.begin(), svec.end());
svec.erase(unique(svec.begin(), svec.end()), svec.end());
res.clear();
res.insert(res.end(), svec.begin(), svec.end());
}
double find_closest_value(const std::vector<double> &sums, double r)
{
std::vector<double>::const_iterator iter;
double min_err, res, err;
min_err = fabs(sums.front() - r);
res = sums.front();
for (iter = sums.begin(); iter != sums.end(); ++iter) {
err = fabs(*iter - r);
if (err < min_err) {
min_err = err;
res = *iter;
}
}
printf("found value %lf with err %lf\n", res, min_err);
return res;
}
void print_indexes(const std::vector<double> &vec, double value)
{
std::vector<double>::const_iterator iter;
int index = 0;
printf("indexes: [");
for (iter = vec.begin(); iter != vec.end(); ++iter, ++index) {
if (value >= *iter) {
printf("%d, ", index);
value -= *iter;
}
}
printf("]\n");
}
int main(int argc, char **argv)
{
std::vector<double> vec, sums;
double r = .7;
int n = 5;
double value;
populate(vec, n);
find_partial_sums(vec, sums, r);
value = find_closest_value(sums, r);
print_indexes(vec, value);
return 0;
}
Sort the vector and search for the closest fraction available to r. store that index, subtract the value from r, and repeat with the remainder of r. iterate until r is reached, or no such index can be found.
Example :
0.3 - the biggest value available would be 0.25. (index 2). the remainder now is 0.05
0.05 - the biggest value available would be 0.03125 - the remainder will be 0.01875
etc.
etc. every step would be an O(logN) search in a sorted array. the number of steps will also be O(logN) total complexity will be than O(logN^2).
This is not dynamic programming question
The output should rather be vector of ints (indexes), not vector of doubles
This might by off 0-2 in exact values, this is just concept:
A) output zero index until the r0 (r - index values already outputded) is bigger than 1/2
B) Inspect the internal representation of r0 double and:
x (1st bit shift) = -Exponent; // The bigger exponent, the smallest numbers (bigger x in 1/2^(x) you begin with)
Inspect bit representation of the fraction part of float in cycle with body:
(direction depends on little/big endian)
{
if (bit is 1)
output index x;
x++;
}
Complexity of each step is constant, so overall it is O(n) where n is size of output.
To paraphrase the question, what are the one bits in the binary representation of r (after the binary point)? N is the 'precision', if you like.
In Cish pseudo-code
for (int i=0; i<N; i++) {
if (r>V[i]) {
print(i);
r -= V[i];
}
}
You could add an extra test for r == 0 to terminate the loop early.
Note that this gives the least binary number closest to 'r', i.e. the one closer to zero if there are two equally 'right' answers.
If the Nth digit was a one, you'll need to add '1' to the 'binary' number obtained and check both against the original 'r'. (Hint: construct vectors a[N], b[N] of 'bits', set '1' bits instead of 'print'ing above. Set b = a and do a manual add, digit by digit from the end of 'b' until you stop carrying. Convert to double and choose whichever is closer.
Note that a[] <= r <= a[] + 1/2^N and that b[] = a[] + 1/2^N.
The 'least number of indexes [sic]' is a red-herring.