I have following Python Code written in NumPy:
> r = 3
> y, x = numpy.ogrid[-r : r + 1, -r : r + 1]
> mask = numpy.sqrt(x**2 + y**2)
> mask
array([[4.24264, 3.60555, 3.16228, 3.00000, 3.16228, 3.60555, 4.24264],
[3.60555, 2.82843, 2.23607, 2.00000, 2.23607, 2.82843, 3.60555],
[3.16228, 2.23607, 1.41421, 1.00000, 1.41421, 2.23607, 3.16228],
[3.00000, 2.00000, 1.00000, 0.00000, 1.00000, 2.00000, 3.00000],
[3.16228, 2.23607, 1.41421, 1.00000, 1.41421, 2.23607, 3.16228],
[3.60555, 2.82843, 2.23607, 2.00000, 2.23607, 2.82843, 3.60555],
[4.24264, 3.60555, 3.16228, 3.00000, 3.16228, 3.60555, 4.24264]])
Now, I am making the mask in Eigen where I need to broadcast row and column vector. Unfortunately, it is not allowed so I made the following workaround:
int len = 1 + 2 * r;
MatrixXf mask = MatrixXf::Zero(len, len);
ArrayXf squared_yx = ArrayXf::LinSpaced(len, -r, r).square();
mask = (mask.array().colwise() + squared_yx) +
(mask.array().rowwise() + squared_yx.transpose());
mask = mask.cwiseSqrt();
cout << "mask" << endl << mask << endl;
4.24264 3.60555 3.16228 3 3.16228 3.60555 4.24264
3.60555 2.82843 2.23607 2 2.23607 2.82843 3.60555
3.16228 2.23607 1.41421 1 1.41421 2.23607 3.16228
3 2 1 0 1 2 3
3.16228 2.23607 1.41421 1 1.41421 2.23607 3.16228
3.60555 2.82843 2.23607 2 2.23607 2.82843 3.60555
4.24264 3.60555 3.16228 3 3.16228 3.60555 4.24264
It works. But I wonder if there is another and shorter way to do it. Therefore my question is how to broadcast Row and Column Vector in Eigen C++?
System Info
Tool
Version
Eigen
3.3.7
GCC
9.4.0
Ubuntu
20.04.4 LTS
I think the easiest approach (as in: most readable), is replicate.
int r = 3;
int len = 1 + 2 * r;
const auto& squared_yx = Eigen::ArrayXf::LinSpaced(len, -r, r).square();
const auto& bcast = squared_yx.replicate(1, len);
Eigen::MatrixXf mask = (bcast + bcast.transpose()).sqrt();
Note that what you do is numerically unstable (for large r) and the hypot function exists to work around these issues. So even your python code could be better:
r = 3
y, x = numpy.ogrid[-r : r + 1, -r : r + 1]
mask = numpy.hypot(x, y)
To achieve the same in Eigen, do something like this:
const auto& yx = Eigen::ArrayXf::LinSpaced(len, -r, r);
const auto& bcast = yx.replicate(1, len);
Eigen::MatrixXf mask = bcast.binaryExpr(bcast.transpose(),
[](float x, float y) noexcept -> float {
return std::hypot(x, y);
});
Eigen's documentation on binaryExpr is currently broken, so this is hard to find.
To be fair, you will probably never run into stability issues in this particular case because you will run out of memory first. However, it'd still like to point this out because seeing a naive sqrt(x**2 + y**2) is always a bit of a red flag. Also, in Python hypot might still worth it from a performance point because it reduces the number of temporary memory allocations and function calls.
BinaryExpr
The documentation on binaryExpr is missing, I assume because the parser has trouble with Eigen's C++ code. In any case, one can find it indirectly as CwiseBinaryOp and similarly CwiseUnaryOp, CwiseNullaryOp and CwiseTernaryOp.
The use looks a bit weird but is pretty simple. It takes a functor (either a struct with operator(), a function pointer, or a lambda) and applies this element-wise.
The unary operation makes this pretty clear. If Eigen::Array.sin() didn't exist, you could write this:
array.unaryExpr([](double x) -> double { return std::sin(x); }) to achieve exactly the same effect.
The binary and ternary versions take one or two more Eigen expressions as the second and third argument to the function. That's what I did above. The nullary version is explained in the documentation in its own chapter.
Use of auto
Eigen is correct to warn about auto but only in that you have to know what you do. It is important to realize that auto on an Eigen expression just keeps the expression around. It does not evaluate it into a vector or matrix.
This is fine and very useful if you want to compose a complex expression that would be hard to read when put in a single statement. In my code above, there are no temporary memory allocations and no floating point computations take place until the final expression is assigned to the matrix.
As long as the programmer knows that these are expressions and not final matrices, everything is fine.
I think the main take-away is that use of auto with Eigen should be limited to short-lived (as in: inside a single function) scalar expressions. Any coding style that uses auto for everything will quickly break or be hard to read with Eigen. But it can be used safely and make the code more readable in the process without sacrificing performance in the same way as evaluating into matrices would.
As for why I chose const auto& instead of auto or const auto: Mostly force of habit that is unrelated to the task at hand. I mostly do it for instances like this:
const Eigen::Vector& Foo::get_bar();
void quz(Foo& foo)
{
const auto& bar = foo.get_bar();
}
Here, bar will remain a reference whereas auto would create a copy. If the return value is changed, everything stays valid.
Eigen::Vector Foo::get_bar();
void quz(Foo& foo)
{
const auto& bar = foo.get_bar();
}
Now a copy is created anyway. But everything continues to work because assigning the return value to a const-reference extends the lifetime of the object. So this may look like a dangling pointer, but it is not.
Related
I have an expression whose outcome is a real number, but is composed of imaginary terms (that cancel one another). A significantly more simple example than the one I am considering would be something like,
z = a + 1/[sqrt(a-b) - a] - f[sqrt(a-b)] = a
where a and b are real numbers and f is some function that statisfies the above expression. It would not surprise you that in some cases, say for b > a (which does not always occur, but could occur in some cases), the above expression returns nan, since some of its terms are imaginary.
Sometimes, it is possible to work out the algebra and write out this not-really-complex expression using real numbers only. However, in my case, the algebra is very messy (so messy that even Matlab's symbolic package and Mathematica are unable to trivially simplify).
I am wondering if there is some other way to work out expressions you know to be real, but are partly imaginary.
PS: not important for this question, but for more info about the expression I am dealing with, please see another question I previously asked.
tl;dr for the comment thread:
If you know you're doing something that will involve imaginary numbers, just use std::complex.
You can't avoid getting NaN if you insist on asking for a real result to something (sqrt, say) that you know will have an imaginary component. There is no real answer it can give you.
At the end of your computation, if imag(result) is zero (or within a suitable epsilon), then your imaginary parts cancelled out and you have a real(result).
As a concrete example:
#include <complex>
#include <iostream>
int main()
{
std::complex<double> a{-5, 0}; // -5 + 0i
std::complex<double> b{ 5, 0}; // +5 + 0i
auto z = b + sqrt(a) - sqrt(a);
std::cout << "z = " << real(z) << " + " << imag(z) << "i\n";
}
prints
z = 5 + 0i
With your new example
z = a + 1/(sqrt(a-b) - a) - f(sqrt(a-b)) = a
it'll be useful to make a of type std::complex in the first place, and to use a complex 1+0i for the numerator as well. This is because of the way overloaded operators are resolved:
using cx = std::complex<double>;
cx f(cx); // whatever this does, it also needs to handle complex inputs
cx foo(cx a, cx b)
{
return a + cx{1}/(sqrt(a-b) - a) - f(sqrt(a-b));
}
auto
I am trying to solve a problem of nonlinear real numbers using Z3. I need the Z3 to generate multiple solutions.
In the problem domain, precision is not a critical issue; I need just one or two decimal digits after the decimal point. so, I need to set Z3 not to explore all the search space of real numbers to minimize the time to find multiple solutions.
I am trying to replace the real numbers with floating point numbers. I read the fpa example in the c_api.c file but I found it a little bit confusing for me.
for example, let me assume that I want to convert the reals in the following code:
config cfg;
cfg.set("auto_config", true);
context con(cfg);
expr x = con.real_const("x");
expr y = con.real_const("y");
solver sol(con);
sol.add(x*y > 10);
std::cout << sol.check() << "\n";
std::cout << sol.get_model() << "\n";
}
I tried the following code but it didn't work
config cfg;
cfg.set("auto_config", true);
context con(cfg);
expr sign = con.bv_const("sig", 1);
expr exp = con.bv_const("exp", 10);
expr sig = con.bv_const("sig", 10);
expr x = to_expr(con, Z3_mk_fpa_fp(con, sign, exp, sig));
expr y = to_expr(con, Z3_mk_fpa_fp(con, sign, exp, sig));
solver sol(con);
sol.add(x*y > 10);
std::cout << sol.check() << "\n";
and the output is:
Assertion failed: false, file c:\users\rehab\downloads\z3-master\z3-master\src\a
pi\c++\z3++.h, line 1199
My questions are:
Are there any detailed examples or code snippets about using fpa in C++ APIs? it is not clear to me how to convert the fpa example in the C API to C++ API.
What's wrong in the above code conversion?
I'm not sure if using floats is the best way to go for your problem. But sounds like you tried all other options and non-linearity is getting in your way. Note that even if you model your problem with floats, floating-point arithmetic is quite tricky and solver may have hard time finding satisfying models. Furthermore, solutions maybe way far off from actual results due to numerical instability.
Using C
Leaving all those aside, the correct way to code your query using the C api would be (assuming we use 32-bit single-precision floats):
#include <z3.h>
int main(void) {
Z3_config cfg = Z3_mk_config();
Z3_context ctx = Z3_mk_context(cfg);
Z3_solver s = Z3_mk_solver(ctx);
Z3_solver_inc_ref(ctx, s);
Z3_del_config(cfg);
Z3_sort float_sort = Z3_mk_fpa_sort(ctx, 8, 24);
Z3_symbol s_x = Z3_mk_string_symbol(ctx, "x");
Z3_symbol s_y = Z3_mk_string_symbol(ctx, "y");
Z3_ast x = Z3_mk_const(ctx, s_x, float_sort);
Z3_ast y = Z3_mk_const(ctx, s_y, float_sort);
Z3_symbol s_x_times_y = Z3_mk_string_symbol(ctx, "x_times_y");
Z3_ast x_times_y = Z3_mk_const(ctx, s_x_times_y, float_sort);
Z3_ast c1 = Z3_mk_eq(ctx, x_times_y, Z3_mk_fpa_mul(ctx, Z3_mk_fpa_rne(ctx), x, y));
Z3_ast c2 = Z3_mk_fpa_gt(ctx, x_times_y, Z3_mk_fpa_numeral_float(ctx, 10, float_sort));
Z3_solver_assert(ctx, s, c1);
Z3_solver_assert(ctx, s, c2);
Z3_lbool result = Z3_solver_check(ctx, s);
switch(result) {
case Z3_L_FALSE: printf("unsat\n");
break;
case Z3_L_UNDEF: printf("undef\n");
break;
case Z3_L_TRUE: { Z3_model m = Z3_solver_get_model(ctx, s);
if(m) Z3_model_inc_ref(ctx, m);
printf("sat\n%s\n", Z3_model_to_string(ctx, m));
break;
}
}
return 0;
}
When run, this prints:
sat
x_times_y -> (fp #b0 #xbe #b10110110110101010000010)
y -> (fp #b0 #xb5 #b00000000000000000000000)
x -> (fp #b0 #x88 #b10110110110101010000010)
These are single-precision floating point numbers; you can read about them in wikipedia for instance. In more conventional notation, they are:
x_times_y -> 1.5810592e19
y -> 1.8014399e16
x -> 877.6642
This is quite tricky to use, but what you have asked.
Using Python
I'd heartily recommend using the Python API to at least see what the solver is capable of before investing into such complicated C code. Here's how it would look in Python:
from z3 import *
x = FP('x', FPSort(8, 24))
y = FP('y', FPSort(8, 24))
s = Solver()
s.add(x*y > 10);
s.check()
print s.model()
When run, this prints:
[y = 1.32167303562164306640625,
x = 1.513233661651611328125*(2**121)]
Perhaps not what you expected, but it is a valid model indeed.
Using Haskell
Just to give you a taste of simplicity, here's how the same problem can be expressed using the Haskell bindings (It's just a mere one liner!)
Prelude Data.SBV> sat $ \x y -> fpIsPoint x &&& fpIsPoint y &&& x * y .> (10::SFloat)
Satisfiable. Model:
s0 = 5.1129496e28 :: Float
s1 = 6.6554557e9 :: Float
Summary
Note that Floating-point also has issues regarding NaN/Infinity values, so you might have to avoid those explicitly. (This is what the Haskell expression did by using the isFPPoint predicate. Coding it in Python or C would require more code, but is surely doable.)
It should be emphasized that literally any other binding to Z3 (Python, Haskell, Scala, what have you) will give you a better experience than what you'll get with C/C++/Java. (Even direct coding in SMTLib would be nicer.)
So, I heartily recommend using some higher-level interface (Python is a good one: It is easy to learn), and once you are confident with the model and how it works, you can then start coding the same in C if necessary.
Consider two vectors, A and B, of size n, 7 <= n <= 23. Both A and B consists of -1s, 0s and 1s only.
I need a fast algorithm which computes the inner product of A and B.
So far I've thought of storing the signs and values in separate uint32_ts using the following encoding:
sign 0, value 0 → 0
sign 0, value 1 → 1
sign 1, value 1 → -1.
The C++ implementation I've thought of looks like the following:
struct ternary_vector {
uint32_t sign, value;
};
int inner_product(const ternary_vector & a, const ternary_vector & b) {
uint32_t psign = a.sign ^ b.sign;
uint32_t pvalue = a.value & b.value;
psign &= pvalue;
pvalue ^= psign;
return __builtin_popcount(pvalue) - __builtin_popcount(psign);
}
This works reasonably well, but I'm not sure whether it is possible to do it better. Any comment on the matter is highly appreciated.
I like having the 2 uint32_t, but I think your actual calculation is a bit wasteful
Just a few minor points:
I'm not sure about the reference (getting a and b by const &) - this adds a level of indirection compared to putting them on the stack. When the code is this small (a couple of clocks maybe) this is significant. Try passing by value and see what you get
__builtin_popcount can be, unfortunately, very inefficient. I've used it myself, but found that even a very basic implementation I wrote was far faster than this. However - this is dependent on the platform.
Basically, if the platform has a hardware popcount implementation, __builtin_popcount uses it. If not - it uses a very inefficient replacement.
The one serious problem here is the reuse of the psign and pvalue variables for the positive and negative vectors. You are doing neither your compiler nor yourself any favors by obfuscating your code in this way.
Would it be possible for you to encode your ternary state in a std::bitset<2> and define the product in terms of and? For example, if your ternary types are:
1 = P = (1, 1)
0 = Z = (0, 0)
-1 = M = (1, 0) or (0, 1)
I believe you could define their product as:
1 * 1 = 1 => P * P = P => (1, 1) & (1, 1) = (1, 1) = P
1 * 0 = 0 => P * Z = Z => (1, 1) & (0, 0) = (0, 0) = Z
1 * -1 = -1 => P * M = M => (1, 1) & (1, 0) = (1, 0) = M
Then the inner product could start by taking the and of the bits of the elements and... I am working on how to add them together.
Edit:
My foolish suggestion did not consider that (-1)(-1) = 1, which cannot be handled by the representation I proposed. Thanks to #user92382 for bringing this up.
Depending on your architecture, you may want to optimize away the temporary bit vectors -- e.g. if your code is going to be compiled to FPGA, or laid out to an ASIC, then a sequence of logical operations will be better in terms of speed/energy/area than storing and reading/writing to two big buffers.
In this case, you can do:
int inner_product(const ternary_vector & a, const ternary_vector & b) {
return __builtin_popcount( a.value & b.value & ~(a.sign ^ b.sign))
- __builtin_popcount( a.value & b.value & (a.sign ^ b.sign));
}
This will lay out very well -- the (a.value & b.value & ... ) can enable/disable an XOR gate, whose output splits into two signed accumulators, with the first pathway NOTed before accumulation.
What would be the most efficient algorithm to solve a linear equation in one variable given as a string input to a function? For example, for input string:
"x + 9 – 2 - 4 + x = – x + 5 – 1 + 3 – x"
The output should be 1.
I am considering using a stack and pushing each string token onto it as I encounter spaces in the string. If the input was in polish notation then it would have been easier to pop numbers off the stack to get to a result, but I am not sure what approach to take here.
It is an interview question.
Solving the linear equation is (I hope) extremely easy for you once you've worked out the coefficients a and b in the equation a * x + b = 0.
So, the difficult part of the problem is parsing the expression and "evaluating" it to find the coefficients. Your example expression is extremely simple, it uses only the operators unary -, binary -, binary +. And =, which you could handle specially.
It is not clear from the question whether the solution should also handle expressions involving binary * and /, or parentheses. I'm wondering whether the interview question is intended:
to make you write some simple code, or
to make you ask what the real scope of the problem is before you write anything.
Both are important skills :-)
It could even be that the question is intended:
to separate those with lots of experience writing parsers (who will solve it as fast as they can write/type) from those with none (who might struggle to solve it at all within a few minutes, at least without some hints).
Anyway, to allow for future more complicated requirements, there are two common approaches to parsing arithmetic expressions: recursive descent or Dijkstra's shunting-yard algorithm. You can look these up, and if you only need the simple expressions in version 1.0 then you can use a simplified form of Dijkstra's algorithm. Then once you've parsed the expression, you need to evaluate it: use values that are linear expressions in x and interpret = as an operator with lowest possible precedence that means "subtract". The result is a linear expression in x that is equal to 0.
If you don't need complicated expressions then you can evaluate that simple example pretty much directly from left-to-right once you've tokenised it[*]:
x
x + 9
// set the "we've found minus sign" bit to negate the first thing that follows
x + 7 // and clear the negative bit
x + 3
2 * x + 3
// set the "we've found the equals sign" bit to negate everything that follows
3 * x + 3
3 * x - 2
3 * x - 1
3 * x - 4
4 * x - 4
Finally, solve a * x + b = 0 as x = - b/a.
[*] example tokenisation code, in Python:
acc = None
for idx, ch in enumerate(input):
if ch in '1234567890':
if acc is None: acc = 0
acc = 10 * acc + int(ch)
continue
if acc != None:
yield acc
acc = None
if ch in '+-=x':
yield ch
elif ch == ' ':
pass
else:
raise ValueError('illegal character "%s" at %d' % (ch, idx))
Alternative example tokenisation code, also in Python, assuming there will always be spaces between tokens as in the example. This leaves token validation to the parser:
return input.split()
ok some simple psuedo code that you could use to solve this problem
function(stinrgToParse){
arrayoftokens = stringToParse.match(RegexMatching);
foreach(arrayoftokens as token)
{
//now step through the tokens and determine what they are
//and store the neccesary information.
}
//Use the above information to do the arithmetic.
//count the number of times a variable appears positive and negative
//do the arithmetic.
//add up the numbers both positive and negative.
//return the result.
}
The first thing is to parse the string, to identify the various tokens (numbers, variables and operators), so that an expression tree can be formed by giving operator proper precedences.
Regular expressions can help, but that's not the only method (grammar parsers like boost::spirit are good too, and you can even run your own: its all a "find and recourse").
The tree can then be manipulated reducing the nodes executing those operation that deals with constants and by grouping variables related operations, executing them accordingly.
This goes on recursively until you remain with a variable related node and a constant node.
At the point the solution is calculated trivially.
They are basically the same principles that leads to the production of an interpreter or a compiler.
Consider:
from operator import add, sub
def ab(expr):
a, b, op = 0, 0, add
for t in expr.split():
if t == '+': op = add
elif t == '-': op = sub
elif t == 'x': a = op(a, 1)
else : b = op(b, int(t))
return a, b
Given an expression like 1 + x - 2 - x... this converts it to a canonical form ax+b and returns a pair of coefficients (a,b).
Now, let's obtain the coefficients from both parts of the equation:
le, ri = equation.split('=')
a1, b1 = ab(le)
a2, b2 = ab(ri)
and finally solve the trivial equation a1*x + b1 = a2*x + b2:
x = (b2 - b1) / (a1 - a2)
Of course, this only solves this particular example, without operator precedence or parentheses. To support the latter you'll need a parser, presumable a recursive descent one, which would be simper to code by hand.
I'm trying to achieve something like the following in C++:
class MyVector; // 3 component vector class
MyVector const kA = /* ... */;
MyVector const kB = /* ... */;
MyVector const kC = /* ... */;
MyVector const kD = /* ... */;
// I'd like to shorten the remaining lines, ideally making it readable but less code/operations.
MyVector result = kA;
MyVector const kCMinusD = kC - kD;
if(kCMinusD.X <= 0)
{
result.X = kB.X;
}
if(kCMinusD.Y <= 0)
{
result.Y = kB.Y;
}
if(kCMinusD.Z <= 0)
{
result.Z = kB.Z;
}
Paraphrasing the code into English, I have four 'known' vectors. Two of the vectors have values that I may or may not want in my result, and whether I want them or not is contingent on a branch based on the components of two other vectors.
I feel like I should be able to simplify this code with some matrix math and masking, but I can't wrap my head around it.
For now I'm going with the branch, but I'm curious to know if there's a better way that still would be understandable, and less code-verbose.
Edit:
In reference to Mark's comment, I'll explain what I'm trying to do here.
This code is an excerpt from some spring physics I'm working on. The components are as follows:
kC is the springs length currently, and kD is minimum spring length.
kA and kB are two sets of spring tensions, each component of which may be unique per component (i.e., a different spring tension along the X, Y, or Z). kA is the springs tension if it's not fully compressed, and kB is the springs tension if it IS fully compressed.
I'd like to build up a resultant 'vector' that simply is the amalgamation of kC and kD, dependant on whether the spring is compressed or not.
Depending on the platform you're on, the compiler might be able to optimize statements like
result.x = (kC.x > kD.x) ? kA.x : kB.x;
result.y = (kC.y > kD.y) ? kA.y : kB.y;
result.z = (kC.z > kD.z) ? kA.z : kB.z;
using fsel (floating point select) instructions or conditional moves. Personally, I think the code looks nicer and more concise this way too, but that's subjective.
If the code is really performance critical, and you don't mind changing your vector class to be 4 floats instead of 3, you could use SIMD (e.g SSE on Intel platforms, VMX on PowerPC) to do the comparison and select the answers. If you went ahead with this, it would like this: (in pseudo code)
// Set each component of mask to be either 0x0 or 0xFFFFFFFF depending on the comparison
MyVector4 mask = vec_compareLessThan(kC, kD);
// Sets each component of result to either kA or kB's component, depending on whether the bits are set in mask
result = vec_select(kA, kb, mask);
This takes a while getting used to, and it might be less readable initially, but you eventually get used to thinking in SIMD mode.
The usual caveats apply, of course - don't optimize before you profile, etc.
If your vector elements are ints, you can do:
MyVector result;
MyVector const kCMinusD = kC - kD;
int mask = kCMinusD.X >> 31; // either 0 or -1
result.X = (kB.X & mask) | (kCMinusD.X & ~mask)
mask = kCMinusD.Y >> 31;
result.X = (kB.Y & mask) | (kCMinusD.Y & ~mask)
mask = kCMinusD.Z >> 31;
result.X = (kB.Z & mask) | (kCMinusD.Z & ~mask)
(note this handles the == 0 case differently, not sure if you care)
If your vector elements are doubles instead of ints, you can do something similar as the sign bit is in the same place, you just have to convert to integers, do the mask, and convert back.
If you're seeking a clean expression in source more than a runtime optimization, you might consider solving this problem from the "toolbox" point of view. So let's say that on MyVector you defined sign, gt (greater-than), and le (less-than-or-equal-to). Then in two lines:
MyVector const kSignCMinusD = (kC - kD).sign();
result = kSignCMinusD.gt(0) * kA + kSignCMinusD.le(0) * kB;
With operator overloading:
MyVector const kSignCMinusD = (kC - kD).sign();
result = (kSignCMinusD > 0) * kA + (kSignCMinusD <= 0) * kB;
For inspiration here's the MatLab function reference. And obviously there are many C++ vector libraries to choose from with such functions.
You can always go in and optimize further if profiling shows it necessary. But often the biggest performance issues are how well you can see the big picture and reuse intermediate computations.
Since you are only doing subtraction you are rewrite as below:
MyVector result;
result.x = kD.x > kC.x ? kB.x : kA.x;
result.y = kD.y > kC.y ? kB.y : kA.y;
result.z = kD.z > kC.z ? kB.z : kA.z;