Incorrect variable range behavior in RooFit - c++

The RooFit package allows me to import some TTree branches, but it constrains values included in those branches between the min and max value set by a RooRealVar. For instance:
RooRealVar t1("t1", "Some variable", 0.0, 1.0);
RooDataSet data("data", "My Dataset", RooArgSet(t1), Import(*myttree));
so long as the TTree myttree contains a branch called t1. This is all fine and good, until you start getting close to floating point values in the range. My particular problem occurs because I have a variable like t1 which maps to some variable with an exponential distribution. I'm trying to fit this distribution, but the fits fail for values of t1 ~ 0.0. My solution was to just change the range a bit to cut off potential events where the stored value of t1 in the tree is actually zero or close to it (all the following code is run in the ROOT interpreter, but I have confirmed it works in compiled code as well):
root[0] RooRealVar t1("t1", "Some variable", 0.001, 1.0);
However, note the following annoying behavior:
root[1] t1 = 0.000998;
root[2] t1.getVal();
(double) 0.0010000000
// this is correct, as 0.000998 < 0.001 so RooFit set it as the lower limit
root[3] t1 = 0.000999;
root[4] t1.getVal();
(double) 0.00099900000
// this is incorrect.
Yes, the extra zero is printed in the second case, which I also don't understand, but I'm mostly concerned with the failure to recognize that 0.000999 < 0.001. When I compare these values later in an if-statement, I find that C++ can tell the difference. Everything appears to be double precision here, and I've been tracing through the code to see where the precision error seems to crop up. Correct me if I'm wrong, but a float should still hold these numbers up to comparison precision, right? What's going on here? If this is some floating point error problem, what's the best way to resolve it? I have several events with values like t1 = 0.000999874, and changing the bounds to something like 0.0001 doesn't really help either, there are still events which live on this edge.
Edit: I want to emphasize that while this is probably a floating point problem, it really shouldn't be. For instance, the following code works:
root[0] RooRealVar t1("t1", "Some variable", 0.001, 1.0);
root[1] t1 = 0.000999;
root[2] t1.getVal() < 0.001;
(bool) true

Well everyone, I found the answer (and I hate it). It actually has very little to do with floating point arithmetic, and I'm honestly not sure why the code was written this way. From the source code that determines if a value is "in range":
bool RooAbsRealLValue::inRange(double value, const char* rangeName, double* clippedValPtr) const
{
// double range = getMax() - getMin() ; // ok for +/-INIFINITY
double clippedValue(value);
bool isInRange(true) ;
const RooAbsBinning& binning = getBinning(rangeName) ;
double min = binning.lowBound() ;
double max = binning.highBound() ;
// test this value against our upper fit limit
if(!RooNumber::isInfinite(max) && value > (max+1e-6)) {
clippedValue = max;
isInRange = false ;
}
// test this value against our lower fit limit
if(!RooNumber::isInfinite(min) && value < min-1e-6) {
clippedValue = min ;
isInRange = false ;
}
if (clippedValPtr) *clippedValPtr=clippedValue ;
return isInRange ;
}
As we can see here, RooFit doesn't actually check if min < val < max or even min <= val <= max but rather min - 1e-6 < value < max + 1e-6! I couldn't find a single place where this was documented explicitly, but I'm even more concerned that there is a separate implementation of inRange which takes a variable name (or comma separated list of variable names) and returns a result which is incompatible with the prior implementation:
bool RooAbsRealLValue::inRange(const char* name) const
{
const double val = getVal() ;
const double epsilon = 1e-8 * fabs(val) ;
if (!name || name[0] == '\0') {
const auto minMax = getRange(nullptr);
return minMax.first - epsilon <= val && val <= minMax.second + epsilon;
}
const auto& ranges = ROOT::Split(name, ",");
return std::any_of(ranges.begin(), ranges.end(), [val,epsilon,this](const std::string& range){
const auto minMax = this->getRange(range.c_str());
return minMax.first - epsilon <= val && val <= minMax.second + epsilon;
});
}
Here, we can see the creation of an epsilon = 1e-8 * fabs(val) rather than the arbitrary 1e-6 given in the first definition. This comparison uses a <= rather than a < also. It should be noted that the method used to filter trees when imported in this way uses the first implementation (source here).
Somewhere along the way (I'm not entirely sure where actually), some of these arbitrary comparisons lead to the following paradoxical behavior:
root[0] RooRealVar t1("t1", "Some variable", 0.001, 1.0);
root[1] t1 = 0.001 - 1e-6;
(RooAbsArg &) RooRealVar::t1 = 0.000999 L(0.001 - 1)
root[2] t1 = 0.001 - 1e-8 * 0.001;
(RooAbsArg &) RooRealVar::t1 = 0.001 L(0.001 - 1)
root[3] t1 = 0.00099999999;
(RooAbsArg &) RooRealVar::t1 = 0.001 L(0.001 - 1)
I would classify this as a bug. Under no circumstances should 0.00099900000 be classified as within the range of (0.001 - 1) where 0.00099999999 is not!

Related

Why is there a loop in this division as multiplication code?

I got the js code below from an archive of hackers delight (view the source)
The code takes in a value (such as 7) and spits out a magic number to multiply with. Then you bitshift to get the results. I don't remember assembly or any math so I'm sure I'm wrong but I can't find the reason why I'm wrong
From my understanding you could get a magic number by writing ceil(1/divide * 1<<32) (or <<64 for 64bit values, but you'd need bigger ints). If you multiple an integer with imul you'd get the result in one register and the remainder in another. The result register is magically the correct result of a division with this magic number from my formula
I wrote some C++ code to show what I mean. However I only tested with the values below. It seems correct. The JS code has a loop and more and I was wondering, why? Am I missing something? What values can I use to get an incorrect result that the JS code would get correctly? I'm not very good at math so I didn't understand any of the comments
#include <cstdio>
#include <cassert>
int main(int argc, char *argv[])
{
auto test_divisor = 7;
auto test_value = 43;
auto a = test_value*test_divisor;
auto b = a-1; //One less test
auto magic = (1ULL<<32)/test_divisor;
if (((1ULL<<32)%test_divisor) != 0) {
magic++; //Round up
}
auto answer1 = (a*magic) >> 32;
auto answer2 = (b*magic) >> 32;
assert(answer1 == test_value);
assert(answer2 == test_value-1);
printf("%lld %lld\n", answer1, answer2);
}
JS code from hackers delight
var two31 = 0x80000000
var two32 = 0x100000000
function magic_signed(d) { with(Math) {
if (d >= two31) d = d - two32// Treat large positive as short for negative.
var ad = abs(d)
var t = two31 + (d >>> 31)
var anc = t - 1 - t%ad // Absolute value of nc.
var p = 31 // Init p.
var q1 = floor(two31/anc) // Init q1 = 2**p/|nc|.
var r1 = two31 - q1*anc // Init r1 = rem(2**p, |nc|).
var q2 = floor(two31/ad) // Init q2 = 2**p/|d|.
var r2 = two31 - q2*ad // Init r2 = rem(2**p, |d|).
do {
p = p + 1;
q1 = 2*q1; // Update q1 = 2**p/|nc|.
r1 = 2*r1; // Update r1 = rem(2**p, |nc|.
if (r1 >= anc) { // (Must be an unsigned
q1 = q1 + 1; // comparison here).
r1 = r1 - anc;}
q2 = 2*q2; // Update q2 = 2**p/|d|.
r2 = 2*r2; // Update r2 = rem(2**p, |d|.
if (r2 >= ad) { // (Must be an unsigned
q2 = q2 + 1; // comparison here).
r2 = r2 - ad;}
var delta = ad - r2;
} while (q1 < delta || (q1 == delta && r1 == 0))
var mag = q2 + 1
if (d < 0) mag = two32 - mag // Magic number and
shift = p - 32 // shift amount to return.
return mag
}}
In the C CODE:
auto magic = (1ULL<<32)/test_divisor;
We get Integer Value in magic because both (1ULL<<32) & test_divisor are Integers.
The Algorithms requires incrementing magic on certain conditions, which is the next conditional statement.
Now, multiplication also gives Integers:
auto answer1 = (a*magic) >> 32;
auto answer2 = (b*magic) >> 32;
C CODE is DONE !
In the JS CODE:
All Variables are var ; no Data types !
No Integer Division ; No Integer Multiplication !
Bitwise Operations are not easy and not suitable to use in this Algorithm.
Numeric Data is via Number & BigInt which are not like "C Int" or "C Unsigned Long Long".
Hence the Algorithm is using loops to Iteratively add and compare whether "Division & Multiplication" has occurred to within the nearest Integer.
Both versions try to Implement the same Algorithm ; Both "should" give same answer, but JS Version is "buggy" & non-standard.
While there are many Issues with the JS version, I will highlight only 3:
(1) In the loop, while trying to get the best Power of 2, we have these two statements :
p = p + 1;
q1 = 2*q1; // Update q1 = 2**p/|nc|.
It is basically incrementing a counter & multiplying a number by 2, which is a left shift in C++.
The C++ version will not require this rigmarole.
(2) The while Condition has 2 Equality comparisons on RHS of || :
while (q1 < delta || (q1 == delta && r1 == 0))
But both these will be false in floating Point Calculations [[ eg check "Math.sqrt(2)*Math.sqrt(0.5) == 1" : even though this must be true, it will almost always be false ]] hence the while Condition is basically the LHS of || , because RHS will always be false.
(3) The JS version returns only one variable mag but user is supposed to get (& use) even variable shift which is given by global variable access. Inconsistent & BAD !
Comparing , we see that the C Version is more Standard, but Point is to not use auto but use int64_t with known number of bits.
First I think ceil(1/divide * 1<<32) can, depending on the divide, have cases where the result is off by one. So you don't need a loop but sometimes you need a corrective factor.
Secondly the JS code seems to allow for other shifts than 32: shift = p - 32 // shift amount to return. But then it never returns that. So not sure what is going on there.
Why not implement the JS code in C++ as well and then run a loop over all int32_t and see if they give the same result? That shouldn't take too long.
And when you find a d where they differ you can then test a / d for all int32_t a using both magic numbers and compare a / d, a * m_ceil and a * m_js.

Improve numerical accuracy of LineLine intersection method (3D)

I coded the following LineLine intersection method:
double LineLineIntersection(
const Eigen::Vector3d& origin1,
const Eigen::Vector3d& ray1,
const Eigen::Vector3d& origin2,
const Eigen::Vector3d& ray2)
{
if(abs((origin1 - origin2).norm() - 1.0) < 0.000001) return 0;
auto n1 = (origin2 - origin1).cross(ray2);
auto n2 = ray1.cross(ray2);
// Use this to test whether the vectors point in the same or opposite directions
auto n = n2.normalized();
// If n2 is the 0 vector or if the cross products are not colinear, no solution exists
if(n2.norm() < 0.00001 || abs(abs(n1.dot(n)) - n1.norm()) > 0.000001)
return std::numeric_limits<double>::infinity();;
return n1.dot(n) / n2.dot(n);
}
The theory for how this works is explained here. However the page has a mistake, taking just the absolute value has only the magnitude, it erases the direction. So instead, the dot product with the cross direction must be taken. That way the distance can be either positive or negative depending on whether the vectors point in the same direction or not.
This technically works but I am running into big numerical errors. For example in one of my tests I am getting:
The difference between i1.x() and Eigen::Vector3d(-0.581, 1.232, 0).x() is 0.0024061184231309873, which exceeds 0.001, where
i1.x() evaluates to -0.58340611842313095,
Eigen::Vector3d(-0.581, 1.232, 0).x() evaluates to -0.58099999999999996, and
0.001 evaluates to 0.001.
An error bigger than 0.001 is huge. What can I do to make the method more accurate?
This is the value of i1: -0.583406 1.23237 0 sorry to not have included it before.
You're using the type "double", try to change it to "long double" or "__float128" if it exists in your version of G++. Also, you can use "BigDecimal" in Java for better accuracy or maybe some long arithmetics from Python.

Variable changes on it's own in C++

I have a loop going through an array trying to find which index is a string. It should solve for what that value should be.
I can't figure out why, but as soon as the if statements start i becomes 1 which gives my code an error.
I'm not very fluent in C++.
for(int i = 0; i < 4; i++) {
if(auto value = std::get_if<std::string>(&varArr[i])) {
solvedIndex = i;
auto value0 = std::get_if<float>(&varArr[0]);
auto value1 = std::get_if<float>(&varArr[1]);
auto value2 = std::get_if<float>(&varArr[2]);
auto value3 = std::get_if<float>(&varArr[3]);
//i changes to 1 when this if starts??
if(i = 0) {
solvedVar = (*value3 / *value1) * *value2;
} else if (i = 1) {
solvedVar = *value3 / (*value0 / *value2);
} else if (i = 2) {
solvedVar = *value0 / (*value3 / *value1);
} else {
solvedVar = *value1 * (*value0 / *value2);
}
break;
}
}
Note that these variables are declared above. Also, varArr is filled with values:
std::variant<std::string, float> varArr[4];
int solvedIndex;
float solvedVar;
As has been noted, in your if statements, you are using the assignment operator (=) but want the equality comparison operator (==). For your variable i the first if statement sets i equal to 0 and if(0) is the same as if(false). So your program goes to the first else-if which sets i equal to 1 and if(1) evaluates to true. Your code then finishes the block within else if (i = 1) {...} and then ends.
That's because operator= is the assignment operator in C++ (and most languages, actually). That changes the value of the variable to the value on the other side. So, for instance:
x = 0
will change the value of x to 0. Doesn't matter if it's in an if statement. It will always change the value to 0 (or whatever the right hand side value is).
What you are looking for is operator==, which is the comparison (aka relational) operator in C++/ That asks the question "Are these two things equal?" So, for instance:
x == 0
asks is x is equal to 0.

Comparing floating-point numbers (floats or doubles) for min/max

How to compare two different floats, to a certain degree of accuracy. I know that there are very slight imprecisions while using floats or doubles in any programming language, but which may however, be enough to cause this comparison of floats a < b to return a different value than it actually should.
I was solving a problem from the UVa Online Judge which gave me a Wrong Answer many times. It took few float values as input, albeit, to 2 decimal places. I figured out the makeshift solution, which was by splitting the input and converting it to ints, but I wouldn't prefer to use that always.
So my question is, what's the best way to compare whether a float a is lesser (or greater) than a float b, provided that the inputs of a and b are given correct to n decimal places, in this case, 2?
The language I prefer is C++.
Use std::numeric_limits<T>::epsilon() to check whether two numbers are almost equal. If you want to know whether one is greater/less you should also take into account the relative tolerance.
#include <cmath>
#include <limits>
template < typename T >
bool fuzzy_compare(T a, T b)
{
return std::abs(a - b) < std::numeric_limits<T>::epsilon();
};
Just use math:
#define PREC 0.01 //(1/pow(10,n)), n = 2
float c = a-b;
if (abs(c) < PREC) {
printf("a equals b");
} else if(c < 0){
printf("b is grater than a");
} else
printf("a is grater than b");
}
Use the setprecison() operator. The number you put in between the parentheses will determine how many numbers pass the decimal will be included in the output. Be sure to include the iomanip library.
Comparing floats is alway a tricky Here is a more complicated example, showing why you should use std::numeric_limits<T>::epsilon().
The first line returns true, but the second returns false (on my machine).
float64_t CalculateEpsilon ()
{
float64_t l_AllowedInaccuray = 1; // 1.1, 0.9
int32_t significantDecimalPlaces = 2;
return ( l_AllowedInaccuray * pow ( 0.1, significantDecimalPlaces ) );
}
bool IsEqual ( float64_t lhs, float64_t rhs )
{
float64_t l_Epsilon = CalculateEpsilon ();
float64_t l_Delta = std::abs ( lhs - rhs );
return l_Delta <= l_Epsilon;
}
int32_t main ()
{
std::cout << IsEqual ( 107.35999999999999, 107.350 ); //returns true
std::cout << IsEqual ( 107.359999999999999, 107.350 ); //returns false
return 0;
}

Equalities in C/C++

In C++, the usual way of determining if some value, x, is between two limits is to:
//This is (A)
double x = 0.0d;
double lower = -1.0d;
double upper = +1.0d;
if(x > lower && x < upper){
// Do some stuff
}
But today I discovered by accident that I can do this:
// This is (B)
double x = 0.0d;
double lower = -1.0d;
double upper = +1.0d;
if(lower < x < upper){
// Do some stuff
}
It seems to work fine, but I've never heard of this being done before, with "lower < x < upper". Does this produce executable code as you would expect it to? IE, is (A) equivalent to (B)?
I think a lot of people won't know about this, and I suspect that might be because the compiler interprets (A) differently to (B). It this right?
No, A and B are not equivalent, you cannot do this.
Or, obviously you can (as you discovered) but you're not doing what you think you're doing.
You're evaluating (lower < x) < upper, i.e. the value of lower < x (which is false or true, but which convert to int for the comparison) is compared to upper.
See this table of operator precedence for more information.
They are definitely not equivalent. Your expression lower < x < upper will first evaluate lower < x to either true or false, and then do true < x or false < x respectively.
It doesn't work fine. In fact, it's dead wrong, and only works by accident.
lower < x < upper is parsed as (lower < x) < upper; (lower < x) has type bool, and its value is either true or false, depending on the value of x. To compare that bool value to upper the compiler converts the bool to a float with the value 1.0 for true and 0.0 for false.
Well, yes. In either cases x is between the value range.
For example:
lower = 4;
upper = 9;
x = 7;
If you do: 7 > 4 && 7 < 9 is the same as saying 4 < 7 < 9.
This is basic arithmetics, by the way.