I need to round integers to be the nearest multiple of another integer. Examples for results in the case of multiples of 100:
36->0
99->100
123->100
164->200
and so on.
I came up with the following code, that works, but feels "dirty":
int RoundToMultiple(int toRound, int multiple)
{
return (toRound + (multiple / 2)) / multiple * multiple;
}
This counts on the truncating properties of integer division to make it work.
Can I count on this code to be portable? Are there any compiler setups where this will fail to give me the desired result? If there are, how can I achieve the same results in a portable way?
If needed for a better answer, it can be assumed that multiples will be powers of 10 (including multiples of 1). Numbers can also be assumed to all be positive.
Yes, you can count on this code to be portable. N4296 (which is the latest open draft of C++14) says in section 5.6 [expr.mul]:
For
integral operands the / operator yields the algebraic quotient with any fractional part discarded. [Footnote: This is often called truncation towards zero]
This is not a new feature of the latest C++, it could be relied on in C89 too.
The only caveat, is that if toRound is negative, you need to subtract the offset.
An alternative approach is:
int RoundToMultiple(int toRound, int multiple)
{
const auto ratio = static_cast<double>(toRound) / multiple;
const auto iratio = std::lround(ratio);
return iratio * multiple;
}
This avoid messy +/- offsets, but performance will be worse, and there are problems if toRound is so large that it can't be held precisely in a double. (OTOH, if this is for output, then I suspect multiple will be similarly large in this case, so you will be alright.)
The C++ standard explicitly specifies the behavior of integer division thusly:
[expr.mul]
For integral operands the / operator yields the algebraic
quotient with any fractional part discarded.
A.k.a. truncation towards zero. This is as portable as it gets.
Though - as mentioned by others - the integral division behaves as you expect, may be the following solution looks "less wired" (still opinion based).
Concerning a solution that converts an int to a double: I personally feel that this is to expensive just for the sake of rounding, but maybe someone can convince me that my feeling is wrong;
Anyway, by using just integral operators, the following solution makes the discussion on whether a double's mantissa can always hold every int superfluous:
int RoundToMultiple(int toRound, int multiple) {
toRound += multiple / 2;
return toRound - (toRound%multiple);
}
If you also wanted to include negative values, the code could be slightly adapted as follows (including tests):
#include <stdio.h>
int RoundToMultiple(int toRound, int multiple) {
toRound += toRound < 0 ? -multiple / 2 : multiple / 2;
return toRound - (toRound%multiple);
}
int main(int argc, char const *argv[])
{
int tests[] = { 36,99,123,164,-36,-99,-123,-164,0 };
int expectedResults[] = { 0,100,100,200,0,-100,-100,-200,0 };
int i=0;
int test=0, result=0, expectedResult=0;
do {
test = tests[i];
result = RoundToMultiple(test, 100);
expectedResult = expectedResults[i];
printf("test %d: %d==%d ? %s\n", test, result, expectedResult, (expectedResult==result ? "OK" : "NOK!"));
i++;
}
while(test != 0);
}
Related
heres my code:
int oupt=0;
for(int i=0, power=number.length()-1; i<number.length(); i++, power--) {
oupt+=((int)number[i]-48)*pow(5, power);
}
string output = std::to_string(oupt);
in this, number is a string of the number i want to convert from base 5 to base 10, for example "414".when i run the code i get the answer as 108 when i know the answer is 109. for all other bases of 5 it gives output which is decremented by one. when i try to print ((int)number[i]-48)*pow(5, power), i get output which makes sense, 100, 5, 4. which add up to 109. the output i get is 108. this is true for all conversions from base 5 to base 10. please help me.
Any toolset that doesn't give you an exact double value for pow with integral arguments ought to be considered defective.
But alas the C++ standard permits pow to give you an approximation beyond that endemic in double precision floating point arithmetic. Under some standards - such as IEEE754 - some functions (e.g. sqrt) and the arithmetic operators are required to return the best floating point number possible, but that rule does not apply to pow. It allows compiler and C++ standard library vendors to be lazy and implement pow(x, y) as exp(y * log(x)) for example. And that can undershoot the correct value, and the truncation to an integral type makes the problem pernicious.
This is what's happening to you here.
Putting std::round around the pow result will probably work.
If you have to roll your own text-to-integer conversion, it can be done much more simply:
const int base = 5;
int oupt = 0;
for (int i = 0; i < number.length(); ++i) {
oupt *= base;
oupt += number[i] - '0'; // works for all character encodings
}
std::string output = std::to_string(oupt);
Note: no floating-point math used or needed.
But it's even easier to use the standard library:
int oupt = std::stoi(number, nullptr, base);
I am using the addition in log space equation described on the Wikipedia log probability article, but I am getting underflow when computing the exp of very large, negative, logarithms. As a result, my program crashes.
Example inputs are a = -2 and b = -1033.4391885529124.
My code, implemented straight from the Wikipedia article, looks like this:
double log_sum(double a, double b)
{
double min_ab = std::min(a, b);
a = std::max(a, b);
b = min_ab;
if (isinf(a) && isinf(b)) {
return -std::numeric_limits<double>::infinity();
} else if (isinf(a)) {
return b;
} else if (isinf(b)) {
return a;
} else {
return a + log2(1 + exp2(b - a));
}
}
I've come up with the following ideas, but can't decide which is best:
Check for out-of-range inputs before evaluation.
Disable (somehow) the exception, and flush or clamp the output after evaluation
Implement custom log and exp functions that do not throw exceptions and automatically flush or clamp the results.
Some other ways?
Additionally, I'd be interested to know what effect the choice of the logarithm base has on the computation. I chose base two because I believed that other log bases would be calculated from log_n(x) = log_2(x) / log_2(n), and would suffer from precision loss due to the division. Is that correct?
According to http://en.cppreference.com/w/cpp/numeric/math/exp:
For IEEE-compatible type double, overflow is guaranteed if 709.8 < arg, and underflow is guaranteed if arg < -708.4
So you can't prevent an underflow. However:
If a range error occurs due to underflow, the correct result (after rounding) is returned.
So there shouldn't be any program crash - "just" a loss of precision.
However, notice that
1 + exp(n)
will loose precision much sooner, i.e. already at n = -53. This is because the next representable number after 1.0 is 1.0 + 2^-52.
So loss of precision due to exp is far less than the precision lost when adding 1.0 + exp(...)
The problem here is accurately computing the expression log(1+exp(x)) without intermediate under/overflow. Fortunately, Martin Maechler (one of the R core developers) details how to do it in section 3 of this vignette.
He uses natural base functions: it should be possible to translate it to base-2 by appropriately scaling the functions, but it uses the log1p function in one part, and I'm not aware of any math library which supplies a base-2 variant.
The choice of base is unlikely to have any effect on accuracy (or performance), and most reasonable math libraries are able to give sub 1-ulp guarantees for both functions (i.e. you will have one of the two floating point values closest to the exact answer). A pretty common approach is to break up the floating point number into its base-2 exponent k and significand 1+f, such that 1/sqrt(2) < 1+f < sqrt(2), and then use a polynomial approximation to compute log(1+f): due to some mathematical quirks (basically, the fact that the 2nd term of the Taylor series can be represented exactly) it turns out to be more accurate to do this in the natural base rather than base-2, so a typical implementation will look like:
log(x) = k*log2 + p(f)
log2(x) = k + p(f)*invlog2
(e.g. see log and log2 in openlibm), so there is no real benefit to using one over the other.
I stumbled onto this code here.
Generators doubleSquares(int value)
{
Generators result;
for (int i = 0; i <= std::sqrt(value / 2); i++) // 1
{
double j = std::sqrt(value - std::pow(i, 2)); // 2
if (std::floor(j) == j) // 3
result.push_back( { i, static_cast<int>(j) } ); // 4
}
return result;
}
Am I wrong to think that //3 is dangerous ?
This code is not guaranteed by the C++ standard to work as desired.
Some low-quality math libraries do not return correctly rounded values for pow, even when the inputs have integer values and the mathematical result can be exactly represented. sqrt may also return an inaccurate value, although this function is easier to implement and so less commonly suffers from defects.
Thus, it is not guaranteed that j is exactly an integer when you might expect it to be.
In a good-quality math library, pow and sqrt will always return correct results (zero error) when the mathematical result is exactly representable. If you have a good-quality C++ implementation, this code should work as desired, up to the limits of the integer and floating-point types used.
Improving the Code
This code has no reason to use pow; std::pow(i, 2) should be i*i. This results in exact arithmetic (up to the point of integer overflow) and completely avoids the question of whether pow is correct.
Eliminating pow leaves just sqrt. If we know the implementation returns correct values, we can accept the use of sqrt. If not, we can use this instead:
for (int i = 0; i*i <= value/2; ++i)
{
int j = std::round(std::sqrt(value - i*i));
if (j*j + i*i == value)
result.push_back( { i, j } );
}
This code only relies on sqrt to return a result accurate within .5, which even a low-quality sqrt implementation should provide for reasonable input values.
There are two different, but related, questions:
Is j an integer?
Is j likely to be the result of a double calculation whose exact result would be an integer?
The quoted code asks the first question. It is not correct for asking the second question. More context would be needed to be certain which question should be being asked.
If the second question should be being asked, you cannot depend only on floor. Consider a double that is greater than 2.99999999999 but less than 3. It could be the result of a calculation whose exact value would be 3. Its floor is 2, and it is greater than its floor by almost 1. You would need to compare for being close to the result of std:round instead.
I would say it is dangerous. One should always test for "equality" of floating point numbers by comparing the difference between the two numbers with an acceptably small number, e.g.:
#include <math.h>
...
if (fabs(std::floor(j) - j) < eps) {
...
... where eps is a number that is acceptably small for one's purpose. This approach is essential unless one can guarantee that the operations return exact results, which may be true for some cases (e.g. IEEE-754-compliant systems) but the C++ standard does not require that this be true. See, for instance Cross-Platform Issues With Floating-Point Arithmetics in C++.
Does anyone know of an open-source C or C++ library with functions implementing every integer division mode one might want? Possible behaviors (for positive result):
round_down, round_up,
round_to_nearest_with_ties_rounding_up,
round_to_nearest_with_ties_rounding_down,
round_to_nearest_with_ties_rounding_to_even,
round_to_nearest_with_ties_rounding_to_odd
with each (aside from round-to-even and round-to-odd) having two variants
// (round relative to 0; -divide(-x, y) == divide(x, y))
negative_mirrors_positive,
// (round relative to -Infinity; divide(x + C*y, y) == divide(x, y) + C)
negative_continuous_with_positive
.
I know how to write it, but surely someone has done so already?
As an example, if we assume (as is common and is mandated in C++11) that built-in signed integral division rounds towards zero, and that built-in modulus is consistent with this, then
int divide_rounding_up_with_negative_mirroring_positive(int dividend, int divisor) {
// div+mod is often a single machine instruction.
const int quotient = dividend / divisor;
const int remainder = dividend % divisor;
// this ?:'s condition equals whether quotient is positive,
// but we compute it without depending on quotient for speed
// (instruction-level parallelism with the divide).
const int adjustment = (((dividend < 0) == (divisor < 0)) ? 1 : -1);
if(remainder != 0) {
return quotient + adjustment;
}
else {
return quotient;
}
}
Bonus points: work for multiple argument types; fast; optionally return modulus as well; do not overflow for any argument values (except division by zero and MIN_INT/-1, of course).
If I don't find such a library, I'll write one in C++11, release it, and link to it in an answer here.
So, I wrote something. The implementation is typically ugly template and bitwise code, but it works well. Usage:
divide(dividend, divisor, rounding_strategy<...>())
where rounding_strategy<round_up, negative_mirrors_positive> is an example strategy; see list of variants in my question or in the source code. https://github.com/idupree/Lasercake/blob/ee2ce96d33cad10d376c6c5feb34805ab44862ac/data_structures/numbers.hpp#L80
depending only on C++11 [*], with unit tests (using Boost Test framework) starting at https://github.com/idupree/Lasercake/blob/ee2ce96d33cad10d376c6c5feb34805ab44862ac/tests/misc_utils_tests.cpp#L38
It is polymorphic, decent speed, and does not overflow, but doesn't currently return modulus.
[*] (and on boost::make_signed and boost::enable_if_c, which are trivial to replace with std::make_signed and std::enable_if, and on our caller_error_if() which can be replaced with assert() or if(..){throw ..} or deleted. You can ignore and delete the rest of the file assuming you're not interested in the other things there.)
Each divide_impl's code can be adapted to C by replacing each T with e.g. int and T(CONSTANT) with CONSTANT. In the case of the round_to_nearest_* variant, you'd either want to make the rounding kind be a runtime argument or create six copies of the code (one for each distinct rounding variation it handles). The code relies on '/' rounding towards zero, which is common and also specified by C11 (std draft N1570 6.5.5.6) as well as C++11. For C89/C++98 compatibility, it could use stdlib.h div()/ldiv() which are guaranteed to round towards zero (see http://www.linuxmanpages.com/man3/div.3.php , http://en.cppreference.com/w/cpp/numeric/math/div )
I'm currently implementing a hash table in C++ and I'm trying to make a hash function for floats...
I was going to treat floats as integers by padding the decimal numbers, but then I realized that I would probably reach the overflow with big numbers...
Is there a good way to hash floats?
You don't have to give me the function directly, but I'd like to see/understand different concepts...
Notes:
I don't need it to be really fast, just evenly distributed if possible.
I've read that floats should not be hashed because of the speed of computation, can someone confirm/explain this and give me other reasons why floats should not be hashed? I don't really understand why (besides the speed)
It depends on the application but most of time floats should not be hashed because hashing is used for fast lookup for exact matches and most floats are the result of calculations that produce a float which is only an approximation to the correct answer. The usually way to check for floating equality is to check if it is within some delta (in absolute value) of the correct answer. This type of check does not lend itself to hashed lookup tables.
EDIT:
Normally, because of rounding errors and inherent limitations of floating point arithmetic, if you expect that floating point numbers a and b should be equal to each other because the math says so, you need to pick some relatively small delta > 0, and then you declare a and b to be equal if abs(a-b) < delta, where abs is the absolute value function. For more detail, see this article.
Here is a small example that demonstrates the problem:
float x = 1.0f;
x = x / 41;
x = x * 41;
if (x != 1.0f)
{
std::cout << "ooops...\n";
}
Depending on your platform, compiler and optimization levels, this may print ooops... to your screen, meaning that the mathematical equation x / y * y = x does not necessarily hold on your computer.
There are cases where floating point arithmetic produces exact results, e.g. reasonably sized integers and rationals with power-of-2 denominators.
If your hash function did the following you'd get some degree of fuzziness on the hash lookup
unsigned int Hash( float f )
{
unsigned int ui;
memcpy( &ui, &f, sizeof( float ) );
return ui & 0xfffff000;
}
This way you'll mask off the 12 least significant bits allowing for a degree of uncertainty ... It really depends on yout application however.
You can use the std hash, it's not bad:
std::size_t myHash = std::cout << std::hash<float>{}(myFloat);
unsigned hash(float x)
{
union
{
float f;
unsigned u;
};
f = x;
return u;
}
Technically undefined behavior, but most compilers support this. Alternative solution:
unsigned hash(float x)
{
return (unsigned&)x;
}
Both solutions depend on the endianness of your machine, so for example on x86 and SPARC, they will produce different results. If that doesn't bother you, just use one of these solutions.
You can of course represent a float as an int type of the same size to hash it, however this naive approach has some pitfalls you need to be careful of...
Simply converting to a binary representation is error prone since values which are equal wont necessarily have the same binary representation.
An obvious case: -0.0 wont match 0.0 for example. *
Further, simply converting to an int of the same size wont give very even distribution, which is often important (implementing a hash/set that uses buckets for example).
Suggested steps for implementation:
filter out non-finite cases (nan, inf) and (0.0, -0.0 whether you need to do this explicitly or not depends on the method used).
convert to an int of the same size(that is - use a union for example to represent the float as an int, not simply cast to an int).
re-distribute the bits, (intentionally vague here!), this is basically a speed vs quality tradeoff. But if you have many values in a small range you probably don't want them to in a similar range too.
*: You may wan't to check for (nan and -nan) too. How to handle those exactly depends on your use case (you may want to ignore sign for all nan's as CPython does).
Python's _Py_HashDouble is a good reference for how you might hash a float, in production code (ignore the -1 check at the end, since that's a special value for Python).
If you're interested, I just made a hash function that uses floating point and can hash floats. It also passes SMHasher ( which is the main bias-test for non-crypto hash functions ). It's a lot slower than normal non-cryptographic hash functions due to the float calculations.
I'm not sure if tifuhash will become useful for all applications, but it's interesting to see a simple floating point function pass both PractRand and SMHasher.
The main state update function is very simple, and looks like:
function q( state, val, numerator, denominator ) {
// Continued Fraction mixed with Egyptian fraction "Continued Egyptian Fraction"
// with denominator = val + pos / state[1]
state[0] += numerator / denominator;
state[0] = 1.0 / state[0];
// Standard Continued Fraction with a_i = val, b_i = (a_i-1) + i + 1
state[1] += val;
state[1] = numerator / state[1];
}
Anyway, you can get it on npm
Or you can check out the github
Using is simple:
const tifu = require('tifuhash');
const message = 'The medium is the message.';
const number = 333333333;
const float = Math.PI;
console.log( tifu.hash( message ),
tifu.hash( number ),
tifu.hash( float ),
tifu.hash( ) );
There's a demo of some hashes on runkit here https://runkit.com/593a239c56ebfd0012d15fc9/593e4d7014d66100120ecdb9
Side note: I think that in future using floating point,possibly big arrays of floating point calculations, could be a useful way to make more computationally-demanding hash functions in future. A weird side effect I discovered of using floating point is that the hashes are target dependent, and I surmise maybe they could be use to fingerprint the platforms they were calculated on.
Because of the IEEE byte ordering the Java Float.hashCode() and Double.hashCode() do not give good results. This problem is wellknown and can be adressed by this scrambler:
class HashScrambler {
/**
* https://sites.google.com/site/murmurhash/
*/
static int murmur(int x) {
x ^= x >> 13;
x *= 0x5bd1e995;
return x ^ (x >> 15);
}
}
You then get a good hash function, which also allows you to use Float and Double in hash tables. But you need to write your own hash table that allows a custom hash function.
Since in a hash table you need also test for equality, you need an exact equality to make it work. Maybe the later is what President James K. Polk intends to adress?