integer, parameter :: m = -2147483648
Leads to a compiler error: "Integer too big for its kind", but
integer, parameter :: m = -2147483647 - 1
seems to work and produce the correct results. I presume this is because the compiler checks 2147483648 before negating it and overflows the integer type. While the subtract 1 hack appears to do the desired initialization, is there a "best practice" method to use?
The issue is that the Fortran standard defines integers in terms of "model numbers", which have a symmetric interval. GFortran (which you seem to be using based on the error message) does not allow integer literals which are not Fortran model numbers, even though two's complement hardware can represent such numbers. This check is done during the parsing stage, and thus there is no error message when an expression such as "-huge(0) - 1" is constant folded later on during the compilation process.
With GFortran, you can disable this check with -fno-range-check.
As you suspect, the compiler is interpreting -2147483648 as a unary negation of the value +2147483648, and that intermediate value is too large for a signed 32-bit integer.
As for best practices, every definition of INT_MIN I've seen is expressed in terms of -INT_MAX - 1:
glibc
dietlibc
linux
I conclude that the best practice is to rely on compilers' constant folding to do the right thing, rather than to express this constant directly.
2147483648 don't exists, maximum is 2147483647 in 32 bit. You need to use an integer*8 variable, it's a 64 bit integer that lets you use larger numbers.
Related
OK, I know that there was many question about pow function and casting it's result to int, but I couldn't find answer to this a bit specific question.
OK, this is the C code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int i = 5;
int j = 2;
double d1 = pow(i,j);
double d2 = pow(5,2);
int i1 = (int)d1;
int i2 = (int)d2;
int i3 = (int)pow(i,j);
int i4 = (int)pow(5,2);
printf("%d %d %d %d",i1,i2,i3,i4);
return 0;
}
And this is the output: "25 25 24 25". Notice that only in third case where arguments to pow are not literals we have that wrong result, probably caused by rounding errors. Same thing happends without explicit casting. Could somebody explain what happens in this four cases?
Im using CodeBlocks in Windows 7, and MinGW gcc compiler that came with it.
The result of the pow operation is 25.0000 plus or minus some bit of rounding error. If the rounding error is positive or zero, 25 will result from the conversion to an integer. If the rounding error is negative, 24 will result. Both answers are correct.
What is most likely happening internally is that in one case a higher-precision, 80-bit FPU value is being used directly and in the other case, the result is being written from the FPU to memory (as a 64-bit double) and then read back in (converting it to a slightly different 80-bit value). This can make a microscopic difference in the final result, which is all it takes to change a 25.0000000001 to a 24.999999997
Another possibility is that your compiler recognizes the constants passed to pow and does the calculation itself, substituting the result for the call to pow. Your compiler may use an internal arbitrary-precision math library or it may just use one that's different.
This is caused by a combination of two problems:
The implementation of pow you are using is not high quality. Floating-point arithmetic is necessarily approximate in many cases, but good implementations take care to ensure that simple cases such as pow(5, 2) return exact results. The pow you are using is returning a result that is less than 25 by an amount greater than 0 but less than or equal to 2–49. For example, it might be returning 25–2-50.
The C implementation you are using sometimes uses a 64-bit floating-point format and sometimes uses an 80-bit floating-point format. As long as the number is kept in the 80-bit format, it retains the complete value that pow returned. If you convert this value to an integer, it produces 24, because the value is less than 25 and conversion to integer truncates; it does not round. When the number is converted to the 64-bit format, it is rounded. Converting between floating-point formats rounds, so the result is rounded to the nearest representable value, 25. After that, conversion to integer produces 25.
The compiler may switch formats whenever it is “convenient” in some sense. For example, there are a limited number of registers with the 80-bit format. When they are full, the compiler may convert some values to the 64-bit format and store them in memory. The compiler may also rearrange expressions or perform parts of them at compile-time instead of run-time, and these can affect the arithmetic performed and the format used.
It is troublesome when a C implementation mixes floating-point formats, because users generally cannot predict or control when the conversions between formats occur. This leads to results that are not easily reproducible and interferes with deriving or controlling numerical properties of software. C implementations can be designed to use a single format throughout and avoid some of these problems, but your C implementation is apparently not so designed.
To add to the other answers here: just generally be very careful when working with floating point values.
I highly recommend reading this paper (even though it is a long read):
http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf
Skip to section 3 for practical examples, but don't neglect the previous chapters!
I'm fairly sure this can be explained by "intermediate rounding" and the fact that pow is not simply looping around j times multiplying by i, but calculating using exp(log(i)*j) as a floating point calculation. Intermediate rounding may well convert 24.999999999996 into 25.000000000 - even arbitrary storing and reloading of the value may cause differences in this sort of behaviuor, so depending on how the code is generated, it may make a difference to the exact result.
And of course, in some cases, the compiler may even "know" what pow actually achieves, and replace the calculation with a constant result.
I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:
If all the bits in an integer type are zero, does the integer as whole represent zero?
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Is there a guaranteed way to check if any bit is not set?
Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?
I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").
Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.
(1) If all the bits in an integer type are zero, does the integer as whole represent zero?
Yes, the bit pattern consisting of all zeroes always represents 0:
The representations of integral types shall define values by use of a pure binary numeration system.49 [§3.9.1/7]
49 A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. In fact, signed magnitude is specifically allowed:
[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end
example ] [§3.9.1/7]
(3) Is there a guaranteed way to check if any bit is not set?
I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]
(4) Is there a guaranteed way to check if any bit is set?
I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.
(5) Is there a guaranteed way to set the left-most and/or right-most bits?
Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:
Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]
You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.
The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.
As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.
I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.
You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.
If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.
Good luck finding a machine nowadays that uses signed magnitude or one's complement.
If you want your brain to explode, consider this: If you interpret an int or long or long long as an array of unsigned char (which is the most reasonable thing to do if you want to see all the bits), you know that the order of bytes is not defined, for example "bigendian" vs. "littleendian". We all (hopefully) know that.
But it is worse: Each bit of an int could be stored in any of the bits of the array of char. So there are 32! ways how the bits of a 32 bit integer could be mapped to an array of four 8-bit unsigned chars by a truly bizarre implementation. Fortunately, I haven't encountered more than two ways myself (and I know of one more ordering in a real computer).
If all the bits in an integer type are zero, does the integer as whole represent zero?
Edit: since you have now clarified that you are not concerned with the padding bits, the answer to this is actually "yes". But I leave the original:
Not necessarily, it could be a trap representation. See C99 6.2.6.1:
For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)
The presence of padding bits allows for the possibility that all 0 is a trap representation. (As noted by Keith Thompson in the comment below, the more recent C11 makes explicit that such a representation is not a trap representation).
and
The values of any padding bits are unspecified
and
44) Some combinations of padding bits might generate trap representations
If you restrict the question to value and sign bits, the answer is yes, due to 6.2.6.2:
If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2 N −1 , so that objects of that type shall be capable of representing values from 0 to 2 N − 1 using a pure binary representation; this shall be known as the value representation.
and
If the sign bit is zero, it shall not affect the resulting value.
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Not necessarily, and in fact sign-and-magnitude is explicitly supported in 6.2.6.2.
Is there a guaranteed way to check if any bit is not set?
If you do not care about padding and sign bits, you could just compare to 0, but this would not work with a 1's complement representation (which is allowed) seeing as all bits 0 and all bits 1 both represent the value 0.
Otherwise: you can read the value of each byte via an unsigned char *, and compare the result to 0:
Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation
If you want to check a specific value bit, you could construct a suitable bitmask using (1u << n), but this will not necessarily let you inspect the sign bit.
Is there a guaranteed way to check if any bit is set?
The answer is essentially the same as to the previous question.
Is there a guaranteed way to set the left-most and/or right-most bits?
Do you mean left-most value bit? You could count the bits in INT_MAX or UINT_MAX or equivalent depending on the type, and use that to construct a value (via 1 << n) with which to OR the original value.
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
You can do so using a bitmask which you left shift repeatedly, but you can check only the value bits this way and not the sign bit.
For the bitmanipulations you could make a struct with 8 one unsigned bit fields and let the pointer of that struct point to your char. In that way you can easily access each bit. But the compiler will probably do masking under the hood, so it is only a cleaner way for the programmer I think. You must check that your compiler doesn't change the order of the fields when doing this.
yourstruct* pChar=(yourstruct*)(&c)
pChar.Bit7=1;
Let me caveat this by saying I'm addressing C and C++ in general (e.g. C90 and lower, MS Visual C++, etc): the "greatest common denominator" (vs. the latest/greatest cx11 "standard").
Q: If all the bits in an integer type are zero, does the integer as whole represent zero?
A: Yes
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
A: Yes. This includes the sign bit, for a signed int.
I'm frankly not familiar with "magnitude"
Q: Is there a guaranteed way to check if any bit is not set?
A: "And'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to check if any bit is set?
A: Again, "and'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to set the left-most and/or right-most bits?
A: I believe you should always have a "MAX_INT" available for all implementations/all architectures to determine the leftmost bit.
I'm prepared to be flamed ... but I believe the above is accurate. And I hope it helps.
IMHO...
On my platform this prints 9223372036854775808.
double x = 1e19;
std::cout << static_cast<unsigned __int64>(x) << '\n';
I tried Boost.NumericConversion, but got the same result.
Splitting x into 2 equal part, then adding together converted halves give the correct result. But I need a generic solution to use in a template code.
Thank you in advance.
EDIT:
This problem shows up on Visual Studio 2008, but not MinGW. Casting 4.0e9 into unsigned long works fine.
Seems like it works well with gcc, but it is problematic in Visual Studio. See Microsoft's answer regarding this issue:
Our floating-point to integer
conversions are always done to a
signed integer. In this particular
case we use FIST instruction which
generates 800..00 as you described.
Therefore, there is no defined
behavior for converting to unsigned
64-bit integer values which are
larger than largest 64-bit signed
integer.
So you can only convert the numbers in the signed 64-bit integer range: −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 (-2^63~2^63-1).
The behavior of your compiler is not conforming to C99, it requires that positive values should always be converted correctly if possible. It only allows to deviate from that for negative values.
The remaindering operation performed
when a value of integer type is
converted to unsigned type need not be
performed when a value of real
floating type is converted to unsigned
type. Thus, the range of portable real
floating values is (−1, Utype_MAX+1).
For you template code, you might just test if your value is greater than static_cast< double >(UINT64_MAX/2) and do the repair work that you are already doing. If this only concerns testing for constants, this should be optimized out where it is not relevant.
In C++ we can make primitives unsigned. But they are always positive. Is there also a way to make unsigned negative variables? I know the word unsigned means "without sign", so also not a minus (-) sign. But I think C++ must provide it.
No. unsigned can only contain nonnegative numbers.
If you need a type that only represent negative numbers, you need to write a class yourself, or just interpret the value as negative in your program.
(But why do you need such a type?)
unsigned integers are only positive. From 3.9.1 paragraph 3 of the 2003 C++ standard:
The range of nonnegative values of a
signed integer type is a subrange of
the corresponding unsigned integer
type, and the value representation of
each corresponding signed/unsigned
type shall be the same.
The main purpose of the unsigned integer types is to support modulo arithmetic. From 3.9.1 paragraph 4:
Unsigned integers, declared unsigned,
shall obey the laws of arithmetic
modulo 2n where n is the
number of bits in the value
representation of that particular size
of integer.
You are free, of course, to treat them as negative if you wish, you'll just have to keep track of that yourself somehow (perhaps with a Boolean flag).
I think you are thinking it the wrong way.
If you want a negative number, then you must not declare the variable as unsigned.
If your problem is the value range, and you want that one more bit, then you could use a "bigger" data type (int 64 for example...).
Then if you are using legacy code, creating a new struct can solve your problem, but this is that way because of your specific situation, C++ shouldn't handle it.
Don't be fooled by the name: unsigned is often misunderstood as non-negative, but the rules for the language are different... probably a better name would have been "bitmask" or "modulo_integer".
If you think that unsigned is non-negative then for example implicit conversion rules are total nonsense (why a difference between two non-negative should be a non-negative ? why the addition of a non-negative and an integer should be non-negative ?).
It's very unfortunate that C++ standard library itself fell in that misunderstanding because for example vector.size() is unsigned (absurd if you mean it as the language itself does in terms of bitmask or modulo_integer). This choice for sizes has more to do with the old 16-bit times than with unsigned-ness and it was in my opinion a terrible choice that we're still paying as bugs.
But I think C++ must provide it.
Why? You think that C++ must provide the ability to make non-negative numbers negative? That's silly.
Unsigned values in C++ are defined to always be non-negative. They even wrap around — rather than underflowing — at zero! (And the same at the other end of the range)
To retrieve the smallest value i have to use numeric_limits<int>::min()
I suppose the smallest int is -2147483648, and tests on my machine showed this result.
But some C++ references like Open Group Base Specifications and
cplusplus.com define it with the value -2147483647.
I ask this question because in my implementation of the negaMax Framework (Game Tree Search)
the value minimal integer * (-1) has to be well defined.
Yes, with minimal int = (numeric_limits::min() + 2) i am on the safe side in any case,
thus my question is more theoretically but i think nevertheless quite interesting.
If a value is represented as sign-and-magnitude instead of two's complement, the sign bit being one with all other bits as zero is equivalent to -0. In sign-and-magnitude the maximum positive integer and negative integer are the same magnitude. Two's complement is able to represent one more negative value because it doesn't have the same symmetry.
The value of numeric_limits<int>::min() is defined by implementation. That's why it could be different. You shouldn't stick to any concrete minimal value.
On cplusplus.com you forgot to read the qualifier
min. magnitude*
This is not necessarily the actual value of the constant in any particular compiler or system, it may be equal or greater in magnitude than this.
From the cplusplus.com link you posted (emphasis mine):
The following panel shows the different constants and their guaranteed minimal magnitudes (positive numbers may be greater in value, and negative numbers may be less in value). Any particular compiler implementation may define integral types with greater magnitudes than those shown here
Numeric limits are always system and compiler defined, try running with a 64bit compiler and system, you may see totally different numbers.
c++ uses two-s compliment for signed integers. Thus the smallest signed integer is defined by 100..00 (usually 32 bit).
Simply shifting 1<<(sizeof(int)*8-1) should give you the smallest signed integer.
Obviously for unsigned integers, the smallest is 0.
edit: you can read more here
edit2: apparently C++ doesn't necessarily use two-s compliment, my mistake