This is a very basic question.Please don't mind but I need to ask this. Adding two integers
int main()
{
cout<<"Enter a string: ";
int a,b,c;
cout<<"Enter a";
cin>>a;
cout<<"\nEnter b";
cin>>b;
cout<<a<<"\n"<<b<<"\n";
c= a + b;
cout <<"\n"<<c ;
return 0;
}
If I give a = 2147483648 then
b automatically takes a value of 4046724. Note that cin will not be prompted
and the result c is 7433860
If int is 2^32 and if the first bit is MSB then it becomes 2^31
c= 2^31+2^31
c=2^(31+31)
is this correct?
So how to implement c= a+b for a= 2147483648 and b= 2147483648 and should c be an integer or a double integer?
When you perform any sort of input operation, you must always include an error check! For the stream operator, this could look like this:
int n;
if (!(std::cin >> n)) { std::cerr << "Error!\n"; std::exit(-1); }
// ... rest of program
If you do this, you'll see that your initial extraction of a already fails, so whatever values are read afterwards are not well defined.
The reason the extraction fails is that the literal token "2147483648" does not represent a value of type int on your platform (it is too large), no different from, say, "1z" or "Hello".
The real danger in programming is to assume silently that an input operation succeeds when often it doesn't. Fail as early and as noisily as possible.
The int type is signed and therefor it's maximum value is 2^31-1 = 2147483648 - 1 = 2147483647
Even if you used unsigned integer it's maximum value is 2^32 -1 = a + b - 1 for the values of a and b you give.
For the arithmetics you are doing, you should better use "long long", which has maximum value of 2^63-1 and is signed or "unsigned long long" which has a maximum value of 2^64-1 but is unsigned.
c= 2^31+2^31
c=2^(31+31)
is this correct?
No, but you're right that the result takes more than 31 bits. In this case the result takes 32 bits (whereas 2^(31+31) would take 62 bits). You're confusing multiplication with addition: 2^31 * 2^31 = 2^(31+31).
Anyway, the basic problem you're asking about dealing with is called overflow. There are a few options. You can detect it and report it as an error, detect it and redo the calculation in such a way as to get the answer, or just use data types that allow you to do the calculation correctly no matter what the input types are.
Signed overflow in C and C++ is technically undefined behavior, so detection consists of figuring out what input values will cause it (because if you do the operation and then look at the result to see if overflow occurred, you may have already triggered undefined behavior and you can't count on anything). Here's a question that goes into some detail on the issue: Detecting signed overflow in C/C++
Alternatively, you can just perform the operation using a data type that won't overflow for any of the input values. For example, if the inputs are ints then the correct result for any pair of ints can be stored in a wider type such as (depending on your implementation) long or long long.
int a, b;
...
long c = (long)a + (long)b;
If int is 32 bits then it can hold any value in the range [-2^31, 2^31-1]. So the smallest value obtainable would be -2^31 + -2^31 which is -2^32. And the largest value obtainable is 2^31 - 1 + 2^31 - 1 which is 2^32 - 2. So you need a type that can hold these values and every value in between. A single extra bit would be sufficient to hold any possible result of addition (a 33-bit integer would hold any integer from [-2^32,2^32-1]).
Or, since double can probably represent every integer you need (a 64-bit IEEE 754 floating point data type can represent integers up to 53 bits exactly) you could do the addition using doubles as well (though adding doubles may be slower than adding longs).
If you have a library that offers arbitrary precision arithmetic you could use that as well.
Related
For some values (like 9) it works perfectly but, for most (like 7, 19 or 6), it subtracts 1 from the return (binary) value.
#include<iostream>
#include<cmath>
using namespace std;
int decimaltobinary(int);
int main()
{
int num;
cout<<"Enter the number: ";
cin>>num;
cout<<num<<" in decimal = "<<decimaltobinary(num)<<" in binary.";
return 0;
}
int decimaltobinary(int num)
{
int remainder,i=0,binary=0;
while(num!=0)
{
remainder=num%2;
num=num/2;
binary=binary+remainder*pow(10,i);
i++;
}
return binary;
}
There are two main problems with the shown code:
The shown code attempts to build a binary version of the input number in decimal producing, for example, the result of 111 for the number 7. That's an integer value of one hundred and eleven.
On a 32 bit platform, with a 32 bit integer means that the largest number that can be "converted" to decimal this way will be 2047. 2048 is 10000000000 in binary, which will exceed the capacity of a 32 bit integer. An unsigned 32 bit integer's maximum value is 4294967295 (and half of that for it's plain, signed, int value, but either signed or unsigned, you're out of gas at this point).
Any use of pow() with two values that are integets is automatically broken by default, becase floating point math is broken. This is not what pow() really does. Here's what pow() does: a) it takes the natural logarithm of its first parameter, b) multiples the result from step a by its 2nd parameter, c) raises e to the power resulting from step b. Does this sound like something you expected to do here?
And since pow() takes floating point parameters, and the result is a floating point, the end result of the shown code is a bunch of needless conversions between floating point and integer values, and non-specific rounding errors as a result of imprecise floating point exponential math.
But the main flaw in the shown code is an attempt to use plain ints to assemble a decimal number represented of a binary value, which simply doesn't have enough digits for this. Switching to long long int won't be much of a help. Counting things off on my fingers, you'll be able to go up only to somewhere slightly north of a million, that way. A completely different approach must be taken for the described programming tasks.
Your problem is that binary+remainder*pow(10,i); is all done in floating-point arithmetic and only converted to int at the assignment. Since pow is not exact, you may get the result slightly below the exact value, in which case the conversion truncates it and makes 1 less than the desired result.
While there are various better ways to achieve your goal, the immediate fix is to use std::round() and then cast the result to int:
binary=binary+remainder*int(round(pow(10,i)));
Why does cin fail, when I enter a number like: 3999999999 but it works for smaller numbers like: 5 ?
#include <iostream>
int main()
{
int n;
std::cin >> n;
if (std::cin.fail())
std::cout << "Something sucks!";
else
std::cout << n;
return 0;
}
Try:
std::cout << std::numeric_limits<int>::max() << std::endl; // requires you to #include <limits>
int on your system is likely a 32-bit signed two's complement number, which means the max value it can represent is 2,147,483,647. Your number, 3,999,999,999, is larger than that, and can't be properly represented by int. cin fails, alerting you of the problem.
long may be a 64-bit integer on your system, and if it is, try that. You need a 64-bit integer to represet 3,999,999,999. Alternatively, you can use an unsigned int, which will be able to represent numbers as large as 4,294,967,295 (again, on the typical system). Of course, this means you can't represent negative numbers, so it's a trade-off.
That's probably because the big value is too big to fit into a variable of int type on your system. Try just assigning the big value to your variable and see what happens.
To fix it, change the type of the variable to a wider type, like long or long long.
The maximal number representable by an int depends on the number of bits it is represented. Usually, it's 32-bits, ie. the maximal number is 2147483647, which is smaller than the number you typed.
When the formatted input functions fail in some form, they set std::ios_base::failbit and leave the original value in the argument unchanged. There are different failures and overflows are considered one sort of failure. I recall a discussion about different errors being indicated in different ways but I din't recall the outcome. The best bet may be to make sure errno is cleared before calling any of the input functiins and checking errno for ERANGE to distinguish an overflow from a format error.
I have some small problems regarding (implicit) type conversion in C++.
1. float to int
float f = 554344.76;
int x1 = f;
std::cout << x1 << std::endl;
Prints 554344 (rounding down or cutting of decimal places) but when replacing it with float f = 5543444.76; it prints 5543445 (rounding up). Why is it in the first case rounding down and in the second case rounding up? On top of that for huger numbers it produces completely weird results (e.g 5543444675.76 turns into 5543444480). Why?
What is the difference between int x1 = f; and long int x2 = f;?
2. Long int to float
long int li;
float x3 = li;
std::cout << x3 << std::endl;
A solution to an exercise says that the values is rounded down and results in incorrect values for large numbers. If I try long int li = 5435; it is not rounded down. Or is the meaning that long int li = 5435.56; is rounded down? Second, why does it result in incorrect values for large numbers? I think long int and float have the same number of bits.
3. char to double
char c = 130;
double x4 = c;
std::cout << x4 << std::endl;
Why does this result in -126 while char c = 100; provides the correct value?
4. int to char
int i = 200;
char x5 = i;
std::cout << x5 << std::endl;
This prints nothing (no output). Why? I think up to 255 the result should be correct because char can store values up to 255.
Edit: One question per post, please. Here I answer #3 and #4
I think up to 255 the result should be correct because char can store values up to 255.
This is an incorrect assumption. (Demo)
You have potential signed overflow if a char is signed and 8 bits (1 byte). It would only have a maximum value of 127. This would be undefined behavior.
Whether a char is signed or not is implementation dependent, but it usually is. It will always be 1 byte long, but "1 byte" is allowed to be implementation-dependent, although it's almost universally going to be 8 bits.
In fact, if you reference any ASCII table, it only goes up to 127 before you get into "extended" ASCII, which on most platforms you'd need a wide character type to display it.
So your code in #3 and #4 have overflow.
You should have even gotten a warning about it when you tried char c = 130:
warning: overflow in implicit constant conversion
A float usually does not have enough precision to fully represent 5543444.76. Your float is likely storing the value 5543455.0. The cast to int is not where the rounding occurs. Casting from floating point to int always truncates the decimal. Try using a double instead of float or assign the value directly to an int to illustrate the difference.
Many of float's bits are used to represent sign and exponent, it cannot accurately represent all values an int of the same size. Again, this is a problem of precision, the least significant digits have to be discarded causing unexpected results that look like rounding errors. Consider the scientific notation. You can represent a very large range of values using few digits, but only a few decimal points are tracked. The less important digits are dropped.
char may be signed or may be unsigned, it depends on your platform. It appears that char is signed 8 bit on your platform, meaning it can only represent values from -128 to 127, clearly 130 exceeds that limit. Since signed integer overflow is undefined, your test case might do anything, including wrapping to -126.
char variables don't print their value when passed to std::cout. They print the character associated with that value. See this table. Note that since the value 200 exceeds the maximum value a char can represent on your platform, it might do anything, including trying to display a character that has no obvious representation.
float has only 23 bits of precision. 5543444 doesn't fit in 23 bits, so it gets rounded to closest value that fits.
You have uninitialized li, so it's undefined behavior. Perhaps you should edit the question to show the real code you are wondering about.
char is quite often signed. 130 can't be represented as signed char. This is probably undefined behavior (would have to check the standard to be sure, it could be implementation defined or there could be some special rule), but in practice on a PC CPU, the compiler takes the 8 lowest bits of 130 and showes them into the 8 bit signed character, and that sets the 8th bit, resulting in negative value.
Same thing as 3 above: 200 does not fit in signed 8 bit integer, which your char probably is. Instead it ends up setting the sign bit, resulting in negative value.
The nearest number from 5,543,444.76 an IEEE 754 32bit float can represent is 5,543,445.0. See this IEEE 754 converter. So f is equal to 5543445.0f and then rounded down to 5543445 when converted to an integer.
Even though on your specific system a float and an long int may have the same size, all values from one cannot be represented by the other. For instance, 0.5f cannot be represented as a long int. Similarly, 100000002 cannot be represented as a float: the nearest IEEE 754 32 bits floats are 100000000.0f and 100000008.0f.
In general, you need to read about the floattng point representation. Wikipedia is a good start.
char may be signed char or unsigned char according to the system you're in. In the (8bits) signed char case, 130 cannot be represented. A signed integer overflow occur (which is UB) and most probably it wraps to -126 (note that 130+126=256). On the other hand, 100 is a perfectly valid value for a signed char.
In the Extended ASCII Table, 200 maps to È. If your system does not handled extended ascii (if it's configured with UTF-8 for instance) or if you've got no font to represent this character, you'll see no output. If you're on a system with char defined as signed char, it's UB anyway.
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.