2's complement binary expected output - c++

I'm studying for my exam, and I would like to check my answers to this question:
Suppose binary values are signed 8-bit values, representing twos-complement format with a decimal range from -128 to 127. Which of the following statements are true/false?
1) 11111111 > 0111111
I think this is false because the first digit represents the sign, so we're comparing a negative value to a positive value.
2) (11111111 + 11111111) > (00000001 - 00000010)
I'm not so sure about this one because I don't know what happens when it overflows. I think the computer just drops the last digit. So I think the left-hand side is like -128 - 128 = -256. Then the right-hand side is 1 - 2 = -1, which is represented as 1000001. This means the inequality, in decimal, becomes -256 > -1, which is false. But again, I am not so sure about this one.
3) (10000000 / 00000100) == 11100000
The first part is -0/4 and the second part is non-zero, so would it be false?
Also, there are only sample problems, and I would like to practice/explore on my own. Is there any way in which I can write a C++ program to see the expected output of questions of this form?
Thank you.

Regarding playing around with this in C++, there are a bunch of bit manipulation libraries out there (just searching for "C++ bit manipulation library" will turn up lots of results). As a few examples:
https://github.com/Chris--A/BitBool
https://www.slac.stanford.edu/comp/unix/gnu-info/libg++_23.html
You may or may not find one that has a built-in notion of two's complement. Regardless, it could be a useful exercise for yourself to implement that kind of functionality on top of what one of these libraries provides, for example by subclassing something they provide or writing your own class encapsulating an existing library implementation as a data member.

Related

Why do signed negative integers start at the lowest value?

I can't really explain this question in words alone (probably why I can't find an answer), so I'll try to give as much detail as I can. This isn't really a practical question, I'm just curious.
So let's say we have a signed 8bit int.
sign | bytes | sign 0 | sign 1
? | 0000000 | (+)0 | (-)128
? | 1111111 | (+)127 | (-)1
I don't understand why this works this way, can someone explain? In my head, it makes more sense for the value to be the same and for the sign to just put a plus or minus in front, so to me it looks backwards.
There are a couple of systems for signed integers.
One of them, sign-magnitude, is exactly what you expect: a part that says how big the number is, and a bit that either leaves the number positive or negates it. That makes the sign bit really special, substantially different than the other bits. For example:
sign-magnitude representation
0_0000000 = 0
0_0000001 = 1
1_0000001 = -1
1_0000000 = -0
This has some uncomfortable side-effects, mainly no longer corresponding to unsigned arithmetic in a useful way (if you add two sign-magnitude integers as if they are unsigned weird things happen, eg -0 + 1 = -1), which has far-reaching consequences: addition/subtraction/equals/multiplication all need special signed versions of them, multiplication and division by powers of two in no way corresponds to bit shifts (except accidentally), since it has no clear correlation to Z/2^k Z it's not immediately clear how it behaves algebraically. Also -0 exists as separate thing from 0, which is weird and causes different kinds of trouble depending on your semantics for it, but never no trouble.
The most common system by far is two's complement, where the sign bit does not mean "times 1 or times -1" but "add 0 or add -2^k". As with one's complement, the sign bit is largely a completely normal bit (except with respect to division and right shift). For example:
two's complement representation (8bit)
00000000 = 0 (no surprises there)
10000000 = -128
01111111 = 127
11111111 = -1 (= -128 + 127)
etc
Now note that 11111111 + 00000001 = 0 in unsigned 8bit arithmetic anyway, and -1+1=0 is clearly desirable (in fact it is the definition of -1). So what it comes down to, at least for addition/subtraction/multiplication/left shift, is plain old unsigned arithmetic - you just print the numbers differently. Of course some operators still need special signed versions. Since it corresponds to unsigned arithmetic so closely, you can reason about additions and multiplications as if you are in Z/2^k Z with total confidence. It does have a slight oddity comparable with the existence of negative zero, namely the existence of a negative number with no positive absolute value.
The idea to make the value the same and to just put a plus or minus in front is a known idea, called a signed magnitude representation or a similar expression. A discussion here says the two major problems with signed magnitude representation are that there are two zeros (plus and minus), and that integer arithmetic becomes more complicated in the computer algorithm.
A popular alternative for computers is a two's complement representation, which is what you are asking about. This representation makes arithmetic algorithms simpler, but looks weird when you imagine plotting the binary values along a number line, as you are doing. Two's complement also has a single zero, which takes care of the first major problem.
The signed number representations article in Wikipedia has comparison tables illustrating signed magnitude, two's complement, and three other representation systems of values in a decimal number line from -11 to +16 and in a binary value chart from 0000 to 1111.

ADT Integer class questions

I am pretty new to programming and I have to do an Abstract Data Type (ADT) for integer numbers.
I've browsed the web for some tips, examples, tutorials but i couldn't find anything usefull, so i hope i will get here some answers.
I thinked a lot about how should i format the ADT that stores my integer and I'm thinking of something like this:
int lenght; // stores the length of the number(an limit since this numbers goes to infinite)
int[] digits; // stores the digits of my number, with the dimension equal to length
Now, I'm confused about how should i tackle the sign representation.Is it ok to hold the sign into an char something like: char sign?
But then comes the question what to do when I have to add and multiply two integers, what about the cases when i have overflows on this operations.
So , if some of you have some ideas about how should I represent the number(the format) and how should I do the multiply and add i would be very great full. I don't need any code, I i the learning stage just some ideas. Thank you.
One good way to do this is to store the sign as a bool (e.g. bool is_neg;). That way it's completely clear what that data means (vice with a char, where it's not entirely clear.
You might want to store each digit in an unsigned short (or if you want to be precise about sign, uint16_t). Then, when you do a multiply of two digits, you can just multiply them as unsigned ints (uint32_t), and then the low 16 bits are your result and the overflow is in the high 16 bits. You can then add this to the result array fairly easily. You know that the multiplication of a n-bit number by a k-bit number is at most n + k bits long, so you can preallocate your array to that size and then worry about removing extra zeros later.
Hope this helps, and let me know if you want more tips.
The first design decision you have to make is the choice of a basis.
You seem to lean towards plain decimal. Could be unpacked (one full byte per digit, numerical or ASCII representation), or packed digits pairs (Decimal Coded Binary, twice four bits in a byte).
Other schemes are more convenient for faster operations: basis being a power of 2 or a power of 10, fitting in a byte, a short, an int...
Powers of 10 have the benefit that conversion to and from base 10 can be done word by word.
Addition is an easy matter: add the words in pairs and handle the carries. Same for subtraction, with borrows.
Multiplies are a whole different story if you care about efficiency. The method of written computation taught at school can be used, but it requires length1 x length2 operations. For long numbers, more efficient methods are preferred (http://en.wikipedia.org/wiki/Multiplication_algorithm#Karatsuba_multiplication). They are also more complex.

How do I find the largest integer fully supported by hardware arithmetics?

I am implementing a BigInt class that must support arbitrary-precision operations on integers.
Quote from "The Algorithm Design Manual" by S.Skiena:
What base should I do [editor's note: arbitrary-precision] arithmetic in? - It is perhaps simplest to implement your own high-precision arithmetic package in decimal, and thus represent each integer as a string of base-10 digits. However, it is far more efficient to use a higher base, ideally equal to the square root of the largest integer supported fully by hardware arithmetic.
How do I find the largest integer supported fully by hardware arithmetic? If I understand correctly, being my machine an x64 based PC, the largest integer supported should be 2^64 (http://en.wikipedia.org/wiki/X86-64 - Architectural features: 64-bit integer capability), so I should use base 2^32, but is there a way in c++ to get this size programmatically so I can typedef my base_type to it?
You might be searching for std::uintmax_t and std::intmax_t.
static_cast<unsigned>(-1) is the max int. e.g. all bits set to 1 Is that what you are looking for ?
You can also use std::numeric_limits<unsigned>::max() or UINT_MAX, and all of these will yield the same result. and what these values tell is the maximum capacity of unsigned type. e.g. the maximum value that can be stored into unsigned type.
int (and, by extension, unsigned int) is the "natural" size for the architecture. So a type that has half the bits of an int should work reasonably well. Beyond that, you really need to configure for the particular hardware; the type of the storage unit and the type of the calculation unit should be typedefs in a header and their type selected to match the particular processor. Typically you'd make this selection after running some speed tests.
INT_MAX doesn't help here; it tells you the largest value that can be stored in an int, which may or may not be the largest value that the hardware can support directly. Similarly, INTMAX_MAX is no help, either; it tells you the largest value that can be stored as an integral type, but doesn't tell you whether operations on such a value can be done in hardware or require software emulation.
Back in the olden days, the rule of thumb was that operations on ints were done directly in hardware, and operations on longs were done as multiple integer operations, so operations on longs were much slower than operations on ints. That's no longer a good rule of thumb.
Things are not so black and white. There are MAY issues here, and you may have other things worth considering. I've now written two variable precision tools (in MATLAB, VPI and HPF) and I've chosen different approaches in each. It also matters whether you are writing an integer form or a high precision floating point form.
The difference is, integers can grow without bound in the number of digits. But if you are doing a floating point implementation with a user specified number of digits, you always know the number of digits in the mantissa. This is fixed.
First of all, it is simplest to use a single integer for each decimal digit. This makes many things work nicely, so I/O is easy. It is a bit inefficient in terms of storage though. Adds and subtracts are easy though. And if you use integers for each digit, then multiplies are even easy. In MATLAB for example, conv is pretty fast, though it is still O(n^2). I think gmp uses an fft multiply, so faster yet.
But assuming you use a basic conv multiply, then you need to worry about overflows for numbers with a huge number of digits. For example, suppose I store decimal digits as 8 bit signed integers. Using conv, followed by carries, I can do a multiply. For example, suppose I have the number 9999.
N = repmat(9,1,4)
N =
9 9 9 9
conv(N,N)
ans =
81 162 243 324 243 162 81
Thus even to form the product 9999*9999, I'd need to be careful as the digits will overflow an 8 bit signed integer. If I'm using 16 bit integers to accumulate the convolution products, then a multiply between a pair of 1000 digits integers can cause an overflow.
N = repmat(9,1,1000);
max(conv(N,N))
ans =
81000
So if you are worried about the possibility of millions of digits, you need to watch out.
One alternative is to use what I call migits, essentially working in a higher base than 10. Thus by using base 1000000 and doubles to store the elements, I can store 6 decimal digits per element. A convolution will still cause overflows for larger numbers though.
N = repmat(999999,1,10000);
log2(max(conv(N,N)))
ans =
53.151
Thus a convolution between two sets of base 1000000 migits that are 10000 migits in length (60000 decimal digits) will overflow the point where a double cannot represent an integer exactly.
So again, if you will use numbers with millions of digits, beware. A nice thing about the use of a higher base of migits with a convolution based multiply is since the conv operation is O(n^2), then going from base 10 to base 100 gives you a 4-1 speedup. Going to base 1000 yields a 9-1 speedup in the convolutions.
Finally, the use of a base other than 10 as migits makes it logical to implement guard digits (for floats.) In floating point arithmetic, you should never trust the least significant bits of a computation, so it makes sense to keep a few digits hidden in the shadows. So when I wrote my HPF tool, I gave the user control of how many digits would be carried along. This is not an issue for integers of course.
There are many other issues. I discuss them in the docs carried with those tools.

Why numeric_limits<int>::min() is differently defined?

To retrieve the smallest value i have to use numeric_limits<int>::min()
I suppose the smallest int is -2147483648, and tests on my machine showed this result.
But some C++ references like Open Group Base Specifications and
cplusplus.com define it with the value -2147483647.
I ask this question because in my implementation of the negaMax Framework (Game Tree Search)
the value minimal integer * (-1) has to be well defined.
Yes, with minimal int = (numeric_limits::min() + 2) i am on the safe side in any case,
thus my question is more theoretically but i think nevertheless quite interesting.
If a value is represented as sign-and-magnitude instead of two's complement, the sign bit being one with all other bits as zero is equivalent to -0. In sign-and-magnitude the maximum positive integer and negative integer are the same magnitude. Two's complement is able to represent one more negative value because it doesn't have the same symmetry.
The value of numeric_limits<int>::min() is defined by implementation. That's why it could be different. You shouldn't stick to any concrete minimal value.
On cplusplus.com you forgot to read the qualifier
min. magnitude*
This is not necessarily the actual value of the constant in any particular compiler or system, it may be equal or greater in magnitude than this.
From the cplusplus.com link you posted (emphasis mine):
The following panel shows the different constants and their guaranteed minimal magnitudes (positive numbers may be greater in value, and negative numbers may be less in value). Any particular compiler implementation may define integral types with greater magnitudes than those shown here
Numeric limits are always system and compiler defined, try running with a 64bit compiler and system, you may see totally different numbers.
c++ uses two-s compliment for signed integers. Thus the smallest signed integer is defined by 100..00 (usually 32 bit).
Simply shifting 1<<(sizeof(int)*8-1) should give you the smallest signed integer.
Obviously for unsigned integers, the smallest is 0.
edit: you can read more here
edit2: apparently C++ doesn't necessarily use two-s compliment, my mistake

How can I set all bits to '1' in a binary number of an unknown size?

I'm trying to write a function in assembly (but lets assume language agnostic for the question).
How can I use bitwise operators to set all bits of a passed in number to 1?
I know that I can use the bitwise "or" with a mask with the bits I wish to set, but I don't know how to construct a mask based off some a binary number of N size.
~(x & 0)
x & 0 will always result in 0, and ~ will flip all the bits to 1s.
Set it to 0, then flip all the bits to 1 with a bitwise-NOT.
You're going to find that in assembly language you have to know the size of a "passed in number". And in assembly language it really matters which machine the assembly language is for.
Given that information, you might be asking either
How do I set an integer register to all 1 bits?
or
How do I fill a region in memory with all 1 bits?
To fill a register with all 1 bits, on most machines the efficient way takes two instructions:
Clear the register, using either a special-purpose clear instruction, or load immediate 0, or xor the register with itself.
Take the bitwise complement of the register.
Filling memory with 1 bits then requires 1 or more store instructions...
You'll find a lot more bit-twiddling tips and tricks in Hank Warren's wonderful book Hacker's Delight.
Set it to -1. This is usually represented by all bits being 1.
Set x to 1
While x < number
x = x * 2
Answer = number or x - 1.
The code assumes your input is called "number". It should work fine for positive values. Note for negative values which are twos complement the operation attempt makes no sense as the high bit will always be one.
Use T(~T(0)).
Where T is the typename (if we are talking about C++.)
This prevents the unwanted promotion to int if the type is smaller than int.