Print the integral value of a very long binary representation - c++

Let's say you have a very long binary-word (>64bit), which represents an unsigned integral value, and you would like to print the actual number. We're talking C++, so let's assume you start off with a bool[ ] or std::vector<bool> or a std::bitset, and end up with a std::string or some kind of std::ostream - whatever your solution prefers. But please only use the core-language and STL.
Now, i suspected, you must evaluate it chunkwise, to have some intermediate results, that are small enough to store away - preferably base 10, as in x·10k. I could figure out to assemble the number from that point. But since there is no chunk-width that corresponds to the base of 10, I don't know how to do it. Of course, you can start with any other chunk-width, let's say 3, to get intermediates in the form of x·(23)k, and then convert it to base 10, but this will lead to x·103·k·lg2 which obviously has a floating-point exponent, that isn't of any help.
Anyway, I'm exhausted of this math-crap and I would appreciate a thoughtful suggestion.
Yours sincerely,
Armin

I'm going to assume you already have some sort of bignum division/modulo function to work with, because implementing such a thing is a complete nightmare.
class bignum {
public:
bignum(unsigned value=0);
bignum(const bignum& rhs);
bignum(bignum&& rhs);
void divide(const bignum& denominator, bignum& out_modulo);
explicit operator bool();
explicit operator unsigned();
};
std::ostream& operator<<(std::ostream& out, bignum value) {
std::string backwards;
bignum remainder;
do {
value.divide(10, remainder);
backwards.push_back(unsigned(remainder)+'0');
}while(value);
std::copy(backwards.rbegin(), backwards.rend(), std::ostream_iterator(out));
return out;
}
If rounding is an option, it should be fairly trivial to convert most bignums to double as well, which would be a LOT faster. Namely, copy the 64 most significant bits to an unsigned long, convert that to a double, and then multiply by 2.0 to the power of the number of significant bits minus 64. (I say significant bits, because you have to skip any leading zeros)
So if you have 150 significant bits, copy the top 64 into an unsigned long, convert that to a double, and multiply that by std::pow(2.0, 150-64) ~ 7.73e+25 to get the result. If you only have 40 significant bits, pad with zeros on the right it still works. copy the 40 bits to the MSB of an unsigned long, convert that to a double, and multiply that by std::pow(2.0, 40-64) ~ 5.96e-8 to get the result!
Edit
Oli Charlesworth posted a link to the wikipedia page on Double Dabble which blows the first algorithm I showed out of the water. Don't I feel silly.

Related

Getting the high part of signed integer multiplication in C++

How can I manually calculate the high part for a signed multiplication in C++? Like Getting the high part of 64 bit integer multiplication (unsigned only), but how do I calculate carry/borrow?
I do not mean that cast in a larger type (thats simple), but really manual calculation, so it works also with int128_t.
My goal is to write a template function that always returns the correct high-part for signed and unsigned arguments (u/int8..128_t):
template <typename Type>
constexpr Type mulh(const Type& op1, const Type& op2) noexcept
{
if constexpr (std::is_signed_v<Type>) return ???;
else return see link;
}
It seems that you are implementing things that are usually available as
compiler builtin functions.
Implementing those in standard C++ results with less efficient code. That can be still fun as mental exercise but then why you ask us to spoil it as whole?
If you can do signed, then you could convert unsigned to signed x=(x<0)?x*-1:x get the result and then calculate the sign afterwards z=((x<0)!=(y<0))?z*-1:z
This works becuase the magnitude of a signed integer of arbitraty length is always going to fit into an unsigned integer of the same length, and you know that if both numbers are negative or positive the answer will be positive, if only one of them is negative it will be negative.

Own Big Float in C++

I want to write my own variable "type" as a homework in C++. It should be an arbitrarily long float. I was thinking of structure like...
Code:
class bigFloat
{
public:
bigFloat(arguments);
~bigFloat();
private:
std::vector<char> before; // numbers before decimal point
std::vector<char> after; // numbers after decimal point
int pos; // position of decimal point
};
Where if i have number like: 3.1415
Before = '3'; after = '1415'; pos = 1;
If that makes sense to you... BUT assignment wants me to save some memory, which I don't because for every number I allocate about it is about 1 byte, which is too much I guess.
Question:
How would you represent those arbitrarily long numbers?
(Sorry for my bad english, I hope the post makes sense)
If you need to preserve memory, all that means is that you need to use memory as efficiently as possible. In other words, given the value you're storing, you shouldn't waste bytes.
Example:
255 doesn't need 32 bits
I think your vector of chars is fine. If you're allowed to use a C++11 compiler, I'd probably change that to a vector of uint8_t and make sure when I'm storing the value that I can store a value from 0 to 255 in a vector of size 1.
However, that's not the end of it. From the sounds of it, what you're after is an arbitrary number of significant digits. However, for a true float representation, you also need to allocate storage for the base and exponent, after deciding what the base will be for your type. There is also the question of whether you want your exponent to be arbitrarily long too. Let's assume so.
So, I'd probably use something like this for members of your class:
//Assuming a base of 10.
static uint8_t const base = 10;
std::vector<uint8_t> digits_before_decimal;
std::vector<uint8_t> digits_after_decimal;
std::vector<uint8_t> exponent;
std::bitset<1> sign;
It is then a matter of implementing the various operators for your type and testing various scenarios to make sure your solution works.
If you really want to be thorough, you could use a simple testing framework to make sure that problems you fix along the way, stay fixed.
In memory, it will essentially look like a binary representation of the number.
For example:
65535 will be: before_decimal =<0xff,0xff>, after_decimal vector is empty
255.255 will be: before_decimal =<0xff>, after_decimal=<0xff>
255255 will be: before_decimal =<0x03,0xe5,0x17>, after_decimal vector is empty
255255.0 will be: before_decimal =<0x03,0xe5,0x17>, after_decimal: <0>
As others have mentioned, you don't really need two vectors for before and after the decimal. However, I'm using two in my answer because it makes it easier to understand and you don't have to keep track of the decimal. The memory requirements of two vs one vector really aren't that different when you're dealing with a long string of digits.
I should also note that using an integer to record position of the decimal point limits your number of digits to 2 billion, which is not an arbitrarily long number.
UPDATE: If this actually is homework, I would check with whoever has given you the homework if you need to support any floating point special cases, the simplest of which would be NaNs. There are other special cases too, but trying to implement all of them will very quickly turn this from a homework assignment into a thesis. Good luck :)
Don't use two separate vectors before and after. You need whole mantissa to make arithmetic operations.
Actually your pos is exponent. Name it accordingly. Exponent is signed btw.
You need sign of mantissa.
I recommend to store mantissa as rational fraction. You need two numbers: numerator and denominator. Then you can make division without round-off.
It's better to store numbers as ints with arbitrary length instead of arrays of digits.
PS. I made such calculator long time ago. To illustrate my answer, I give you declaration of class for number:
class CNumber
{
// ctors, methods....
char cSign; // sign of mantissa
CString strNumer; // numerator of mantissa
CString strDenom; // denominator of mantissa
char cExpSign; // sign of exponent
CString strExp; // exponent
};
I used MFC. CString is standard string there.

Big Integer class in C++. How to push digits in an array of unsigned long integers?

I am writing a simple big integer library for exercise. I would like to use it in a simple implementation of RSA. I have read all the previous threads but I have not found an answer to my question. I am just at the beginning of the project and I have read the best choice to represents all the digits of the big integer should be to represent them using an array of unsigned long numbers, so it should be something like this:
class BigInteger
{
public:
BigInteger(const std::string &digits);
private:
std::vector <unsigned long> _digits;
};
The problem is that I don't know how to implement the constructor of the class. I think I should convert every character of the string and save it in the array in a way which minimizes the overall memory used by the array because every character is 1 byte long while an unsigned long is at least 4 bytes long. Should I push a group of 4 characters at a time to avoid wasting every unsigned long digit memory? Could you give me an example or some suggestions?
Thank you.
Before thinking about how to push digits, think about how to implement
the four basic operations. What you want to do in the constructor from
string is to convert the string to the internal representation, whatever
that is, and to do so, you have to be able to multiply by 10 (supposing
decimal) and add.
As #James Kanze correctly points out, conversions to and from string are not the main design issue, and you should leave them until the end. If you focus on simplifying the interface with the outside world you might end up with a design that is easy to serialize but a nightmare to work with.
On the particular problem at hand, a common approach to dealing with bignumbers efficiently is using half of the bits in each storage unit (if your unsigned long is 32bits, use only the lower 16bits). Having the spare space in all units allow you to operate separatedly in each element without having to deal with overflows and then normalize the result by moving the carry-out (high bit numbers). A simplified pseuso-code approach to sum (ignoring sizes and mostly everything else would be:
bignumber& bignumber::operator+=( bignumber const & rhs ) {
// ensure that there is enough space
for ( int i = 0; i < size(); ++i ) {
data[ i ] += rhs.data[ i ]; // might break invariant but won't overflow
}
normalize(); // fix the invariant
}
// Common idiom: implement operator+ in terms of operator+= on the first argument
// (copied by value)
bignumber operator+( bignumber lhs, bignumber const & rhs ) {
lhs += rhs;
return lhs;
}

Emulated Fixed Point Division/Multiplication

I'm writing a Fixedpoint class, but have ran into bit of a snag... The multiplication, division portions, I am not sure how to emulate. I took a very rough stab at the division operator but I am sure it's wrong. Here's what it looks like so far:
class Fixed
{
Fixed(short int _value, short int _part) :
value(long(_value + (_part >> 8))), part(long(_part & 0x0000FFFF)) {};
...
inline Fixed operator -() const // example of some of the bitwise it's doing
{
return Fixed(-value - 1, (~part)&0x0000FFFF);
};
...
inline Fixed operator / (const Fixed & arg) const // example of how I'm probably doing it wrong
{
long int tempInt = value<<8 | part;
long int tempPart = tempInt;
tempInt /= arg.value<<8 | arg.part;
tempPart %= arg.value<<8 | arg.part;
return Fixed(tempInt, tempPart);
};
long int value, part; // members
};
I... am not a very good programmer, haha!
The class's part is 16 bits wide (but expressed as a 32-bit long since I imagine it'd need the room for possible overflows before they're fixed) and the same goes for value which is the integer part. When the 'part' goes over 0xFFFF in one of it's operations, the highest 16 bits are added to 'value', and then the part is masked so only it's lowest 16 bits remain. That's done in the init list.
I hate to ask, but if anyone would know where I could find documentation for something like this, or even just the 'trick' or how to do those two operators, I would be very happy for it! I am a dimwit when it comes to math, and I know someone has had to do/ask this before, but searching google has for once not taken me to the promised land...
As Jan says, use a single integer. Since it looks like you're specifying 16 bit integer and fractional parts, you could do this with a plain 32 bit integer.
The "trick" is to realise what happens to the "format" of the number when you do operations on it. Your format would be described as 16.16. When you add or subtract, the format stays the same. When you multiply, you get 32.32 -- So you need a 64 bit temporary value for the result. Then you do a >>16 shift to get down to 48.16 format, then take the bottom 32 bits to get your answer in 16.16.
I'm a little rusty on the division -- In DSP, where I learned this stuff, we avoided (expensive) division wherever possible!
I'd recommend using one integer value instead of separate whole and fractional part. Than addition and subtraction are the integeral counterparts directly and you can simply use 64-bit support, which all common compilers have these days:
Multiplication:
operator*(const Fixed &other) const {
return Fixed((int64_t)value * (int64_t)other.value);
}
Division:
operator/(const Fixed &other) const {
return Fixed(((int64_t)value << 16) / (int64_t)other.value);
}
64-bit integers are
On gcc, stdint.h (or cstdint, which places them in std:: namespace) should be available, so you can use the types I mentioned above. Otherwise it's long long on 32-bit targets and long on 64-bit targets.
On Windows, it's always long long or __int64.
To get things up and running, first implement the (unary) inverse(x) = 1/x, and then implement a/b as a*inverse(b). You'll probably want to represent the intermediates as a 32.32 format.

Converting floating point to fixed point

In C++, what's the generic way to convert any floating point value (float) to fixed point (int, 16:16 or 24:8)?
EDIT: For clarification, fixed-point values have two parts to them: an integer part and a fractional part. The integer part can be represented by a signed or unsigned integer data type. The fractional part is represented by an unsigned data integer data type.
Let's make an analogy with money for the sake of clarity. The fractional part may represent cents -- a fractional part of a dollar. The range of the 'cents' data type would be 0 to 99. If a 8-bit unsigned integer were to be used for fixed-point math, then the fractional part would be split into 256 evenly divisible parts.
I hope that clears things up.
Here you go:
// A signed fixed-point 16:16 class
class FixedPoint_16_16
{
short intPart;
unsigned short fracPart;
public:
FixedPoint_16_16(double d)
{
*this = d; // calls operator=
}
FixedPoint_16_16& operator=(double d)
{
intPart = static_cast<short>(d);
fracPart = static_cast<unsigned short>
(numeric_limits<unsigned short> + 1.0)*d);
return *this;
}
// Other operators can be defined here
};
EDIT: Here's a more general class based on anothercommon way to deal with fixed-point numbers (and which KPexEA pointed out):
template <class BaseType, size_t FracDigits>
class fixed_point
{
const static BaseType factor = 1 << FracDigits;
BaseType data;
public:
fixed_point(double d)
{
*this = d; // calls operator=
}
fixed_point& operator=(double d)
{
data = static_cast<BaseType>(d*factor);
return *this;
}
BaseType raw_data() const
{
return data;
}
// Other operators can be defined here
};
fixed_point<int, 8> fp1; // Will be signed 24:8 (if int is 32-bits)
fixed_point<unsigned int, 16> fp1; // Will be unsigned 16:16 (if int is 32-bits)
A cast from float to integer will throw away the fractional portion so if you want to keep that fraction around as fixed point then you just multiply the float before casting it. The below code will not check for overflow mind you.
If you want 16:16
double f = 1.2345;
int n;
n=(int)(f*65536);
if you want 24:8
double f = 1.2345;
int n;
n=(int)(f*256);
**** Edit** : My first comment applies to before Kevin's edit,but I'll leave it here for posterity. Answers change so quickly here sometimes!
The problem with Kevin's approach is that with Fixed Point you are normally packing into a guaranteed word size (typically 32bits). Declaring the two parts separately leaves you to the whim of your compiler's structure packing. Yes you could force it, but it does not work for anything other than 16:16 representation.
KPexEA is closer to the mark by packing everything into int - although I would use "signed long" to try and be explicit on 32bits. Then you can use his approach for generating the fixed point value, and bit slicing do extract the component parts again. His suggestion also covers the 24:8 case.
( And everyone else who suggested just static_cast.....what were you thinking? ;) )
I gave the answer to the guy that wrote the best answer, but I really used a related questions code that points here.
It used templates and was easy to ditch dependencies on the boost lib.
This is fine for converting from floating point to integer, but the O.P. also wanted fixed point.
Now how you'd do that in C++, I don't know (C++ not being something I can think in readily). Perhaps try a scaled-integer approach, i.e. use a 32 or 64 bit integer and programmatically allocate the last, say, 6 digits to what's on the right hand side of the decimal point.
There isn't any built in support in C++ for fixed point numbers. Your best bet would be to write a wrapper 'FixedInt' class that takes doubles and converts them.
As for a generic method to convert... the int part is easy enough, just grab the integer part of the value and store it in the upper bits... decimal part would be something along the lines of:
for (int i = 1; i <= precision; i++)
{
if (decimal_part > 1.f/(float)(i + 1)
{
decimal_part -= 1.f/(float)(i + 1);
fixint_value |= (1 << precision - i);
}
}
although this is likely to contain bugs still