How can I get the sourcecode for rand() (C++)?

How can I get the sourcecode for rand() (C++)? - c++

I'm new to programming.
I want to know exactly what rand() does.
Searching only yields examples on its usage. But none explain each step of how the function generates a random number. They treat rand() as a blackbox.
I want to know what rand() is doing; each step.
Is there a resource that will allow me to see exactly what rand() does?
This is all open source stuff isn't it? I'll settle for the disassembly, if there's no source.
I know it returns a random number, but how does it generate that number? I want to see each step.
Thank you.

Here is the current glibc implementation:
/* Return a random integer between 0 and RAND_MAX. */
int
rand (void)
{
return (int) __random ();
}
That's not much help, but __random eventually calls __random_r:
/* If we are using the trivial TYPE_0 R.N.G., just do the old linear
congruential bit. Otherwise, we do our fancy trinomial stuff, which is the
same in all the other cases due to all the global variables that have been
set up. The basic operation is to add the number at the rear pointer into
the one at the front pointer. Then both pointers are advanced to the next
location cyclically in the table. The value returned is the sum generated,
reduced to 31 bits by throwing away the "least random" low bit.
Note: The code takes advantage of the fact that both the front and
rear pointers can't wrap on the same call by not testing the rear
pointer if the front one has wrapped. Returns a 31-bit random number. */
int
__random_r (buf, result)
struct random_data *buf;
int32_t *result;
{
int32_t *state;
if (buf == NULL || result == NULL)
goto fail;
state = buf->state;
if (buf->rand_type == TYPE_0)
{
int32_t val = state[0];
val = ((state[0] * 1103515245) + 12345) & 0x7fffffff;
state[0] = val;
*result = val;
}
else
{
int32_t *fptr = buf->fptr;
int32_t *rptr = buf->rptr;
int32_t *end_ptr = buf->end_ptr;
int32_t val;
val = *fptr += *rptr;
/* Chucking least random bit. */
*result = (val >> 1) & 0x7fffffff;
++fptr;
if (fptr >= end_ptr)
{
fptr = state;
++rptr;
}
else
{
++rptr;
if (rptr >= end_ptr)
rptr = state;
}
buf->fptr = fptr;
buf->rptr = rptr;
}
return 0;
fail:
__set_errno (EINVAL);
return -1;
}

This was 10 seconds of googling:
gcc implementation of rand()
How is the rand()/srand() function implemented in C
implementation of rand()
...
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01206.html
http://www.gnu.org/software/libc/manual/html_node/Pseudo_002dRandom-Numbers.html
I was gonna list the actual search, but seeing this is clearly a dupe, I'll just vote as dupe

You can browse the source code for different implementations of the C standard.
The question has been answered before, you might find what you're looking for at What common algorithms are used for C's rand()?
That answer provides code for glibc's implementation of rand()

The simplest reasonably good pseudo-random number generators are Linear Congruential Generators (LCGs). These are iterations of a formula such as
X_{n+1} = (a * X_n + c) modulo m
The constants a, c, and m are chosen to given unpredictable sequences. X_0 is the random seed value. Many other algorithms exists, but this is probably enough to get you going.
Really good pseudo-random number generators are more complex, such as the Mersenne Twister.

I guess, THIS is what you are looking for. It contains the detailed explanation of random function, and simple C program to understand the algo.
Edit:
You should check THIS as well. A possible duplicate.

Well, I believe rand is from the C standard library, not the C++ standard library. There is no one implementation of either library, there are several.
You could go somewhere like this page to view the source code for glibc, the c library used on most Linux distributions. For glibc you'd find it in source files under stdlib such as rand.c and random.c.
A different implementation, such as uClibc might be easier to read. Try here under the libc/stdlib folder.

Correct me if I'm wrong, but although this answer points to part of the implementation, I found that there is more to rand() used in stdlib, which is from [glibc][2]. From the 2.32 version obtained from here, the stdlib folder contains a random.c file which explains that a simple linear congruential algorithm is used. The folder also has rand.c and rand_r.c which can show you more of the source code. stdlib.h contained in the same folder will show you the values used for macros like RAND_MAX.
/* An improved random number generation package. In addition to the
standard rand()/srand() like interface, this package also has a
special state info interface. The initstate() routine is called
with a seed, an array of bytes, and a count of how many bytes are
being passed in; this array is then initialized to contain
information for random number generation with that much state
information. Good sizes for the amount of state information are
32, 64, 128, and 256 bytes. The state can be switched by calling
the setstate() function with the same array as was initialized with
initstate(). By default, the package runs with 128 bytes of state
information and generates far better random numbers than a linear
congruential generator. If the amount of state information is less
than 32 bytes, a simple linear congruential R.N.G. is used.
Internally, the state information is treated as an array of longs;
the zeroth element of the array is the type of R.N.G. being used
(small integer); the remainder of the array is the state
information for the R.N.G. Thus, 32 bytes of state information
will give 7 longs worth of state information, which will allow a
degree seven polynomial. (Note: The zeroth word of state
information also has some other information stored in it; see setstate
for details). The random number generation technique is a linear
feedback shift register approach, employing trinomials (since there
are fewer terms to sum up that way). In this approach, the least
significant bit of all the numbers in the state table will act as a
linear feedback shift register, and will have period 2^deg - 1
(where deg is the degree of the polynomial being used, assuming
that the polynomial is irreducible and primitive). The higher order
bits will have longer periods, since their values are also
influenced by pseudo-random carries out of the lower bits. The
total period of the generator is approximately deg*(2deg - 1); thus
doubling the amount of state information has a vast influence on the
period of the generator. Note: The deg*(2deg - 1) is an
approximation only good for large deg, when the period of the shift
register is the dominant factor. With deg equal to seven, the
period is actually much longer than the 7*(2**7 - 1) predicted by
this formula. */

Related

Generating random visual noise using For loop

I'm starting to get into the mild depths of C, using arduinos and such like, and just wanted some advice on how I'm generating random noise using a For loop.
The important bit:
void testdrawnoise() {
int j = 0;
for (uint8_t i=0; i<display.width(); i++) {
if (i == display.width()-1) {
j++;
i=0;
}
M = random(0, 2); // Random 0/1
display.drawPixel(i, j, M); // (Width, Height, Pixel on/off)
display.refresh();
}
}
The function draws a pixel one by one across the screen, moving to the next line down once i is has reached display.width()-1. Whether the pixel appears on(black) or off(white) is determined by M.
The code is working fine, but I feel like it could be done better, or at least neater, and perhaps more efficiently.
Input and critiques greatly appreciated.

First of all, your loop never ends, and goes on incrementing j without bounds, so, after you filled the screen once, you go on looping outside of the screen height; although your library does bounds checking, it's certainly not a productive use of CPU to keep on looping without actually doing useful work until j overflows and goes back to zero.
Also, signed overflow is undefined behavior in C++, so you are technically on shaky grounds (I originally thought that Arduino always compiles with -fwrapv which guarantees wraparound on signed integer overflow, but apparently I was mistaken).
Given that the library you are using keeps the whole framebuffer in memory and sends it all on refresh calls, it doesn't make much sense to re-send it at each pixel - especially since the frame transmission is probably going to be by far the slowest part of this loop. So, you can move it out of the loop.
Putting this together (plus caching width and height and using the simpler overload of random), you can change this to:
void testdrawnoise() {
int w = display.width(), h = display.height();
for (int j=0; j<h; ++j) {
for (int i=0; i<w; ++i) {
display.drawPixel(i, j, random(2));
}
}
display.refresh();
}
(if your screen dimensions are smaller than 256 on AVR Arduinos you may gain something by changing all those int to byte, but don't take my word for it)
Notice that this will do it just once, you can put it into your loop() function or in an infinite loop to make it keep generating random patterns.
This is what you can do with the provided interface; now, going into undocumented territory we can go faster.
As stated above, the library you are using keeps the whole framebuffer in memory, packed (as expected) at 8 bits per byte, in a single global variable named sharpmem_buffer, initialized with a malloc of the obvious size.
It should also be noted that, when you ask for a random bit in your code, the PRNG generates a full 31-bit random number and takes just the low bit. Why waste all the other perfectly good random bits?
At the same time, when you call drawPixel, the library performs a series of boolean operations on the corresponding byte in memory to set just the bit you asked for without touching the rest of the bits. Quite stupid, given that you are going to overwrite the other ones with random anyway.
So, putting together these two facts, we can do something like:
void testdrawnoise() {
// access the buffer defined in another .cpp
extern byte *sharpmem_buffer;
byte *ptr = sharpmem_buffer; // pointer to current position
// end position
byte *end = ptr + display.width()*display.height()/8;
for (; ptr!=end; ++ptr) {
// store a full byte of random
*ptr = random(256);
}
display.refresh();
}
which, subtracted the refresh() time, should be at very least 8 times faster than the previous version (I actually expect significantly more, given that not only the core of the loop executes 1/8th of iterations, but it's also way simpler - no function calls besides random, no branches, no boolean operations on memory).
On AVR Arduinos the only point that can be optimized further is probably the RNG - we are still using only 8 bit of a 31 bit (if they are actually 31 bits? Arduino documentation as usual sucks badly at providing useful technical information) RNG, so we could probably generate 3 bytes of random out of a single RNG call, or 4 if we switched to a hand-rolled LCG that didn't mess with the sign bit. On ARM Arduinos, in this last case, we could even gain something by performing full 32-bit stores in memory instead of writing single bytes.
However, these further optimizations are (1) tedious to write (if you have to handle screens where the number of pixels is not multiple of 24/32) and (2) probably not particularly profitable, given that most of the time will be spent in transmission over the SPI anyway. Worth mentioning them anyway, as they may be useful in other cases where there's no transmission bottleneck to slow everything down.
Given that OP's MCU is actually a Cortex M0 (so, a 32 bit ARM), it's worth trying to make it even faster using a full 32 bit PRNG and 32 bit stores.
As said above, built-in random returns a signed value, and it's not exactly clear what range it provides; for this reason, we'll have to roll our own PRNG that is guaranteed to provide 32 full bits of randomness.
A decent and very fast PRNG that provides 32 random bits with minimal state is xorshift; we'll just use the xorshift32 straight from Wikipedia, as we don't really need the improved "*" or "+" versions (nor we really care about having a bigger period provided by the larger counterparts).
struct XorShift32 {
uint32_t state = 0x12345678;
uint32_t next() {
uint32_t x = state;
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
state = x;
return x;
}
};
XorShift32 xorShift;
Now we can rewrite testdrawnoise():
void testdrawnoise() {
int size = display.width()*display.height();
// access the buffer defined in another .cpp
extern byte *sharpmem_buffer;
/*
we can access the framebuffer as if it was an array of 32-bit words;
this is fine, since it was alloc-ed with malloc, which guarantees memory
aligned for the most restrictive built-in type, and the library only
uses it with byte pointers, so there should be no strict aliasing problem
*/
uint32_t *ptr = (uint32_t *)sharpmem_buffer;
/*
notice that the division is an integer division, which truncates; so, we
are filling the framebuffer up the the last multiple of 4 bytes; with
"strange" sizes we may be leaving out up to 3 bytes (see later)
*/
uint32_t *end = ptr + size/32;
for (; ptr!=end; ++ptr) {
// store a full byte of random
*ptr = xorShift.next();
}
// now to fill the possibly missing last three bytes
// pick it up where we left it
byte *final_ptr = (byte *)end;
byte *final_end = sharpmem_buffer + size/8;
// generate 32 random bits; it's ok, we'll need at most 24
uint32_t r = xorShift.next();
for(; final_ptr!=final_end; ++final_ptr) {
// take the lower 8 bits
*final_ptr = r;
// throw away the bits we used, get in the upper ones
r = r>>8;
}
display.refresh();
}

pseudo-random function in c?

I am reading book called the c programming language by Brian W.Kernighan and Dennis M.Ritchie. I cannot understand the function that is written in the book for generating pseudo-random number it is like this;
unsigned long int next = 1;
int rand(void)
{
next = next * 1103515243 + 12345;
return (unsigned int)(next / 65536) % 32768;
}
void srand(unsigned int seed)
{
next = seed;
}
I also tried my self. But I only came up with the following observations
65536 = is the value of 16 bit unsigned + 1 bit
32768 = is the value of
16 bit signed + 1 bit
but I am not able to figure out the whole process .
This is the book written by the legends and I want to understand this book.
Please if anybody can help me to figure out this problem I will feel very very fortunate.

Pseudo Random Number Generators are a very complex subject. You could learn it for years, and get a PhD on it. As commented, read also about linear congruential generator (it looks like your code is an example in some C standard)
In C on POSIX systems, you have random(3) (and also lrand48(3), sort-of obsolete); In C++11 you have <random>
The /65536 operation might be compiled as >>16 a right shift of 16 bits.
The %32768 operation could be optimized as a bitmask (same as &0x7fff) keeping 15 least bits.

This hasn't an accepted answer yet, so let's try one.
As noted by Basile Starynkevitch, what is implemented here is a pseudo-random number generator (RNG) from the class of linear congruential generators (LCGs). These in general take the form of a sequence
X := (a * X + c) mod m
The starting value for X is called the seed (same as in the code). Of course c < m and a < m. Often also a << c. Both c and m are usually large numbers chosen so that the whole sequence does reasonably well in the spectral test, but you probably don't have to care about that to understand the basic mode of operation. If you are a little bit into number theory, you will probably see that the sequence repeats after a while (it is periodic).
Random numbers are generated, by first seeding X with a starting seed. For each generated number, the sequence is cycled and a subset of the bits of X are returned.
In the code from the question, a = 1103515245, c = 12345, and
m is implicitly pow(2, 8 * sizeof(unsigned long)) by virtue of unsigned integer overflow. These are also the values ISO/IEC 9899, i.e. the C language standard suggests.
With this known, the first pitfall is probably this statement:
return (unsigned int)(next / 65536) % 32768;
Kernighan and Ritchie probably thought that using only simple arithmetic is more readable and more portable than using bit masks and bit shifts. The above is equivalent to
return (unsigned int)(next >> 16) & 0x7fff
which selects bits 16-30 from next. You get back a pseudo-random number in the range [0;32767]. The bit range is also the one suggested in the C standard.
WARNING: It is well known that this LCG does --while widely deployed, because it's noted in the standard-- not produce very good pseudo-random numbers (the version in GLIBc is even worse). Distinctively, it is absolutely unsafe to use for cryptographic applications. With the few number of random bits, I would not even use it for any Monte Carlo method, because results may be severely skewed by the quality of the RNG.
So in short: Try to understand it: yes, you are welcome. Use it for anything: no.

Time complexity of russian peasant multiplication algorithm?

I want to know that what is the time complexity of this piece of code which is Russian peasant Implementation
unsigned long long int russian(unsigned long long int a, unsigned long long int b) {
unsigned long long int res = 0;
while (b > 0) {
if (b & 1)
res = res + a;
a <<= 1;
b >>= 1;
}
return res % mod;
}
As far as my knowledge i think its time complexity is either lg2b or lg2a(depending upon our choice of a or b) .Any expert comment?

The time complexity of the piece of code you supplied is, of course, O(1), because there is an upper bound on how long it can take and will never exceed that upper bound on any inputs.
Presumably, that's not the answer to the question you actually mean to ask. There are actually several different things you might be actually interested in, and they all actually have different answers.
(also, since you seem to be trying to do a modular multiply, you really should be reducing all relevant quantities inside the loop so that you don't overflow, and so that you can use - instead of %)
You might be interested in having a precise estimate of the wall-clock time. Obtaining this will actually require gathering some empirical data, but it will probably look something like
A + B bitlength(b) + C popcount(b)
(popcount is the number of 1s in the binary expansion) for some constants A, B, and C. However, CPU hardware is actually rather complicated, and it might actually be extremely involved to get a good estimate for the third term above, since branch prediction hardware might do some odd things.
And A, B, and C probably aren't even constants; they will depend to some extent on whether this function gets inlined, and the sort of code surrounding the places where it's used.
Now, you might want a more abstract answer where b can be of arbitrary size, rather than constrained to be the size of an unsigned long long, and want to count the number of arithmetic operations. This is very clearly just the bit length of b, or as the comments indicate, O(lg(b)). (where lg is the log base 2)
Now, you might actually be interested not just in the arithmetic operations, but their cost. And might be interested in a being of arbitrary size rather than constrained to be an unsigned long long. A useful unit of measure would be bit operations. e.g. doing a left-shift by 1 on an N-bit number ought to cost O(N) bit operations.
I'm pretty sure the loop works out to O(lg(a)lg(b)+lg(b)^2) bit operations. (this doesn't include the % operation you do afterwards)

Is it possible to roll a significantly faster version of sqrt

In an app I'm profiling, I found that in some scenarios this function is able to take over 10% of total execution time.
I've seen discussion over the years of faster sqrt implementations using sneaky floating-point trickery, but I don't know if such things are outdated on modern CPUs.
MSVC++ 2008 compiler is being used, for reference... though I'd assume sqrt is not going to add much overhead though.
See also here for similar discussion on modf function.
EDIT: for reference, this is one widely-used method, but is it actually much quicker? How many cycles is SQRT anyway these days?

Yes, it is possible even without trickery:
sacrifice accuracy for speed: the sqrt algorithm is iterative, re-implement with fewer iterations.
lookup tables: either just for the start point of the iteration, or combined with interpolation to get you all the way there.
caching: are you always sqrting the same limited set of values? if so, caching can work well. I've found this useful in graphics applications where the same thing is being calculated for lots of shapes the same size, so results can be usefully cached.
Hello from 11 years in the future.
Considering this still gets occasional votes, I thought I'd add a note about performance, which now even more than then is dramatically limited by memory accesses. You absolutely must use a realistic benchmark (ideally, your whole application) when optimising something like this - the memory access patterns of your application will have a dramatic effect on solutions like lookup tables and caches, and just comparing 'cycles' for your optimised version will lead you wildly astray: it is also very difficult to assign program time to individual instructions, and your profiling tool may mislead you here.
On a related note, consider using simd/vectorised instructions for calculating square roots, like _mm512_sqrt_ps or similar, if they suit your use case.
Take a look at section 15.12.3 of intel's optimisation reference manual, which describes approximation methods, with vectorised instructions, which would probably translate pretty well to other architectures too.

There's a great comparison table here:
http://assemblyrequired.crashworks.org/timing-square-root/
Long story short, SSE2's ssqrts is about 2x faster than FPU fsqrt, and an approximation + iteration is about 4x faster than that (8x overall).
Also, if you're trying to take a single-precision sqrt, make sure that's actually what you're getting. I've heard of at least one compiler that would convert the float argument to a double, call double-precision sqrt, then convert back to float.

You're very likely to gain more speed improvements by changing your algorithms than by changing their implementations: Try to call sqrt() less instead of making calls faster. (And if you think this isn't possible - the improvements for sqrt() you mention are just that: improvements of the algorithm used to calculate a square root.)
Since it is used very often, it is likely that your standard library's implementation of sqrt() is nearly optimal for the general case. Unless you have a restricted domain (e.g., if you need less precision) where the algorithm can take some shortcuts, it's very unlikely someone comes up with an implementation that's faster.
Note that, since that function uses 10% of your execution time, even if you manage to come up with an implementation that only takes 75% of the time of std::sqrt(), this still will only bring your execution time down by 2,5%. For most applications users wouldn't even notice this, except if they use a watch to measure.

How accurate do you need your sqrt to be? You can get reasonable approximations very quickly: see Quake3's excellent inverse square root function for inspiration (note that the code is GPL'ed, so you may not want to integrate it directly).

Don't know if you fixed this, but I've read about it before, and it seems that the fastest thing to do is replace the sqrt function with an inline assembly version;
you can see a description of a load of alternatives here.
The best is this snippet of magic:
double inline __declspec (naked) __fastcall sqrt(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
It's about 4.7x faster than the standard sqrt call with the same precision.

Here is a fast way with a look up table of only 8KB. Mistake is ~0.5% of the result. You can easily enlarge the table, thus reducing the mistake. Runs about 5 times faster than the regular sqrt()
// LUT for fast sqrt of floats. Table will be consist of 2 parts, half for sqrt(X) and half for sqrt(2X).
const int nBitsForSQRTprecision = 11; // Use only 11 most sagnificant bits from the 23 of float. We can use 15 bits instead. It will produce less error but take more place in a memory.
const int nUnusedBits = 23 - nBitsForSQRTprecision; // Amount of bits we will disregard
const int tableSize = (1 << (nBitsForSQRTprecision+1)); // 2^nBits*2 because we have 2 halves of the table.
static short sqrtTab[tableSize];
static unsigned char is_sqrttab_initialized = FALSE; // Once initialized will be true
// Table of precalculated sqrt() for future fast calculation. Approximates the exact with an error of about 0.5%
// Note: To access the bits of a float in C quickly we must misuse pointers.
// More info in: http://en.wikipedia.org/wiki/Single_precision
void build_fsqrt_table(void){
unsigned short i;
float f;
UINT32 *fi = (UINT32*)&f;
if (is_sqrttab_initialized)
return;
const int halfTableSize = (tableSize>>1);
for (i=0; i < halfTableSize; i++){
*fi = 0;
*fi = (i << nUnusedBits) | (127 << 23); // Build a float with the bit pattern i as mantissa, and an exponent of 0, stored as 127
// Take the square root then strip the first 'nBitsForSQRTprecision' bits of the mantissa into the table
f = sqrtf(f);
sqrtTab[i] = (short)((*fi & 0x7fffff) >> nUnusedBits);
// Repeat the process, this time with an exponent of 1, stored as 128
*fi = 0;
*fi = (i << nUnusedBits) | (128 << 23);
f = sqrtf(f);
sqrtTab[i+halfTableSize] = (short)((*fi & 0x7fffff) >> nUnusedBits);
}
is_sqrttab_initialized = TRUE;
}
// Calculation of a square root. Divide the exponent of float by 2 and sqrt() its mantissa using the precalculated table.
float fast_float_sqrt(float n){
if (n <= 0.f)
return 0.f; // On 0 or negative return 0.
UINT32 *num = (UINT32*)&n;
short e; // Exponent
e = (*num >> 23) - 127; // In 'float' the exponent is stored with 127 added.
*num &= 0x7fffff; // leave only the mantissa
// If the exponent is odd so we have to look it up in the second half of the lookup table, so we set the high bit.
const int halfTableSize = (tableSize>>1);
const int secondHalphTableIdBit = halfTableSize << nUnusedBits;
if (e & 0x01)
*num |= secondHalphTableIdBit;
e >>= 1; // Divide the exponent by two (note that in C the shift operators are sign preserving for signed operands
// Do the table lookup, based on the quaternary mantissa, then reconstruct the result back into a float
*num = ((sqrtTab[*num >> nUnusedBits]) << nUnusedBits) | ((e + 127) << 23);
return n;
}

How to implement big int in C++

I'd like to implement a big int class in C++ as a programming exercise—a class that can handle numbers bigger than a long int. I know that there are several open source implementations out there already, but I'd like to write my own. I'm trying to get a feel for what the right approach is.
I understand that the general strategy is get the number as a string, and then break it up into smaller numbers (single digits for example), and place them in an array. At this point it should be relatively simple to implement the various comparison operators. My main concern is how I would implement things like addition and multiplication.
I'm looking for a general approach and advice as opposed to actual working code.

A fun challenge. :)
I assume that you want integers of arbitrary length. I suggest the following approach:
Consider the binary nature of the datatype "int". Think about using simple binary operations to emulate what the circuits in your CPU do when they add things. In case you are interested more in-depth, consider reading this wikipedia article on half-adders and full-adders. You'll be doing something similar to that, but you can go down as low level as that - but being lazy, I thought I'd just forego and find a even simpler solution.
But before going into any algorithmic details about adding, subtracting, multiplying, let's find some data structure. A simple way, is of course, to store things in a std::vector.
template< class BaseType >
class BigInt
{
typedef typename BaseType BT;
protected: std::vector< BaseType > value_;
};
You might want to consider if you want to make the vector of a fixed size and if to preallocate it. Reason being that for diverse operations, you will have to go through each element of the vector - O(n). You might want to know offhand how complex an operation is going to be and a fixed n does just that.
But now to some algorithms on operating on the numbers. You could do it on a logic-level, but we'll use that magic CPU power to calculate results. But what we'll take over from the logic-illustration of Half- and FullAdders is the way it deals with carries. As an example, consider how you'd implement the += operator. For each number in BigInt<>::value_, you'd add those and see if the result produces some form of carry. We won't be doing it bit-wise, but rely on the nature of our BaseType (be it long or int or short or whatever): it overflows.
Surely, if you add two numbers, the result must be greater than the greater one of those numbers, right? If it's not, then the result overflowed.
template< class BaseType >
BigInt< BaseType >& BigInt< BaseType >::operator += (BigInt< BaseType > const& operand)
{
BT count, carry = 0;
for (count = 0; count < std::max(value_.size(), operand.value_.size(); count++)
{
BT op0 = count < value_.size() ? value_.at(count) : 0,
op1 = count < operand.value_.size() ? operand.value_.at(count) : 0;
BT digits_result = op0 + op1 + carry;
if (digits_result-carry < std::max(op0, op1)
{
BT carry_old = carry;
carry = digits_result;
digits_result = (op0 + op1 + carry) >> sizeof(BT)*8; // NOTE [1]
}
else carry = 0;
}
return *this;
}
// NOTE 1: I did not test this code. And I am not sure if this will work; if it does
// not, then you must restrict BaseType to be the second biggest type
// available, i.e. a 32-bit int when you have a 64-bit long. Then use
// a temporary or a cast to the mightier type and retrieve the upper bits.
// Or you do it bitwise. ;-)
The other arithmetic operation go analogous. Heck, you could even use the stl-functors std::plus and std::minus, std::times and std::divides, ..., but mind the carry. :) You can also implement multiplication and division by using your plus and minus operators, but that's very slow, because that would recalculate results you already calculated in prior calls to plus and minus in each iteration. There are a lot of good algorithms out there for this simple task, use wikipedia or the web.
And of course, you should implement standard operators such as operator<< (just shift each value in value_ to the left for n bits, starting at the value_.size()-1... oh and remember the carry :), operator< - you can even optimize a little here, checking the rough number of digits with size() first. And so on. Then make your class useful, by befriendig std::ostream operator<<.
Hope this approach is helpful!

Things to consider for a big int class:
Mathematical operators: +, -, /,
*, % Don't forget that your class may be on either side of the
operator, that the operators can be
chained, that one of the operands
could be an int, float, double, etc.
I/O operators: >>, << This is
where you figure out how to properly
create your class from user input, and how to format it for output as well.
Conversions/Casts: Figure out
what types/classes your big int
class should be convertible to, and
how to properly handle the
conversion. A quick list would
include double and float, and may
include int (with proper bounds
checking) and complex (assuming it
can handle the range).

There's a complete section on this: [The Art of Computer Programming, vol.2: Seminumerical Algorithms, section 4.3 Multiple Precision Arithmetic, pp. 265-318 (ed.3)]. You may find other interesting material in Chapter 4, Arithmetic.
If you really don't want to look at another implementation, have you considered what it is you are out to learn? There are innumerable mistakes to be made and uncovering those is instructive and also dangerous. There are also challenges in identifying important computational economies and having appropriate storage structures for avoiding serious performance problems.
A Challenge Question for you: How do you intend to test your implementation and how do you propose to demonstrate that it's arithmetic is correct?
You might want another implementation to test against (without looking at how it does it), but it will take more than that to be able to generalize without expecting an excrutiating level of testing. Don't forget to consider failure modes (out of memory problems, out of stack, running too long, etc.).
Have fun!

addition would probably have to be done in the standard linear time algorithm
but for multiplication you could try http://en.wikipedia.org/wiki/Karatsuba_algorithm

Once you have the digits of the number in an array, you can do addition and multiplication exactly as you would do them longhand.

Don't forget that you don't need to restrict yourself to 0-9 as digits, i.e. use bytes as digits (0-255) and you can still do long hand arithmetic the same as you would for decimal digits. You could even use an array of long.

I'm not convinced using a string is the right way to go -- though I've never written code myself, I think that using an array of a base numeric type might be a better solution. The idea is that you'd simply extend what you've already got the same way the CPU extends a single bit into an integer.
For example, if you have a structure
typedef struct {
int high, low;
} BiggerInt;
You can then manually perform native operations on each of the "digits" (high and low, in this case), being mindful of overflow conditions:
BiggerInt add( const BiggerInt *lhs, const BiggerInt *rhs ) {
BiggerInt ret;
/* Ideally, you'd want a better way to check for overflow conditions */
if ( rhs->high < INT_MAX - lhs->high ) {
/* With a variable-length (a real) BigInt, you'd allocate some more room here */
}
ret.high = lhs->high + rhs->high;
if ( rhs->low < INT_MAX - lhs->low ) {
/* No overflow */
ret.low = lhs->low + rhs->low;
}
else {
/* Overflow */
ret.high += 1;
ret.low = lhs->low - ( INT_MAX - rhs->low ); /* Right? */
}
return ret;
}
It's a bit of a simplistic example, but it should be fairly obvious how to extend to a structure that had a variable number of whatever base numeric class you're using.

Use the algorithms you learned in 1st through 4th grade.
Start with the ones column, then the tens, and so forth.

Like others said, do it to old fashioned long-hand way, but stay away from doing this all in base 10. I'd suggest doing it all in base 65536, and storing things in an array of longs.

If your target architecture supports BCD (binary coded decimal) representation of numbers, you can get some hardware support for the longhand multiplication/addition that you need to do. Getting the compiler to emit BCD instruction is something you'll have to read up on...
The Motorola 68K series chips had this. Not that I'm bitter or anything.

My start would be to have an arbitrary sized array of integers, using 31 bits and the 32n'd as overflow.
The starter op would be ADD, and then, MAKE-NEGATIVE, using 2's complement. After that, subtraction flows trivially, and once you have add/sub, everything else is doable.
There are probably more sophisticated approaches. But this would be the naive approach from digital logic.

Could try implementing something like this:
http://www.docjar.org/html/api/java/math/BigInteger.java.html
You'd only need 4 bits for a single digit 0 - 9
So an Int Value would allow up to 8 digits each. I decided i'd stick with an array of chars so i use double the memory but for me it's only being used 1 time.
Also when storing all the digits in a single int it over-complicates it and if anything it may even slow it down.
I don't have any speed tests but looking at the java version of BigInteger it seems like it's doing an awful lot of work.
For me I do the below
//Number = 100,000.00, Number Digits = 32, Decimal Digits = 2.
BigDecimal *decimal = new BigDecimal("100000.00", 32, 2);
decimal += "1000.99";
cout << decimal->GetValue(0x1 | 0x2) << endl; //Format and show decimals.
//Prints: 101,000.99

The computer hardware provides facility of storing integers and doing basic arithmetic over them; generally this is limited to integers in a range (e.g. up to 2^{64}-1). But larger integers can be supported via programs; below is one such method.
Using Positional Numeral System (e.g. the popular base-10 numeral system), any arbitrarily large integer can be represented as a sequence of digits in base B. So, such integers can be stored as an array of 32-bit integers, where each array-element is a digit in base B=2^{32}.
We already know how to represent integers using numeral-system with base B=10, and also how to perform basic arithmetic (add, subtract, multiply, divide etc) within this system. The algorithms for doing these operations are sometimes known as Schoolbook algorithms. We can apply (with some adjustments) these Schoolbook algorithms to any base B, and so can implement the same operations for our large integers in base B.
To apply these algorithms for any base B, we will need to understand them further and handle concerns like:
what is the range of various intermediate values produced during these algorithms.
what is the maximum carry produced by the iterative addition and multiplication.
how to estimate the next quotient-digit in long-division.
(Of course, there can be alternate algorithms for doing these operations).
Some algorithm/implementation details can be found here (initial chapters), here (written by me) and here.

subtract 48 from your string of integer and print to get number of large digit.
then perform the basic mathematical operation .
otherwise i will provide complete solution.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js