Sum signed 32-bit int with unsigned 64bit int - c++

On my application, I receive two signed 32-bit int and I have to store them. I have to create a sort of counter and I don't know when it will be reset, but I'll receive big values and frequently. Beacause of that, in order to store these values, I decided to use two unsigned 64-bit int.
The following could be a simple version of the counter.
struct Counter
{
unsigned int elementNr;
unsigned __int64 totalLen1;
unsigned __int64 totalLen2;
void UpdateCounter(int len1, int len2)
{
if(len1 > 0 && len2 > 0)
{
++elementNr;
totalLen1 += len1;
totalLen2 += len2;
}
}
}
I know that if a smaller type is casted to a bigger one (e.g. int to long) there should be no issues. However, passing from 32 bit rappresentation to 64 bit rappresentation and from signed to unsigned at the same time, is something new for me.
Reading around, I undertood that len1 should be expanded from 32 bit to 64 bit and then applied sign extension. Because the unsigned int and signen int have the same rank (Section 4.13), the latter should be converted.
If len1 stores a negative value, passing from signed to unsigned will return a wrong value, this is why I check the positivy at the beginning of the function. However, for positive values, there
should be no issues I think.
For clarity I could revrite UpdateCounter(int len1, int len2) like this
void UpdateCounter(int len1, int len2)
{
if(len1 > 0 && len2 > 0)
{
++elementNr;
__int64 tmp = len1;
totalLen1 += static_cast<unsigned __int64>(tmp);
tmp = len2;
totalLen2 += static_cast<unsigned __int64>(tmp);
}
}
Might there be some side effects that I have not considered.
Is there another better and safer way to do that?

A little background, just for reference: binary operators such arithmetic addition work on operands of the same type (the specific CPU instruction to which is translated depends on the number representation that must be the same for both instruction operands).
When you write something like this (using fixed width integer types to be explicit):
int32_t a = <some value>;
uint64_t sum = 0;
sum += a;
As you already know this involves an implicit conversion, more specifically an
integral promotion according to integer conversion rank.
So the expression sum += a; is equivalent to sum += static_cast<uint64_t>(a);, so a is promoted having the lesser rank.
Let's see what happens in this example:
int32_t a = 60;
uint64_t sum = 100;
sum += static_cast<uint64_t>(a);
std::cout << "a=" << static_cast<uint64_t>(a) << " sum=" << sum << '\n';
The output is:
a=60 sum=160
So all is all ok as expected. Let's se what happens adding a negative number:
int32_t a = -60;
uint64_t sum = 100;
sum += static_cast<uint64_t>(a);
std::cout << "a=" << static_cast<uint64_t>(a) << " sum=" << sum << '\n';
The output is:
a=18446744073709551556 sum=40
The result is 40 as expected: this relies on the two's complement integer representation (note: unsigned integer overflow is not undefined behaviour) and all is ok, of course as long as you ensure that the sum does not become negative.
Coming back to your question you won't have any surprises if you always add positive numbers or at least ensuring that sum will never be negative... until you reach the maximum representable value std::numeric_limits<uint64_t>::max() (2^64-1 = 18446744073709551615 ~ 1.8E19).
If you continue to add numbers indefinitely sooner or later you'll reach that limit (this is valid also for your counter elementNr).
You'll overflow the 64 bit unsigned integer by adding 2^31-1 (2147483647) every millisecond for approximately three months, so in this case it may be advisable to check:
#include <limits>
//...
void UpdateCounter(const int32_t len1, const int32_t len2)
{
if( len1>0 )
{
if( static_cast<decltype(totalLen1)>(len1) <= std::numeric_limits<decltype(totalLen1)>::max()-totalLen1 )
{
totalLen1 += len1;
}
else
{// Would overflow!!
// Do something
}
}
}
When I have to accumulate numbers and I don't have particular requirements about accuracy I often use double because the maximum representable value is incredibly high (std::numeric_limits<double>::max() 1.79769E+308) and to reach overflow I would need to add 2^32-1=4294967295 every picoseconds for 1E+279 years.

Related

Non consistent results when calculating the same integer with different expressions

Arithmetic result for the same expression lead to different outcomes depending on wether I define an integer in one line or I use several steps:
int main() {
unsigned long long veryBigIntOneLine = ((255*256+255)*256+255)*256+255;
unsigned long long veryBigInt = 255;
veryBigInt *= 256;
veryBigInt += 255;
veryBigInt *= 256;
veryBigInt += 255;
veryBigInt *= 256;
veryBigInt += 255;
unsigned long long veryBigIntParanthesis = (((((255*256)+255)*256)+255)*256)+255;
unsigned long long fourthInt = 256;
fourthInt *= 256;
fourthInt *= 256;
fourthInt *= 256;
--fourthInt;
cout << "veryBigIntOneLine: " << veryBigIntOneLine << endl;
cout << "veryBigInt: " << veryBigInt << endl;
cout << "veryBigIntParanthesis: " << veryBigIntParanthesis << endl;
cout << "fourthInt: " << fourthInt << endl;
return 0;
}
they should all describe the same number, 256^4-1 (or 2^32-1), but the outcome is different.
veryBigIntOneLine: 18446744073709551615
veryBigInt: 4294967295
veryBigIntParanthesis: 18446744073709551615
fourthInt: 4294967295
4294967295 is the expected answer (as it is given for all four expressions by the Google calculator).
Also 18446744073709551615 is probably not an exact result of what is computed as I get an overflow warning at compilation time for both one line expressions (even when I tried with type __int128). It is actually 2^64-1, which is the max value for unsigned long long with my compiler (veryBigIntOneLine+1 gives 0).
Initialization code ((255*256+255)*256+255)*256+255 suffers from signed integer overflow which is Undefined Behavior, as well as from implicit conversion of signed int to unsigned. While step-by step calculations avoid those problems because right hand operand is implicitly converted to unsigned long long.
Simply using appropriate literals will fix those issues:
unsigned long long veryBigIntOneLine{ ((255ull*256ull+255ull)*256ull+255ull)*256ull+255ull}; // 4294967295
This is because you are not using unsigned long long literals. If you want literals to match your definitions you need to use:
255ULL + 256ULL * 255ULL + ...
The ULL is very important if you create numbers that are 64 bits. In C, without the suffix a number may be 64, 32 or even just 16 bits (even bytes on some CRAY where 64 bits. That also means your code would have worked just find on one of those CRAY systems.)

Output a 15 digit number

I have a program that is suppose to find the sum of all the numbers between 1 and 75 in the fibonacci sequence that are divisible by three and add them together. I got the program working properly the only problem I am having is being able to display such a large number. I am told that the answer should be 15 digits. I have tried long long, long double, unsigned long long int and none of those produce the right output (they produce a negative number).
code:
long fibNum(int kth, int nth);
int main()
{
int kTerm;
int nTerm;
kTerm = 76;
nTerm = 3;
std::cout << fibNum(kTerm, nTerm) << std::endl;
system("pause");
return 0;
}
long fibNum(int kth, int nth)
{
int term[100];
long firstTerm;
long secondTerm;
long exactValue;
int i;
term[1] = 1;
term[2] = 1;
exactValue = 0;
do
{
firstTerm = term[nth - 1];
secondTerm = term[nth - 2];
term[nth] = (firstTerm + secondTerm);
nth++;
}
while(nth < kth);
for(i = 1; i < kth; i++)
{
if(term[i] % 3 == 0)
{
term[i] = term[i];
}
else
term[i] = 0;
exactValue = term[i] + exactValue;
}
return exactValue;
I found out that the problem has to do with the array. The array cannot store the 47th term which is 10 digits. Now I have no idea what to do
Type long long is guaranteed to be at least 64 bits (and is exactly 64 bits on every implementation I've seen). Its maximum value, LLONG_MAX is at least 263-1, or 9223372036854775807, which is 19 decimal digits -- so long longis more than big enough to represent 15-digit numbers.
Just use type long long consistently. In your code, you have one variable of type long double, which has far more range than long long but may have less precision (which could make it impossible to determine whether a given number is a multiple of 3.)
You could also use unsigned long long, whose upper bound is at least 264-1, but either long long or unsigned long long should be more than wide enough for your purposes.
Displaying a long long value in C++ is straightforward:
long long num = some_value;
std::cout << "num = " << num << "\n";
Or if you prefer printf for some reason, use the "%lld" format for long long, "%llu" for unsigned long long.
(For integers too wide to fit in 64 bits, there are software packages that handle arbitrarily large integers; the most prominent is GNU's GMP. But you don't need it for 15-digit integers.)
What you can do is take char s[15]and int i=14,k,
and then go for while loop till sum!=0
Under while body
k=n%10;
s[i]=k+48;
n=n/10;
i--;
The array cannot store the 47th term which is 10 digits.
This indicates that your architecture has a type long with just 32 bits. That is common for 32-bit architecture. 32 bits cover 9 digit numbers and low 10-digit numbers, to be precise 2.147.483.647 for long and 4.294.967.295 for unsigned long.
Just change your long types to long long or unsigned long long, including the return type of fibNum. That would easily cover 18 digits.

Find largest unsigned int .... Why doesn't this work?

Couldn't you initialize an unsigned int and then increment it until it doesn't increment anymore? That's what I tried to do and I got a runtime error "Timeout." Any idea why this doesn't work? Any idea how to do it correctly?
#include
int main() {
unsigned int i(0), j(1);
while (i != j) {
++i;
++j;
}
std::cout << i;
return 0;
}
Unsigned arithmetic is defined as modulo 2^n in C++ (where n is
the number of bits). So when you increment the maximum value,
you get 0.
Because of this, the simplest way to get the maximum value is to
use -1:
unsigned int i = -1;
std::cout << i;
(If the compiler gives you a warning, and this bothers you, you
can use 0U - 1, or initialize with 0, and then decrement.)
Since i will never be equal to j, you have an infinite loop.
Additionally, this is a very inefficient method for determining the maximum value of an unsigned int. numeric_limits gives you the result without looping for 2^(16, 32, 64, or however many bits are in your unsigned int) iterations. If you didn't want to do that, you could write a much smaller loop:
unsigned int shifts = sizeof(unsigned int) * 8; // or CHAR_BITS
unsigned int maximum_value = 1;
for (int i = 1; i < shifts; ++i)
{
maximum_value <<= 1;
++maximum_value;
}
Or simply do
unsigned int maximum = (unsigned int)-1;
i will always be different than j, so you have entered an endless loop. If you want to take this approach, your code should look like this:
unsigned int i(0), j(1);
while (i < j) {
++i;
++j;
}
std::cout << i;
return 0;
Notice I changed it to while (i<j). Once j overflows i will be greater than j.
When an overflow happens, the value doesn't just stay at the highest, it wraps back abound to the lowest possible number.
i and j will be never equal to each other. When an unsigned integral value achieves its maximum adding to it 1 will result in that the next value will be the minimum that is 0.
For example if to consider unsigned char then its maximum is 255. After adding 1 you will get 0.
So your loop is infinite.
I assume you're trying to find the maximum limit that an unsigned integer can store (which is 65,535 in decimal). The reason that the program will time out is because when the int hits the maximum value it can store, it "Goes off the end." The next time j increments, it will be 65,535; i will be 0.
This means that the way you're going about it, i would NEVER equal j, and the loop would run indefinitely. If you changed it to what Damien has, you'd have i == 65,535; j equal to 0.
Couldn't you initialize an unsigned int and then increment it until it doesn't increment anymore?
No. Unsigned arithmetic is modular, so it wraps around to zero after the maximum value. You can carry on incrementing it forever, as your loop does.
Any idea how to do it correctly?
unsigned int max = -1; // or
unsigned int max = std::numeric_limits<unsigned int>::max();
or, if you want to use a loop to calculate it, change your condition to (j != 0) or (i < j) to stop when j wraps. Since i is one behind j, that will contain the maximum value. Note that this might not work for signed types - they give undefined behaviour when they overflow.

Binary-Decimal Negative bit set

How can I tell if a binary number is negative?
Currently I have the code below. It works fine converting to Binary. When converting to decimal, I need to know if the left most bit is 1 to tell if it is negative or not but I cannot seem to figure out how to do that.
Also, instead of making my Bin2 function print 1's an 0's, how can I make it return an integer? I didn't want to store it in a string and then convert to int.
EDIT: I'm using 8 bit numbers.
int Bin2(int value, int Padding = 8)
{
for (int I = Padding; I > 0; --I)
{
if (value & (1 << (I - 1)))
std::cout<< '1';
else
std::cout<<'0';
}
return 0;
}
int Dec2(int Value)
{
//bool Negative = (Value & 10000000);
int Dec = 0;
for (int I = 0; Value > 0; ++I)
{
if(Value % 10 == 1)
{
Dec += (1 << I);
}
Value /= 10;
}
//if (Negative) (Dec -= (1 << 8));
return Dec;
}
int main()
{
Bin2(25);
std::cout<<"\n\n";
std::cout<<Dec2(11001);
}
You are checking for negative value incorrectly. Do the following instead:
bool Negative = (value & 0x80000000); //It will work for 32-bit platforms only
Or may be just compare it with 0.
bool Negative = (value < 0);
Why don't you just compare it to 0. Should work fine and almost certainly you can't do this in a manner more efficient than the compiler.
I am entirely unclear if this is what the OP is looking for, but its worth a toss:
If you know you have a value in a signed int that is supposed to be representing a signed 8-bit value, you can pull it apart, store it in a signed 8-bit value, then promote it back to a native int signed value like this:
#include <stdio.h>
int main(void)
{
// signed integer, value is 245. 8bit signed value is (-11)
int num = 0xF5;
// pull out the low 8 bits, storing them in a signed char.
signed char ch = (signed char)(num & 0xFF);
// now let the signed char promote to a signed int.
int res = ch;
// finally print both.
printf("%d ==> %d\n",num, res);
// do it again for an 8 bit positive value
// this time with just direct casts.
num = 0x70;
printf("%d ==> %d\n", num, (int)((signed char)(num & 0xFF)));
return 0;
}
Output
245 ==> -11
112 ==> 112
Is that what you're trying to do? In short, the code above will take the 8bits sitting at the bottom of num, treat them as a signed 8-bit value, then promote them to a signed native int. The result is you can now "know" not only whether the 8-bits were a negative number (since res will be negative if they were), you also get the 8-bit signed number as a native int in the process.
On the other hand, if all you care about is whether the 8th bit is set in the input int, and is supposed to denote a negative value state, then why not just :
int IsEightBitNegative(int val)
{
return (val & 0x80) != 0;
}

computing permutation of specific bits in a number

As part of my master thesis, I get a number (e.g. 5 bits) with 2 significant bits (2nd and 4th). This means for example x1x0x, where $x \in {0,1}$ (x could be 0 or 1) and 1,0 are bits with fixed values.
My first task is to compute all the combinations of the above given number , 2^3 = 8. This is called S_1 group.
Then I need to compute 'S_2' group and this is all the combinations of the two numbers x0x0x and x1x1x(this means one mismatch in the significant bits), this should give us $\bin{2}{1} * 2^3 = 2 * 2^3 = 16.
EDIT
Each number, x1x1x and x0x0x, is different from the Original number, x1x0x, at one significant bit.
Last group, S_3, is of course two mismatches from the significant bits, this means, all the numbers which pass the form x0x1x, 8 possibilities.
The computation could be computed recursively or independently, that is not a problem.
I would be happy if someone could give a starting point for these computations, since what I have is not so efficient.
EDIT
Maybe I chose my words wrongly, using significant bits. What I meant to say is that a specific places in a five bits number the bit are fixed. Those places I defined as specific bits.
EDIT
I saw already 2 answers and it seems I should have been clearer. What I am more interested in, is finding the numbers x0x0x, x1x1x and x0x1x with respect that this is a simply example. In reality, the group S_1 (in this example x1x0x) would be built with at least 12 bit long numbers and could contain 11 significant bits. Then I would have 12 groups...
If something is still not clear please ask ;)
#include <vector>
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
string format = "x1x0x";
unsigned int sigBits = 0;
unsigned int sigMask = 0;
unsigned int numSigBits = 0;
for (unsigned int i = 0; i < format.length(); ++i)
{
sigBits <<= 1;
sigMask <<= 1;
if (format[i] != 'x')
{
sigBits |= (format[i] - '0');
sigMask |= 1;
++numSigBits;
}
}
unsigned int numBits = format.length();
unsigned int maxNum = (1 << numBits);
vector<vector<unsigned int> > S;
for (unsigned int i = 0; i <= numSigBits; i++)
S.push_back(vector<unsigned int>());
for (unsigned int i = 0; i < maxNum; ++i)
{
unsigned int changedBits = (i & sigMask) ^ sigBits;
unsigned int distance = 0;
for (unsigned int j = 0; j < numBits; j++)
{
if (changedBits & 0x01)
++distance;
changedBits >>= 1;
}
S[distance].push_back(i);
}
for (unsigned int i = 0; i <= numSigBits; ++i)
{
cout << dec << "Set with distance " << i << endl;
vector<unsigned int>::iterator iter = S[i].begin();
while (iter != S[i].end())
{
cout << hex << showbase << *iter << endl;
++iter;
}
cout << endl;
}
return 0;
}
sigMask has a 1 where all your specific bits are. sigBits has a 1 wherever your specific bits are 1. changedBits has a 1 wherever the current value of i is different from sigBits. distance counts the number of bits that have changed. This is about as efficient as you can get without precomputing a lookup table for the distance calculation.
Of course, it doesn't actually matter what the fixed-bit values are, only that they're fixed. xyxyx, where y is fixed and x isn't, will always yield 8 potentials. The potential combinations of the two groups where y varies between them will always be a simple multiplication- that is, for each state that the first may be in, the second may be in each state.
Use bit logic.
//x1x1x
if(01010 AND test_byte) == 01010) //--> implies that the position where 1s are are 1.
There's probably a number-theoretic solution, but, this is very simple.
This needs to be done with a fixed-bit integer type. Some dynamic languages (python for example), will extend bits out if they think it's a good idea.
This is not hard, but it is time consuming, and TDD would be particularly appropriate here.