Conversion from unsigned long long to unsigned int in C++ - c++

If I write the following code in C/C++:
using namespace std::chrono;
auto time = high_resolution_clock::now();
unsigned long long foo = time.time_since_epoch().count();
unsigned bar = foo;
Which bits of foo are dropped in the conversion to unsigned int? Is there any way I can enforce only the least significant bits to be preserved?
Alternatively, is there a simple way to hash foo into an unsigned int size? (This would be preferred, but I'm doing all of this in an initializer list.)
EDIT: Just realized that preserving the least significant bits could still allow looping. I gues hashing is what I'd be going for then.
2nd EDIT: To clarify what I responded in a comment, I am seeding std::default_random_engine inside a loop and do not want an overflow to cause seed values to repeat. I am looking for a simple way to hash unsigned long long into unsigned int.

Arithmetic with unsigned integral types in C and in C++ deals with out of range values by modular reduction; e.g. if unsigned int is a 32 bit type, then when assigned a value out of range, it reduces the value modulo 2^32 and stores the least nonnegative representative.
In particular, that is exactly what the standard mandates when assigning from a larger to a smaller unsigned integral type.
Note also that the standard doesn't guarantee the size of unsigned int — if you need, for example, a type that is definitely exactly 32 bits wide, use uint32_t.

Good news you're safe:
I just ran time.time_since_epoch().count() and got:
1,465,934,400
You've got a while till you have till you see a repeat value since `numeric_limits is:
4,294,967,295
Copying to a integral type of a smaller size:
Causes dropping of excess higher order bits
So if you just do static_cast<unsigned int>(foo) you won't get a matching output for roughly 136 years: numeric_limits<unsigned int>::max / 60U / 60U / 24U / 356U
PS You won't care if you get a repeat by then.

Replying to the question in edit two, you can seed based on the delta between when you started and when the seed is needed. E.g.:
std::default_random_engine generator;
typedef std::chrono::high_resolution_clock myclock;
myclock::time_point beginning = myclock::now();
for (int i = 0; i < 128; i++) // or whatever type of loop you're using
{
// obtain a seed from the timer
myclock::duration d = myclock::now() - beginning;
unsigned seed = d.count();
generator.seed(seed);
// TODO: magic
}

Okay, I've settled on using std::hash.
std::hash<long long> seeder;
auto seed = seeder(time.time_since_epoch().count());
Since I am doing this in an initializer list, this can be put together in one line and passed to std::default_random_engine (but that looks pretty ugly).
This is quit a hit to performance, but at least it reduces the chance of seeds repeating.

Related

What data type is used to store intermediate calculations while executing a program in C++?

I was trying to do the following calculations but found out that the calculations do not yield the correct result.
I have the following doubt that when my computer does the calculation a*b, what data type is used to store the result of the calculation temporary before doing the modulus. How is the data type in which it stores the result decided?.
Please do let me know about the source of the information.
#include <iostream>
using namespace std;
int main()
{
long long int a=1000000000000000000; // 18 zeroes
long long int b=1000000000000000000;
long long int c=1000000007;
long long int d=(a*b)%c;
cout<<a<<"\n"<<b<<"\n"<<c<<"\n"<<d;
}
Edit1: This code also gives incorrect output
#include <iostream>
using namespace std;
int main()
{
int a=1000000000; // 9 zeroes
int b=1000000000;
long long int c=1000000007;
long long int d=a*b%c;
cout<<a<<"\n"<<b<<"\n"<<c<<"\n"<<d;
}
How is the data type in which it stores the result decided?
The rules are fairly complicated and convoluted in general, but in this particular case it's simple. a*b is of type long long, and since a*b overflows the programs has Undefined Behavior.
You can use the equivalent formula to compute the correct result (without overflowing):
(a * b) % c == ((a % c) * (b % c)) % c
Could you also suggest on how to decide for mixed data types and post
about your source of information
Of some interest: https://en.cppreference.com/w/cpp/language/implicit_conversion The standard rules are unfortunately even more complicated.
As some suggestions:
never mix unsigned and signed.
pay attentions that types smaller than int will be promoted to int or unsigned.
for a type T equal or larger than int then T op T will have type type T. This is what you should be aiming for in your expressions. (i.e. have both operators of the same type either int, long or long long.
avoid unsigned types. Unfortunately that's impossible with the current Standard Library design (std::size_t sigh)
avoid long as its width differs between current major compilers and platforms
if you care about the width of the integer data type then avoid int long long long and such and always use fixed width integer types (std::int32_t std::int64_t etc.). Completely ignore that technically those types are optional.
My understanding is that long long has to be able to accommodate at least 64 bits but each 1000000000000000000 is a 60 bit number so a*b would yield a result that exceeds any integer representation the compiler supports. Perhaps you were thinking that the 1000000000000000000 was binary?

How to get negative remainder with remainder operator on size_t?

Consider the following code sample:
#include <iostream>
#include <string>
int main()
{
std::string str("someString"); // length 10
int num = -11;
std::cout << num % str.length() << std::endl;
}
Running this code on http://cpp.sh, I get 5 as a result, while I was expecting it to be -1.
I know that this happens because the type of str.length() is size_t which is an implementation dependent unsigned, and because of the implicit type conversions that happen with binary operators that cause num to be converted from a signed int to an unsigned size_t (more here);
this causes the negative value to become a positive one and messes up the result of the operation.
One could think of addressing the problem with an explicit cast to int:
num % (int)str.length()
This might work but it's not guaranteed, for instance in the case of a string with length larger than the maximum value of int. One could reduce the risk using a larger type, like long long, but what if size_t is unsigned long long? Same problem.
How would you address this problem in a portable and robust way?
Since C++11, you can just cast the result of length to std::string::difference_type.
To address "But what if the size is too big?":
That won't happen on 64 bit platforms and even if you are on a smaller one: When was the last time you actually had a string that took up more than half of total RAM? Unless you are doing really specific stuff (which you would know), using the difference_type is just fine; quit fighting ghosts.
Alternatively, just use int64_t, that's certainly big enough. (Though maybe looping over one on some 32 bit processors is slower than int32_t, I don't know. Won't matter for that single modulus operation though.)
(Fun fact: Even some prominent committee members consider littering the standard library with unsigned types a mistake, for reference see
this panel at 9:50, 42:40, 1:02:50 )
Pre C++11, the sign of % with negative values was implementation defined, for well defined behavior, use std::div plus one of the casts described above.
We know that
-a % b == -(a % b)
So you could write something like this:
template<typename T, typename T2>
constexpr T safeModulo(T a, T2 b)
{
return (a >= 0 ? 1 : -1) * static_cast<T>(std::llabs(a) % b);
}
This won't overflow in 99.98% of the cases, because consider this
safeModulo(num, str.length());
If std::size_t is implemented as an unsigned long long, then T2 -> unsigned long long and T -> int.
As pointed out in the comments, using std::llabs instead of std::abs is important, because if a is the smallest possible value of int, removing the sign will overflow. Promoting a to a long long just before won't result in this problem, as long long has a larger range of values.
Now static_cast<int>(std::llabs(a) % b) will always result in a value that is smaller than a, so casting it to int will never overflow/underflow. Even if a gets promoted to an unsigned long long, it doesn't matter because a is already "unsigned" from std::llabs(a), and so the value is unchanged (i.e. didn't overflow/underflow).
Because of the property stated above, if a is negative, multiply the result with -1 and you get the correct result.
The only case where it results in undefined behavior is when a is std::numeric_limits<long long>::min(), as removing the sign overflows a, resulting in undefined behavior. There is probably another way to implement the function, I'll think about it.

long and int not enough and double wouldn't work

I am using C++ and I've heard and experienced that the maximum value that can be stored in a int
and a long are same.
But my problem is that I need to store a number that exceed the maximum value
of long variable. The size of double variable is pretty enough.
But the problem is using double variable
avoid me using the operator % which is necessary to code my function more easily and some times there
seems to be no other ways than using it.
So please would you kindly tell me a way to achieve my target?
It depends on the purpose. For a better answer, give us more context
Have a look at (unsigned) long long or GMP
You can use type long long intor unsigned long long int
To know the maximum value that an untegral type can contain you can use the following construction as for example
std::numeric_limits<long long>::max();
To use it you have to include header <limits>
So, you want to compute the modulo of large integers. It's 99% likely you're doing encryption, which is hard stuff. Your question kind of implies that maybe you should look for some off-the-shelf solution for your top-level problem (the encryption).
Anyway, the standard answer is otherwise to use a library for large-precision integers, such as GNU MP.
#include <cmath>
int main ()
{
double max_uint = 4294967295.0;
double max1 = max_uint + 2.0;
double max2 = (max1 + 1.0) * (max_uint + 1.0);
double f = fmod(max2,max1);
return 0;
}
max1 and max2 are both over unsigned int limit, and fmod returns correct max2 % max1 result, which is also over unsigned int limit: f == max_uint + 1.0.
Edit:
good hint from anatolyg: this method works only for integers up to 2^52. This is because mantissa of double has 52 bit, and every higher integer is representable only with precision loss. E.g. 2^80 could be == (2^80)+1 and == (2^80)+2 and so on. The higher the integers, the higher the inprecision, because densitiy of representable integers gets wider there.
But if you just need to have 20 extra bit compared to int with 32 bit, and have no other possibility to achieve this with an built-in integral type (with which the regular % will be faster I think), then you can use this...
first there's a difference between int and long type
but for To fix the your problem you can use
unsigned long long int
here is a list of some of the sizes you would expect in C++:
char : 1 byte
short : 2 bytes
int : 4 bytes
long : 4 bytes
long long : 8 bytes
float : 4 bytes
double : 8 bytes
I think this clearly explains why you are experiencing difficulties and gives you a hint on how to solve them

Large Number Issues in C++

I'm working on a relatively simple problem based around adding all the primes under a certain value together. I've written a program that should accomplish this task. I am using long type variables. As I get up into higher numbers (~200/300k), the variable I am using to track the sum becomes negative despite the fact that no negative values are being added to it (based on my knowledge and some testing I've done). Is there some issue with the data type or I am missing something.
My code is below (in C++) [Vector is basically a dynamic array in case people are wondering]:
bool checkPrime(int number, vector<long> & primes, int numberOfPrimes) {
for (int i=0; i<numberOfPrimes-1; i++) {
if(number%primes[i]==0) return false;
}
return true;
}
long solveProblem10(int maxNumber) {
long sumOfPrimes=0;
vector<long> primes;
primes.resize(1);
int numberOfPrimes=0;
for (int i=2; i<maxNumber; i++) {
if(checkPrime(i, primes, numberOfPrimes)) {
sumOfPrimes=sumOfPrimes+i;
primes[numberOfPrimes]=long(i);
numberOfPrimes++;
primes.resize(numberOfPrimes+1);
}
}
return sumOfPrimes;
}
Integers represent values use two's complement which means that the highest order bit represents the sign. When you add the number up high enough, the highest bit is set (an integer overflow) and the number becomes negative.
You can resolve this by using an unsigned long (32-bit, and may still overflow with the values you're summing) or by using an unsigned long long (which is 64 bit).
the variable I am using to track the sum becomes negative despite the fact that no negative values are being added to it (based on my knowledge and some testing I've done)
longs are signed integers. In C++ and other lower-level languages, integer types have a fixed size. When you add past their maximum they will overflow and wrap-around to negative numbers. This is due to the behavior of how twos complement works.
check valid integer values: Variables. Data Types.
you're using signed long, which is usually 32 bit, which means -2kkk - 2kkk, you can either use unsigned long, which is 0-4kkk, or use 64 bit (un)signed long long
if you need values bigger 2^64 (unsigned long long), you will need to use bignum math
long is probably only 32 bits on your system - use uint64_t for the sum - this gives you a guaranteed 64 bit unsigned integer.
#include <cstdint>
uint64_t sumOfPrimes=0;
You can include header <cstdint> and use type std::uintmax_t instead of long.

What's an efficient way to avoid integer overflow converting an unsigned int to int in C++?

Is the following an efficient and problem free way to convert an unsigned int to an int in C++:
#include <limits.h>
void safeConvert(unsigned int passed)
{
int variable = static_cast<int>(passed % (INT_MAX+1));
...
}
Or is there a better way?
UPDATE
As pointed out by James McNellis it is not undefined to assign an unsigned int > INT_MAX to an integer - rather this is implementation defined. As such the context here is now specifically on my preference is to ensure this integer resets to zero when the unsigned int exceeds INT_MAX.
Original Context
I have a number of unsigned int's used as counters, but want to pass them around as integers in a specific case.
Under normal operation these counts will remain within the bounds of INT_MAX. However to avoid running into undefined implementation specific behaviour should the abnormal (but valid) case occur I want some efficient conversion here.
This should also work:
int variable = passed & INT_MAX;
Under normal operation these counts will remain within the bounds of INT_MAX. However to avoid running into undefined behaviour should the abnormal (but valid) case occur I want some efficient conversion here.
Efficient conversion to what? If all the shared values for int and unsigned int correspond, and you want other unsigned values such as INT_MAX + 1 to each have distinct values, then you can only map them onto the negative integer values. This is done by default, and can be explicitly requested with static_cast<int>(my_unsigned). Otherwise, you could map them all to 0, or -1, or INT_MIN, or throw away the high bit... easiest way is simply: if (my_unsigned > INT_MAX) my_unsigned = XXX, or ...my_unsigned &= INT_MAX to clear the high bit. But will the called functions work properly if the int overflows? Perhaps a better solution would be to use 64-bit ints to begin with?