C++: Big Integers - c++

I am a writing a lexer as part of a compiler project and I need to detect if an integer is larger than what can fit in a int so I can print an error. Is there a C++ standard library for big integers that could fit this purpose?

The Standard C library functions for converting number strings to integers are supposed to detect numbers which are out of range, and set errno to ERANGE to indicate the problem. See here

You could probably use libgmp. However, I think for your purpose, it's just unnecessary.
If you, for example, parse your numbers to 32-bit unsigned int, you
parse the first at most 9 decimal numbers (that's floor(32*log(2)/log(10)). If you haven't more, the number is OK.
take the next digit. If the number you got / 10 is not equal to the number from the previous step, the number is bad.
if you have more digits (eg. more than 9+1), the number is bad.
else the number is good.
Be sure to skip any leading zeros etc.

libgmp is a general solution, though maybe a bit heavyweight.
For a lighter-weight lexical analyzer, you could treat it as a string; trim leading zeros, then if it's longer than 10 digits, it's too long; if shorter then it's OK, if exactly 10 digits, string compare to the max values 2^31=2147483648 or 2^32=4294967296. Keep in mind that -2^31 is a legal value but 2^31 isn't. Also keep in mind the syntax for octal and hexadecimal constants.

To everyone suggesting atoi:
My atoi() implementation does not set errno.
My atoi() implementation does not return INT_MIN or INT_MAX on overflow.
We cannot rely on sign reversal. Consider 0x4000...0.
*2 and the negative bit is set.
*4 and the value is zero.
With base-10 numbers our next digit would multiply this by 10.
This is all nuts. Unless your lexer is parsing gigs of numerical data, stop the premature optimization already. It only leads to grief.
This approach may be inefficient, but it's adequate for your needs:
const char * p = "1234567890123";
int i = atoi( p );
ostringstream o;
o << i;
return o.str() == p;
Or, leveraging the stack:
const char * p = "1234567890123";
int i = atoi( p );
char buffer [ 12 ];
snprintf( buffer, 12, "%d", i );
return strcmp(buffer,p) == 0;

How about this. Use atol, and check for overflow and underflow.
#include <iostream>
#include <string>
using namespace std;
main()
{
string str;
cin >> str;
int i = atol(str.c_str());
if (i == INT_MIN && str != "-2147483648") {
cout << "Underflow" << endl;
} else if (i == INT_MAX && str != "2147483647") {
cout << "Overflow" << endl;
} else {
cout << "In range" << endl;
}
}

You might want to check out GMP if you want to be able to deal with such numbers.

In your lexer as you parse the integer string you must multiply by 10 before you add each new digit (assuming you're parsing from left to right). If that running total suddenly becomes negative, you've exceeded the precision of the integer.

If your language (like C) supports compile-time evaluation of expressions, then you might need to think about that, too.
Stuff like this:
#define N 2147483643 // This is 2^31-5, i.e. close to the limit.
int toobig = N + N;
GCC will catch this, saying "warning: integer overflow in expression", but of course no individual literal is overflowing. This might be more than you require, just thought I'd point it out as stuff that real compilers to in this department.

You can check to see if the number is higher or lower than INT_MAX or INT_MIN respectively. You would need to #include <limits.h>

Related

C++ while loop is not reading -1 from my file

I am writing a program that reads numbers from a .txt file and outputs a respective amount of asterisks (for even integers) and dollar signs (for odd integers). For example, a 3 would output $$$ and 2 would output **. The program works fine, except for when it reads the number -1. Other negative numbers work just fine, except for -1 for some reason..
Here is my code:
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
#include <Windows.h>
using namespace std;
int main()
{
int value, even, odd;
ifstream infile;
infile.open("lab6_input.txt");
while (infile >> value)
{
if (value % 2 == 0)
cout << string(abs(value), "*$"[value % 2]) << endl;
else
cout << string(abs(value), "*$"[value % 2]) << endl;
value++;
}
infile.close();
system("pause");
return 0;
}
Here is my output: https://imgur.com/a/favqrLv
The last number in the output is a -1, but it just displays an empty space.
Your code seems a bit odd. Your variables are not initialized, even and odd are not even used. Your if statement is unnecessary because you have in both cases the same code.
To your question:
You should use abs(value) twice.
Try
while(infile >> value){
cout << string(abs(value), "*$"[abs(value) % 2]) << endl;
value++;
}
Live example
The problem lies here
"*$"[value % 2]
In C++, the result of the modulo operator applied to a negative number is negative (well, technically is a bit more complicated than that). So, when value is negative, that instruction causes undefined behavior, accessing the array (string literal) out of bounds (the one at index -1).
You could solve the issue taking the absolute value of value or of the result, but consider writing a free function like the following, instead.
constexpr bool is_odd(int x)
{
return x % 2;
}
It will better express the intent and will help the compiler to optimize your code (see e.g. here), because it's like you are asking
Tell me if value is divisible by two (the remainder of its division by 2 is zero) or not.
Which is different from
Give me the remainder of the division of value by 2
You may have noted, in linked Compiler Explorer page, that the compilers end up using a simple
and edi, 1
Instead of performing an actual modulo operation. This is because what you really need is the less significant bit and you could directly use in your code
value & 1
Note, though, that the Standard (C++17, while I'm writing) doesn't mandates (yet, C++20 will require two's complement) a particular binary representation for type int, so the previous would be implementation defined (and wrong, if you happen to find a ones' complement still working int implementation).

How long long is represented in memory?

I am not an advanced C++ programmer. But I have been using C++ for a long time now. So, I love playing with it. Lately I was thinking about ways to maximize a variable programmatically. So I tried Bitwise Operators to fill a variable with 1's. Then there's signed and unsigned issue. My knowledge of memory representation is not very well. However, I ended up writing the following code which is working for both signed and unsigned short, int and long (although int and long are basically the same). Unfortunately, for long long, the program is failing.
So, what is going on behind the scenes for long long? How is it represented in memory? Besides, Is there any better way to do achieve the same thing in C++?
#include <bits/stdc++.h>
using namespace std;
template<typename T>
void Maximize(T &val, bool isSigned)
{
int length = sizeof(T) * 8;
cout << "\nlength = " << length << "\n";
// clearing
for(int i=0; i<length; i++)
{
val &= 0 << i;
}
if(isSigned)
{
length--;
}
val = 1 << 0;
for(int i=1; i<length; i++)
{
val |= 1 << i;
cout << "\ni = " << i << "\nval = " << val << "\n";
}
}
int main()
{
long long i;
Maximize(i, true);
cout << "\n\nsizeof(i) = " << sizeof(i) << " bytes" << "\n";
cout << "i = " << i << "\n";
return 0;
}
The basic issue with your code is in the statements
val &= 0 << i;
and
val |= 1 << i;
in the case that val is longer than an int.
In the first expression, 0 << i is (most likely) always 0, regardless of i (technically, it suffers from the same undefined behaviour described below, but you will not likely encounter the problem.) So there was no need for the loop at all; all of the statements do the same thing, which is to zero out val. Of course, val = 0; would have been a simpler way of writing that.
The issue 1 << i is that the constant literal 1 is an int (because it is small enough to be represented as an int, and int is the narrowest representation used for integeral constants). Since 1 is an int, so is 1 << i. If i is greater than or equal to the number of value bits in an int, that expression has undefined behaviour, so in theory the result could be anything. In practice, however, the result is likely to be the same width as an int, so only the low-order bits will be affected.
It is certainly possible to convert the 1 to type T (although in general, you might need to be cautious about corner cases when T is signed), but it is easier to convert the 1 to an unsigned type at least as wide as Tby using the maximum-width unsigned integer type defined in cstdint, uintmax_t:
val |= std::uintmax_t(1) << i;
In real-world code, it is common to see the assumption that the widest integer type is long long:
val |= 1ULL << i;
which will work fine if the program never attempts to instantiate the template with a extended integer type.
Of course, this is not the way to find the largest value for an integer type. The correct solution is to #include <limits> and then use the appropriate specialization of std::numeric_limits<T>::max()
C++ allows only one representation for positive (and unsigned) integers, and three possible representations for negative signed integers. Positive and unsigned integers are simply represented as a sequence of bits in binary notation. There may be padding bits as well, and signed integers have a single sign bit which must be 0 in the case of positive integers, so there is no guarantee that there are 8*sizeof(T) useful bits in the representation, even if the number of bits in a byte is known to be 8 (and, in theory, it could be larger). [Note 1]
The sign bit for negative signed integers is always 1, but there are three different formats for the value bits. The most common is "two's complement", where the value bits interpreted as a positive number would be exactly 2k more than the actual value of the number, where k is the number of value bits. (This is equivalent to specifying a weight of 2-k to the sign bits, which is why it is called 2s complement.)
Another alternative is "one's complement", in which the value bits are all inverted individually. This differs by exactly one from two's-complement representation.
The third allowable alternative is "sign-magnitude", in which the value bits are precisely the absolute value of the negative number. This representation is frequently used for floating point values, but only rarely used in integer values.
Both sign-magnitude and one's complement suffer from the disadvantage that there is a bit pattern which represents "negative 0". On the other hand, two's complement representation has the feature that the magnitude of the most negative representable value is one larger than the magnitude of the most positive representable value, with the result that both -x and x/-1 can overflow, leading to undefined behaviour.
Notes
I believe that it is theoretically possible for padding to be inserted between the value bits and the sign bits, but I certainly do not know of any real-world implementation with that feature. However, the fact that attempting to shift a 1 into the sign bit position is undefined behaviour makes it incorrect to assume that the sign bit is contiguous with the value bits.
I was thinking about ways to maximize a variable programmatically.
You are trying to reinvent the wheel. C++ STL already has this functionality: std::numeric_limits::max()
// x any kind of numeric type: any integer or any floating point value
x = std::numeric_limits<decltype(x)>::max();
This is also better since you will not relay on undefined behavior.
As harold commented, the solution is to use T(1) << i instead of 1 << i. Also as Some programmer dude mentioned, long long is represented as consecutive bytes (typically 8 bytes) with sign bit at the MSB if it is signed.

Output a floating point value with a constant number of characters in C++

I'm having trouble doing what I think should be a fairly simple task in C++. I'm trying to output a floating point value to be written into a log file. The log file has 7 characters designated for the number output, but I'm finding it to be a little nontrivial to get a constant 7 character output over a wide range of values of different magnitudes, signs, and precisions (eg: 1, -0.60937, 0.60937, 0.009371, -0.009371). I've got a somewhat hacked way to kinda do it:
int desiredPrecision = 6;
if (runningAvg < 0)
desiredPrecision--;
if (std::abs((long) runningAvg) < 1)
desiredPrecision--;
else
theFile << std::showpoint;
theFile.precision (desiredPrecision);
theFile.fill('0');
theFile.setf(std::ios_base::left, std::ios_base::adjustfield);
theFile.width(7);
theFile << runningAvg << std::endl;
But this way seems extremely hacky to me. It works with numbers like:
-0.60937 (outputs: -0.6094)
-1.7 (-1.7000)
-1 (-1.0000)
0.6937 (0.60937)
0.00937 (0.00937)
but it breaks with
0.009371 (0.009371)
and
-0.009371 (-0.009371)
Now, I could add another level of if-else statements to deal with small magnitude numbers, but that just seems to be adding to the level of hackiness, and not a clean way to do it. I've played a bit with fprintf, but it seems like it is more concerned with a strict mathematical definition of precision, whereas in this application I care more about restricting the width of the field to 7 characters at all times. (I can also rely on these numbers never being so large that I'll overflow 6 characters plus a sign)
Am I missing something obvious here? Anyone have any tips for a less hacked way to achieve this?
Don't know how to do this with iostream stuff but I think that the ?printf format string you're looking for is one of these:
%.4f for negative numbers and %07.5f for positive numbers.
%+.4f (positive numbers will have a leading +)
% .4f (positive numbers will have a leading space)
The easiest way to do it is to print into a string and chop the string.
You are really doing text processing/report generation not floating point number handling sop treat it as a formatting problem
If you are not concern with precision (or concern enough to use the same precision for all numbers) you could use the same "%7.0e" format for all numbers.
Example
#include <stdio.h>
static const char* format = "%7.0e";
int main() {
double a[] = {1, -0.60937, 0.60937, 0.009371, -0.009371,
-1, -1.2e8, 1e-4, 1e-5, -1.5e-321, 0/.0, 1/0.};
for (unsigned i = 0; i < sizeof(a) / sizeof(*a); ++i) {
printf(format, a[i]);
puts("");
if (snprintf(0,0, format, a[i]) != 7)
return 1;
}
}
Output
1e+00
-6e-01
6e-01
9e-03
-9e-03
-1e+00
-1e+08
1e-04
1e-05
-2e-321
nan
inf

C++ count the number of digits of a double

i want to do what the title says like this:
int number1;
cin>>number1;
num1len=log10(number1)+1;
cout<<"num of digits is "<<num1len<<"\n";
but when the number of digits is 11 and more the answer is always 7(6+1)
Does anyone knows why or what im i doing wrong?
Floating-point data types, including double, store approximations. What you're finding by calling log10 is the number of places to the left of the decimal point, which is affected by at most one by the approximation process.
The question you asked, how to find the number of decimal digits in a number stored in binary floating-point, is meaningless. The number 7.1 has two decimal digits, however its approximate floating-point representation doesn't use decimal digits at all. To preserve the number of decimal digits, you'd need some decimal representation, not the C++ double data type.
Of course, all of this is applicable only to double, per the question title. Your code snippet doesn't actually use double.
What is 'wrong' is the maximum value which can be stored in a (signed) int :
#include <iostream>
#include <numeric>
int main()
{
std::cout << std::numeric_limits<int>::max() << std::endl;
}
Gives me :
2147483647
You are running past the unsigned 32-bit boundary ... your number of 11 digits or more exceeds 0xFFFFFFFF, and so wraps around.
You need to use either unsigned long long or double for your number1 variable:
#include <iostream>
#include <cstdlib>
#include <cmath>
int
main ( int argc, char * argv[] )
{
unsigned long long num; // or double, but note comments below
std::cin >> num;
std::cout << "Number of digits in " << num << " is " << ( (int) std::log10 ( num ) + 1 ) << std::endl;
return 0;
}
Those large numbers will print in scientific notation by default when you send them to std::cout if you choose to use double as your data type, so you would want to throw some formatting in there. If you use an unsigned long long instead, they will print as they were entered, but you have to be sure that your platform supports unsigned long long.
EDIT: As mentioned by others, use of floating point values has other implications to consider, and is most likely not what you are ultimately trying to achieve. AFAIK, the integral type on a platform that yields the largest positive value is unsigned long long, so depending on the values you are looking to work with, see if that is available to you for use.
Others have pointed out that floating point numbers are approximations, so you can't really get an accurate count of digits in it.
But...you can get something approximate, by writing it out to a std::stringstream object, then converting it to a std::string, and getting the lenght of the said string. You'll of course have to deal with the fact that there may be non-digit characters in the string (like minus sign, decimal point, E for exponent etc). Also the number of digits you obtain in this manner would be dependent on formatting options you choose when writing to the stringstream object. But assuming that you know what formatting options you'd like to use, you can get the number of digits subject to these options.

Maximum Width of a Printed Double in C++

I was wondering, how long in number of characters would the longest a double printed using fprintf be? My guess is wrong.
Thanks in advance.
Twelve would be a bit of an underestimate. On my machine, the following results in a 317 character long string:
#include <limits>
#include <cstdio>
#include <cstring>
int main()
{
double d = -std::numeric_limits<double>::max();
char str[2048] = "";
std::sprintf(str, "%f", d);
std::size_t length = std::strlen(str);
}
Using %e results in a 14 character long string.
Who knows. The Standard doesn't say how many digits of precision a double provides other than saying it (3.9.1.8) "provides at least as much precision as float," so you don't really know how many characters you'll need to sprintf an arbitrary value. Even if you did know how many digits your implementation provided, there's still the question of exponential formatting, etc.
But there's a MUCH bigger question here. Why the heck would you care? I'm guessing it's because you're trying to write something like this:
double d = ...;
int MAGIC_NUMBER = ...;
char buffer[MAGIC_NUMBER];
sprintf(buffer, "%f", d);
This is a bad way to do this, precisely because you don't know how big MAGIC_NUMBER should be. You can pick something that should be big enough, like 14 or 128k, but then the number you picked is arbitrary, not based on anything but a guess that it will be big enough. Numbers like MAGIC_NUMBER are, not suprisingly, called Magic Numbers. Stay away from them. They will make you cry one day.
Instead, there's a lot of ways to do this string formatting without having to care about buffer sizes, digits of precision, etc, that let you just get on with the buisness of programming. Streams is one:
#include <sstream>
double d = ...;
stringstream ss;
ss << d;
string s = ss.str();
cout << s;
...Boost.Format is another:
#include <boost\format\format.hpp>
double d = ... ;
string s = (boost::format("%1%") % d).str();
cout << s;
Its defined in limits:
std::cout << std::numeric_limits<double>::digits << "\n";
std::cout << std::numeric_limits<double>::digits10 << "\n";
Definition:
digits: number of digits (in radix base) in the mantissa
Equivalent to FLT_MANT_DIG, DBL_MANT_DIG or LDBL_MANT_DIG.
digits10: Number of digits (in decimal base) that can be represented without change.
Equivalent to FLT_DIG, DBL_DIG or LDBL_DIG for floating types.
See: http://www.cplusplus.com/reference/std/limits/numeric_limits/
Of course when you print stuff to a stream you can use the stream manipulators to limit the size of the output.
you can decide it by yourself..
double a=1.1111111111111111111111111111111111111111111111111;
printf("%1.15lf\n", a);
return 0;
./a.out
1.111111111111111
you can print more than 12 characters..
If your machine uses IEEE754 doubles (which is fairly widespread now), then the binary precision is 53 bits; The decimal equivalent is approximately 15.95 (calculated via logarithmic conversion), so you can usually rely on 15 decimal digits of precision.
Consult Double precision floating-point format for a brief discussion.
For a much more in-depth study, the canonical paper is What Every Computer Scientist Should Know About Floating-Point Arithmetic. It gets cited here whenever binary floating point discussions pop up, and is worth a weekend of careful reading.