Error subtracting hex constant when it ends in an 'E' [duplicate] - c++

This question already has an answer here:
Why doesn't "0xe+1" compile?
(1 answer)
Closed 8 months ago.
int main()
{
0xD-0; // Fine
0xE-0; // Fails
}
This second line fails to compile on both clang and gcc. Any other hex constant ending is ok (0-9, A-D, F).
Error:
<source>:4:5: error: unable to find numeric literal operator 'operator""-0'
4 | 0xE-0;
| ^~~~~
I have a fix (adding a space after the constant and before the subtraction), so I'd mainly like to know why? Is this something to do with it thinking there's an exponent here?
https://godbolt.org/z/MhGT33PYP

Actually, this behaviour is mandated by the C++ standard (and documented), as strange as it may seem. This is because of how C++ compiles using Preprocessing Tokens (a.k.a pp-tokens).
If we look closely at how the compiler generates a token for numbers:
A preprocessing number is made up of a digit, optionally preceded by a period, and may be followed by letters, underscores, digits, periods, and any one of: e+ e- E+ E-.
According to this, The compiler reads 0x, then E-, which it interprets it as part of the number as having E- is allowed in a numeral pp-token and no space precedes it or is in between the E and the - (this is why adding a space is an easy fix).
This means that 0xE-0 is taken in as a single preprocessing token. In other words, the compiler interprets it as one number, instead of two numbers 0xE and 0 and an operation -. Therefore, the compiler is expecting E to represent an exponent for a floating-point literal.
Now let's take a look at how C++ interprets floating-point literals. Look at the section under "Examples". It gives this curious code sample:
<< "\n0x1p5 " << 0x1p5 // double
<< "\n0x1e5 " << 0x1e5 // integer literal, not floating-point
E is interpreted as part of the integer literal, and does not make the number a hexadecimal floating literal! Therefore, the compiler recognizes 0xE as a single, integer, hexidecimal number. Then, there is the -0 which is technically part of the same preprocessing token and therefore is not an operator and another integer. Uh oh. This is now invalid, as there is no -0 suffix.
And so the compiler reports an error, as such.

Related

Visual Studio C++ C2022. Too big for character error occurs when trying to print a Unicode character

When I try to print a Unicode character to console. Visual Studio gives me an error. How do I fix this and get Visual Studio to print the Unicode character?
#include <iostream>
int main() {
std::cout << "\x2713";
return 0;
}
Quite simply, \x2713 is too large for a single character. If you wanted two characters, you need to do \x27\x13, if you wanted the wide character, then you need to prefix with L, i.e. L"\x2713", then use std::wcout instead of std::cout.
Note, from the C++20 standard (draft) [lex.ccon]/7 (emphasis mine):
The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character-literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character-literals with no prefix) or wchar_t (for character-literals prefixed by L).
Essentially, the compiler may treat that character how it wants; g++ issues a warning, and MSVC (for me) is a compiler error (clang also treats as an error)
\xNNN (any positive number of hex digits) means a single byte whose value is given by NNN; unless in a string literal or character literal prefixed by L in which case it means a wchar_t whose value is given by NNN.
If you are looking to encode a Unicode code point, the the syntax is \uNNNN (exactly 4 digits) or \UNNNNNNNN (exactly 8 digits). Note that this is the code point, not a UTF representation.
Using the u or U forms instead of L avoids portability problems due to wchar_t having different size on different platforms.
To get well-defined behaviour you can manually specify the encoding of a string literal, e.g.:
std::cout << u8"\u2713" << std::endl;
which will encode the code point as UTF-8. Of course you still need a UTF-8 aware terminal to see meaningful output.
Without the encoding prefix then it is up to the compiler (I think) in what way to encode the code point.
See:
Escape sequences
String literal

Error in getting ASCII of character in C++

I saw this question : How to convert an ASCII char to its ASCII int value?
The most voted answer (https://stackoverflow.com/a/15999291/14911094) states the solution as :
Just do this:
int(k)
But i am having issues with this.
My code is :
std::cout << char(144) << std::endl;
std::cout << (int)(char(144)) << std::endl;
std::cout << int('É') << std::endl;
Now the output comes as :
É
-112
-55
Now i can understand the first line but what is happening for the second an the third lines?
Firstly how can some ASCII be negative and secondly how can that be different for the same character.
Also as far as i have tested this is not some random garbage from the memory as this stays same for every time i run the program also :
If i change it to 145 :
æ
-111
The output to changes by 1 so as far as i guess this may due to some kind of overflow.
But i cannot get it exactly as i am converting to int and that should be enough(4 bytes) to store the result.
Can any one suggest a solution?
If your platform is using ASCII for the character encoding (most do these days), then bear in mind that ASCII is only a 7 bit encoding.
It so happens that char is a signed type on your platform. (The signedness or otherwise of char doesn't matter for ASCII as only the first 7 bits are required.)
Hence char(144) gives you a char with a value of -112. (You have a 2's complement char type on your platform: from C++14 you can assume that, but you can't in C).
The third line implies that that character (which is not in the ASCII set) has a value of -55.
int(unsigned char('É'))
would force it to a positive value on all but the most exotic of platforms.
The C++ standard only guarantees that characters in the basic execution character set1 have non-negative encodings. Characters outside that basic set may have negative encodings - it depends on the locale.
Upper- and lowercase Latin alphabet, decimal digits, most punctuation, and control characters like tab, newline, form feed, etc.

Valid syntax of calling pseudo-destructor for a floating constant

Consider the following demonstrative program.
#include <iostream>
int main()
{
typedef float T;
0.f.T::~T();
}
This program is compiled by Microsoft Visual Studio Community 2019.
But clang and gcc issue an error like this
prog.cc:7:5: error: unable to find numeric literal operator 'operator""f.T'
7 | 0.f.T::~T();
| ^~~~~
If to write the expression like ( 0.f ).T::~T() then all three compilers compile the program.
So a question arises: is this record 0.f.T::~T() syntactically valid? And if not, then what syntactical rule is broken?
The parsing of numerical tokens is quite crude, and allows many things that aren't actually valid numbers. In C++98, the grammar for a "preprocessing number", found in [lex.ppnumber], is
pp-number:
digit
. digit
pp-number digit
pp-number nondigit
pp-number e sign
pp-number E sign
pp-number .
Here, a "nondigit" is any character that can be used in an identifier, other than digits, and a "sign" is either + or -. Later standards would expand the definition to allow single quotes (C++14), and sequences of the form p-, p+, P-, P+ (C++17).
The upshot is that, in any version of the standard, while a preprocessing number is required to start with a digit, or a period followed by a digit, after that an arbitrary sequence of digits, letters, and periods may follow. Using the maximal munch rule, it follows that 0.f.T::~T(); is required to be tokenized as 0.f.T :: ~ T ( ) ;, even though 0.f.T isn't a valid numerical token.
Thus, the code is not syntactically valid.
A user defined literal suffix, ud-suffix, is an identifier. An identifier is a sequence of letters (including some non-ASCII characters), the underscore, and numbers that does not start with a number. The period character is not included.
Therefore it is a compiler bug as it is treating the non-identifier sequence f.T as an identifier.
The 0. is a fractional-constant, which can be followed by an optional exponent, then either a ud-suffix (for a user defined literal) or a floating-point-suffix (one of fFlL). The f can be considered a ud-suffx as well, but since it matches another literal type it should be that and not the UDL. A ud-suffix is defined in the grammar as an identifier.

C++ - Reading a double, followed by a character, from cin

I'm trying to read a double, followed by a character, from cin using the snippet:
double d;
char c;
while(1) {
cin >> d >> c;
cout << d << c << endl;
}
The peculiar thing is that it works for some characters, but not for others. For example, it works for "2g", "2h", but fails for "2a", "2b", "2x" ...:
mwmbp:ppcpp mwisse$ ./a.out
2a
0
2b
0
2c
0
2g
2g
2h
2h
2i
0h
2x
0h
2z
2z
As pointed out by one of you, it does indeed work for integers. Do you know why it doesn't work for doubles? I have as yet been unable to find information on how cin interprets its input.
This is currently a bug on LLVM: https://llvm.org/bugs/show_bug.cgi?id=17782
Way back in 2014 it was assigned from Howard Hinnant to Marshall Clow since then... Well don't hold you breath on this getting fixed any time soon.
EDIT:
The istream extraction operator internally uses num_get::do_get Which sequentially performs these tasks for a double:
Selects a conversion specifier, for double that's %lg
Tests for an empty input stream
Checks if the next character in the string is contained in the ctype or numpunct facets
If scanf would allow the character obtained from 3 to be appended to the input field given the conversion specifier obtained in 1, if so 3 is repeated if not 5 is performed on the input field without this character
The double from the accepted input field is read in with
scanf prior to c++11
strtold in c++11 and c++14
strtod onward from c++17
If 5 fails failbit is assigned to the istream's iostate, but if 5 succeeded, the result is assigned to the double
If any thousands separators were allowed into the input field by facet numpunct in 3 their position is evaluated, if any of them violate the grouping rules of the facet, failbit is assigned to the istream's iostate
If the input field used in 5 was empty eofbit is assigned to the istream's iostate
That's a lot to say that for a double you're really concerned with scanf's %lg conversion specifier's rules for extraction of a double (which internally will depend upon strtof's constraints):
An optional plus or minus character
One of the following
"INF" or "INFINITY" (case insensitive)
"NAN" (case insensitive)
"0x" or "0X", an input field of hexadecimal digits and optionally a decimal point character, and optionally followed by a "p" or "P" a plus or minus sign and a decimal exponent
An input field of decimal digits and optionally a decimal point character and optionally an "e" or "E" a plus or minus sign and a non-empty exponent
Note that if your locale defines any other expression as an acceptable floating point input field this is also accepted. So if you've added some special sauce to the istream you're working with that may be where the problem lies. Outside of that, neither a trailing "a", "b", or "x" are an accepted suffix for the %lg conversion specifier, so your implementation is not compliant or there's something else you're not telling us.
Here is a live example of your inputs succeeding on gcc5.1 which is compliant: http://ideone.com/nGGW0L
Since the problem is caused by a bug (or feature, depending on your point of view), in libc++, it seems that the easiest way to avoid it is to use libstdc++ instead, until a fix is in place. If you're running on a mac, add -stdlib=libstdc++ to your compile flags. g++ -stdlib=libstdc++ test.cpp will correctly compile the code given in this post.
Libc++ appears to have other, similar, bugs, one of which I posted here: Trying to read lines from an ASCII file using C++ , Ubuntu vs Mac...?, before learning about these different libraries.

Possible work around "Invalid Octal Digit" in char when typing Alt Keys

I am writing a program that executes the quadratic formula. My only problem is the actual formatting of the program. The alt keys that allow me to type the plus-minus sign and square root symbol are giving me some problems.
The problem exists within
cout<< 0-b << char(241) << char(251) << char(0178);
The last char to type the squared symbol (²) reports the invalid octal digit error. Is there a way around this or will i have to satisfy by simply writing " x^2 " ?
You should just remove the leading 0 from 0178. A leading zero on a numeric constant is automatically treated as octal and 8 is not a valid octal digit.
In addition, the superscript-2 character you're referring to is decimal 178, U+00B2. Another way would be to just use '\xb2' in your code.
Of course, you also have to be certain that whatever is interpreting that output stream knows about the Unicode characters that you're trying to output. This probably depends on your terminal program or console. If it doesn't, you may have to resort to hacks like (x^2) or, even worse, monstrosities like:
3 2
3x - 7x + 42x - 1
y = -------------------
12