This is a homework question I just can't seem to get correct. It is related to a C++ programming course. The question is given below, fill in the blank.
The char and int data types have a direct equivalence, with the value of the int being based on the _________ coding scheme.
The exam has asked a poorly-worded question! char and int do NOT have a direct equivalence - but a char can be interpreted as an int, usually using the "ASCII" coding scheme ("American Standard Code for Information Interchange"). But even that's not universal - there's also EBCDIC and others.
But try "ASCII".
Edit
According to the C standard, the character encoding doesn't have to be ASCII. But there are rules it has to follow:
The repesentations for '0' to '9' must be consecutive and in that order, to make calculations easy when converting to int.
The representations for 'A' to 'Z' must be ascending, to make calculations for sorting easy (note not necessarily consecutive - for example in EBCDIC they're not).
The representations for 'a' to 'z' must also adhere to the above rule, but also the difference between upper and lower case must be the same for every character (note that lower could come before upper).
Related
What is the order followed by comparisons between characters in C++? I have noticed that 'z' > '1'. I am trying to find a link to the order rationale C++ follows for all characters, or any generic material reference in case this is a widely known order (similar to alphabetical order for lowercase letters).
Each character in any programming language corresponds to an ASCII value. Just check this table, it will solve all of your doubts on how characters are evaluated https://www.ascii-code.com/
Only '0' to '9' is guaranteed to be encoded consecutively. Other characters, according language specification, would be implementation specified. However almost all x86 c compilers use the ASCII code.
So I realize that assuming ascii encoding can get you in trouble, but I'm never really sure how much trouble you can have subtracting characters. I'd like to know what relatively common scenarios can cause any of the following to evaluate to false.
Given:
std::string test = "B";
char m = 'M';
A) (m-'A')==12
B) (test[0]-'D') == -2
Also, does the answer change for lowercase values (changing the 77 to 109 ofc)?
Edit: Digit subtraction answers this question for char digits, by saying the standard says '2'-'0'==2 must hold for all digits 0-9, but I want to know if it holds for a-z and A-Z, which section 2.3 of the standard is unclear on in my reading.
Edit 2: Removed ASCII specific content, to focus question more clearly (sorry #πάντα-ῥεῖ for a content changing edit, but I feel it is necessary). Essentially the standard seems to imply some ordering of characters for the basic set, but some encodings do not maintain that ordering, so what's the overriding principle?
In other words, when are chars in C/C++ not stored in ASCII?
C or C++ language don't have any notion of the actual character coding table used by the target system. The only convention is that character literals like 'A' match the current encoding.
You could as well deal with EBCDIC encoded characters and the code looks the same as for ASCII characters.
I have searched for this for quite some time before posting this question. The answer to it should be fairly easy though, since I am an ultra-beginner atm.
I have a char* in which I want a user to put some digits (over 20), that in turn can be called upon specifically.
This is what I've tried:
char* digits = GetString();
int prime = digits[0];
When I verify whether this worked with printf I find prime to have become 0.
printf("prime:%d\ndigits:%c\n",prime, digits[0]);
Why would this be and what could I do to make this work?
Edit: Is it perhaps easier to make an int array and use GetLongLong?
Neither C or C++ guarantees what value will be used to encode the character 0, but both guarantee that digits will be contiguous and ordered, so (for example) digits[0]-48 may or may not work, but digits[0] - '0' is guaranteed to work (presuming that digits[0] actually holds a digit, of course).
The precise requirement in the C++ standard (§2.3/3) is:
In both the source and execution basic character sets, the value of each character after 0 in the
above list of decimal digits shall be one greater than the value of the previous.
At least as of C99, the C standard has identical wording, but at §5.2.1/3.
The character zero ('0') has the numeric value of 48, '1' is 49, and so on.
You may find this a useful idiom to get the numeric value from the ascii value.
int prime = digits[0] - '0';
You may also find looking at man ascii informative (or similar man page if you use some other charset).
Sometimes I need to define a char which represents a non-alphanumeric char.
What is the correct way to define its value in C++?
Is using EOF or char_traits<char>::eof() a good choice?
You're reading too much in to the word char.
At the end of the day, it is little more than a size. In this case, 8 bits. Shorts are 16 (and you can wear them on the beach), ints can be 32 or something else, and longs can be 64 (or ints, or a quick conversation with the relevant authorities on the beach as to why you lost both pairs of shorts).
The correct way to define a value in C++ is basically down to what the maximum value that can be held. char_traits::eof() is indeed a good constant, but out of context - means very little.
EOF is not a char value; it's an int value that's returned by some functions to indicate that no valid character data could be obtained. If you're looking for a value to store in a char object, EOF is definitely not a good choice.
If your only requirement is to store some non-alphanumeric value in a char object (and you don't char which), just choose something. Any punctuation character will do.
char example = '*';
char another_example = '?';
char yet_another_example = '\b'; // backspace
This assumes I'm understanding your question correctly. As stated:
Sometimes I need to define a char which represents a non-alphanumeric char.
it's not at all clear what you mean. What exactly do you mean by "represents"? If you're looking for some arbitrary non-alphanumeric character, see above. If you're looking for some arbitrary value that merely indicates that you should have a non-alphanumeric character in some particular place, you can pick anything you like, as long as you use it consistently.
For example, "DD-DD" might be template representing two decimal digits, followed by a hyphen, followed by two more decimal digits -- but only if you establish and follow a convention that says that's what it means.
Please update your question to make it clear what you're asking.
How can I find out what the current charset is in C++?
In a console application (WinXP) I am getting negative values for some characters (like äöüé) with
(int)mystring[a]
and this surprises me. I was expecting the values to be between 127 and 256.
So is there something like GetCharset() or SetCharset() in c++?
It depends on how you look at the value you have at hand. char can be signed(e.g. on Windows), or unsigned like on some other systems. So, what you should do is to print the value as unsigned to get what you are asking for.
C++ until now is char-set agnostic. For Windows console specifically, you can use: GetConsoleOutputCP.
Look at std::numeric_limits<char>::min() and max(). Or CHAR_MIN and CHAR_MAX if you don't like typing, or if you need an integer constant expression.
If CHAR_MAX == UCHAR_MAX and CHAR_MIN == 0 then chars are unsigned (as you expected). If CHAR_MAX != UCHAR_MAX and CHAR_MIN < 0 they are signed (as you're seeing).
In the standard 3.9.1/1, ensures that there are no other possibilities: "... a plain char can take on either the same values as a signed char or an unsigned char; which one is implementation-defined."
This tells you whether char is signed or unsigned, and that's what's confusing you. You certainly can't call anything to modify it: from the POV of a program it's baked into the compiler even if the compiler has ways of changing it (GCC certainly does: -fsigned-char and -funsigned-char).
The usual way to deal with this is if you're going to cast a char to int, cast it through unsigned char first. So in your example, (int)(unsigned char)mystring[a]. This ensures you get a non-negative value.
It doesn't actually tell you what charset your implementation uses for char, but I don't think you need to know that. On Microsoft compilers, the answer is essentially that commonly-used character encoding "ISO-8859-mutter-mutter". This means that chars with 7-bit ASCII values are represented by that value, while values outside that range are ambiguous, and will be interpreted by a console or other recipient according to how that recipient is configured. ISO Latin 1 unless told otherwise.
Properly speaking, the way characters are interpreted is locale-specific, and the locale can be modified and interrogated using a whole bunch of stuff towards the end of the C++ standard that personally I've never gone through and can't advise on ;-)
Note that if there's a mismatch between the charset in effect, and the charset your console uses, then you could be in for trouble. But I think that's separate from your issue: whether chars can be negative or not is nothing to do with charsets, just whether char is signed.
chars are normally signed by default.
Try this.
cout << (unsigned char) mystring[a] << endl;
The only gurantee that the standard provides are for members of the basic character set:
2.2 Character sets
3 The basic execution character set
and the basic execution wide-character
set shall each contain all the members
of the basic source character set,
plus control characters representing
alert, backspace, and carriage return,
plus a null character (respectively,
null wide character), whose
representation has all zero bits. For
each basic execution character set,
the values of the members shall be
non-negative and distinct from one
another. In both the source and
execution basic character sets, the
value of each character after 0 in the
above list of decimal digits shall be
one greater than the value of the
previous. The execution character set
and the execution wide-character set
are supersets of the basic execution
character set and the basic execution
wide-character set, respectively. The
values of the members of the execution
character sets are
implementation-defined, and any
additional members are locale-specific
Further, the type char is supposed to hold:
3.9.1 Fundamental types
1 Objects declared as characters (char) shall be large enough to store any member of the
implementation’s basic
character set.
So, no gurantees whethere you will get the correct value for the characters you have mentioned. However, try to use an unsigned int to hold this value (for all practical purposes, it never makes sense to use a signed type to hold char values ever, if you are going to print them/pass around).