I have searched for this for quite some time before posting this question. The answer to it should be fairly easy though, since I am an ultra-beginner atm.
I have a char* in which I want a user to put some digits (over 20), that in turn can be called upon specifically.
This is what I've tried:
char* digits = GetString();
int prime = digits[0];
When I verify whether this worked with printf I find prime to have become 0.
printf("prime:%d\ndigits:%c\n",prime, digits[0]);
Why would this be and what could I do to make this work?
Edit: Is it perhaps easier to make an int array and use GetLongLong?
Neither C or C++ guarantees what value will be used to encode the character 0, but both guarantee that digits will be contiguous and ordered, so (for example) digits[0]-48 may or may not work, but digits[0] - '0' is guaranteed to work (presuming that digits[0] actually holds a digit, of course).
The precise requirement in the C++ standard (§2.3/3) is:
In both the source and execution basic character sets, the value of each character after 0 in the
above list of decimal digits shall be one greater than the value of the previous.
At least as of C99, the C standard has identical wording, but at §5.2.1/3.
The character zero ('0') has the numeric value of 48, '1' is 49, and so on.
You may find this a useful idiom to get the numeric value from the ascii value.
int prime = digits[0] - '0';
You may also find looking at man ascii informative (or similar man page if you use some other charset).
Related
This is a homework question I just can't seem to get correct. It is related to a C++ programming course. The question is given below, fill in the blank.
The char and int data types have a direct equivalence, with the value of the int being based on the _________ coding scheme.
The exam has asked a poorly-worded question! char and int do NOT have a direct equivalence - but a char can be interpreted as an int, usually using the "ASCII" coding scheme ("American Standard Code for Information Interchange"). But even that's not universal - there's also EBCDIC and others.
But try "ASCII".
Edit
According to the C standard, the character encoding doesn't have to be ASCII. But there are rules it has to follow:
The repesentations for '0' to '9' must be consecutive and in that order, to make calculations easy when converting to int.
The representations for 'A' to 'Z' must be ascending, to make calculations for sorting easy (note not necessarily consecutive - for example in EBCDIC they're not).
The representations for 'a' to 'z' must also adhere to the above rule, but also the difference between upper and lower case must be the same for every character (note that lower could come before upper).
Sometimes I need to define a char which represents a non-alphanumeric char.
What is the correct way to define its value in C++?
Is using EOF or char_traits<char>::eof() a good choice?
You're reading too much in to the word char.
At the end of the day, it is little more than a size. In this case, 8 bits. Shorts are 16 (and you can wear them on the beach), ints can be 32 or something else, and longs can be 64 (or ints, or a quick conversation with the relevant authorities on the beach as to why you lost both pairs of shorts).
The correct way to define a value in C++ is basically down to what the maximum value that can be held. char_traits::eof() is indeed a good constant, but out of context - means very little.
EOF is not a char value; it's an int value that's returned by some functions to indicate that no valid character data could be obtained. If you're looking for a value to store in a char object, EOF is definitely not a good choice.
If your only requirement is to store some non-alphanumeric value in a char object (and you don't char which), just choose something. Any punctuation character will do.
char example = '*';
char another_example = '?';
char yet_another_example = '\b'; // backspace
This assumes I'm understanding your question correctly. As stated:
Sometimes I need to define a char which represents a non-alphanumeric char.
it's not at all clear what you mean. What exactly do you mean by "represents"? If you're looking for some arbitrary non-alphanumeric character, see above. If you're looking for some arbitrary value that merely indicates that you should have a non-alphanumeric character in some particular place, you can pick anything you like, as long as you use it consistently.
For example, "DD-DD" might be template representing two decimal digits, followed by a hyphen, followed by two more decimal digits -- but only if you establish and follow a convention that says that's what it means.
Please update your question to make it clear what you're asking.
I'm currently doing a project where I need to create two data structures that will be used to contain strings. One of them has to be a form of linked list, and I've been adivsed to seperate the words out into seperate lists inside of it for each letter of the alphabet. I'm required to think about efficiency so I have an array of Head pointers of size 26, and am wanting to convert the first character of the word given into an integer so I can put it into the subscript, such as:
//a string called s is passed as a parameter to the function
int i = /*some magic happens here*/ s.substr(0, 1);
currentPointer = heads[i]; //then I start iterating through the list
I've been searching around and all I've seemd to have found is how to covert number characters that are in strings into integers, and not letter characters, and am wondering how on earth I can get this working without resorting to a huge and ugly set of if statements
When you are setting i to the value of the first character, you are getting the ASCII value.
So i is out of your 0-25 range : See man ascii
You can reduce it by substraction of the first alaphabet ascii letter. (Be care full with the case)
std::string s("zoro");
int i = s[0];
std::cout << "Ascii value : " << i << " Reduced : " << i - 'a' << std::endl;
Which produce the ASCII value 'z' = 112 and 25 for the reduced value, as expected.
I think you are confusing values with representations. "Ten" "10" and "1 1 1 1 1 1 1 1 1 1" are all the same value, just represented differently.
I've been searching around and all I've seemd to have found is how to covert number characters that are in strings into integers, and not letter characters
There's no difference. Characters are always represented by integers anyway. It's just a matter of representation. Just present the value the way you want.
By the way, this is a key concept programmers have to understand. So it's worth spending some time thinking about it.
A classic example of this misunderstanding is a question like "I have a variable i that has some value in decimal. How can I make it store a value in hex?" Of course that makes no sense, it stores values and hex and decimal are representations. If you have ten cars, you have ten in cars, not in decimal or hex. If i has the value ten, then the value ten is in i, not a representation of ten in decimal or hex.
Of course, when you display the value stored in i, you have to choose how to represent it. You can display it as ten, or 10, | | | | | | | | | | |, or whatever.
And you might have a string that has a representation of the value "ten" in hex, and you might need to assign that value to a variable. That requires converting from a representation to the value it represents.
There are input and output functions that input and out values in various representations.
I suspect you want to convert digits stored in strings as characters to integers, e.g. character '9' to integer 9.
In order to do so:
char c = '9';
int x = c - '0';
That will work regardless of whether you have a computer using ASCII or EBCDIC...
In this case, you don't seem to need atoi or itoa (neither is going to do anything very useful with, for example, J). You just want something like:
int i = tolower(s[0])-'a';
In theory that's not portable -- if there's any chance of the code being used on a machine that uses EBCDIC (i.e., an IBM or compatible mainframe) you'll want to use something like 'z'-'a' as the size of your array, since it won't be exactly 26 (EBCDIC includes some other characters inserted between some letters, so the letters are in order but not contiguous).
Probably more importantly, if you want to support languages other than English, things change entirely in a hurry -- you might have a different number of letters than 26, they might not all be contiguous, etc. For such a case your basic design is really the problem. Rather than fixing that one line of code, you probably need to redesign almost completely.
As an aside, however, there's a pretty good chance that a linked list isn't a very good choice here.
I've got a string value of the form 10123X123456 where 10 is the year, 123 is the day number within the year, and the rest is unique system-generated stuff. Under certain circumstances, I need to add 400 to the day number, so that the number above, for example, would become 10523X123456.
My first idea was to substring those three characters, convert them to an integer, add 400 to it, convert them back to a string and then call replace on the original string. That works.
But then it occurred to me that the only character I actually need to change is the third one, and that the original value would always be 0-3, so there would never be any "carrying" problems. It further occurred to me that the ASCII code points for the numbers are consecutive, so adding the number 4 to the character "0", for example, would result in "4", and so forth. So that's what I ended up doing.
My question is, is there any reason that won't always work? I generally avoid "ASCII arithmetic" on the grounds that it's not cross-platform or internationalization friendly. But it seems reasonable to assume that the code points for numbers will always be sequential, i.e., "4" will always be 1 more than "3". Anybody see any problem with this reasoning?
Here's the code.
string input = "10123X123456";
input[2] += 4;
//Output should be 10523X123456
From the C++ standard, section 2.2.3:
In both the source and execution basic character sets, the value of each character after 0 in the
above list of decimal digits shall be one greater than the value of the previous.
So yes, if you're guaranteed to never need a carry, you're good to go.
The C++ language definition requres that the code-point values of the numerals be consecutive. Therefore, ASCII Arithmetic is perfectly acceptable.
Always keep in mind that if this is generated by something that you do not entirely control (such as users and third-party system), that something can and will go wrong with it. (Check out Murphy's laws)
So I think you should at least put on some validations before doing so.
It sounds like altering the string as you describe is easier than parsing the number out in the first place. So if your algorithm works (and it certainly does what you describe), I wouldn't consider it premature optimization.
Of course, after you add 400, it's no longer a day number, so you couldn't apply this process recursively.
And, <obligatory Year 2100 warning>.
Very long time ago I saw some x86 processor instructions for ASCII and BCD.
Those are AAA (ASCII Adjust for Addition), AAS (subtraction), AAM (mult), AAD (div).
But even if you are not sure about target platform you can refer to specification of characters set you are using and I guess you'll find that first 127 characters of ASCII is always have the same meaning for all characters set (for unicode that is first characters page).
How can I find out what the current charset is in C++?
In a console application (WinXP) I am getting negative values for some characters (like äöüé) with
(int)mystring[a]
and this surprises me. I was expecting the values to be between 127 and 256.
So is there something like GetCharset() or SetCharset() in c++?
It depends on how you look at the value you have at hand. char can be signed(e.g. on Windows), or unsigned like on some other systems. So, what you should do is to print the value as unsigned to get what you are asking for.
C++ until now is char-set agnostic. For Windows console specifically, you can use: GetConsoleOutputCP.
Look at std::numeric_limits<char>::min() and max(). Or CHAR_MIN and CHAR_MAX if you don't like typing, or if you need an integer constant expression.
If CHAR_MAX == UCHAR_MAX and CHAR_MIN == 0 then chars are unsigned (as you expected). If CHAR_MAX != UCHAR_MAX and CHAR_MIN < 0 they are signed (as you're seeing).
In the standard 3.9.1/1, ensures that there are no other possibilities: "... a plain char can take on either the same values as a signed char or an unsigned char; which one is implementation-defined."
This tells you whether char is signed or unsigned, and that's what's confusing you. You certainly can't call anything to modify it: from the POV of a program it's baked into the compiler even if the compiler has ways of changing it (GCC certainly does: -fsigned-char and -funsigned-char).
The usual way to deal with this is if you're going to cast a char to int, cast it through unsigned char first. So in your example, (int)(unsigned char)mystring[a]. This ensures you get a non-negative value.
It doesn't actually tell you what charset your implementation uses for char, but I don't think you need to know that. On Microsoft compilers, the answer is essentially that commonly-used character encoding "ISO-8859-mutter-mutter". This means that chars with 7-bit ASCII values are represented by that value, while values outside that range are ambiguous, and will be interpreted by a console or other recipient according to how that recipient is configured. ISO Latin 1 unless told otherwise.
Properly speaking, the way characters are interpreted is locale-specific, and the locale can be modified and interrogated using a whole bunch of stuff towards the end of the C++ standard that personally I've never gone through and can't advise on ;-)
Note that if there's a mismatch between the charset in effect, and the charset your console uses, then you could be in for trouble. But I think that's separate from your issue: whether chars can be negative or not is nothing to do with charsets, just whether char is signed.
chars are normally signed by default.
Try this.
cout << (unsigned char) mystring[a] << endl;
The only gurantee that the standard provides are for members of the basic character set:
2.2 Character sets
3 The basic execution character set
and the basic execution wide-character
set shall each contain all the members
of the basic source character set,
plus control characters representing
alert, backspace, and carriage return,
plus a null character (respectively,
null wide character), whose
representation has all zero bits. For
each basic execution character set,
the values of the members shall be
non-negative and distinct from one
another. In both the source and
execution basic character sets, the
value of each character after 0 in the
above list of decimal digits shall be
one greater than the value of the
previous. The execution character set
and the execution wide-character set
are supersets of the basic execution
character set and the basic execution
wide-character set, respectively. The
values of the members of the execution
character sets are
implementation-defined, and any
additional members are locale-specific
Further, the type char is supposed to hold:
3.9.1 Fundamental types
1 Objects declared as characters (char) shall be large enough to store any member of the
implementation’s basic
character set.
So, no gurantees whethere you will get the correct value for the characters you have mentioned. However, try to use an unsigned int to hold this value (for all practical purposes, it never makes sense to use a signed type to hold char values ever, if you are going to print them/pass around).