C++ Converting/casting String letters to integer values (ASCII) - c++

I'm currently doing a project where I need to create two data structures that will be used to contain strings. One of them has to be a form of linked list, and I've been adivsed to seperate the words out into seperate lists inside of it for each letter of the alphabet. I'm required to think about efficiency so I have an array of Head pointers of size 26, and am wanting to convert the first character of the word given into an integer so I can put it into the subscript, such as:
//a string called s is passed as a parameter to the function
int i = /*some magic happens here*/ s.substr(0, 1);
currentPointer = heads[i]; //then I start iterating through the list
I've been searching around and all I've seemd to have found is how to covert number characters that are in strings into integers, and not letter characters, and am wondering how on earth I can get this working without resorting to a huge and ugly set of if statements

When you are setting i to the value of the first character, you are getting the ASCII value.
So i is out of your 0-25 range : See man ascii
You can reduce it by substraction of the first alaphabet ascii letter. (Be care full with the case)
std::string s("zoro");
int i = s[0];
std::cout << "Ascii value : " << i << " Reduced : " << i - 'a' << std::endl;
Which produce the ASCII value 'z' = 112 and 25 for the reduced value, as expected.

I think you are confusing values with representations. "Ten" "10" and "1 1 1 1 1 1 1 1 1 1" are all the same value, just represented differently.
I've been searching around and all I've seemd to have found is how to covert number characters that are in strings into integers, and not letter characters
There's no difference. Characters are always represented by integers anyway. It's just a matter of representation. Just present the value the way you want.
By the way, this is a key concept programmers have to understand. So it's worth spending some time thinking about it.
A classic example of this misunderstanding is a question like "I have a variable i that has some value in decimal. How can I make it store a value in hex?" Of course that makes no sense, it stores values and hex and decimal are representations. If you have ten cars, you have ten in cars, not in decimal or hex. If i has the value ten, then the value ten is in i, not a representation of ten in decimal or hex.
Of course, when you display the value stored in i, you have to choose how to represent it. You can display it as ten, or 10, | | | | | | | | | | |, or whatever.
And you might have a string that has a representation of the value "ten" in hex, and you might need to assign that value to a variable. That requires converting from a representation to the value it represents.
There are input and output functions that input and out values in various representations.

I suspect you want to convert digits stored in strings as characters to integers, e.g. character '9' to integer 9.
In order to do so:
char c = '9';
int x = c - '0';
That will work regardless of whether you have a computer using ASCII or EBCDIC...

In this case, you don't seem to need atoi or itoa (neither is going to do anything very useful with, for example, J). You just want something like:
int i = tolower(s[0])-'a';
In theory that's not portable -- if there's any chance of the code being used on a machine that uses EBCDIC (i.e., an IBM or compatible mainframe) you'll want to use something like 'z'-'a' as the size of your array, since it won't be exactly 26 (EBCDIC includes some other characters inserted between some letters, so the letters are in order but not contiguous).
Probably more importantly, if you want to support languages other than English, things change entirely in a hurry -- you might have a different number of letters than 26, they might not all be contiguous, etc. For such a case your basic design is really the problem. Rather than fixing that one line of code, you probably need to redesign almost completely.
As an aside, however, there's a pretty good chance that a linked list isn't a very good choice here.

Related

Error in getting ASCII of character in C++

I saw this question : How to convert an ASCII char to its ASCII int value?
The most voted answer (https://stackoverflow.com/a/15999291/14911094) states the solution as :
Just do this:
int(k)
But i am having issues with this.
My code is :
std::cout << char(144) << std::endl;
std::cout << (int)(char(144)) << std::endl;
std::cout << int('É') << std::endl;
Now the output comes as :
É
-112
-55
Now i can understand the first line but what is happening for the second an the third lines?
Firstly how can some ASCII be negative and secondly how can that be different for the same character.
Also as far as i have tested this is not some random garbage from the memory as this stays same for every time i run the program also :
If i change it to 145 :
æ
-111
The output to changes by 1 so as far as i guess this may due to some kind of overflow.
But i cannot get it exactly as i am converting to int and that should be enough(4 bytes) to store the result.
Can any one suggest a solution?
If your platform is using ASCII for the character encoding (most do these days), then bear in mind that ASCII is only a 7 bit encoding.
It so happens that char is a signed type on your platform. (The signedness or otherwise of char doesn't matter for ASCII as only the first 7 bits are required.)
Hence char(144) gives you a char with a value of -112. (You have a 2's complement char type on your platform: from C++14 you can assume that, but you can't in C).
The third line implies that that character (which is not in the ASCII set) has a value of -55.
int(unsigned char('É'))
would force it to a positive value on all but the most exotic of platforms.
The C++ standard only guarantees that characters in the basic execution character set1 have non-negative encodings. Characters outside that basic set may have negative encodings - it depends on the locale.
Upper- and lowercase Latin alphabet, decimal digits, most punctuation, and control characters like tab, newline, form feed, etc.

Converting char to int in C

I have searched for this for quite some time before posting this question. The answer to it should be fairly easy though, since I am an ultra-beginner atm.
I have a char* in which I want a user to put some digits (over 20), that in turn can be called upon specifically.
This is what I've tried:
char* digits = GetString();
int prime = digits[0];
When I verify whether this worked with printf I find prime to have become 0.
printf("prime:%d\ndigits:%c\n",prime, digits[0]);
Why would this be and what could I do to make this work?
Edit: Is it perhaps easier to make an int array and use GetLongLong?
Neither C or C++ guarantees what value will be used to encode the character 0, but both guarantee that digits will be contiguous and ordered, so (for example) digits[0]-48 may or may not work, but digits[0] - '0' is guaranteed to work (presuming that digits[0] actually holds a digit, of course).
The precise requirement in the C++ standard (§2.3/3) is:
In both the source and execution basic character sets, the value of each character after 0 in the
above list of decimal digits shall be one greater than the value of the previous.
At least as of C99, the C standard has identical wording, but at §5.2.1/3.
The character zero ('0') has the numeric value of 48, '1' is 49, and so on.
You may find this a useful idiom to get the numeric value from the ascii value.
int prime = digits[0] - '0';
You may also find looking at man ascii informative (or similar man page if you use some other charset).

Unsigned byte for 10 conflicts with newline

Is there a way to differentiate the first value (which is a number 10 saved as an unsigned char) from the newline character in the following demo code?
int main() {
unsigned char ch1(10), ch2('\n');
std::cout << (int)ch1 << " " << (int)ch2 << std::endl;
}
The output is
10 10
I want to write to a file such characters as unsigned bytes, but also want the newline character be distinguishable from a number '10' when read at a later time.
Any suggestions?
regards,
Nikhil
There is no way. You write the same byte, and preserve no other information.
You need to think of other way of encoding you values, or reserve one value for your sentinel (like 255 or 0). Of course, you need to be sure, that this value is not present in your input.
Other possibility it to use one byte-value as 'special' character to escape your control codes. Similar as '\' is used to give special meaning to 'n' in '\n'. But it makes all parsing more complicated, as your values may be now one- or two-byte long. Unless you are under tight pressure memory-wise, I would advice to store values as their string representation, this is usually more readable.
No, a char of value 10 is a newline. Take a look at an ascii table. You'll see that the number 10 would be two different chars (49 and 48, respectively).

Possible work around "Invalid Octal Digit" in char when typing Alt Keys

I am writing a program that executes the quadratic formula. My only problem is the actual formatting of the program. The alt keys that allow me to type the plus-minus sign and square root symbol are giving me some problems.
The problem exists within
cout<< 0-b << char(241) << char(251) << char(0178);
The last char to type the squared symbol (²) reports the invalid octal digit error. Is there a way around this or will i have to satisfy by simply writing " x^2 " ?
You should just remove the leading 0 from 0178. A leading zero on a numeric constant is automatically treated as octal and 8 is not a valid octal digit.
In addition, the superscript-2 character you're referring to is decimal 178, U+00B2. Another way would be to just use '\xb2' in your code.
Of course, you also have to be certain that whatever is interpreting that output stream knows about the Unicode characters that you're trying to output. This probably depends on your terminal program or console. If it doesn't, you may have to resort to hacks like (x^2) or, even worse, monstrosities like:
3 2
3x - 7x + 42x - 1
y = -------------------
12

Is this an acceptable use of "ASCII arithmetic"?

I've got a string value of the form 10123X123456 where 10 is the year, 123 is the day number within the year, and the rest is unique system-generated stuff. Under certain circumstances, I need to add 400 to the day number, so that the number above, for example, would become 10523X123456.
My first idea was to substring those three characters, convert them to an integer, add 400 to it, convert them back to a string and then call replace on the original string. That works.
But then it occurred to me that the only character I actually need to change is the third one, and that the original value would always be 0-3, so there would never be any "carrying" problems. It further occurred to me that the ASCII code points for the numbers are consecutive, so adding the number 4 to the character "0", for example, would result in "4", and so forth. So that's what I ended up doing.
My question is, is there any reason that won't always work? I generally avoid "ASCII arithmetic" on the grounds that it's not cross-platform or internationalization friendly. But it seems reasonable to assume that the code points for numbers will always be sequential, i.e., "4" will always be 1 more than "3". Anybody see any problem with this reasoning?
Here's the code.
string input = "10123X123456";
input[2] += 4;
//Output should be 10523X123456
From the C++ standard, section 2.2.3:
In both the source and execution basic character sets, the value of each character after 0 in the
above list of decimal digits shall be one greater than the value of the previous.
So yes, if you're guaranteed to never need a carry, you're good to go.
The C++ language definition requres that the code-point values of the numerals be consecutive. Therefore, ASCII Arithmetic is perfectly acceptable.
Always keep in mind that if this is generated by something that you do not entirely control (such as users and third-party system), that something can and will go wrong with it. (Check out Murphy's laws)
So I think you should at least put on some validations before doing so.
It sounds like altering the string as you describe is easier than parsing the number out in the first place. So if your algorithm works (and it certainly does what you describe), I wouldn't consider it premature optimization.
Of course, after you add 400, it's no longer a day number, so you couldn't apply this process recursively.
And, <obligatory Year 2100 warning>.
Very long time ago I saw some x86 processor instructions for ASCII and BCD.
Those are AAA (ASCII Adjust for Addition), AAS (subtraction), AAM (mult), AAD (div).
But even if you are not sure about target platform you can refer to specification of characters set you are using and I guess you'll find that first 127 characters of ASCII is always have the same meaning for all characters set (for unicode that is first characters page).