What does str[i] - 'a' mean? [duplicate] - c++

This question already has answers here:
What's the meaning of subtracting 'a' from elements of char array
(5 answers)
Closed 1 year ago.
If we were to be iterating through a string str, and inside the for loop was a line like str[i] - 'a'... what does that exactly mean? str[i] would be returning a character from the string str, and then we would be subtracting 'a' from it? I'm just confused by that.

Assuming ASCII (or any encoding in which lowercase letters are in a compact sequence) it is the number (starting at 0) of the lowercase letter in str[i] (the ith position in str).
Very bad coding, unless you are absolutely sure it is a lowercase letter, and not e.g. LATIN-1 or some other 8-bit encoding in which you have lowercase letters like 'ñ' or 'á' or even 'ß' which are strewn about the high --first bit one-- space).

According to this The bracket operator [] of the std::string type returns a reference to a char.
Since (signed) char can be natively interpreted by the compiler as an 8-bit signed integer in the range -128 to 127, then all due mathematical operations can be made.
An example of common use AFAIK: subtracting or operating with chars on a (text) string can be used for adding salt to some cryptography algorithm.

Related

C++ Turning Character types into int type

So I read and was taught that subtracting '0' from my given character turns it into an int, however my Visual Studio isn't recognizing that here, saying a value of type "const char*" cannot be used to initialize an entity of type int in C++ programming here.
bigint::bigint(const char* number) : bigint() {
int number1 = number - '0'; // error code
for (int i = 0; number1 != 0 ; ++i)
{
digits[i] = number1 % 10;
number1 /= 10;
digits[i] = number1;
}
}
The goal of the first half is to simply turn the given number into a type int. The second half is outputting that number backwards with no leading zeroes. Please note this function is apart of the class declared given in a header file here:
class bigint {
public:
static const int MAX_DIGITS = 50;
private:
int digits[MAX_DIGITS];
public:
// constructors
bigint();
bigint(int number);
bigint(const char * number);
}
Is there any way to convert the char parameter to an int so I can then output an int? Without using the std library or strlen, since I know there is a way to use the '0' char but I can't seem to be doing it right.
You can turn a single character in the range '0'..'9' into a single digit 0..9 by subtracting '0', but you cannot turn a string of characters into a number by subtracting '0'. You need a parsing function like std::stoi() to do the conversion work character-by-character.
But that's not what you need here. If you convert the string to a number, you then have to take the number apart. The string is already in pieces, so:
bigint::bigint(const char* number) : bigint() {
while (number) // keep looping until we hit the string's null terminator
{
digits[i] = number - '0'; // store the digit for the current character
number++; // advance the string to the next character
}
}
There could be some extra work involved in a more advanced version, such as sizing digits appropriately to fit the number of digits in number. Currently we have no way to know how many slots are actually in use in digits, and this will lead to problems later when the program has to figure out where to stop reading digits.
I don't know what your understanding is, so I will go over everything I see in the code snippet.
First, what you're passing to the function is a pointer to a char, with const keyword making the char immutable or "read only" if you prefer.
A char is actually a 8-bit sized 1 integer. It can store a numerical value in binary form, which can be also interpreted as a character.
Fundamental types - cppreference.com
Standard also expects char to be a "type for character representation". It could be represented in ASCII code, but it could be something else like EBCDIC maybe, I'm not sure. For future reference just remember that ASCII is not guaranteed, although you're likely to never use a system where it's no ASCII (if I'm correct). But it's not so much that char is somehow enforcing encoding - it's the functions that you pass those chars and char pointers to, that interpret their content as characters in ASCII encoding, while on some obscure or legacy platforms they could actually interpret them as characters in some less common encoding. Standard however demands that encoding used has this property: codes for characters '0' to '9' are subsequent, and thus '9' - '0' means: subtract code of '0' from code of '9'. The result is 9, because code for '9' is 9 positions from code for '0' in ASCII. Ranges 'a'-'z' and 'A'-'Z' have this quality as well, in case you need that, but it's a little bit trickier if your input is in base higher than 10, like a popular base of 16 called hexadecimal.
A pointer stores an address, so the most basic functionality for it is to "point" to a variable. But it can be used in various ways, one of which, very frequent in C, is to store address of the beginning of an array of variables of the same type. Those could be chars. We could interpret such an array as a line of text, or a string (a concept, not to be confused with C++ specific string class).
Since a pointer does not contain information on length or end of such an array, we need to get that information across to the function we pass the pointer to. Sometimes we can just provide the length, sometimes we provide the end pointer. When dealing with "lines of text" or c-style strings, we use (and c standard library functions expect) what is callled a null-terminated string. In such a string, the first char after the last one used for a line is a null, which is, to simplify, basically a 0. A 0, but not a '0'.
So what you're passing to the function, and what you interpret as, say 416, is actually a pointer to a place in memory where '4' is econded and stored as a number, followed by '1' and then '6', taking up three bytes. And depending on how you obtained this line of text, '6' is probably followed by a NULL, that is - a zero.
NULL - cppreference.com
Conversion of such a string to a number first requires a data type able to hold it. In case of 416 it could be anything from short upwards. If you wanted to do that on your own, you would need to iterate over entire line of text and add the numbers multiplied by proper powers of 10, take care of signedness too and maybe check if there are any edge cases. You could however use a standard function like int atoi (const char * str);
atoi - cplusplus.com
Now, that would be nice of course, but you're trying to work with "bigints". However you define them, it means your class' purpose is to deal with numbers to big to be stored in built-in types. So there is no way you can convert them just like that.
What you're trying to do right now seems to be a constructor that creates a bigint out of number represented as a c style string. How shall I put it... you want to store your bigint internally as an array of it's digits in base 10 (a good choice for code simplicity, readability and maintainability, as well as interoperation with base 10 textual representation, but it doesn't make efficient use of memory and processing power.) and your input is also an array of digits in base 10, except internally you're storing numbers as numbers, while your input is encoded characters. You need to:
sanitize the input (you need criteria for what kind of input is acceptable, fe. if there can be any leading or trailing whitespace, can the number be followed by any non-numerical characters to be discarded, how to represent signedness, is + for positive numbers optional or forbidden etc., throw exception if the input is invalid.
convert whatever standard you enforce for your input into whatever uniform standard you employ internally, fe. strip leading whitespace, remove + sign if it's optional and you don't use it internally etc.
when you know which positions in your internal array correspond with which positions in the input string, you can iterate over it and copy every number, decoding it first from ASCII.
A side note - I can't be sure as to what exactly it is that you expect your input to be, because it's only likely that it is a textual representation - as it could just as easily be an array of unencoded chars. Of course it's obviously the former, which I know because of your post, but the function prototype (the line with return type and argument types) does not assure anyone about that. Just another thing to be aware of.
Hope this answer helped you understand what is happening there.
PS. I cannot emphasize strongly enough that the biggest problem with your code is that even if this line worked:
int number1 = number - '0'; // error code
You'd be trying to store a number on the order of 10^50 into a variable capable of holding on the order of 10^9
The crucial part in this problem, which I have a vague feeling you may have found on spoj.com is that you're handling BIGints. Integers too big to be stored in a trivial manner.
1 ) The standard does not actually require for char to be this size directly, but indirectly it requires for it to be at least 8 bits, possibly more on obscure platforms. And yes, I think there were some platforms where it was indeed over 8 bits. Same thing with pointers that may behave strange on obscure architectures.

How to mix hexadecimal char and normal char in string literal in C++? [duplicate]

This question already has answers here:
How to properly add hex escapes into a string-literal?
(3 answers)
Limit the Length of a Hexadecimal Escape Sequence in a C-String [duplicate]
(1 answer)
Closed 4 years ago.
Is it possible to mix '\xfd' and 'a' in a single string literal?
For example:
unsigned char buff1[] = "\xfda";
unsigned char buff1[] = "\x0f\x0015899999999";
VC++2015 reports:
Error C2022 '-1717986919': too big for character
As mentioned by the other answer '\xfda' is considered as a single hex character literal. To get a string literal with '\xfd' and 'a' you need to split the string.
"\xfd" "a"
Adjacent string literal tokens are concatenated, which means that for example "ab" "cd" is the same as "abcd".
You will not be able to do so using a hex character literal in a single string. [lex.ccon]/8 states
The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character literals with no prefix) or wchar_­t (for character literals prefixed by L). [ Note: If the value of a character literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed. — end note ]
emphasis mine
This means '\xfda' is considered a single hex character literal since all of its digits are valid hex digits. What you can do is use multiple string literals that will be concatenated for you to break it up like
unsigned char buff1[] = "\xfd" "a";
Another option would be to switch to using an octal literal if you want 'a' to be part of the string. That would be "\375a".
Not possible, as explained well in NathanOliver's answer. But there is also no need you can simply use two literals:
unsigned char buff1[] = "\x0f\x00""15899999999";

C++ casting a single element of a string

I have recently tried to convert an element of a string (built from digits only) to int and in order to avoid bizzare results (like 49 instead of 1) I had to use 'stringstream' (not even knowing what it is and how it works) instead of int variable = static_cast<int>(my_string_name[digit_position]);.
According to what I've read about these 'streams' the type is irrelevant when you use them so I guess that was the case. The question is: What type did I actually want to convert FROM and why didn't it work? I thought that a string element is a char but apparently not if the conversion didn't work.
I thought that a string element is a char but apparently not if the conversion didn't work.
Yes, it is a char. However, the value of this char is not how it is rendered on the screen. This actually depends on the encoding of the string. For example, if your string is encoded in ASCII, you can see that the "bizarre" 49 value that you got must have been represented as a '1' character. Building on that, you can subtract the ASCII code of the character '0' to get the numeric values of these characters. However, be very careful: this greatly depends on your encoding. It is possible that your string may use multi-byte characters when you can't even index them this naively.
I believe that what you are looking for is not a cast, but the std::stoi (http://en.cppreference.com/w/cpp/string/basic_string/stol) function, which converts a string representation of an integer to int.
char values are already numeric, such a static_cast won't help you much here. What you actually need is:
int variable = my_string_name[digit_position] - '0';
For ASCII values the digits '0' - '9' are encoded as decimal numbers from 49 - 59. So to get the numerical value that is represented by the digit we need to substract the 1st digits value of 49.
To make the code portable for other character encoding tables (e.g. EBCDIC) the '0' is substracted.
Assume you have a string: std::string myString("1");. You could access the first element which is type char and you convert it to type int via static_cast. However, your expectation of what will happen is incorrect.
Upon looking at the ASCII table for number 1 you will see it has a base 10 integer value of 49, which is why the cast gives the output you're seeing. Instead you could use a conversion function like atoi (e.g., atoi(&myString[0]);).
Your "bizzare results" aren't actually that bizzare. When accessing a single character of a std::string like so : my_string_name[digit_position] you're getting a char. Putting that char in an int will convert '1' to an int. Looking at http://www.asciitable.com/ we can see that 49 is the representation for '1'. To get the actual number you should do int number = my_string_name[digit_position] - '0';
You are working with the ASCII char code for a number, not an actual number. However because the ASCII char codes are laid out in order:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
'8'
'9'
So if you are converting a single character and you're character is in base-10 then you can use:
int variable = my_string_name[digit_position] - '0'
But this depends upon the use of character codes which are in sequence. Specifically, if there were any other characters interspersed here this wouldn't work at all (I don't know any character code mapping that doesn't have these listed sequentially.)
But to guarantee the conversion you should use stoi:
int variable = stoi(my_string_name.substr(digit_position, 1))

subtracting characters of numbers

I am trying to understand what is going on with the code:
cout << '5' - '3';
Is what I am printing an int? Why does it automatically change them to ints when I use the subtraction operator?
In C++ character literals just denote integer values.
A basic literal like '5' denotes a char integer value, which with almost all extant character encodings is 48 + 5 (because the character 0 is represented as value 48, and the C++ standard guarantees that the digit values are consecutive, although there's no such guarantee for letters).
Then, when you use them in an arithmetic expression, or even just write +'5', the char values are promoted to int. Or less imprecisely, the “usual arithmetic conversions” kick in, and convert up to the nearest type that is int or *higher that can represent all char values. This change of type affects how e.g. cout will present the value.
* Since a char is a single byte by definition, and since int can't be less than one byte, and since in practice all bits of an int are value representation bits, it's at best only in the most pedantic formal that a char can be converted up to a higher type than int. If that possibility exists in the formal, then it's pure language lawyer stuff.
What you're doing here is subtracting the value for ASCII character '5' from the value for ASCII character '3'. So '5' - '3' is equivalent to 53 - 51 which results in 2.
The ASCII value of character is here
Every character in C programming is given an integer value to represent it. That integer value is known as ASCII value of that character. For example: ASCII value of 'a' is 97. For example: If you try to store character 'a' in a char type variable, ASCII value of that character is stored which is 97.
Subtracting between '5' and '3' means subtracting between their ASCII value. So, replace cout << '5' - '3'; with their ASCII value cout << 53 - 51;. Because Every character in C programming is given an integer value to represent it.
There is a subtraction operation between two integer number, so, it prints a integer 2

Printing char by integer qualifier

I am trying to execute the below program.
#‎include‬ "stdio.h"
#include "string.h"
void main()
{
char c='\08';
printf("%d",c);
}
I'm getting the output as 56 . But for any numbers other than 8 , the output is the number itself , but for 8 the answer is 56.
Can somebody explain ?
A characters that begins with \0 represents Octal number, is the base-8 number system, and uses the digits 0 to 7. So \08 is invalid representation of octal number because 8 ∉ [0, 7], hence you're getting implementation-defined behavior.
Probably your compiler recognize a Multibyte Character '\08' as '\0' one character and '8' as another and interprets as '\08' as '\0' + '8' which makes it '8'. After looking at the ASCII table, you'll note that the decimal value of '8' is 56.
Thanks to #DarkDust, #GrijeshChauhan and #EricPostpischil.
The value '\08' is considered to be a multi-character constant, consisting of \0 (which evaluates to the number 0) and the ASCII character 8 (which evaluates to decimal 56). How it's interpreted is implementation defined. The C99 standard says:
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
So if you would assign '\08' to something bigger than a char, like int or long, it would even be valid. But since you assign it to a char you're "chopping off" some part. Which part is probably also implementation/machine dependent. In your case it happens to gives you value of the 8 (the ASCII character which evaluates to the number 56).
Both GCC and Clang do warn about this problem with "warning: multi-character character constant".
\0 is used to represent octal numbers in C/C++. Octal base numbers are from 0->7 so \08 is a multi-character constant, consisting of \0, the compiler interprets \08 as \0 + 8, which makes it '8' whose ascii value is 56 . Thats why you are getting 56 as output.
As other answers have said, these kind of numbers represent octal characters (base 8). This means that you have to write '\010' for 8, '\011' for 9, etc.
There are other ways to write your assign:
char c = 8;
char c = '\x8'; // hexadecimal (base 16) numbers