Beginner: tolower in a vector - c++

So I have a code for counting different characters in a textfile and I dont quite understand what the bottom codeline in the following section do:
string linje;
int nLetters = 'Z' - 'A' +1;
vector<int> bokstaver(nLetters, 0);
int antallTegn = 0;
while(getline(inputfile, linje)){
for(char tegn:linje){
if(isLetter(tegn)){
antallTegn++;
bokstaver[tolower(tegn)-'a']++;
I know it converts the tegn variable to a lower case but I dont understand why we have to subtract 'a'.

Characters are represented by integers in computers. Each integer value represent a character (this gets more complicated with UNICODE, but that's beyond the scope of this question). So, 'a' has a numerical value, as does tolower(tegn).
Numbers can be thought as a line of numbers, where the value is the position of the number on that line. Similarly, number-encoded characters can be thought of as characters in a line where their numerical value is their position.
A number line:
0 1 2 3 4
A character line:
, . - a b c d
Subtraction of two numbers is analogous to their distance on the number line. Similarly, subtracting two characters is their distance on the character line.
So, bokstaver is an array whose indices are positions on the character line, offset by the position of the character 'a'.

Characters are represented using character codes in computers.
For example, character code representing for a is 0x61 (97) in ASCII code.
Therefore, you have to subtract 'a' (character code for a) in order to convert the character code in the input string to index for the vector starting with zero.
This method will work when character codes for alphabets are continuous. ASCII code satisfies this condition.

Related

Slicing string character correctly in C++

I'd like to count number 1 in my input, for example,111 (1+1+1) must return 3and
101must return 2 (1+1)
To achieve this,I developed sample code as follows.
#include <iostream>
using namespace std;
int main(){
string S;
cout<<"input number";
cin>>S;
cout<<"S[0]:"<<S[0]<<endl;
cout<<"S[1]:"<<S[1]<<endl;
cout<<"S[2]:"<<S[2]<<endl;
int T = (int) (S[0]+S[1]+S[2]);
cout<<"T:"<<T<<endl;
return 0;
}
But when I execute this code I input 111 for example and my expected return is 3 but it returned 147.
[ec2-user#ip-10-0-1-187 atcoder]$ ./a.out
input number
111
S[0]:1
S[1]:1
S[2]:1
T:147
What is the wrong point of that ? I am totally novice, so that if someone has opinion,please let me know. Thanks
It's because S[0] is a char. You are adding the character values of these digits, rather than the numerical value. In ASCII, numerical digits start at value 48. In other words, each of your 3 values are exactly 48 too big.
So instead of doing 1+1+1, you're doing 49+49+49.
The simplest way to convert from character value to digit is to subtract 48, which is the value of 0.
e.g, S[0] - '0'.
Since your goal is to count the occurrences of a character, it makes no sense to sum the characters together. I recommend this:
std::cout << std::ranges::count(S, '1');
To explain the output that you get, characters are integers whose values represent various symbols (and non-printable control characters). The value that represents the symbol '1' is not 1. '1'+'1'+'1' is not '3'.

What does '0' mean in a subtraction? [duplicate]

This question already has answers here:
C++- Adding or subtracting '0' from a value
(4 answers)
Closed 3 years ago.
class Complex
{
public:
int a,b;
void input(string s)
{
int v1=0;
int i=0;
while(s[i]!='+')
{
v1=v1*10+s[i]-'0'; // <<---------------------------here
i++;
}
while(s[i]==' ' || s[i]=='+'||s[i]=='i')
{
i++;
}
int v2=0;
while(i<s.length())
{
v2=v2*10+s[i]-'0';
i++;
}
a=v1;
b=v2;
}
};
This is a class complex and the function input inputs string and convert it into integers a and b of class complex.
what is the requirement of subtracting '0' in this code
The characters representing the digits, '0' thru '9' have values that are (and must be) sequential. For example, in the ASCII character set the '0' character is encoded with the value 48 (decimal), '1' is 49, '2' is 50 and so on, until '9', which is 57. Other encoding systems may use different actual values for the digits (for example, in EBCDIC, '0' is 240 and '9' is 249), but the C standard requires that they are sequentially congruent. From §5.2.1 of the C11 (ISO/IEC 9899:201x) Draft:
In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
Thus, when you subtract the '0' character from another character that represents a digit, you get the numerical value of that digit (rather than its encoded value).
So, in the code:
int a = '6' - '0';
the value of the a will be 6 (and similarly for other digits).
The reason for not just using a value of (say) 48, rather than writing '0' is that the former would only work on systems that use that particular (i.e. ASCII) character encoding, whereas the latter will work on any compliant system.
"What does '0' means in c++" - The symbol '0' designates a single character (constant) with the value 0, which, when interpreted as an ASCII character (which it will be) has the numerical value 0x30 (or 48 in decimal). So, you are basically just subtracting 48.
I dont quite understand the logic of this function but I hope this will help:
'0' is a character literal for 0 in ASCII. The [] operator of string returns a character. So most likely s[i] - '0' is supposed to get you the digit stored in s[i] as a character. Example: '3' -'0' = 3. Note lack of ' around the 3.
The C and C++ standards require that the characters '0'..'9' be
contiguous and increasing. So to convert one of those characters to
the digit that it represents you subtract '0' and to convert a digit
to the character that represents it you add '0'.
In this case the goal is to convert the character in the integer digit that represent.

Why does the size of this std::string change, when characters are changed?

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.
Here is the snippet.
int main()
{
std::string a = "123123\0shai\0";
std::cout << a.length();
}
http://ideone.com/W6Bhfl
The output in this case is
6
Where as the same program with a different string having numerals instead of characters
int main()
{
std::string a = "123123\0123\0";
std::cout << a.length();
}
http://ideone.com/mtfS50
gives an output of
8
What exactly is happening under the hood? How does presence of a '\0' character change the behavior?
The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.
That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).
It would have been very clear if you tried to print the contents of the string.
Octal sequences are one to three digits long, and the compiler will use as many digits as possible.
If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

Casting an int to a char. Not storing the correct value

I'm trying to store a number as a character in a char vector named code
code->at(i) = static_cast<char>(distribution(generator));
However it is not storing the way I think it should
for some shouldn't '\x4' be the ascii value for 4? if not how do I achieve that result?
Here's another vector who's values were entered correctly.
You are casting without actually converting the int to a char. You need:
code->at(i) = distribution(generator) + '0';
No. \xN does not give you the ASCII code for the character N.
\xN is the ASCII character† whose code is N (in hexadecimal form).
So, when you write '\x4', you get the [unprintable] character with the ASCII code 4. Upon conversion to an integer, this value is still 4.
If you wanted the ASCII character that looks like 4, you'd write '\x34' because 34 is 4's ASCII code. You could also get there using some magic, based on numbers in ASCII being contiguous and starting from '0':
code->at(i) = '0' + distribution(generator);
† Ish.

what does that mean, C programm for RLE

I am new to C so I do not understand what is happening in this line:
out[counter++] = recurring_count + '0';
What does +'0' mean?
Additionally, can you please help me by writing comments for most of the code? I don't understand it well, so I hope you can help me. Thank you.
#include "stdafx.h"
#include "stdafx.h"
#include<iostream>
void encode(char mass[], char* out, int size)
{
int counter = 0;
int recurring_count = 0;
for (int i = 0; i < size - 1; i++)
{
if (mass[i] != mass[i + 1])
{
recurring_count++;
out[counter++] = mass[i];
out[counter++] = recurring_count + '0';
recurring_count = 0;
}
else
{
recurring_count++;
}
}
}
int main()
{
char data[] = "yyyyyyttttt";
int size = sizeof(data) / sizeof(data[0]);
char * out = new char[size + 1]();
encode(data, out, size);
std::cout << out;
delete[] out;
std::cin.get();
return 0;
}
It adds the character encoding value of '0' to the value in recurring_count. If we assume ASCII encoded characters, that means adding 48.
This is common practice for making a "readable" digit from a integer value in the range 0..9 - in other words, convert a single digit number to an actual digit representation in a character form. And as long as all digits are "in sequence" (only digits between 0 and 9), it works for any encoding, not just ASCII - so a computer using EBCDIC encoding would still have the same effect.
recurring_count + '0' is a simple way of converting the int recurring_count value into an ascii character.
As you can see over on wikipedia the ascii character code of 0 is 48. Adding the value to that takes you to the corresponding character code for that value.
You see, computers may not really know about letters, digits, symbols; like the letter a, or the digit 1, or the symbol ?. All they know is zeroes and ones. True or not. To exist or not.
Here's one bit: 1
Here's another one: 0
These two are only things that a bit can be, existence or absence.
Yet computers can know about, say, 5. How? Well, 5 is 5 only in base 10; in base 4, it would be a 11, and in base 2, it would be 101. You don't have to know about the base 4, but let's examine the base 2 one, to make sure you know about that:
How would you represent 0 if you had only 0s and 1s? 0, right? You probably would also represent the 1 as 1. Then for 2? Well, you'd write 2 if you could, but you can't... So you write 10 instead.
This is exactly analogous to what you do while advancing from 9 to 10 in base 10. You cannot write 10 inside a single digit, so you rather reset the last digit to zero, and increase the next digit by one. Same thing while advancing from 19 to 20, you attempt to increase 9 by one, but you can't, because there is no single digit representation of 10 in base 10, so you rather reset that digit, and increase the next digit.
This is how you represent numbers with just 0s and 1s.
Now that you have numbers, how would you represent letters and symbols and character-digits, like the 4 and 3 inside the silly string L4M3 for example? You could map them; map them so, for example, that the number 1 would from then on represent the character A, and then 2 would represent B.
Of course, it would be a little problematic; because when you do that the number 1 would represent both the number 1 and the character A. This is exactly the reason why if you write...
printf( "%d %c", 65, 65 );
You will have the output "65 A", provided that the environment you're on is using ASCII encoding, because in ASCII 65 has been mapped to represent A when interpreted as a character. A full list can be found over there.
In short
'A' with single quotes around delivers the message that, "Hey, this A over here is to receive whatever the representative integer value of A is", and in most environments it will just be 65. Same for '0', which evaluates to 48 with ASCII encoding.