How many bytes does a string take? A char? - c++

I'm doing a review of my first semester C++ class, and I think I missing something. How many bytes does a string take up? A char?
The examples we were given are, some being character literals and some being strings:
'n', "n", '\n', "\n", "\\n", ""
I'm particularly confused by the usage of newlines in there.

#include <iostream>
int main()
{
std::cout << sizeof 'n' << std::endl; // 1
std::cout << sizeof "n" << std::endl; // 2
std::cout << sizeof '\n' << std::endl; // 1
std::cout << sizeof "\n" << std::endl; // 2
std::cout << sizeof "\\n" << std::endl; // 3
std::cout << sizeof "" << std::endl; // 1
}
Single quotes indicate characters.
Double quotes indicate C-style strings with an invisible NUL
terminator.
\n (line break) is only a single char and so is \\ (backslash). \\n is just a backslash followed by n.

'n': is not a string, is a literal char, one byte, the character code for the letter n.
"n": string, two bytes, one for n and one for the null character every string has at the end.
"\n": two bytes as \n stand for "new line" which takes one byte, plus one byte for the null char.
'\n': same as the first, literal char, not a string, one byte.
"\\n": three bytes.. one for \, one for newline and one for the null character
"": one byte, just the null character.

A char, by definition, takes up one byte.
Literals using ' are char literals; literals using " are string literals.
A string literal is implicitly null-terminated, so it will take up one more byte than the observable number of characters in the literal.
\ is the escape character and \n is a newline character.
Put these together and you should be able to figure it out.

The following will take x consecutive chars in memory:
'n' - 1 char (type char)
"n" - 2 chars (above plus zero character) (type const char[2])
'\n' - 1 char
"\n" - 2 chars
"\\n" - 3 chars ('\', 'n', and zero)
"" - 1 char
edit: formatting fixed
edit2: I've written something very stupid, thanks Mooing Duck for pointing that out.

The number of bytes a string takes up is equal to the number of characters in the string plus 1 (the terminator), times the number of bytes per character. The number of bytes per character can vary. It is 1 byte for a regular char type.
All your examples are one character long except for the second to last, which is two, and the last, which is zero. (Some are of type char and only define a single character.)

'n' -> One char. A char is always 1 byte. This is not a string.
"n" -> A string literal, containing one n and one terminating NULL char. So 2 bytes.
'\n' -> One char, A char is always 1 byte. This is not a string.
"\n" -> A string literal, containing one \n and one terminating NULL char. So 2 bytes.
"\\n" -> A string literal, containing one \, one '\n', and one terminating NULL char. So 3 bytes.
"" -> A string literal, containing one terminating NULL char. So 1 byte.

You appear to be referring to string constants. And distinguishing them from character constants.
A char is one byte on all architectures. A character constant uses the single quote delimiter '.
A string is a contiguous sequence of characters with a trailing NUL character to identify the end of string. A string uses double quote characters '"'.
Also, you introduce the C string constant expression syntax which uses blackslashes to indicate special characters. \n is one character in a string constant.
So for the examples 'n', "n", '\n', "\n":
'n' is one character
"n" is a string with one character, but it takes two characters of storage (one for the letter n and one for the NUL
'\n' is one character, the newline (ctrl-J on ASCII based systems)
"\n" is one character plus a NUL.
I leave the others to puzzle out based on those.

'n' - 0x6e
"n" - 0x6e00
'\n' - 0x0a
"\n" - 0x0a00
"\\n" - 0x5c6e00
"" - 0x00

Depends if using UTF8 a char is 1byte if UTF16 a char is 2bytes doesn't matter if the byte is 00000001 or 10000000 a full byte is registered and reserved for the character once declared for initialization and if the char changes this register is updated with the new value.
a strings bytes is equal to the number of char between "".
example: 11111111 is a filled byte,
UTF8 char T = 01010100 (1 byte)
UTF16 char T = 01010100 00000000 (2 bytes)
UTF8 string "coding" = 011000110110111101100100011010010110111001100111 (6 bytes)
UTF16 string "coding" = 011000110000000001101111000000000110010000000000011010010000000001101110000000000110011100000000 (12 bytes)
UTF8 \n = 0101110001101110 (2 bytes)
UTF16 \n = 01011100000000000110111000000000 (4 bytes)
Note: Every space and every character you type takes up 1-2 bytes in the compiler but there is so much space that unless you are typing code for a computer or game console from the early 90s with 4mb or less you shouldn't worry about bytes in regards to strings or char.
Things that are problematic to memory are calling things that require heavy computation with floats, decimals, or doubles and using math random in a loop or update methods. That would better be ran once at runtime or on a fixed time update and averaged over the time span.

Related

What did I do CORRECTLY?-comparing index from string using .at(), error messages [duplicate]

When should I use single quotes and double quotes in C or C++ programming?
In C and in C++ single quotes identify a single character, while double quotes create a string literal. 'a' is a single a character literal, while "a" is a string literal containing an 'a' and a null terminator (that is a 2 char array).
In C++ the type of a character literal is char, but note that in C, the type of a character literal is int, that is sizeof 'a' is 4 in an architecture where ints are 32bit (and CHAR_BIT is 8), while sizeof(char) is 1 everywhere.
Some compilers also implement an extension, that allows multi-character constants. The C99 standard says:
6.4.4.4p10: "The value of an integer character constant containing more
than one character (e.g., 'ab'), or
containing a character or escape
sequence that does not map to a
single-byte execution character, is
implementation-defined."
This could look like this, for instance:
const uint32_t png_ihdr = 'IHDR';
The resulting constant (in GCC, which implements this) has the value you get by taking each character and shifting it up, so that 'I' ends up in the most significant bits of the 32-bit value. Obviously, you shouldn't rely on this if you are writing platform independent code.
Single quotes are characters (char), double quotes are null-terminated strings (char *).
char c = 'x';
char *s = "Hello World";
'x' is an integer, representing the numerical value of the
letter x in the machine’s character set
"x" is an array of characters, two characters long,
consisting of ‘x’ followed by ‘\0’
I was poking around stuff like: int cc = 'cc'; It happens that it's basically a byte-wise copy to an integer. Hence the way to look at it is that 'cc' which is basically 2 c's are copied to lower 2 bytes of the integer cc. If you are looking for a trivia, then
printf("%d %d", 'c', 'cc'); would give:
99 25443
that's because 25443 = 99 + 256*99
So 'cc' is a multi-character constant and not a string.
Cheers
Single quotes are for a single character. Double quotes are for a string (array of characters). You can use single quotes to build up a string one character at a time, if you like.
char myChar = 'A';
char myString[] = "Hello Mum";
char myOtherString[] = { 'H','e','l','l','o','\0' };
single quote is for character;
double quote is for string.
In C, single-quotes such as 'a' indicate character constants whereas "a" is an array of characters, always terminated with the \0 character
Double quotes are for string literals, e.g.:
char str[] = "Hello world";
Single quotes are for single character literals, e.g.:
char c = 'x';
EDIT As David stated in another answer, the type of a character literal is int.
A single quote is used for character, while double quotes are used for strings.
For example...
printf("%c \n",'a');
printf("%s","Hello World");
Output
a
Hello World
If you used these in vice versa case and used a single quote for string and double quotes for a character, this will be the result:
printf("%c \n","a");
printf("%s",'Hello World');
output :
For the first line. You will get a garbage value or unexpected value or you may get an output like this:
�
While for the second statement, you will see nothing. One more thing, if you have more statements after this, they will also give you no result.
Note: PHP language gives you the flexibility to use single and double-quotes easily.
Use single quote with single char as:
char ch = 'a';
here 'a' is a char constant and is equal to the ASCII value of char a.
Use double quote with strings as:
char str[] = "foo";
here "foo" is a string literal.
Its okay to use "a" but its not okay to use 'foo'
Single quotes are denoting a char, double denote a string.
In Java, it is also the same.
While I'm sure this doesn't answer what the original asker asked, in case you end up here looking for single quote in literal integers like I have...
C++14 added the ability to add single quotes (') in the middle of number literals to add some visual grouping to the numbers.
constexpr int oneBillion = 1'000'000'000;
constexpr int binary = 0b1010'0101;
constexpr int hex = 0x12'34'5678;
constexpr double pi = 3.1415926535'8979323846'2643383279'5028841971'6939937510;
In C & C++ single quotes is known as a character ('a') whereas double quotes is know as a string ("Hello"). The difference is that a character can store anything but only one alphabet/number etc. A string can store anything.
But also remember that there is a difference between '1' and 1.
If you type
cout<<'1'<<endl<<1;
The output would be the same, but not in this case:
cout<<int('1')<<endl<<int(1);
This time the first line would be 48. As when you convert a character to an int it converts to its ascii and the ascii for '1' is 48.
Same, if you do:
string s="Hi";
s+=48; //This will add "1" to the string
s+="1"; This will also add "1" to the string
different way to declare a char / string
char char_simple = 'a'; // bytes 1 : -128 to 127 or 0 to 255
signed char char_signed = 'a'; // bytes 1: -128 to 127
unsigned char char_u = 'a'; // bytes 2: 0 to 255
// double quote is for string.
char string_simple[] = "myString";
char string_simple_2[] = {'m', 'S', 't', 'r', 'i', 'n', 'g'};
char string_fixed_size[8] = "myString";
char *string_pointer = "myString";
char string_poionter_2 = *"myString";
printf("char = %ld\n", sizeof(char_simple));
printf("char_signed = %ld\n", sizeof(char_signed));
printf("char_u = %ld\n", sizeof(char_u));
printf("string_simple[] = %ld\n", sizeof(string_simple));
printf("string_simple_2[] = %ld\n", sizeof(string_simple_2));
printf("string_fixed_size[8] = %ld\n", sizeof(string_fixed_size));
printf("*string_pointer = %ld\n", sizeof(string_pointer));
printf("string_poionter_2 = %ld\n", sizeof(string_poionter_2));

Base64 encoded String too big, trailing characters truncated in c++

I have an image which I have to convert to base64. After the conversion, below is its value:
"
and so on...
This a quite a big value. I need to put this in a char data[] like below:
char sPostData[21070] = "{ \"image\" : \"<base64 encoded value>\" , \"name\": \"dev\"}";
but it throws this error:
Error C2026 string too big, trailing characters truncated
How can I resolve it?
The Microsoft compiler imposes a limit of 16380 single-byte characters for a string literal. The documentation says
Prior to adjacent strings being concatenated, a string cannot be longer than 16380 single-byte characters.
Break the string into adjacent chunks, something like
char[] = "a whole bunch of characters"
"a whole bunch more characters"
" and even more characters";
According to the documentation for that error, there is a limit of 16380 bytes in a character array (characters for narrow strings, fewer for Unicode).
Character string pointers (const char *) have a different limit, 65535 bytes.

C++ character array doesn't take inputs of more than 4 characters

I'm trying to make a char array in C++ that will store a limited number of characters that I set (in this case 5). My program looks like this:
char name[5];
cout << "Enter 5 character name: ";
cin.getline(name, 5);
cout << name;
I defined a char variable named "name" and set it to store only 5 characters, but whenever I run the program and try to enter anything more than 4 characters, the program truncates anything longer than 4 characters. This happens even if I change the number of characters in the char definition or use a cin statement.
I think it's because of the null terminator character, it might also depend on the version of your C++ compiler, anyways reading the reference here http://www.cplusplus.com/reference/istream/istream/getline/...
It says:
Extracts characters from the stream as unformatted input and stores them into s as a c-string, until either the extracted character is the delimiting character, or n characters have been written to s (including the terminating null character).
If it includes the null character terminator, then it should be reserved a place at the very end of the character array, that's why if you have a 5 characters char array, cin.getline fills in 4 only.

Why does the size of this std::string change, when characters are changed?

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.
Here is the snippet.
int main()
{
std::string a = "123123\0shai\0";
std::cout << a.length();
}
http://ideone.com/W6Bhfl
The output in this case is
6
Where as the same program with a different string having numerals instead of characters
int main()
{
std::string a = "123123\0123\0";
std::cout << a.length();
}
http://ideone.com/mtfS50
gives an output of
8
What exactly is happening under the hood? How does presence of a '\0' character change the behavior?
The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.
That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).
It would have been very clear if you tried to print the contents of the string.
Octal sequences are one to three digits long, and the compiler will use as many digits as possible.
If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

Understanding Endianness - a variable value

I'm using a piece of code (found else where on this site) that checks endianness at runtime.
static bool isLittleEndian()
{
short int number = 0x1;
char *numPtr = (char*)&number;
std::cout << numPtr << std::endl;
std::cout << *numPtr << std::endl;
return (numPtr[0] == 1);
}
When in debug mode, the value numPtr looks like this: 0x7fffffffe6ee "\001"
I assume the first hexadecimal part is the pointer's memory address, and the second part is the value it holds. I'm know that \0 is null termination in old-style C++, but why is it at the front? Is it to do with endianness?
On a little-endian machine: 01 the first byte and therefore least significant (byte place 0), and \0 the second byte/final byte (byte place 1)?
In addition, the cout statements do not print the pointer address or it's value. Reasons for this?
The others have given you a clear answer to what "\000" means, so this is an answer to your question:
On a little-endian machine: 01 the first byte and therefore least significant (byte place 0), and \0 the second byte/final byte (byte place 1)?
Yes, this is correct. Of you look at value like 0x1234, it consists of two bytes, the high part 0x12 and the low part 0x34. The term "little endian" means that the low part is stored first in memory:
addr: 0x34
addr+1: 0x12
Did you known that the term "endian" predated the computer industry? It was originally used by Jonathan Swift in his book Gulliver's Travels, where it described if people were eating the egg from the pointy or the round end.
the easiest way to check for endianness is to let the system do it for you:
if (htonl(0xFFFF0000)==0xFFFF0000) printf("Big endian");
else printf("Little endian");
That's not a \0 followed by "01", it's the single character \001, which represents the number 1 in octal. That's the only byte "in" your string. There's another byte after it with the value zero, but you don't see that since it's treated as the string terminator.
For starters: this type of function is totally worthless: on a machine
where sizeof(int) is 4, there are 24 possible byte orders. Most, of
course, don't make sense, but I've seen at least three. And endianness
isn't the only thing which affects integer representation. If you have
an int, and you want to get the low order 8 bits, use intValue &
0xFF, for the next 8 bits, (intValue >> 8) & 0xFF.
With regards to your precise question: I presume what you are describing
as "looks like this" is what you see in the debugger, when you break at
the return. In this case, numPtr is a char* (a unsigned char
const* would make more sense), so the debugger assumes a C style
string. The 0x7fffffffe6ee is the address; what follows is what the
compiler sees as a C style string, which it displays as a string, i.e.
"...". Presumably, your platform is a traditional little-endian
(Intel); the pointer to the C style string sees the sequence (numeric
values) of 1, 0. The 0 is of course the equivalent of '\0', so it
considers this a one character string, with that one character having
the encoding of 1. There is no printable character with an encoding of
one, and it doesn't correspond to any of the normal escape sequences
(e.g. '\n', '\t', etc.) either. So the debugger outputs it using
the octal escape sequence, a '\' followed by 1 to 3 octal digits.
(The traditional '\0' is just a special case of this; a '\' followed
by a single octal digit.) And it outputs 3 digits, because (probably)
it doesn't want to look ahead to ensure that the next character isn't an
octal digit. (If the sequence were the two bytes 1, 49, for example,
49 is '1' in the usual encodings, and if it output only a single byte
for the octal encoding of 1, the results would be "\11", which is a
single character string—corresponding in the usual encodings to
'\t'.) So you get " this is a string, \001 with first character
having an encoding of 1 (and no displayable representation), and "
that's the end of the string.
The "\001" you are seeing is just one byte. It's probably octal notation, which needs three digits to properly express the (decimal) values 0 to 255.
The \0 isn't a NUL, the debugger is showing you numPtr as a string, the first character of which is \001 or control-A in ASCII. The second character is \000, which isn't displayed because NULs aren't shown when displaying strings. The two character string version of 'number' would appear as "\000\001" on a big-endian machine, instead of "\001\000" as it appears on little-endian machines.
In addition, the cout statements do not print the pointer address or
it's value. Reasons for this?
Because chars and char pointers are treated differently than integers when it comes to printing.
When you print a char, it prints the character from whatever character set is being used. Usually, this is ASCII, or some superset of ASCII. The value 0x1 in ASCII is non-printing.
When you print a char pointer, it doesn't print the address, it prints it as a null-terminated string.
To get the results you desire, cast your char pointer to a void pointer, and cast your char to an int.
std::cout << (void*)numPtr << std::endl;
std::cout << (int)*numPtr << std::endl;