Character Array, \0 - c++

I know, the \0 on the end of the character array is a must if you use the character array with functions who expect \0, like cout, otherwise unexpected random characters appear.
My question is, if i use the character array only in my functions, reading it char by char, do i need to store the \0 at the end?
Also, is it a good idea to fill only characters and leave holes on the array?
Consider the following:
char chars[5];
chars[1] = 15;
chars[2] = 17;
chars[3] = 'c';
//code using the chars[1] and chars[3], but never using the chars
int y = chars[1]+chars[3];
cout << chars[3] << " is " << y;
Does the code above risk unexpected errors?
EDIT: edited the example.

The convention of storing a trailing char(0) at the end of an array of chars has a name, it's called a 'C string'. It has nothing to do, specifically, with char - if you are using wide character, a wide C string would be terminated with a wchar_t(0).
So it's absolutely fine to use char arrays without trailing zeroes if what you are using is just an array of chars and not a C string.
char dirs[4] = { 'n', 's', 'e', 'w' };
for (size_t i = 0; i < 4; ++i) {
fprintf(stderr, "dir %d = %c\n", i, dirs[i]);
std::cout << "dir " << i << " = " << dirs[i] << '\n';
}
Note that '\0' is char(0), that is it has a numeric, integer value of 0.
char x[] = { 'a', 'b', 'c', '\0' };
produces the same array as
char x[] = { 'a', 'b', 'c', 0 };
Your second question is unclear, though
//code using the chars[1] and chars[3], but never using the chars
int y = chars[1]+chars[3];
cout << chars[3] << " is " << y;
Leaving gaps is fine, as long as you're sure your code is aware that they are uninitialized. If it is not, then consider the following:
char chars[4]; // I don't initialize this.
chars[1] = '1';
chars[3] = '5';
int y = chars[1] + chars[3];
std::cout << "y = " << y << '\n';
// prints 100, because y is an int and '1' is 49 and '5' is 51
// later
for (size_t i = 0; i < sizeof(chars); ++i) {
std::cout << "chars[" << i << "] = " << chars[i] << '\n';
}
Remember:
char one = 1;
char asciiCharOne = '1';
are not the same. one has an integer value of 1, while asciiCharOne has an integer value of 49.
Lastly: If you are really looking to store integer numeric values rather than their character representations, you may want to look at the C++11 fixed-width integer types in . For an 8-bit, unsigned value uint8_t, for an 8-bit signed value, int8_t

Running off the end of a character array because it has no terminating \0 means accessing memory that does not belong to the array. That produces undefined behavior. Often that looks like random characters, but that's a rather benign symptom; some are worse.
As for not including it because you don't need it, sure. There's nothing magic that says that an array of char has to have a terminating \0.

To me it looks like you use the array not for strings, but as an array of numbers, so yes it is ok not to use '\0' in the array.
Since you are using it to store numbers, consider using uint8_t or int8_t types from stdint.h, which are typedefs for unsigned char and signed char, but is more clear this way that the array is used as an array of numbers, and not as a string.
cout << chars[3] << " is " << y; is not undefined behaviour because you access the element at position 3 from the array, that element is inside the array and is a char, so everything is fine.
EDIT:
Also, I know is not in your question, but since we are here, using char instead of int for numbers, can be deceiving. On most architectures, it does not increase performance, but actually slows it down. This is mainly because of the way the memory is addressable and because the processor works with 4 bytes / 8 bytes operands anyways. The only gain would be the storage size, but use this for storing on the disk, and unless you are working with really huge arrays, or with limited ram, use int for ram as well.

Related

How can I convert 1 element of an int array to a string/char

When I try to write for example arr[0] = 'y'; and I try to print it it will print "121" because 121 is 'y' in the ASCII table. How can I convert it so the array element will replace it with an actual 'y'?
int example [2] = {16,2};
How do I convert for example 16 to the letter 'y' so if I print the whole array it'd print "y2" and not 1212?
Print format:
int r[2] = {12,43};
for(int i=0; i<2;i++){
cout << r[i];
}
Arrays are homogeneous. All elements of the array have the same type. When you have an array of int, then all elements have the type int.
When you insert an int into a character stream, the output will be a number, with equivalent format to num_put::put().
So, if you want to see output like y, then you must insert either a character, or a character string. If you want to output one object like a character, and another like an integer, then those objects must have a different type.
Characters are integers (with special treament), and convertible to and from int (although not all int values are representable as char). Example using such conversion:
int example [2] = {'y',2};
std::cout << char(example[0]) << example[1];
An easy way to associate letters with numbers is to use std::map:
std::map<int, char> conversions;
//...
map[16] = 'y';
//...
std::cout << map[16] << std::endl;
You could also use an array of char:
static const char conversion_array[] =
{'F', ..., 'y', ...};
std::cout << conversion_array[16] << std::endl;
short my_short = 12;
char my_character = static_cast<char>(my_short);
std::cout << my_character << std::endl;
Will print a char
std::ostream has an overloaded operator<< for chars, which allows you to print the char instead of printing an integer

Accessing array by char in C++

Usually, I access an array in C++ by the syntax foo[2], where 2 is the index of an array.
In the below code. I didn't understand how this code is giving output and access this array by index 'b', 'c'. I am confused it is array index or something else.
int count[256] = {0};
count['b'] = 2;
cout << count['b'] << endl; //output 2
cout << count['c'] << endl; //output 0
Output
2
0
Remember that in c++ characters are represented as numbers. Take a look at this ascii table. http://www.asciitable.com
According to this the character 'b' is represented 98 and 'c' as 99. Therefore what your program is really saying is...
int count[256] = {0};
count[98] = 2;
cout << count[98] << endl; //output 2
cout << count[99] << endl; //output 0
Also incase you don't know saying an array = {0} means zero initialize every value so that is why count['c'] = 0.
In C/C++ there is not 8 bit / 1 byte integer. We simply use the char type to represent a single (signed or unsigned) byte and you can even put signed and unsigned infront of the char type. Char really is just another int type which we happen to use to express characters. You can also do the following.
char b = 98;
char c = 99;
char diff = c - b; //diff is now 1
Type char is actually an integral type. Every char value represented by a character literal has an underlying integral value it corresponds to in a given code page, which is probably an ASCII table. When you do:
count['b'] = 2;
you actually do:
count[98] = 2;
as character 'b' corresponds to an integral value of 98, character 'c' corresponds to an integral value of 99 and so on. To illustrate, the following statement:
char c = 'b';
is equivalent of:
char c = 98;
Here c has the same underlying value, it's the representation that differs.
Because characters are always represented by integers in the computer, it can be used as array indices.
You can verify by this:
char ch = 'b';
count[ch] = 2;
int i = ch;
cout << i << endl;
cout << count[i] << endl;
Usually the output is 98 2, but the first number may vary depending on the encoding of your environment.

why the results are so different?

I don't know why the similar code has a great difference? the first code outputs normally, but the second code outputs some unrecognizable characters. Who can explain it for me?
Thks
#include <iostream>
using namespace std;
int main(){
char a[5] = { 'A', 'B', 'C', 'D' };
cout << a + 1 << endl;
char b[5] = {'a','b','c','d','e'};
cout << b+1 << endl;
return 0;
}
Both expressions a+1 and b+1 degrade into a char* which is then treated by << as a NUL-terminated string, but only a is NUL-terminated. Accessing b as a NUL-terminated string causes undefined behavior, which in your case seems to be printing garbage after the first few characters. (Note that I originally said both were not NUL-terminated, but then I noticed that you had only 4 characters in the initializer for a but specified a size of 5. That means the 5th element would be zero-initialized, effectively NUL-terminating a.)
If you want to print them correctly without causing undefined behavior, make sure they are NUL-terminated:
int main(){
char a[5] = { 'A', 'B', 'C', 'D' }; // Works as-is, but not good form
cout << a + 1 << endl;
char b[6] = {'a','b','c','d','e', '\0'}; // Needed NUL-terminated, but still not the best way
cout << b+1 << endl;
return 0;
}
Or as eigenchris noted in a comment, you could rely on the compiler to NUL-terminate it for you by using a string constant instead:
char a[] = "ABCD";
char b[] = "abcde"; // Probably the best way to do this.
Since you send char * arguments to cout, like a or b, c style strings are expected. This means also the zero termination character is expected for each string. So the following will work:
char a[5] = { 'A', 'B', 'C', 'D', '\0' };
cout << a + 1 << endl;
char b[6] = {'a', 'b', 'c', 'd', 'e', '\0'};
cout << b + 1 << endl;
The thing that this happens to b is that you overwrite zero character by defining all 5 characters. 0 should have been the sixth character.
When you cout << a+1, the output will be "BCD", because the character after 'D' is a a nul (\0).
This is because you specified the array-size of 5, but only gave it 4 values. The remaining, unspecified value will be set to 0.
When you cout << b+1, you'll get "bcdeXXXX", which will continue until a nul is found.
(the XXX will be unpredictable characters based on whatever is in memory.)
This is because you specified the value of all 5 characters, and the memory beyond that is undefined. It might be nuls, it might be random values left over from an earlier program. There is no way to know for sure. But cout will continue printing until it encounters a nul \0, or causes a segmentation/access violation by reading an inaccessible memory address.
That is why you get random garbage on the second output.

Does the null character get automatically inserted?

Is the C++ compiler supposed to automatically insert the null character after the end of the char array? The following prints "Word". So does the C++ compiler automatically insert a null character at the last position in the array?
int main()
{
char x[] = {'W', 'o', 'r', 'd'};
cout << x << endl;
return 0;
}
No, it doesn't. Your array x has only 4 elements. You are passing std::cout a pointer to char, which is not the beginning of a null-terminated string. This is undefined behaviour.
As an example, on my platform, similar code printed Word?, with a spurious question mark.
In C/C++ string literals are automatically appended with terminating zero. So if you will write for example
char x[] = "Word";
then the size of the array will be equal to 5 and the last element of the array will contain '\0'.
When the following record is used as you showed
char x[] = {'W', 'o', 'r', 'd'};
then the compiler allocates exactly the same number of elements as the number of the initializers. That is in this case the array will have only 4 elements.
However if you would write the following way
char x[5] = {'W', 'o', 'r', 'd'};
then the fifth element will contain the terminating zero because it has no corresponding initializer and will be zero-initialized.
Also take into account that the following record is valid in C but invalid in C++
char x[4] = "Word";
That is undefined behavior, and it printed "Word" as it might also have crashed.
No it doesn't. You yourself have to do that if you want to use it as a string or you want to use library functions for strings for that.
No, in this case you are just creating a non-null terminated array. In-case of a read based on null termination. Your program won't stop as it wouldn't find the null at last.
char x[] = "Word";
In this case 5 bytes would be allocated for x. Null at the end.
No, it doesn't. If you want it inserted automatically, use a string constant:
const char *x = "Word";
//OR
std::string const s = "Word";
const char *x = s.c_str();
//OR
char x[] = { "Word" }; //using the C++11 brace initializer syntax.
The simplest way to verify this is to look at the memory address for x using a debugger and monitor it and the next few bytes following it. Stop the debugger after the initialization and examine the memory contents.
No, if you want to use the char array as a string use this form of the initializer
char x[] = { "Word" };
Although it is not guaranteed, it seems my implementation always adds a \0 even if I am not requested.
I tested noNull[2] is always \0
char noNull[] = {'H', 'i'};
for (int i = 2; i < 5; ++i) {
if (noNull[i] == '\0') {
std::cout << "index " << i << " is null" << std::endl;
} else {
std::cout << "index " << i << " is not null" << std::endl;
}
}
output
index 2 is null
index 3 is not null
index 4 is not null

Initializing an unsigned char array with hex values in C++

I would like to initialize an unsigned char array with 16 hex values. However, I don't seem to know how to properly initialize/access those values. When I try to access them as I might want to intuitively, I'm getting no value at all.
This is my output
The program was run with the following command: 4
Please be a value! -----> p
Here's some plaintext
when run with the code below -
int main(int argc, char** argv)
{
int n;
if (argc > 1) {
n = std::stof(argv[1]);
} else {
std::cerr << "Not enough arguments\n";
return 1;
}
char buff[100];
sprintf(buff,"The program was run with the following command: %d",n);
std::cout << buff << std::endl;
unsigned char plaintext[16] =
{0x0f, 0xb0, 0xc0, 0x0f,
0xa0, 0xa0, 0xa0, 0xa0,
0x00, 0x00, 0xa0, 0xa0,
0x00, 0x00, 0x00, 0x00};
unsigned char test = plaintext[1]^plaintext[2];
std::cout << "Please be a value! -----> " << test << std::endl;
std::cout << "Here's some plaintext " << plaintext[3] << std::endl;
return 0;
}
By way of context, this is part of a group project for school. We are ultimately trying to implement the Serpent cipher, but keep on getting tripped up by unsigned char arrays. Our project specification says that we must have two functions that take what would be Byte arrays in Java. I assume the closest relative in C++ is an unsigned char[]. Otherwise I would use vector. Elsewhere in the code I've implemented a setKey function which takes an unsigned char array, packs its values into 4 long long ints (the key needs to be 256 bits) and performs various bit-shifting and xor operations on those ints to generate the keys necessary for the cryptographic algorithm. Hope that's enough background on what I'm looking to do. I'm guessing I'm just overlooking some basic C++ functionality here. Thanks for any and all help!
A char is an 8-bit value capable of storing -128 <= n <= +127, frequently used to store character representations in different encodings and commonly - in Western, Roman-alphabet installations - char is used to indicate representation of ASCII or utf encoded values. 'Encoded' means the symbols/letter in the character set have been assigned numeric values. Think of the periodic table as an encoding of elements, so that 'H' (Hydrogen) is encoded as 1, Germanium as 32. In the ASCII (and UTF-8) tables, position 32 represents the character we call "space".
When you use operator << on a char value, the default behavior is to assume you are passing it a character encoding, e.g. an ASCII character code. If you do
char c = 'z';
char d = 122;
char e = 0x7A;
char f = '\x7a';
std::cout << c << d << e << f << "\n";
All four assignments are equivalent. 'z' is a shortcut/syntactic-sugar for char(122), 0x7A is hex for 122, and '\x7a' is an escape that forms the ascii character with a value of 0x7a or 122 - i.e. z.
Where many new programmers go wrong is that they do this:
char n = 8;
std::cout << n << endl;
this does not print "8", it prints ASCII character at position 8 in the ASCII table.
Think for a moment:
char n = 8; // stores the value 8
char n = a; // what does this store?
char n = '8'; // why is this different than the first line?
Lets rewind a moment: when you store 120 in a variable, it can represent the ASCII character 'x', but ultimately what is being stored is just the numeric value 120, plain and simple.
Specifically: When you pass 122 to a function that will ultimately use it to look up a font entry from a character set using the Latin1, ISO-8859-1, UTF-8 or similar encodings, then 120 means 'z'.
At the end of the day, char is just one of the standard integer value types, it can store values -128 <= n <= +127, it can trivially be promoted to a short, int, long or long long, etc, etc.
While it is generally used to denote characters, it also frequently gets used as a way of saying "I'm only storing very small values" (such as integer percentages).
int incoming = 5000;
int outgoing = 4000;
char percent = char(outgoing * 100 / incoming);
If you want to print the numeric value, you simply need to promote it to a different value type:
std::cout << (unsigned int)test << "\n";
std::cout << unsigned int(test) << "\n";
or the preferred C++ way
std::cout << static_cast<unsigned int>(test) << "\n";
I think (it's not completely clear what you are asking) that the answer is as simple as this
std::cout << "Please be a value! -----> " << static_cast<unsigned>(test) << std::endl;
If you want to output the numeric value of a char or unsigned char, you have to cast it to an int or unsigned first.
Not surprisingly, by default, chars are output as characters not integers.
BTW this funky code
char buff[100];
sprintf(buff,"The program was run with the following command: %d",n);
std::cout << buff << std::endl;
is more simply written as
std::cout << "The program was run with the following command: " << n << std::endl;
std::cout and std::cin always treats char variable as a char
If you want to input or output as int, you must manually do it like below.
std::cin >> int_var; c = int_var;
std::cout << (int)c;
If using scanf or printf, there is no such problem as the format parameter ("%d", "%c", "%s") tells howto covert input buffer (integer, char, string).