I have a very simple question, why is the output of this code the way it is?
I am using Dev-C++ 5.11 with TDM-GCC 4.9.2 64-bit
#include <iostream>
using namespace std;
int main()
{
char *ptr;
char Str[] = "abcdefg";
ptr = Str;
ptr += 8;
cout << ptr;
return 0;
}
I would expect the code to print an empty line.
For some reason, there seems to be a space character at position 7, you can detect that by changing ptr +=8; to ptr+=7;.
but what is weirder to me is that there are 3 more characters that can't be displayed unless you jump beyond the array limit by 2, which in this case we add 8 to the pointer. the characters are: "H,(a weird filled square),p"
screenshot of the output from my computer
I would expect the code to print an empty line.
That expectation is misguided. The behaviour of the program is undefined.
For some reason, there seems to be a space character at position 7
There is not. There is a null terminator at position 7.
but what is weirder to me is that there are 3 more characters that can't be displayed unless you jump beyond the array limit by 2 ...
The behaviour of accessing an array outside of its bounds is undefined.
You cannot expect the empty line when you try to access memory beyond your array.
At position 7 you have the '\0'.
C strings are terminated by this character and it is also used by the printing function to know when it should stop printing.
At position 8 you are beyond this character and the behavior of the program is undefined since the memory you are accessing might be everything.
The characters that you are able to print are just a representation of the memory beyond the string. They might change or exception might be thrown.
Character 'a' is at position 0 and character 'g' is at position 6 you should not access memory outside of this region except if you are trying to hack something.
Related
Before I continue, here's the code:
#include <iostream>
using namespace std;
int main() {
char array[] = {'a','b','c'};
cout << array << endl;
return 0;
}
My system:
VisualStudio 2019, default C++ settings
Using Debug build instead of release
When I run this code sample, I get something like this in my console output:
abcXXXXXXXXX
Those X's represent seemingly random characters. I know they're from existing values in memory at that address, but I don't understand why I'm getting 12 bytes back instead of the three from my array.
Now, I know that if I were doing this with ints which are four bytes long, maybe this would make sense but sizeof(array) returns three (ie. three bytes long, I know the sizeof(array) / sizeof(array[0] trick.) And when I do try it with ints, I'm even more confused because I get some four-byte hex number instead (maybe a memory address?)
This may be some trivial question, I'm sorry, but I'm just trying to figure out why it behaves like this. No vectors please, I'm trying to stay as non-STL as possible here.
cout takes this char array and addresses it as a null-terminated string.
Since the terminating character in this array is not the null character (i.e., char(0)), it attempts to print until encountering the null character.
At this point, it attempts to read memory outside of the array which you have allocated, and technically, anything could happen.
For example, there can be different data in that memory every time the function is called, or the memory access operation may even be illegal, depending on the address where array was allocated at the time the function was called.
So the behavior of your program is generally considered undefined (or non-deterministic).
Create a console application with the following code (renaming f to your entry point):
#include <iostream>
void f(){
char a[5][5];
std::cin>>a[0]>>a[1]>>a[2]>>a[3]>>a[4];
for (int y = 0; y<5; y++)std::cout<<a[y]<<'\n';
}
and input 5 lines of 5 characters such as :
abcde
abcde
abcde
abcde
abcde
I expected the output to be identical to the input or throw an error, but instead I got:
abcdeabcdeabcdeabcdeabcde
abcdeabcdeabcdeabcde
abcdeabcdeabcde
abcdeabcde
abcde
When investigated using the debugger, each a[y] value is equal to abcde and not the displayed output.
What on earth is going on here? Why is this happening, and is there a way to stop it?
Is it related to the
Stack around the variable 'a' was corrupted
Error that gets thrown after it std::couts?
I'm well aware of other ways to get the desired output using nested loops, but I'm wondering if there's a way to iterate only the outer dimension so it uses fewer characters - this is for a code golf challenge. It makes quite a difference:
for(int y=0;y<5;y++)std::cout<<a[y]<<'\n';
vs
for(int y=0;y<5;y++){for(int x=0;x<5;x++)std::cout<<a[y][x]}std::cout<<'\n';
The problem is caused by the fact that you are trying to store "abcde" in a char array with 5 elements. You need at least one more element in the array to hold the terminating null character.
As a consequence, your program has undefined behavior. We can try to make sense of the output but it's futile.
Use
char a[5][6]; // Anything greater than 5 will work for your input
If you don't want your code to be tied to a hard coded size, you can use std::string.
std::string a[5];
A C-string is an sequence of characters that ends with a null terminator. That means "abcde" is actually 6 characters long, the 5 you see plus the null terminator.
Since you only allocated enough space for the input without the null terminator trying to put the string into the array writes off the end of the array and is undefined behavior. What you need is
char a[5][6];
As that will have enough space for the 5 characters plus the null terminator.
I'm pretty inexperienced in c++, and I wrote the following code to see how characters and strings work.
#include "stdio.h"
#include <iostream>
#include <string>
using namespace std;
int main()
{
char asdf[] = "hello";
char test[5] = {'h','e','l','l','o'};
cout << test;
}
I was expected it to output "hello", but instead I got "hellohello", which is really puzzling to me. I did some experimenting:
If I change the asdf to another string of a different length, it outputs "hello" normally.
If I change the amount of characters in test it outputs "hello" normally.
I thought this only happened when the two were the same length, but when I change them both to "hell" it seems to output "hell" normally.
To make things more confusing, when I asked a friend to run this code on their computer, it outputted "hello" and then a random character.
I'm running a fresh install of code blocks on Ubuntu. Anyone have any idea what is going on here?
This is undefined behaviour.
Raw char* or char[] strings in C and C++ must be NULL-terminated. That is, the string needs to end with a '\0' character. Your test[5] does not do that, so the function printing the output continues after the last o, because it is still looking for the NULL-termination.
Due to how the strings are stored on the stack (the stack usually grows towards lower addresses), the next bytes it encounters are those of asdf[], to which you assigned "hello". This is how the memory layout actually looks like, the arrow indicates the direction in which memory addresses (think pointers) increase:
---->
+-------------------
|hellohello\0 ...
+-------------------
\_ asdf
\_ test
Now in C++ and C, string literals like "hello" are NULL-terminated implicitly, so the compiler writes a hidden '\0' behind the end of the string. The output function continues to print the contents of asdf char-by-char until it reaches that hidden '\0' and then it stops.
If you were to remove the asdf, you would likely see a bit of garbage after the first hello and then a segmentation fault. But this is undefined behaviour, because you are reading out of the bounds of the test array. This also explains why it behaves differently on different systems: for example, some compilers may decide to lay out the variables in a different order on the stack, so that on your friends system, test is actually lower on the stack (remember, lower on the stack means at a higher address):
---->
+-------------------
|hello\0hello ...
+-------------------
\_ test
\_ asdf
Now when you print the contents of test, it will print hello char-by-char, then continue reading the memory until a \0 is found. The contents of ... are highly specific to architecture and runtime used, possibly even phase of the moon and time of day (not entirely serious), so that on your friends machine it prints a "random" character and stops then.
You can fix this by adding a '\0' or 0 to your test array (you will need to change the size to 6). However, using const char test[] = "hello"; is the sanest way to solve this.
You have to terminate your test array with an ascii 0 char. What happens now is that in memory it is adjacent to your asdf string, so since test isn't terminated, the << will just continue until it meets the ascii 0 at the end of asdf.
In case you wonder: When filling asdf, this ascii 0 is added automatically.
The reason for this is that C style strings need the null character to mark the end of the string.
As you have not put this into the array test it will just keep printing characters until it finds one. In you case the array asdf happens to follow test in memory - but this cannot be guaranteed.
Instead change the code to this:
char test[] = {'h','e','l','l','o', 0};
cout is printing all characters starting from the beginning of the given address (test here, or &test[0] in equivalent notation) up to the point where it finds a null terminator. As you haven't put a null terminator into the test array it will continue to print until it accidently finds one in memory. Up from this point it's pretty much undefined behavior what happens.
Last character should be '\0' to indicate end of string.
char test[6] = {'h','e','l','l','o','\0'};
Unless there is an overload of operator<< for a reference to an array of 5 chars, the array will "decay" to a pointer to char and treated as a C style string by the operator. C style strings are by convention terminated with a 0 char, which your array is lacking. Therefore the operator continues outputting the bytes in memory, interpreting them as printable chars. It just so happens that on the stack, the two arrays were adjacent so that the operator ran into asdf's memory area, outputting those chars and finally encountering the implicit 0 char which is at the end of "hello". If you omit the other declaration it's likely that your program will crash, namely if the next 0 byte comes later than the memory boundary of your program.
It is undefined behavior to access memory outside an object (here: test) through a pointer to that object.
Character sequences need a null terminator (\0).
char asdf[] = "hello"; // OK: String literals have '\0' appended at the end
char test[5] = {'h','e','l','l','o'}; // Oops, not null terminated. UB
Corrected:
char test[6] = {'h','e','l','l','o','\0'}; // OK
// ^ ^^^^
using namespace std;
char str1[10],str2[10];
cin.getline(str1,14);
cin.getline(str2,10);
cout<<strlen(str1)<<'\t'<<strlen(str2);
The Output of the above code was as follows-
1234567890123
bye
13 3
How could be the length of str1 greater than 10?
It can't. You overran your buffer and overwrote memory outside of the array. Your program happened not to crash or teleport a cat into your monitor before it found a '\0' no earlier than 13 bytes in memory from the start of your 10-element array.
The behaviour of your overrunning a char array is undefined. To be clear, you need to ensure there is sufficient space for your data and a \0 string terminator else the behaviour of cout will be undefined.
The compiler is allowed to do anything if it encounters this.
Your output is a common manifestation, but you must not rely on such behaviour.
Because it is likely to use the space reserved for str2.
But this is undefined behaviour, it could do anything (likely a segfault(access violation or whatever is named on your OS)
When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?
You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.
C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.
Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.
You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.