Now I have learned that cin.getline works like this.
cin.getline(dest string, number of charecters to put into string);
so assume this program.
char s1[8]="Hellopo";
cin.getline(s1,5);
cout<<s1<<endl;
This was by input :hhhhhhhhhhhhh
This was the programs output: hhhh
I have 2 concerns in this program.
1-) I set the program to get 5 characters from what the user inputed and store in s1. When I ran The program it only printed out 4 characters.
2-) Also I expected the program to continue printing out the rest of s1 after it printed out what it got from the user. but it stopped after hhhh
Please explain to me my two concerns. Thank you
std::cin::getline will store four characters plus a null termination in this case (five characters in total). And std::cout will stop printing at the first null terminator it finds.
From istream::getline():
count-1 characters have been extracted (in which case setstate(failbit) is executed).
This means that if you specify 5, only 4 characters will be read. And:
...it then stores a null character CharT() into the next successive location of the array
so a null character will be inserted after the fourth character. So the array s will have contents:
'h' == s[0]
'h' == s[1]
'h' == s[2]
'h' == s[3]
0 == s[4]
The operator<< will stop printing a char* when the first null character is found.
The fifth character is the 0-terminator. getline(buffer,n) stores up to n bytes including the 0-terminator in the buffer. And then cout << s1; stops at the 0-terminator.
The fifth character is the null terminator, which marks the end of the string.
Related
char* s = "123";
std::cout << s[s[3]] << std::endl; // prints 1
std::cout << s[3] << std::endl; // prints nothing?
I tried running the following snippet and the first print statement outputs 1 while the second outputs (seemingly) nothing. What is going on when the pointer is dereferenced using the length of the char pointer array here?
It is unclear why you are using the value of the character at index 3 (s[3]) to index the string again. But in any case, the key point here is that you're using a char to index an array. This means that the char is used as a number, the conversion happening most likely using the ASCII character encoding.
The reason you're getting nothing printed out when you print s[3] is because s is a character array with length 4, and the last character is the null terminator. Null meaning the number 0. The null terminator identifies the end of the string. But it is not a printable character, because it is not meant to be printed. It doesn't have a gliph associated with it, so you don't get anything printed.
Of course, you can see now that s[s[3]] is nothing but s[0], which is the character "1".
Edited question
I understood my mistake in the code I had given in the original question, and the characters I was getting are garbage characters. Although, I still have a few questions about garbage characters in C:
Why can't the character be copied?
Do garbage characters have some pattern? Meaning that can you predict for an empty string what character can come, for an empty integer what will come, and so on.
When a variable is declared, why does it have a garbage character instead of being blank? Is there a specific reason of storing it with a garbage character?
For a string which is not null-terminated, will the same garbage character be printed on every OS? If yes, which one?
Are there the same garbage characters on every OS? Or are they different?
Is there a way to print these characters on the stdout buffer in C / C++?
If you see carefully in the character , there are some characters and numbers in it. Do they represent something?
Is there a list of garbage characters which can be printed in C / C++?
Original Question
Title of original question: Mysterious character output in C
I had come across this code in K & R:
int scanline (char str [], int lim) /* Line will be read in 'str []', while lim is the maximum characters to be read */
{
int c, len, j; /* 'len' will have the length of the read string */
j = 0; /* Initializing 'j' */
for (len = 0; (c = getchar ()) != EOF && c != '\n'; ++len) /* Reading a character one by one, till the user enters '\n', and checking for failure of 'getchar' */
{
if (len < (lim -2)) /* Checking that string entered has not gone beyond it's boundaries. '-2' for '\n' and '\0' */
{
str [j] = c; /* Copying read character into 'string [j]' */
++ j; /* Incrementing 'j' by 1 */
}
}
if (c == '\n') /* Checking if user has finished inputting the line */
{
str [j] = c; /* Copying newline into string */
++j;
++ len;
}
return len; /* Returning number of characters read */
}
In the K & R, it is known as getline, but I made changes, added comments, and thus defined it as scanline. To test this, I made a demo program:
#include <mocl/cancel.h>
int main (int argc, char **argv)
{
int len;
char str [50];
len = scanline (str, 50);
printf ("len = %d\n str = %s\n", len, str);
return 0;
}
The required headers and the function was in my own library, cancel.h. Then when I compiled my program, it was successful. Although, when I ran the executable, I got unexpected output (I cannot type it as I get a character which when I copy, it just gets pasted as 'm'):
The mysterious character is which when I copy, gets copied as the letter m. Also, when I run my program with different inputs, I get different mysterious outputs:
In another case, I get perfect output, just that a blank line is printed:
I had also come across this question, in which the user gets the same symbol.
What have I done till now?
I searched a lot, and I could not find any clue about this character, but if you see carefully, in the second image, I get more characters when I enter "hi this is ashish". One of them is the slash, and one is . But I get another character . I got this link which was showed how to reproduce it, and explained it, although I could not understand. When you run the code given there, you get a lot of characters, and one of them is . Although, even the author of that article could not copy it, and has not posted it. So here's the output:
That was the actual output, as that's not clear, here's a cut out version:
So basically I got to know that both the characters and are extended characters from a string. At that point, I actually figured out what was causing the problem in scanline.
The lines
if (c == '\n') /* Checking if user has finished inputting the line */
{
str [j] = c; /* Copying newline into string */
++j;
++ len;
}
were causing the problems as you were copying a newline into the string. It worked, but I'm not sure why, as doing that was just a guess. I searched but still could not find the reason.
My Questions
How does removing those lines make the program work properly?
What are the characters and ? What are they supposed to do and how did they appear over here?
Are there any more characters like this?
Why can't those characters be copied?
Is it Undefined Behavior?
There's some confusion here regarding the term garbage characters. What it refers to is any byte that resides in a variable that wasn't assigned in some well-defined way. The character A can be a garbage character if it happens to appear in (for example) a block of memory returned by malloc or an uninitialized char variable.
This is distinct from unprintable characters which are any character that does not have a well-defined representation when printed as characters. For example, ASCII codes 0 - 31 and 127 (0 - 1F and 7F hex) are control characters and therefore unprintable. There are also multibyte characters for which a particular terminal may not know how to render them.
To get into your specific questions:
Why can't the character (image) be copied?
As an unprintable character, its screen representation is not well defined. So attempting to copy and paste it from a terminal will yield unexpected results.
Do garbage characters have some pattern? Meaning that can you
predict for an empty string what character can come, for an empty
integer what will come, and so on.
The nature of garbage characters is that their contents are undefined. Trying to predict what uninitialized data will contain is a futile effort. The same piece of code compiled with two different compilers (or the same compiler with different optimization settings) can have completely different contents for any uninitialized data.
The standard doesn't say what values should go there, so implementations are free to handle it however they want. They could chose to leave whatever values happen to be at those memory addresses, they could choose to write 0 to all addresses, they could choose to write the values 0, 1, 2, 3, etc. in sequence. In other words, the contents are undefined.
When a variable is declared, why does it have a garbage character
instead of being blank? Is there a specific reason of storing it with
a garbage character?
Global variables and static local variables are initialized with all bytes zero, which is what the standard dictates. That is something that is done easily at compile time. Local variables on the other hand reside on the stack. So their values are whatever happens to be on the stack at the time the function is called.
Here's an interesting example:
void f1()
{
char str[10];
strcpy(str, "hello");
}
int main()
{
f1();
f1();
return 0;
}
Here is what a particular implementation might do:
The first time f1 is called, the local variable str is uninitialized. Then strcpy is called which copies in the string "hello". This takes up the first 6 bytes of the variable (5 for the string and 1 for the null terminator). The remaining 4 bytes are still garbage. When this functions returns, the memory that the variable str resided at is free to be used for some other purpose.
Now f1 gets called again immediately after the first call. Since no other function was called, the stack for this invocation of f1 happens to sit at the exact same place as the last invocation. So if you were to examine str at this time, you would find it contains h, e, l, l, o, and a null byte (i.e. the string "hello") for the first 6 bytes. But, this string is garbage. It wasn't specifically stored there. If some other function was called before calling f1 a second time, most likely those values would not be there.
Again, garbage means the contents are undefined. The compiler doesn't explicitly put "garbage" (or unprintable characters) in variables.
For a string which is not null-terminated, will the same garbage
character be printed on every OS? If yes, which one?
Here's one of those places you're confusing garbage and unprintable. In your specific case, the garbage character happens to be unprintable, but it doesn't have to be. Here's another example:
void f3()
{
char str1[5], str2[5];
strcpy(str1, "hello");
strcpy(str2, "test");
printf("str1=%s\n", str1);
}
Let's suppose the compiler decides to place str2 immediately after str1 in memory (although it doesn't have to). The first call to strcpy will write the string "hello" into str1, but this variable doesn't have enough room the the null terminating byte. So it gets written to the next byte in memory, which happens to be the first byte of str2. Then when the next call to strcpy runs, it puts the string "test" in str2 but in doing so it overwrites the null terminating byte put there when str1 was written to.
Then when printf gets called, you'll get this as output:
str1=hellotest
When printing str1, printf looks for the null terminator, but there isn't one inside of str1. So it keeps reading until it does. In this case there happens to be another string right after it, so it prints that as well until it finds the null terminator that was properly stored in that string.
But again, this behavior is undefined. A seemingly minor change in this function could result in str2 appearing in memory first. The compiler is free to do as it wishes in the regard, so there's no way to predict what will happen.
Are there the same garbage characters on every OS? Or are they
different?
I believe you're actually referring to unprintable characters in this case. This really depends on the character set of the OS and/or terminal in question. For example, Chinese characters are represented with multiple bytes. If your terminal can't print Chinese characters, you'll see some type of code similar to what you saw for each of the bytes. But if it can, it will display it in a well-defined manner.
Is there a way to print these characters on the stdout buffer in C /
C++?
Not as characters. You can however print out their numerical representations. For example:
void f4()
{
char c;
printf("c=%02hhX\n", (unsigned char)c);
}
The contents of c are undefined, but the above will print whatever value happens to be there in hexadecimal format.
If you see carefully in the character (image),
there are some characters and numbers in it. Do they represent
something?
Some terminals will display unprintable characters by printing a box containing the Unicode codepoint of the character so the reader can know what it is.
Unicode is a standard for text where each character is assigned a numerical code point. Besides the typical set of characters in the ASCII range, Unicode also defines other characters, such as accented letters, other alphabets like Greek, Hebrew, Cyrillic, Chinese, and Japanese, as well as various symbols. Because there are thousands of characters defined by Unicode, multiple bytes are needed to represent them. The most common encoding for Unicode is UTF-8, which allows regular ASCII characters to be encoded with one byte, and other characters to be encoded with two or more bytes as needed.
In this case, the codepoint in question is 007F. This is the DELETE control character, which is typically generated when the Delete key is pressed. Since this is a control character, your terminal is displaying it as a box with the Unicode point for the character instead of attempting to "print" it.
Is there a list of garbage characters which can be printed in C /
C++?
Again, assuming you really mean unprintable characters here, that has more to do with the terminal that's displaying the characters that with the language. Generally, control characters are unprintable, while certain multibyte characters may or may not display properly depending on the font / character set of the terminal.
For starters the function returns incorrect value of len. Let's assume that lim is equal to 2.
In this case in the loop there will be written nothing in the array due to the condition
if (len < (lim -2))
However after the first iteration of the loop len will be increased.
for (len = 0; (c = getchar ()) != EOF && c != '\n'; ++len)
^^^^^
In the second iteration again there will be written nothing in the array diue to the same condition
if (len < (lim -2))
but len will be increased.
for (len = 0; (c = getchar ()) != EOF && c != '\n'; ++len)
^^^^^
Thus nothing will be written in the array but len will be increased until for example the new line character will be encountered.
So the function is invalid. Moreover it is supposed that the function will append the read string with the terminating zero. But this is not done in the function. So you may not output the character array as a string.
The function can be written the following way
int scanline( char str [], int lim )
{
int len = 0;
int c;
while ( len < lim - 1 && ( c = getchar () ) != EOF && c != '\n' )
{
str[len++] = c;
}
if ( len < lim - 1 && c == '\n' ) str[len++] = c;
if ( len < lim ) str[len++] = '\0';
return len;
}
Here is a code snippet. I'm confused as to how the buffering internally works.
while(true)
{
cout << "Enter a character: ";
cin.ignore(3, '\n');
ch = cin.get(); // ch is char type
cout << "char: ch: " << ch << endl;
}
Actually cin.ignore(3, '\n') ignores the first three characters and then gets the next immediate character. Till that point its fine. Since, I kept this in a while loop, I was trying to check the behavior of ignore() and get(). For instance, the output for which I checked was
Enter a character: abcd
char: ch: d
Enter a character: efgh
char: ch: e
Enter a character: ijkl
char: ch: i
Enter a character: mnopq
char: ch: m
Enter a character: char: ch: q
Enter a character:
Just to check the buffering, intentionally I was give 4 characters instead of 1. In the first case, its fine and got it. From second, the ignore doesn't seem to work. When I entered 5 characters, I din't get the behavior.
Need explanation on this. :)
According to documentation of std::cin.ignore(streamsize n = 1, int delim = EOF):
Extracts characters from the input sequence and discards them, until either n characters have been extracted, or one compares equal to delim.
http://www.cplusplus.com/reference/istream/istream/ignore/
You are putting abcd\n onto stdin. Your first ignore(3,'\n') removes abc and your get() fetches d. \n remains in the buffer.
Then you add efgh\n to the buffer which now contains \nefgh\n. Your next ignore() reads either 3 characters or a newline, whatever comes first. Since your newline is first in the buffer, only the newline is ignored.
You probably want to empty the stdin buffer before asking for more input. You can achieve this either by modifying your get() call, or by adding a second ignore() call before asking for more input.
cin.ignore(3, '\n') ignores up to three characters, stopping after it finds the end of a line (i.e. a \n character).
After the first line of input, the buffer will contain 5 characters, abcd\n. So ignore ignores abc, and get gets d, leaving \n.
After the second line, it contains \nefgh\n. So ignore just ignores the end-of-line character, and get returns e.
If you want to discard the rest of line after extracting the character, then use ignore again:
cin.ignore(numeric_limits<streamsize>::max(), '\n');
I have a question about the difference between these two pieces of code:
char buffer5[5];
cin.get(buffer5, 5);
cout << buffer5;
cin.get(buffer5, 5);
cout << buffer5;
and
char buffer4;
while (cin.get(buffer4))
{
cout << buffer4;
}
In the first piece of code, the code gets 5 characters and puts it in buffer5. However, because you press enter, a newline character isn't put into the stream when calling get(), so the program will terminate and will not ask you for another round of 5 characters.
In the second piece of code, cin.get() waits for input to the input stream, so the loop doesn't just terminate (I think). Lets say I input "Apple" into the input stream. This will put 5 characters into the input stream, and the loop will print all characters to the output. However, unlike the first piece of code, it does not stop, even after two inputs as I can continuously keep inputting.
Why is it that I can continuously input character sequences into the terminal in the second piece of code and not the first?
First off, "pressing enter" has no special meaning to the IOStreams beyond entering a newline character (\n) into the input sequence (note, when using text streams the platform specific end of line sequences are transformed into a single newline character). When entering data on a console, the data is normally line buffered by the console and only forwarded to the program when pressing enter (typically this can be turned off but the details of this are platform specific and irrelevant to this question anyway).
With this out of the way lets turn our attention to the behavior of s.get(buffer, n) for an std::istream s and a pointer to an array of at least n characters buffer. The description of what this does is quite trivial: it calls s.get(buffer, n, s.widen('\n')). Since we are talking about std::istream and you probably haven't changed the std::locale we can assume that s.widen('\n') just returns '\n', i.e., the call is equivalent to s.get(buffer, n, '\n') where '\n' is called a delimiter and the question becomes what this function does.
Well, this function extracts up to m = 0 < n? n - 1: 0 characters, stopping when either m is reached or when the next character is identical to the delimiter which is left in the stream (you'd used std::istream::getline() if you'd wanted the delimiter to be extracted). Any extracted character is stored in the corresponding location of buffer and if 0 < n a null character is stored into location buffer[n - 1]. In case, if no character is extracted std::ios_base::failbit is set.
OK, with this we should have all ingredients to the riddle in place: When you entered at least one character but less than 5 characters the first call to get() succeeded and left the newline character as next character in the buffer. The next attempt to get() more characters immediately found the delimiter, stored no character, and indicated failure by setting std::ios_base::failbit. It is easy to verify this theory:
#include <iostream>
int main()
{
char buffer[5];
for (int count(0); std::cin; ++count) {
if (std::cin.get(buffer, 5)) {
std::cout << "get[" << count << "]='" << buffer << "'\n";
}
else {
std::cout << "get[" << count << "] failed\n";
}
}
}
If you enter no character, the first call to std::cin.get() fails. If you enter 1 to 4 characters, the first call succeeds but the second one fails. If you enter more than 4 characters, the second call also succeeds, etc. There are several ways to deal with the potentially stuck newline character:
Just use std::istream::getline() which behaves the same as std::istream::get() but also extracts the delimiter if this is why it stopped reading. This may chop one line into multiple reads, however, which may or may not be desired.
To avoid the limitation of a fixed line length, you could use std::getline() together with an std::string (i.e., std::getline(std::cin, string)).
After a successful get() you could check if the next character is a newline using std::istream::peek() and std::istream::ignore() it when necessary.
Which of these approaches meets your needs depends on what you are trying to achieve.
String manipulation problem
http://www.ideone.com/qyTkL
In the above program (given in the book C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie Exercise 3.14) the length of the Character pointer taken is len+1
char *pc2 = new char[ len + 1];
http://www.ideone.com/pGa6c
However, in this program the length of the Character pointer i have taken is len
char *pc2 = new char[ len ];
Why is there the need to take the length of new string as 1 greater when we get the same result. Please Explain.
Mind it the Programs i have shown here are altered slightly. Not exactly the same one as in the book.
To store a string of length n in C, you need n+1 chars. This is because a string in C is simply an array of chars terminated by the null character \0. Thus, the memory that stores the string "hello" looks like
'h' 'e' 'l' 'l' 'o' '\0'
and consists of 6 chars even though the word hello is only 5 letters long.
The inconsistency you're seeing could be a semantic one; some would say that length of the word hello is len = 5, so we need to allocate len+1 chars, while some would say that since hello requires 6 chars we should say its length (as a C string) is len=6.
Note, by the way, that the C way of storing strings is not the only possible one. For example, one could store a string as an integer (giving the string's length) followed by characters. (I believe this is what Pascal does?). If one doesn't use a length field such as this, one needs another way to know when the string stops. The C way is that the string stops whenever a null character is reached.
To get a feel for how this works, you might want to try the following:
char* string = "hello, world!";
printf("%s\n", string);
char* string2 = "hello\0, world!";
printf("%s\n", string2);
(The assignment char* string = "foo"; is just a shorthand way of creating an array with 4 elements, and giving the first the value 'f', the second 'o', the third 'o', and the fourth '\0').
It's a convention that the string is terminated by an extra null character so whoever allocates storage has to allocate len + 1 characters.
It causes problem. But, sometimes, when len isn't aligned, the OS adds some bytes after it, so the problem is hidden.