Why does ^z have two ASCII codes? - c++

When I put the control key Ctrl + Z at the beginning of the string, its ASCII code is zero, but when I put it at the end of a string, it has an ASCII code of 26.
Ex:
^zhi --> ASCII ^z=0
But
Hi^z --> ASCII ^z=26
Why is this?

Ctrl-Z is a "Substitute character":
https://en.wikipedia.org/wiki/Substitute_character.
A substitute character (␚) is a control character that is used in the
place of a character that is recognized to be invalid or erroneous, or
that cannot be represented on a given device. It is also used as an
escape sequence in some programming languages.
As such, it can translate to different outputs in different contexts.

Related

Fortran formatted IO and the Null character

I wonder how Fortran's I/O is expected to behave in case of a NULL character ACHAR(0).
The actual task is to fill an ASCII file by blocks of precisely eight characters. The strings are read from a binary and may contain non-printing characters.
I tried with gfortran 4.8, 8.1 and f2c. If there is a NULL character in the string the format specifier FORMAT(A8) does not write eight characters.
Give the following F77 code a try:
c Print a string of eight character surrounded by dashes
100 FORMAT('-',A8,'-')
c Works fine if empty or any other combination of printing chars
write(*,100) ''
c In case of a short sting blanks are padded
write(*,100) '345678'
c A NULL character does something I did not expect
write(*,100) '123'//ACHAR(0)//'4567'
c Not even position editing helps
101 FORMAT('-',A8,T10,'x')
write(*,101) '123'//ACHAR(0)//'4567'
end
My output is:
- -
- 345678-
-1234567-
-1234567x
Is this expected behavior? Any idea how to get the output eight characters wide in any case?
When using an edit descriptor A8 the field width is eight. For output, eight characters will be written.
In the case of the example, it isn't the writing of the characters that is contrary to your expectations, but how they are displayed by your terminal.
You can examine the output further with tools like hexdump or you can write to an internal file and look at arbitrary substrings.
Yes, that is expected, if there is a null character, the printing of the string on the screen can stop there. The characters will still be sent, but the string does not have to be printed on the screen.
Note that C uses NULL to delimit strings and the OS may interpret the strings it receives with the same conventions. The allows the non-printable characters to be interpreted in processor specific ways by the processor and the processor includes the whole complex of the compiler, the executing environment (OS and programs in the OS) and the hardware.

Regex - Unicode combining character sequence \x - text terminal

In this pdf document in VI. Other Special Characters says
e. ANSCII or ANSI codes
1. Codes that control appearance of a text terminal
2. 0xA9 = \xA9
I can't understand "appearance of a text terminal".
What does it mean?
Presumably the author meant terminal attributes like text and background color, character set, character attributes (bold, underscored, blinking, inverse) etc.

Weird ASCII/Unicode Character

Peter Thiel's CS183 Notes has a filename with the ASCII string: "Peter Thiel's CS183.pdf" or at least that is how it prints out in Windows Explorer. However, while debugging my program, I noticed that the ' character isn't the plain apostrophe, it has a (unsigned char) value of 146, not the expected 39.
To test to see if it was a bug in my program, I renamed the file and erased the character and retyped apostrophe. Sure enough, this time my program displayed the correct value. I reasoned therefore that it must be a Unicode character (since I don't see it in the ASCII table). However, it isn't a multibyte character because the next byte in the string is an 's'.
Can someone help explain whats going on here?
Your mistake is believing this string is ASCII.
If you are using a Windows machine with character encoding CP-1252 (see http://en.wikipedia.org/wiki/Windows-1252), then your "code" 146 is a
kind of quote (see the table at the wikipedia page).
It is the right single quote mark in the Windows codepage CP1252, neither in ASCII (or ISO-8859-1) or any form of Unicode.
It's a Right single quotation mark instead of a Single quote:
http://www.ascii-code.com/
Like you said, 39 is a Single quote, but the file must have been named using a Right single quotation mark, decimal value 146 in the Windows Latin-1 extended characters, CP-1252.

Will cin recognize \n typed in from keyboard as a newline character?

I am a beginner for C++ so I'm sorry if this question sounds stupid..
I made this little program to help me get familiar with the properties of cin:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string next;
cout<<"Enter your input.\n";
cin>>next;
cout<<next;
return 0;
}
When I typed in \n from the keyboard input, I was returned \n.
Also, when I changed the variable next from a string to a character and gave it the same input as above, I was returned only a \.
My question is: Why am I not returned with a new line instead? Doesn't cin recognize \n type in from keyboard as a newline character? Or is it just applicable to cout?
\n is an escape sequence in C++; when it appears in a character constant or a string literal, the two character sequence is replaced by the single character representing a new line in the default basic encoding (almost always 0x0A in modern systems). C++ defines a number of such escape sequences, all starting with a \.
Input is mapped differently, and in many cases, depending on the device. When reading from the keyboard, most systems will buffer a full line, and only return characters from it when the Enter key has been pressed; what he Enter key sends to a C++ program may vary, and whether the file has been opened in text mode or binary mode can make a difference as well—in text mode, the C++ library should negotiate with the OS to ensure that the enter key always results in the single character represented by \n. (std::cin is always opened in text mode.) Whether the keyboard driver does something special with \ or not depends on the driver, but most don't. C++ never does anything special with \ when inputting from a keyboard (and \n has no special meaning in C++ source code outside of string literals and character constants).
If you need your program to recognize \n as a new line character at input you can check this out:
https://stackoverflow.com/a/2508814/815812
What Michael say is perfectly correct.
You can try out in similar way.
Technically speaking, this depends on things outside your program, but assuming your terminal simply passes the individual bytes corresponding to the '\' and 'n' characters (which I think any sane one will), then the behavior you're seeing is expected.
"\n" is nothing more than a shortcut added to the programming language and environment to let you more easily represent the notion of the ASCII return key. It's not a character itself, just a command to tell the program to generate a non-printable character that corresponds to pressing the Enter key.
Let's say you're in Notepad or whatever and you press the Tab key. It tabs over a spot. Typing "\t" just enters the literal characters "\" and "t". Internally, whoever wrote Notepad had to say what it should do when the user pressed Tab, and he did so by using the mnemonic like
if(key == '\t') {
// tab over
}

How to detect a tab in a text file?

Is detecting tabs same as detecting the spaces? i.e. for detecting a space, I would just compare the space character with its ascii number.
For a tab do I have to search for '\t' character in the file or there is some other way?
if('\t' == myChar)
This would work, and would be better than checking against 9 since 9 may not be a guaranteed value across all architectures.
Assuming you are working with ASCII data, you can just search for a byte with value '\t' (9) in the text file. Tabs are represented as a single byte in text files and most libraries for reading files don't do anything special with those bytes.
A tab is just another character so you can check for the ASCII value if you want.
Although a tab appears as 4 or 8 spaces in an editor, it is actually represented as a single character ('\t', like you mentioned) inside a file. Both the space character and the tab character take up one byte. So basically, u are correct in your assumption.