printf data type specifier complex question - c++

printf("\e[2J\e[0;0H");
What does this line mean?
Can I know what to learn and from where to understand this statement?

"\e" as an escape sequence is not part of the C standard.
A number of compilers treat the otherwise undefined behavior as a character with the value of 27 - the ASCII escape character.
Alternative well defined code:
//printf("\e[2J\e[0;0H");
printf("\x1B[2J\x1b[0;0H");
printf("\033[2J\033[0;0H");
#define ESC "\033"
printf(ESC "[2J" ESC "[0;0H");
The escape character introduces ANSI escape sequences as well answered in #Mickael B.. Select terminals implement some of these sequences.

They are ANSI escape sequences
These sequences define functions that change display graphics, control cursor movement, and reassign keys.
It starts with \e[ and the following characters define what should happen.
2J: clears the terminal
Esc[2J Erase Display:
Clears the screen and moves the cursor to the home position (line 0, column 0).
0;0H moves the cursor to the position (0, 0)
Esc[Line;ColumnH Cursor Position:
Moves the cursor to the specified position (coordinates).
See also:
console_codes - Linux console escape and control sequences
List of ANSI color escape sequences

Related

Why does ^z have two ASCII codes?

When I put the control key Ctrl + Z at the beginning of the string, its ASCII code is zero, but when I put it at the end of a string, it has an ASCII code of 26.
Ex:
^zhi --> ASCII ^z=0
But
Hi^z --> ASCII ^z=26
Why is this?
Ctrl-Z is a "Substitute character":
https://en.wikipedia.org/wiki/Substitute_character.
A substitute character (␚) is a control character that is used in the
place of a character that is recognized to be invalid or erroneous, or
that cannot be represented on a given device. It is also used as an
escape sequence in some programming languages.
As such, it can translate to different outputs in different contexts.

Regex - Unicode combining character sequence \x - text terminal

In this pdf document in VI. Other Special Characters says
e. ANSCII or ANSI codes
1. Codes that control appearance of a text terminal
2. 0xA9 = \xA9
I can't understand "appearance of a text terminal".
What does it mean?
Presumably the author meant terminal attributes like text and background color, character set, character attributes (bold, underscored, blinking, inverse) etc.

determine whether a unicode character is fullwidth or halfwidth in C++

I'm writing a terminal (console) application that is supposed to wrap arbitrary unicode text.
Terminals are usually using a monospaced (fixed width) font, so to wrap a text, it's barely more than counting characters and watching whether a word fits into a line or not and act accordingly.
Problem is that there are fullwidth characters in the Unicode table that take up the width of 2 characters in a terminal.
Counting these would see 1 unicode character, but the printed character is 2 "normal" (halfwidth) characters wide, breaking the wrapping routine as it is not aware of chars that take up twice the width.
As an example, this is a fullwidth character (U+3004, the JIS symbol)
〄
12
It does not take up the full width of 2 characters here although it's preformatted, but it does use twice the width of a western character in a terminal.
To deal with this, I have to distinguish between fullwidth or halfwidth characters, but I cannot find a way to do so in C++. Is it really necessary to know all fullwidth characters in the unicode table to get around the problem?
You should use ICU u_getIntPropertyValue with the UCHAR_EAST_ASIAN_WIDTH property.
For example:
bool is_fullwidth(UChar32 c) {
int width = u_getIntPropertyValue(c, UCHAR_EAST_ASIAN_WIDTH);
return width == U_EA_FULLWIDTH || width == U_EA_WIDE;
}
Note that if your graphics library supports combining characters then you'll have to consider those as well when determining how many cells a sequence uses; for example e followed by U+0301 COMBINING ACUTE ACCENT will only take up 1 cell.
There's no need to build tables, people from Unicode have already done that:
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
The same code is used in terminal emulating software such as xterm[1], konsole[2] and quite likely others...

Meaning of \r on linux systems

I'm looking at some linux specific code which is outputting the likes of:
\r\x1b[J>
to the std io.
I understand that <ESC>[J represents deleting the contents of the screen from the current line down, but what does \r do here?
I'm also seeing the following:
>user_input\n\r>
where user_input is the text entered by the user. But what is the purpose of the \r here?
The character '\r' is carriage return. It returns the cursor to the start of the line.
It is often used in Internet protocols conjunction with newline ('\n') to mark the end of a line (most standards specifies it as "\r\n", but some allows the wrong way around). On Windows the carriage-return newline pair is also used as end-of-line. On the old Macintosh operating system (before OSX) a single carriage-return was used instead of newline as end-of-line, while UNIX and UNIX-like systems (like Linux and OSX) uses a single newline.
Control character \r moves caret (a.k.a text cursor) to the leftmost position within current line.
From Wikipedia
Systems based on ASCII or a compatible character set use either LF
(Line feed, '\n', 0x0A, 10 in decimal) or CR (Carriage return, '\r',
0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF,
'\r\n', 0x0D0A). These characters are based on printer commands: The
line feed indicated that one line of paper should feed out of the
printer thus instructed the printer to advance the paper one line, and
a carriage return indicated that the printer carriage should return to
the beginning of the current line. Some rare systems, such as QNX
before version 4, used the ASCII RS (record separator, 0x1E, 30 in
decimal) character as the newline character.
FWIW - this is a part of carriage control - from mainframe control words to Windows/UNIX/FORTRAN carriage control. Carriage control can be implemented at a language level like FORTRAN does, or system-wide like UNIX and Windows do.
\n arose from limitations of early PDP user "interfaces" - the tty terminal. Go to a museum if you want see one.
A very simple point: The difference between \n \r is explained above. But all of these explanations are really saying that carriage control is implementation dependent.
The [J is part of ANSI escape sequences and what they do on a "standards conforming tty terminal".
DOS used to have ANSI.SYS to provide: colors, underline, bold using those sequences.
http://ascii-table.com/ansi-escape-sequences.php
Is a good reference for the question: what does some odd looking string in the output do?
\r is carriage return. Similarly \n is linefeed.

Will cin recognize \n typed in from keyboard as a newline character?

I am a beginner for C++ so I'm sorry if this question sounds stupid..
I made this little program to help me get familiar with the properties of cin:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string next;
cout<<"Enter your input.\n";
cin>>next;
cout<<next;
return 0;
}
When I typed in \n from the keyboard input, I was returned \n.
Also, when I changed the variable next from a string to a character and gave it the same input as above, I was returned only a \.
My question is: Why am I not returned with a new line instead? Doesn't cin recognize \n type in from keyboard as a newline character? Or is it just applicable to cout?
\n is an escape sequence in C++; when it appears in a character constant or a string literal, the two character sequence is replaced by the single character representing a new line in the default basic encoding (almost always 0x0A in modern systems). C++ defines a number of such escape sequences, all starting with a \.
Input is mapped differently, and in many cases, depending on the device. When reading from the keyboard, most systems will buffer a full line, and only return characters from it when the Enter key has been pressed; what he Enter key sends to a C++ program may vary, and whether the file has been opened in text mode or binary mode can make a difference as well—in text mode, the C++ library should negotiate with the OS to ensure that the enter key always results in the single character represented by \n. (std::cin is always opened in text mode.) Whether the keyboard driver does something special with \ or not depends on the driver, but most don't. C++ never does anything special with \ when inputting from a keyboard (and \n has no special meaning in C++ source code outside of string literals and character constants).
If you need your program to recognize \n as a new line character at input you can check this out:
https://stackoverflow.com/a/2508814/815812
What Michael say is perfectly correct.
You can try out in similar way.
Technically speaking, this depends on things outside your program, but assuming your terminal simply passes the individual bytes corresponding to the '\' and 'n' characters (which I think any sane one will), then the behavior you're seeing is expected.
"\n" is nothing more than a shortcut added to the programming language and environment to let you more easily represent the notion of the ASCII return key. It's not a character itself, just a command to tell the program to generate a non-printable character that corresponds to pressing the Enter key.
Let's say you're in Notepad or whatever and you press the Tab key. It tabs over a spot. Typing "\t" just enters the literal characters "\" and "t". Internally, whoever wrote Notepad had to say what it should do when the user pressed Tab, and he did so by using the mnemonic like
if(key == '\t') {
// tab over
}