Why use "\t" instead of "<press TAB>" for strings? [duplicate] - c++

This question already has answers here:
Is it "bad practice" to use tab characters in string literals?
(2 answers)
Closed 7 years ago.
I just tried compiling some C++ code that used the literal Tab character (as in I pressed the TAB key on my keyboard) inside a string. And, to my surprise, it compiled fine. In retrospect I guess that makes sense, since it's a character just like any other.
cout << "Tab: [TAB]";
Before now I've always used \t to define tabs in strings.
cout << "Tab: [\t]";
Obviously code using literal TABs in strings would suffer greatly in readability, but is there a technical reason to use \t other than convention?

but is there a technical reason to use \t other than convention?
Sure there are technical reasonings. Using
cout << "Tab: [\t]";
would work independently of the target systems actual character encoding.
The source file could use a different encoding as the target system does, e.g. UTF8 for source, but EBCDIC at the target system.
'\t' will always be translated to the correct character code though, thus the code is portable.
Also as mentioned in merlin2011's comment many IDE's will just replace TAB with a specific number of spaces.

Related

In C++ language identifiers can contain unicode characters? [duplicate]

This question already has answers here:
Unicode Identifiers and Source Code in C++11?
(5 answers)
Closed 2 years ago.
So far I knew that in C++ identifiers can only contain letters of English ABC, numbers and underscores, but I am reading the standard and cppreference.com, and they write that unicode characters and special characters can be used as well. Is it true? Or is it true for some compilers only?
https://timsong-cpp.github.io/cppwp/lex.name
https://en.cppreference.com/w/cpp/language/identifiers
According to translation phase 1, the compiler can map each "character" in a source file to either a character in the basic source character set, or to an escaped Unicode character.
How this mapping is performed, or how the escaped Unicode character is handled, is implementation defined. I.e. it's up to the compiler implementation to handle Unicode characters or not. Some do, some don't. You must read the documentation for your compiler to learn what it does. And if you use Unicode characters in your source, you need to be aware of that they might not work reliably (or at all) if the source is used by other compilers.
In summary. If you want to write portable code, then only use the basic source character set.

"warning: multi-character character constant [-Wmultichar]" and program does not work

So, I am writing program to translate runes to english alphabet
it gives me warning "warning: multi-character character constant [-Wmultichar]"
here is the code (not from the program but it has same problem)
code is in c++
string s = "ᛡᚣ"; (string of two utf-8 letters, runes)
if(s.at(0) == 'ᛡ')
cout<<"YES";
but warning is not the main problem, problem is that when i run it it does not output "YES", in case of a program when i try to translate Runes to alphabet it just starts working and makes bunch endl functions rather than translating runes (basically it does nothing)
P.S I tried using different compailers, in visual studio error poped up "Debug Assertion Failed!" "Expression: string subscript out of range"
other compailers just do nothing, i even tried to build program by using unicode instead of characters like "\u16B3".... but it's same, so what should i do? Do i need specific library for utf-8? pls help
If you look at the representation of the characters in your std::string you'll see that each of the characters uses multiple bytes - hence the warning. When dealing with Unicode you'll either need to use something with 32 bits to represent individual code points or you need to use multiple bytes for each code point. The use of code points is probably sufficient but does rely on the characters not using combining characters.
Comparing Unicode strings isn't entirely trivial (and I don't know all the rules). When representing the data using UTF-8 you'll need to compare byte sequences. In addition, you need to make sure that your Unicode string is normalized: some string have different valid representations. For example the u-umlaut in my name can be represented with a code point for u-umlaut or with a code point for u and combining character for the dieresis. In your your code I'd guess you could use
std::string expect("ᛡ");
if (expect.size() <= s.size() && s.substr(0, expect.size()) == expect)
std::cout << "YES\n";

Literal "or" in c++ program? [duplicate]

This question already has answers here:
When were the 'and' and 'or' alternative tokens introduced in C++?
(8 answers)
Closed 8 years ago.
I'm translating a C++ function I wrote some time ago into python when I noticed that my C++ code contains the following lines:
if(MIsScaledOut()) {
if(DataType()==UnknownDataType or DataType()==h)
Descriptor = Descriptor + DataTypeString() + "OverM";
There's an or in there! This was probably because I previously translated from python, and forgot to switch to ||.
This code compiles in various OSes, with various compilers, and I've never seen a problem with it. Is this standard, or have I just gotten lucky so far, and this is something I should worry about?
After remembering the right word to google, I now see that it is listed as a C++ keyword, along with various similar keywords like and that I'd never seen (noticed?) before in C++. The reason these exist is because there are encodings that don't have some of the required punctuation characters used by the traditional operator spellings: {, }, [, ], #, \, ^, |, ~.
As #mafso points out, the alternative "spelled out" versions can be used in C by including the <iso646.h> header, which defines them as macros.
The question of which this has been marked duplicate also points out the existence of digraphs and trigraphs, which can be used to substitute for the missing characters. (That question also says "everybody knows about" them. Obviously, I did not...)

Why C++ standard reserves the formfeed character?

from the following question I begin to know the meaning of \f in c++:
Escape sequence \f - form feed - what exactly is it?
but why c++ have this formfeed character? is there any c++ program that runs on a typewriter?
The formfeed character is not only for typewriters, it's universally recognized by all printers (or printer drivers) to stop the current page and advance to the next.
There is a number of characters that can be expressed with notation like \t, \b, etc. There is nothing magic in this list. This is just a simple way to insert characters with values less than space (0x20) into the string. The list of these characters has primarily historical base. These chars are used in old programs. Why should they stop compiling? The negative impact from this feature is minimal to none.
The language itself does not put any meaning into these chars. They are simply placed into a string or char constant. The program deals with them.

Why must C/C++ string literal declarations be single-line?

Is there any particular reason that multi-line string literals such as the following are not permitted in C++?
string script =
"
Some
Formatted
String Literal
";
I know that multi-line string literals may be created by putting a backslash before each newline.
I am writing a programming language (similar to C) and would like to allow the easy creation of multi-line strings (as in the above example).
Is there any technical reason for avoiding this kind of string literal? Otherwise I would have to use a python-like string literal with a triple quote (which I don't want to do):
string script =
"""
Some
Formatted
String Literal
""";
Why must C/C++ string literal declarations be single-line?
The terse answer is "because the grammar prohibits multiline string literals." I don't know whether there is a good reason for this other than historical reasons.
There are, of course, ways around this. You can use line splicing:
const char* script = "\
Some\n\
Formatted\n\
String Literal\n\
";
If the \ appears as the last character on the line, the newline will be removed during preprocessing.
Or, you can use string literal concatenation:
const char* script =
" Some\n"
" Formatted\n"
" String Literal\n";
Adjacent string literals are concatenated during preprocessing, so these will end up as a single string literal at compile-time.
Using either technique, the string literal ends up as if it were written:
const char* script = " Some\n Formatted\n String Literal\n";
One has to consider that C was not written to be an "Applications" programming language but a systems programming language. It would not be inaccurate to say it was designed expressly to rewrite Unix. With that in mind, there was no EMACS or VIM and your user interfaces were serial terminals. Multiline string declarations would seem a bit pointless on a system that did not have a multiline text editor. Furthermore, string manipulation would not be a primary concern for someone looking to write an OS at that particular point in time. The traditional set of UNIX scripting tools such as AWK and SED (amongst MANY others) are a testament to the fact they weren't using C to do significant string manipulation.
Additional considerations: it was not uncommon in the early 70s (when C was written) to submit your programs on PUNCH CARDS and come back the next day to get them. Would it have eaten up extra processing time to compile a program with multiline strings literals? Not really. It can actually be less work for the compiler. But you were going to come back for it the next day anyhow in most cases. But nobody who was filling out a punch card was going to put large amounts of text that wasn't needed in their programs.
In a modern environment, there is probably no reason not to include multiline string literals other than designer's preference. Grammatically speaking, it's probably simpler because you don't have to take linefeeds into consideration when parsing the string literal.
In addition to the existing answers, you can work around this using C++11's raw string literals, e.g.:
#include <iostream>
#include <string>
int main() {
std::string str = R"(a
b)";
std::cout << str;
}
/* Output:
a
b
*/
Live demo.
[n3290: 2.14.5/4]: [ Note: A source-file new-line in a raw string
literal results in a new-line in the resulting execution
string-literal. Assuming no whitespace at the beginning of lines in
the following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—end note ]
Though non-normative, this note and the example that follows it in [n3290: 2.14.5/5] serve to complement the indication in the grammar that the production r-char-sequence may contain newlines (whereas the production s-char-sequence, used for normal string literals, may not).
Others have mentioned some excellent workarounds, I just wanted to address the reason.
The reason is simply that C was created at a time when processing was at a premium and compilers had to be simple and as fast as possible. These days, if C were to be updated (I'm looking at you, C1X), it's quite possible to do exactly what you want. It's unlikely, however. Mostly for historical reasons; such a change could require extensive rewrites of compilers, and so will likely be rejected.
The C preprocessor works on a line-by-line basis, but with lexical tokens. That means that the preprocessor understands that "foo" is a token. If C were to allow multi-line literals, however, the preprocessor would be in trouble. Consider:
"foo
#ifdef BAR
bar
#endif
baz"
The preprocessor isn't able to mess with the inside of a token - but it's operating line-by-line. So how is it supposed to handle this case? The easy solution is to simply forbid multiline strings entirely.
Actually, you can break it up thus:
string script =
"\n"
" Some\n"
" Formatted\n"
" String Literal\n";
Adjacent string literals are concatenated by the compiler.
Strings can lay on multiple lines, but each line has to be quoted individually :
string script =
" \n"
" Some \n"
" Formatted \n"
" String Literal ";
I am writing a programming language
(similar to C) and would like to let
write multi-line strings easily (like
in above example).
There is no reason why you couldn't create a programming language that allows multi-line strings.
For example, Vedit Macro Language (which is C-like scripting language for VEDIT text editor) allows multi-line strings, for example:
Reg_Set(1,"
Some
Formatted
String Literal
")
It is up to you how you define your language syntax.
You can also do:
string useMultiple = "this"
"is "
"a string in C.";
Place one literal after another without any special chars.
Literal declarations doesn't have to be single-line.
GPUImage inlines multiline shader code. Checkout its SHADER_STRING macro.