I'm trying to learn about string literals and the likes and I've been playing around with it. Currently facing the problem of being unable to wcout a string that was the concatenation of two string literals appended with the "string"s method.
std::string concat = "Hello, "s + "World!";
It doesn't have any compiler errors if I cast to a string or make a call to a string constructor to concatenate them.
I'm also having trouble getting wcout to actually output unicode characters. I use cout elsewhere in the code.
constexpr wchar_t* surname = L"shirts \u0444 \u1300";
outputs shirts but no unicode characters when I wcout << surname; If I just cout surname I get hex.
Edit: thanks to comments I have understood the problem of wcout. I didn't realize it would only work with wstring and I was avoiding ordinary cout due to having read something about not mixing the two that I have yet to fully understand.
I still can't get the symbols to print out in wchar_t* which just outputs ordinary ascii characters.
Thanks for the swift replies thus far!
wcout works for normal chars marked with u8 but nothing else it seems. Several wcout statements just aren't outputting anything after the shirt fail, I moved them before it and they were printed out but they were hex rather than characters as expected. So far only normal char* have worked. This is such a headache...
As for no Unicode console output, you may have to set the locale, that is:
std::setlocale(LC_ALL, "");
constexpr wchar_t* surname = L"shirts \u0444 \u1300";
wcout << surname;
Related
Why does the following work:
string input = "a long string of text pasted from a .txt file";
But this version does not?
string input =
"
some
large
string ";
I thought C++ doesn't care about whitespace.
You can do something like this. It's called a raw string literal:
string input =
R"(
some
large
string )";
This will include the endline characters as well. The format is R"(string-literal)"
For the most parts no, it does not care about whitespace. But there are exceptions and string literals are one of them.
The rule is string literals cannot span multiple lines. But adjacent literals are automatically concatenated so you can just do
const char string[] = "very "
"long "
"string";
and it will be equivalent to
const char string[] = "very long string";
I am not sure about the origin of the rule, I suspect it might have been done to prevent confusion whether the newline should be part of the string or not (it's not unless explicitly escaped). Or maybe just some grammar/parser thing. Compiling C/C++ is kind of complicated and happens in multiple phases, see cppreference - string literals already have plenty of special treatment.
Why does the C++ standard allow the following?
#include <iostream>
#include <string>
int main()
{
std::string s(10, '\0'); // s.length() now is 10
std::cout << "string is " << s << ", length is " << s.length() << std::endl;
s.append(5, '\0'); // s.length() now is 15
std::cout << "string is " << s << ", length is " << s.length() << std::endl;
// the same with += char and push_back
// but:
s += "hello"; // s.length() returns 20 string is "hello"
std::cout << "string is " << s << ", length is " << s.length() << std::endl;
return 0;
}
Why does it add 0 and count it?
It looks like broken integrity of string, doesn't it? But I checked standard and it is correct behavior.
Why does standard allows following?
Because the people designing C++ strings decided that such things should be allowed. I'm not sure if anyone that was part of the team that designed C++ strings are on SO... But since you yourself say that the standard allows it, that's the way it is, and I doubt it's about to change.
It's sometimes quite practical to have a string that can contain "anything". I can think of a few instances when I've had to work around the fact that C style strings can't contain zero-bytes. Along with the fact that long C style strings take a long time to find the length of, the main benefit of C++ strings is that they are not restricted to "what you can put in them" - that's a good thing in my book.
Not sure what is problem here.
Adding '\0' in the middle of the std::string changes nothing - null character is treated like any other. The only thing that can change is if you use .c_str() with function that accepts null-terminated strings. But then it's not problem of .c_str(), only with the function that treats '\0' specially.
If you want to know how many characters has this string as if treated like null-terminated string, use
size_t len = strlen(s.c_str());
Note that it's O(n) operation, because that's how strlen works.
If you ask why += operator doesn't add the implicit null character of string literal "hello" to the string, I say the reverse (adding it) is unclear and definitely not what you want 99% of the time. On the other hand, if you want to add '\0' to your string, just append it like a buffer:
char buffer[] = "Hello";
s.append(buffer, sizeof(buffer));
or (even better) drop the char arrays and null-terminated strings altogether and use C++-style replacements like std::string as NTS-replacement, std::vector<char> as contiguous buffer, std::vector as dynamic array with pointers replacement, and std::array (C++11) as standard C array replacement.
Also, (as mentioned by #AdamRosenfield in comments), your string after adding "hello" does have in fact 20 characters, it's probably only that your terminal doesn't print nulls.
NUL char '\0' is the ending character for c style string, not std::strings. However, it supports this character to get values from a const char pointer so that it can find the end of a c-style string. Otherwise, it is treated just like other characters
std::string is more of a container for characters than anything else and \0 is a character. As a real world example, take a look at the CreateProcess function in Windows. The lpEnvironment parameter takes a null-terminated block of null-terminated strings (i.e. A=1\0B=2\0C=3\0\0). If you're building a block it's convenient to use an std::string.
I was coding something and my code didn't work correctly in some situations, so I decided to write some output to file for debugging. The program just concatenates some characters from a string (and it didn't get out of bounds) and printed them to the file. It has no thing as error reporting or something, and the input string is just a bunch of random characters. But, i get some junk in the output, such as:
f::xsgetn error reading the file
sgetn error reading the file
ilebuf::xsgetn error reading the file
(I removed program's output and this is just the extra stuff.)
As far as I know, if there are any errors, an exception must be thrown. What happens and how can I fix it?
The same thing happens when I print the output using standard output. All used libraries are standard libraries (eg. iostream, fstream, etc.)
PS: For some reasons, I can't publish all the code. Here is the part that creates the output and passes it to stream: (tri is and string, and is defined previously. Center is an integer and is inside the bounds of the string. fout is a previously defined file stream.)
string op = "" + tri[center];
fout << center << "<>" << op << endl;
Since tri is a string, tri[center] is a char.
The type of "" is const char[], which can't be added to a char.
Instead it is implicitly converted to const char*, which can be added to a char.
Unfortunately for you, the result of that is that the integer value of tri[center]is added to that pointer as an offset, not as a string concatenation, and the particular area of memory that the result refers to doesn't contain what you're looking for but instead contains other static strings like e.g. "error reading the file".
To fix it, use
string op = string("") + tri[center];
instead.
I encountered the same problem in another program, where I had written:
str += blablabla + "#";
and I saw some unrelated characters being printed. I fixed it this way:
str = str + blablabla + "#";
and it worked!
There is some problem with the += operator for string.
string s="abcdefghijklmnopqrstuvwxyz"
char f[]=" " (s.substr(s.length()-10,9)).c_str() " ";
I want to get the last 9 characters of s and add " " to the beginning and the end of the substring, and store it as a char[]. I don't understand why this doesn't work even though char f[]=" " "a" " " does.
Is (s.substr(s.length()-10,9)).c_str() not a string literal?
No, it's not a string literal. String literals always have the form "<content>" or expand to that (macros, like __FILE__ for example).
Just use another std::string instead of char[].
std::string f = " " + s.substr(s.size()-10, 9) + " ";
First, consider whether you should be using cstrings. In C++, generally, use string.
However, if you want to use cstrings, the concatenation of "abc" "123" -> "abc123" is a preprocessor operation and so cannot be used with string::c_str(). Instead, the easiest way is to construct a new string and take the .c_str() of that:
string s="abcdefghijklmnopqrstuvwxyz"
char f[]= (string(" ") + s.substr(s.length()-10,9) + " ").c_str();
(EDIT: You know what, on second thought, that's a really bad idea. The cstring should be deallocated after the end of this statement, so using f can cause a segfault. Just don't use cstrings unless you're prepared to mess with strcpy and all that ugly stuff. Seriously.)
If you want to use strings instead, consider something like the following:
#include <sstream>
...
string s="abcdefghijklmnopqrstuvwxyz"
stringstream tmp;
tmp << " " << s.substr(s.length()-10,9) << " ";
string f = tmp.str();
#Xeo tells you how to solve your problem. Here's some complimentary background on how string literals are handled in the compilation process.
From section A.12 Preprocessing of The C Programming language:
Escape sequences in character constants and string literals (Pars. A.2.5.2, A.2.6) are
replaced by their equivalents; then adjacent string literals are concatenated.
It's the Preprocessor, not the compiler, who's responsible for the concatenation. (You asked for a C++ answer. I expect that C++ treats string literals the same way as C). The preprocessor has only a limited knowledge of the C/C++ language; the (s.substr(s.length()-10,9)).c_str() part is not evaluated at the preprocessor stage.
I spent about 4 hours yesterday trying to fix this issue in my code. I simplified the problem to the example below.
The idea is to store a string in a stringstream ending with std::ends, then retrieve it later and compare it to the original string.
#include <sstream>
#include <iostream>
#include <string>
int main( int argc, char** argv )
{
const std::string HELLO( "hello" );
std::stringstream testStream;
testStream << HELLO << std::ends;
std::string hi = testStream.str();
if( HELLO == hi )
{
std::cout << HELLO << "==" << hi << std::endl;
}
return 0;
}
As you can probably guess, the above code when executed will not print anything out.
Although, if printed out, or looked at in the debugger (VS2005), HELLO and hi look identical, their .length() in fact differs by 1. That's what I am guessing is causing the == operator to fail.
My question is why. I do not understand why std::ends is an invisible character added to string hi, making hi and HELLO different lengths even though they have identical content. Moreover, this invisible character does not get trimmed with boost trim. However, if you use strcmp to compare .c_str() of the two strings, the comparison works correctly.
The reason I used std::ends in the first place is because I've had issues in the past with stringstream retaining garbage data at the end of the stream. std::ends solved that for me.
std::ends inserts a null character into the stream. Getting the content as a std::string will retain that null character and create a string with that null character at the respective positions.
So indeed a std::string can contain embedded null characters. The following std::string contents are different:
ABC
ABC\0
A binary zero is not whitespace. But it's also not printable, so you won't see it (unless your terminal displays it specially).
Comparing using strcmp will interpret the content of a std::string as a C string when you pass .c_str(). It will say
Hmm, characters before the first \0 (terminating null character) are ABC, so i take it the string is ABC
And thus, it will not see any difference between the two above. You are probably having this issue:
std::stringstream s;
s << "hello";
s.seekp(0);
s << "b";
assert(s.str() == "b"); // will fail!
The assert will fail, because the sequence that the stringstream uses is still the old one that contains "hello". What you did is just overwriting the first character. You want to do this:
std::stringstream s;
s << "hello";
s.str(""); // reset the sequence
s << "b";
assert(s.str() == "b"); // will succeed!
Also read this answer: How to reuse an ostringstream
std::ends is simply a null character. Traditionally, strings in C and C++ are terminated with a null (ascii 0) character, however it turns out that std::string doesn't really require this thing. Anyway to step through your code point by point we see a few interesting things going on:
int main( int argc, char** argv )
{
The string literal "hello" is a traditional zero terminated string constant. We copy that whole into the std::string HELLO.
const std::string HELLO( "hello" );
std::stringstream testStream;
We now put the string HELLO (including the trailing 0) into the stream, followed by a second null which is put there by the call to std::ends.
testStream << HELLO << std::ends;
We extract out a copy of the stuff we put into the stream (the literal string "hello", plus the two null terminators).
std::string hi = testStream.str();
We then compare the two strings using the operator == on the std::string class. This operator (probably) compares the length of the string objects - including how ever many trailing null characters. Note that the std::string class does not require the underlying character array to end with a trailing null - put another way it allows the string to contain null characters so the first of the two trailing null characters is treated as part of the string hi.
Since the two strings are different in the number of trailing nulls, the comparison fails.
if( HELLO == hi )
{
std::cout << HELLO << "==" << hi << std::endl;
}
return 0;
}
Although, if printed out, or looked at
in the debugger (VS2005), HELLO and hi
look identical, their .length() in
fact differs by 1. That's what I am
guessing is causing the "==" operator
to fail.
Reason being, the length is different by one trailing null character.
My question is why. I do not
understand why std::ends is an
invisible character added to string
hi, making hi and HELLO different
lengths even though they have
identical content. Moreover, this
invisible character does not get
trimmed with boost trim. However, if
you use strcmp to compare .c_str() of
the two strings, the comparison works
correctly.
strcmp is different from std::string - it is written from back in the early days when strings were terminated with a null - so when it gets to the first trailing null in hi it stops looking.
The reason I used std::ends in the
first place is because I've had issues
in the past with stringstream
retaining garbage data at the end of
the stream. std::ends solved that for
me.
Sometimes it is a good idea to understand the underlying representation.
You're adding a NULL char to HELLO with std::ends. When you initialize hi with str() you are removing the NULL char. The strings are different. strcmp doesn't compare std::strings, it compares char* (it's a C function).
std::ends adds a null terminator, (char)'\0'. You'd use it with the deprecated strstream classes, to add the null terminator.
You don't need it with stringstream, and in fact it screws things up, because the null terminator isn't "the special null terminator that ends a string" to stringstream, to stringstream it's just another character, the zeroth character. stringstream just adds it, and that increases the character count (in your case) to seven, and makes the comparison to "hello" fail.
I think to have a good way to compare strings is to use std::find method. Do not mix C methods and std::string ones!