For the following code:
#include <iostream>
#include <locale>
int main()
{
std::locale loc = std::locale()
.combine<std::numpunct<char>>(std::locale("en_US.UTF-8"));
std::cout << loc.name() << '\n';
std::cout << (std::locale() == loc);
}
When compiled with gcc and clang, the output is C and 1 (https://godbolt.org/z/q8fT4oqj3). But cppreference says that combine() will return a new, nameless locale.
I am totally confused:
If combine() returns a nameless locale, then why is the locale's name still 'C'?
If operator== returns 1, then how to distinguish between them?
It looks like it should return "*" according to the standard:
template <class Facet> locale combine(const locale& other) const;
Effects: Constructs a locale incorporating all facets from *this except for that one facet of other that is identified by Facet.
Returns: The newly created locale.
Throws: runtime_error if has_facet<Facet>(other) is false.
Remarks: The resulting locale has no name.
basic_string<char> name() const;
Returns: The name of *this, if it has one; otherwise, the string "*". If *this has a name, then locale(name().c_str()) is
equivalent to *this. Details of the contents of the resulting string
are otherwise implementation-defined.
So it looks like a bug.
Related
I want to do the same thing as std::quote with a custom type, but I thinking about miss used of this kind of API with a temporary rvalue. After some dinging with std::quoted, I discovered the following problem:
To be efficient std::quoted force to store a const reference or a pointer to avoid a deep copy of the source object, but there are no mechanism to avoid to store the quoted result. If we store it, then delete the source object, and finally stream the stored result we try to access to the deleted reference or pointer.
The example below try to illustrate the problem:
#include <string>
#include <iostream>
#include <iomanip>
class String
{
public:
explicit String(const std::string & s) : _s(s) {std::cout << "String\n";}
~String() { std::cout << "~String\n"; _s = "ERROR TRY ACCESS DELETED STRING";}
const std::string & getS() const {return _s;}
private:
std::string _s;
};
int main()
{
std::cout << std::quoted(String("test").getS()) << '\n';
std::cout << '\n';
auto q = std::quoted(String("test").getS());
std::cout << q << '\n';
return 0;
}
This example print:
String
"test"
~String
String
~String
"p DELETED STRING"
We can see the same problem with gcc(trunk) and clang(trunk).
This is to be expected. Or rather: this is not to be expected to work.
From cppreference:
Allows insertion and extraction of quoted strings, such as the ones
found in CSV or XML.
When used in an expression out << quoted(s, delim, escape), where out
is an output stream with char_type equal to CharT and, for overloads
2-4, traits_type equal to Traits, behaves as a
FormattedOutputFunction, which inserts into out a sequence of
characters seq constructed as follows:
a) First, the character delim is added to the sequence
b) Then every character from s, except if the next character to output equals delim or equals escape (as determined by the stream's traits_type::eq), then first appends an extra copy of escape
c) In the end, delim is appended to seq once more
And
Return value
Returns an object of unspecified type such that the described behavior
takes place.
And the "described bahvior" is the above: "When used in an expression out << quoted(s, delim, escape), where out [...]" then the string s is outputted to the stream. When that string s does no longer exist you cannot output it to a stream.
Actually what is written on cppreferece is what you can also find in the stadnard, just a little reworded, unfortunately without adding lots of clarify. From the standard [quoted.manip#2]:
Returns: An object of unspecified type such that if out is an instance of basic_ostream with member type char_type the same as charT and with member type traits_type, which in the second and third forms is the same as traits, then the expression out << quoted(s, delim, escape) behaves as a formatted output function of out. This forms a character sequence seq, initially consisting of the following elements: [...]
Note that it only states what happens in an expression out << quoted(s, delim, escape). And thats the only usage you can rely on. Maybe the note helps to clarify:
[Note 1: Quoted manipulators provide string insertion and extraction
of quoted strings (for example, XML and CSV formats). Quoted
manipulators are useful in ensuring that the content of a string with
embedded spaces remains unchanged if inserted and then extracted via
stream I/O. — end note]
std::quoted is a io-manipulator. Its return value is not meant to be used for anything but to be passed to a streams operator<< or operator>>.
The following piece of code compiles and runs without errors and with the expected output:
#include <string>
#include <iostream>
using namespace std;
string getString()
{
char s[] = "Hello world!";
return s;
}
int main()
{
cout << getString() << endl;
}
My question is, will this always work? Ordinarily if you return a C-string that was declared locally you can run into some undefined behavior, but in this case is that still a problem since it is run through the string constructor and (presumably) copied into dynamic memory?
return s;
That line is equivalent to:
return std::string(s);
And that will make a copy of the string, so it's fine.
reference: http://en.cppreference.com/w/cpp/string/basic_string/basic_string (constructor #5)
Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s.
Edit: One more detail. You mention
copied into dynamic memory?
And the answer is maybe, perhaps, it doesn't really matter.
The semantics provided by std::string make no specification towards this, it just guarantees that it can be copied/moved around and accessed in a consistent matter. How it acheives this is up to the library implementor.
In fact, popular implementations of std::string use something called the "Small String Optimization". Where strings under a certain length are stored within the string object itself.
Consider the following code snippet:
#include <iostream>
int main() {
std::string foo;
foo = -1; // why is the compiler not complaining about this?
std::cout << "1" << std::endl;
std::cout << foo << std::endl;
std::cout << "2" << std::endl;
}
Actual output (both ideone.com C++14 mode and GCC 4.8.4):
<no output>
Questions:
Why did the code snippet compile at all?
Commenting out foo = -1, I get the correct stdout (1 and 2). What has the compiler compiled with foo = -1; that causes the subsequent couts to fail?
foo = -1;
resolves to std::string::operator=(char) since -1 is an int and int can, in theory, be converted to a char.
It's not clear to me what the standard says when the int does not represent a valid char. It looks like in your implementation, the program crashes.
Update
From the C++11 Standard (emphasis mine):
3.9.1 Fundamental types
1 Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set. If a character from this set is stored in a character object, the integral value of that character
object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values.
It appears that you'll have to consult your compiler's documentation to understand whether it allows char object to hold negative values and, if it does, how does it treat such objects.
char is an integral type in C++. std::string defines an assignment operator:
std::string& operator=(char);
Since int converts to char freely in this context, no diagnostic is given. (It's funny how best intentions pave the road to Hell, innit?)
Since (char)-1 is probably not a valid member if the execution character set on your platform, the stream enters an error state and will stay there, outputting nothing, until the error bit is cleared.
EDIT this is a bug of ideone. If the output stream contains an "illegal" character, the entire stream is not shown, even the parts produced and flushed before the bad character. Use another online compiler to check.
These are the operator= overloads for the string class :-
basic_string& operator=(const basic_string& str);
basic_string& operator=(basic_string&& str) noexcept(allocator_traits<Allocator>::propagate_on_container_move_assignment::value || allocator_traits<Allocator>::is_always_equal::value);
basic_string& operator=(const charT* s);
basic_string& operator=(charT c);
basic_string& operator=(initializer_list<charT>);
Hope that makes sense why that compiled fine.
Now coming onto your question as to why there is no output. I have tweaked the code a little :-
#include <iostream>
int main() {
std::string foo;
foo = -1; // why is the compiler not complaining about this?
char ch = 65;
std::cout << "1" << std::endl;
std::cout << foo << std::endl;
std::cout << ch << std::endl;
//change ch to -1 ... ascii
ch = -1;
std::cout << ch << std::endl;
std::cout << "2" << std::endl;
}
Can you guess what the output is ? Yup think in terms of ascii :-
1
A
2
That's exactly the reason why you don't have the output for -1.
Compiler - MinGW - std=c++14 -- not sure why IDEONE messes up the complete output stream in your case.
I'm experimenting with the iostreams / locale numeric facet and I've hit something quite curious:
The "canonical example" of using the std::num_put facet to directly format a number goes like this:
std::string f(double value) {
using namespace std;
stringstream ss;
num_put<char> const& npf = use_facet<num_put<char> >(ss.getloc());
npf.put(/*out=*/ss, /*format=*/ss, /*fill=*/' ', value);
return ss.str();
}
The first parameter to put is the thing where the output is written to.
We can also have code like this and it works:
std::string g(double value) {
using namespace std;
stringstream ss;
typedef char* CharBufOutIt;
num_put<char, CharBufOutIt> const& npf = use_facet<num_put<char, CharBufOutIt> >(ss.getloc());
char big_enough_buf[100];
npf.put(/*out=*/big_enough_buf, /*format=*/ss, /*fill=*/' ', value);
return big_enough_buf;
}
The second parameter to put() is a stream object that determines the specific formatting to be applied. The second parameter is not modified at all. (Not in my implementation, and not according to what the docs describe this parameter to be for.)
However, the signature of putlooks like this:
iter_type put( iter_type out, std::ios_base& str, char_type fill, long
double v ) const;
That is, it is taking the ios_base object by non-const reference, even though it wouild appear it should really take it by const reference.
Am I missing something? Is this just a (historical?) peculiarity in the C++ iostreams spec? Has this ever been discussed by the C++ std committee?
From the Standard (22.4.2.2.2) the implementation of put is at one point as such:
Stage 3:
If str.width() is nonzero and the number of charT’s in the sequence after stage 2 is less than str.width(), then enough fill characters are added to the sequence at the position indicated for padding to bring the length of the sequence to str.width(). str.width(0) is called.`
Also, str.width(0) calls width declared without const (see this link):
streamsize ios_base::width (streamsize wide);
cppreference says std::ctype provides character classification based on the classic "C" locale. Is this even true when we create a locale like this:
std::locale loc(std::locale("en_US.UTF8"), new std::ctype<char>);
Will the facet of loc still classify characters based on the "C" locale or the Unicode one? If it classifies by the former, why do we even specify the locale name as "en_US.UTF8"?
The standard requires the default-constructed std::ctype<char> to match the minimal "C" locale via §22.4.1.3.3[facet.ctype.char.statics]/1
static const mask* classic_table() noexcept;
Returns: A pointer to the initial element of an array of size table_size which represents the classifications of characters in the "C" locale
the classification member function is() is defined in terms of table() which is defined in terms of classic_table() unless another table was provided to the ctype<char>'s constructor
I've updated cppreference to match these requirements more properly (it was saying "C" for std::ctype<wchar_t> too)
To answer your second question, the locale constructed with std::locale loc(std::locale("en_US.UTF8"), new std::ctype<char>); will use the ctype facet you specified (and, therefore, "C") to classify narrow characters, but it's redundant: narrow character classification of a plain std::locale("en_US.UTF8") (at least in GNU implementation) is exactly the same:
#include <iostream>
#include <cassert>
#include <locale>
int main()
{
std::locale loc1("en_US.UTF8");
const std::ctype_base::mask* tbl1 =
std::use_facet<std::ctype<char>>(loc1).table();
std::locale loc2(std::locale("en_US.UTF8"), new std::ctype<char>);
const std::ctype_base::mask* tbl2 =
std::use_facet<std::ctype<char>>(loc2).table();
for(size_t n = 0; n < 256; ++n)
assert(tbl1[n] == tbl2[n]);
}
From what I read in the working draft I have for C++11 N3376 §22.4.1.1, std::ctype<char> is supposed to do this:
Class ctype encapsulates the C library <cctype> features. istream members
are required to use ctype<> for character classing during input parsing.
The specializations required in Table 81 (22.3.1.1.1), namely ctype<char> and
ctype<wchar_t>, implement character classing appropriate to the
implementation’s native character set.
It has no mention of the C locale anywhere in there, chances are cppreference might be referring to the functions found in <cctype>.
In C, the C locale applies until you change the locale with setlocale() from <locale.h>. The same is likely true of C++, though you also probably have other mechanisms for setting the locale.
Your statement appears to create a locale; it is not clear that it makes that locale into the default locale, though. That locale can then be used to specify the comparison:
ISO/IEC 14882:2011 (the C++ 2011 standard) has section 22.3 entitled Locales. It says, in part:
// 22.3.3, convenience interfaces:
template <class charT> bool isspace (charT c, const locale& loc);
template <class charT> bool isprint (charT c, const locale& loc);
Also, a little later in the standard, it says:
22.3.1.5 locale static members [locale.statics]
static locale global(const locale& loc);
1 Sets the global locale to its argument.
2 Effects: Causes future calls to the constructor locale() to return a copy of the argument. If the
argument has a name, does
std::setlocale(LC_ALL, loc.name().c_str());
otherwise, the effect on the C locale, if any, is implementation-defined. No library function other
than locale::global() shall affect the value returned by locale(). [ Note: See 22.6 for data race
considerations when setlocale is invoked. — end note ]
3 Returns: The previous value of locale().
So I have to change the global locale before running the above line if I want std::ctype to classify based on "en_US.UTF8"?
What I read from the initial quote is that, given your loc you could write:
if (isspace(ch, loc)) { ... }
specifying the locale to be used explicitly. If you don't want to do that, then you need to call std::locale::global(loc) to set the global locale, so unadorned invocations of isspace() will work:
if (isspace(ch)) { ... }