basic_string implementation of libstdc++ in GCC and custom pointers

basic_string implementation of libstdc++ in GCC and custom pointers - c++

I tried to study libstdc++ source code (GCC 7.2) recently and was confused. Probably I have missed something important, but I started to think that it is impossible to implement basic_string class that will fully comply C++ standard.
Here is a problem that I have faced.
basic_string should be able to accept custom allocator class as template parameter.
allocate method is required as a part of allocator.
allocate is allowed to return objects of user-defined type that "acts like pointer to allocated data". Let's call it my_pointer.
my_pointer should satisfy only the requirements for NullablePointer and RandomAccessIterator. All other requirements are optional. According to standard we may be unable to cast my_pointer to CharT* type (another basic_string template parameter) because it is optional.
On the other hand const CharT* c_str() method should be implemented as a part of the standard, so we have to know how to make this cast.
Items 4 and 5 are conflicting and I don't know how to resolve it.
Hope that you can help me to figure it out.
Thank you!

There are several requirements from the standard that together always ensure the conversion is possible, at least indirectly
Given basic_string<charT, traits, Allocator>, the standard requires charT and allocator_traits<Allocator>::value_type be equal.
allocator_traits<Allocator>::pointer is required to be either Allocator::pointer or Allocator::value_type*.
In the former case, given a Allocator::pointer p, *p is required to be Allocator::value_type&.
In the latter case, everything is trivial.
Allocator::pointer is required to be a contiguous iterator, which requires, given a contiguous iterator q, *(q + n) == *(addressof(*q) + n)
Given an Allocator a, a.allocate(n) is required to return a Allocator::pointer.
Combining everything together, it means this is always correct
template<typename charT, typename traits = /* ... */, typename Allocator = /* ... */>
class basic_string
{
typename std::allocator_traits<Allocator>::pointer _data;
// ...
public:
charT* c_str() { return std::addressof(*_data); }
// ...
};
Where _data possibly stores the result from a previous call to Allocator::allocate

Related

C++ vector<bool> error: Taking address of temporary [duplicate]

Item 18 of Scott Meyers's book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says to avoid vector <bool> as it's not an STL container and it doesn't really hold bools.
The following code:
vector <bool> v;
bool *pb =&v[0];
will not compile, violating a requirement of STL containers.
Error:
cannot convert 'std::vector<bool>::reference* {aka std::_Bit_reference*}' to 'bool*' in initialization
vector<T>::operator [] return type is supposed to be T&, but why is it a special case for vector<bool>?
What does vector<bool> really consist of?
The Item further says:
deque<bool> v; // is a STL container and it really contains bools
Can this be used as an alternative to vector<bool>?
Can anyone please explain this?

For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.
In this case, since you can't take the address of a bit within a byte, things such as operator[] can't return a bool& but instead return a proxy object that allows to manipulate the particular bit in question. Since this proxy object is not a bool&, you can't assign its address to a bool* like you could with the result of such an operator call on a "normal" container. In turn this means that bool *pb =&v[0]; isn't valid code.
On the other hand deque doesn't have any such specialization called out so each bool takes a byte and you can take the address of the value return from operator[].
Finally note that the MS standard library implementation is (arguably) suboptimal in that it uses a small chunk size for deques, which means that using deque as a substitute isn't always the right answer.

The problems is that vector<bool> returns a proxy reference object instead of a true reference, so that C++98 style code bool * p = &v[0]; won't compile. However, modern C++11 with auto p = &v[0]; can be made to compile if operator& also returns a proxy pointer object. Howard Hinnant has written a blog post detailing the algorithmic improvements when using such proxy references and pointers.
Scott Meyers has a long Item 30 in More Effective C++ about proxy classes. You can come a long way to almost mimic the builtin types: for any given type T, a pair of proxies (e.g. reference_proxy<T> and iterator_proxy<T>) can be made mutually consistent in the sense that reference_proxy<T>::operator&() and iterator_proxy<T>::operator*() are each other's inverse.
However, at some point one needs to map the proxy objects back to behave like T* or T&. For iterator proxies, one can overload operator->() and access the template T's interface without reimplementing all the functionality. However, for reference proxies, you would need to overload operator.(), and that is not allowed in current C++ (although Sebastian Redl presented such a proposal on BoostCon 2013). You can make a verbose work-around like a .get() member inside the reference proxy, or implement all of T's interface inside the reference (this is what is done for vector<bool>::bit_reference), but this will either lose the builtin syntax or introduce user-defined conversions that do not have builtin semantics for type conversions (you can have at most one user-defined conversion per argument).
TL;DR: no vector<bool> is not a container because the Standard requires a real reference, but it can be made to behave almost like a container, at least much closer with C++11 (auto) than in C++98.

vector<bool> contains boolean values in compressed form using only one bit for value (and not 8 how bool[] arrays do). It is not possible to return a reference to a bit in c++, so there is a special helper type, "bit reference", which provides you a interface to some bit in memory and allows you to use standard operators and casts.

Many consider the vector<bool> specialization to be a mistake.
In a paper "Deprecating Vestigial Library Parts in C++17"
There is a proposal to
Reconsider vector Partial Specialization.
There has been a long history of the bool partial specialization of
std::vector not satisfying the container requirements, and in
particular, its iterators not satisfying the requirements of a random
access iterator. A previous attempt to deprecate this container was
rejected for C++11, N2204.
One of the reasons for rejection is that it is not clear what it would
mean to deprecate a particular specialization of a template. That
could be addressed with careful wording. The larger issue is that the
(packed) specialization of vector offers an important
optimization that clients of the standard library genuinely seek, but
would not longer be available. It is unlikely that we would be able to
deprecate this part of the standard until a replacement facility is
proposed and accepted, such as N2050. Unfortunately, there are no such
revised proposals currently being offered to the Library Evolution
Working Group.

Look at how it is implemented. the STL builds vastly on templates and therefore the headers do contain the code they do.
for instance look at the stdc++ implementation here.
also interesting even though not an stl conforming bit vector is the llvm::BitVector from here.
the essence of the llvm::BitVector is a nested class called reference and suitable operator overloading to make the BitVector behaves similar to vector with some limitations. The code below is a simplified interface to show how BitVector hides a class called reference to make the real implementation almost behave like a real array of bool without using 1 byte for each value.
class BitVector {
public:
class reference {
reference &operator=(reference t);
reference& operator=(bool t);
operator bool() const;
};
reference operator[](unsigned Idx);
bool operator[](unsigned Idx) const;
};
this code here has the nice properties:
BitVector b(10, false); // size 10, default false
BitVector::reference &x = b[5]; // that's what really happens
bool y = b[5]; // implicitly converted to bool
assert(b[5] == false); // converted to bool
assert(b[6] == b[7]); // bool operator==(const reference &, const reference &);
b[5] = true; // assignment on reference
assert(b[5] == true); // and actually it does work.
This code actually has a flaw, try to run:
std::for_each(&b[5], &b[6], some_func); // address of reference not an iterator
will not work because assert( (&b[5] - &b[3]) == (5 - 3) ); will fail (within llvm::BitVector)
this is the very simple llvm version. std::vector<bool> has also working iterators in it.
thus the call for(auto i = b.begin(), e = b.end(); i != e; ++i) will work. and also std::vector<bool>::const_iterator.
However there are still limitations in std::vector<bool> that makes it behave differently in some cases.

This comes from http://www.cplusplus.com/reference/vector/vector-bool/
Vector of bool This is a specialized version of vector, which is used
for elements of type bool and optimizes for space.
It behaves like the unspecialized version of vector, with the
following changes:
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is
stored in a single bit.
Elements are not constructed using the allocator object, but their value is directly set on the proper bit in the internal storage.
Member function flip and a new signature for member swap.
A special member type, reference, a class that accesses individual bits in the container's internal storage with an interface that
emulates a bool reference. Conversely, member type const_reference is
a plain bool.
The pointer and iterator types used by the container are not necessarily neither pointers nor conforming iterators, although they
shall simulate most of their expected behavior.
These changes provide a quirky interface to this specialization and
favor memory optimization over processing (which may or may not suit
your needs). In any case, it is not possible to instantiate the
unspecialized template of vector for bool directly. Workarounds to
avoid this range from using a different type (char, unsigned char) or
container (like deque) to use wrapper types or further specialize for
specific allocator types.
bitset is a class that provides a similar functionality for fixed-size
arrays of bits.

Why the reference of std::vector<bool> is a prvalue whereas some document says that:"std::vector::reference is to provide an l-value"? [duplicate]

Item 18 of Scott Meyers's book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says to avoid vector <bool> as it's not an STL container and it doesn't really hold bools.
The following code:
vector <bool> v;
bool *pb =&v[0];
will not compile, violating a requirement of STL containers.
Error:
cannot convert 'std::vector<bool>::reference* {aka std::_Bit_reference*}' to 'bool*' in initialization
vector<T>::operator [] return type is supposed to be T&, but why is it a special case for vector<bool>?
What does vector<bool> really consist of?
The Item further says:
deque<bool> v; // is a STL container and it really contains bools
Can this be used as an alternative to vector<bool>?
Can anyone please explain this?

For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.
In this case, since you can't take the address of a bit within a byte, things such as operator[] can't return a bool& but instead return a proxy object that allows to manipulate the particular bit in question. Since this proxy object is not a bool&, you can't assign its address to a bool* like you could with the result of such an operator call on a "normal" container. In turn this means that bool *pb =&v[0]; isn't valid code.
On the other hand deque doesn't have any such specialization called out so each bool takes a byte and you can take the address of the value return from operator[].
Finally note that the MS standard library implementation is (arguably) suboptimal in that it uses a small chunk size for deques, which means that using deque as a substitute isn't always the right answer.

The problems is that vector<bool> returns a proxy reference object instead of a true reference, so that C++98 style code bool * p = &v[0]; won't compile. However, modern C++11 with auto p = &v[0]; can be made to compile if operator& also returns a proxy pointer object. Howard Hinnant has written a blog post detailing the algorithmic improvements when using such proxy references and pointers.
Scott Meyers has a long Item 30 in More Effective C++ about proxy classes. You can come a long way to almost mimic the builtin types: for any given type T, a pair of proxies (e.g. reference_proxy<T> and iterator_proxy<T>) can be made mutually consistent in the sense that reference_proxy<T>::operator&() and iterator_proxy<T>::operator*() are each other's inverse.
However, at some point one needs to map the proxy objects back to behave like T* or T&. For iterator proxies, one can overload operator->() and access the template T's interface without reimplementing all the functionality. However, for reference proxies, you would need to overload operator.(), and that is not allowed in current C++ (although Sebastian Redl presented such a proposal on BoostCon 2013). You can make a verbose work-around like a .get() member inside the reference proxy, or implement all of T's interface inside the reference (this is what is done for vector<bool>::bit_reference), but this will either lose the builtin syntax or introduce user-defined conversions that do not have builtin semantics for type conversions (you can have at most one user-defined conversion per argument).
TL;DR: no vector<bool> is not a container because the Standard requires a real reference, but it can be made to behave almost like a container, at least much closer with C++11 (auto) than in C++98.

vector<bool> contains boolean values in compressed form using only one bit for value (and not 8 how bool[] arrays do). It is not possible to return a reference to a bit in c++, so there is a special helper type, "bit reference", which provides you a interface to some bit in memory and allows you to use standard operators and casts.

Many consider the vector<bool> specialization to be a mistake.
In a paper "Deprecating Vestigial Library Parts in C++17"
There is a proposal to
Reconsider vector Partial Specialization.
There has been a long history of the bool partial specialization of
std::vector not satisfying the container requirements, and in
particular, its iterators not satisfying the requirements of a random
access iterator. A previous attempt to deprecate this container was
rejected for C++11, N2204.
One of the reasons for rejection is that it is not clear what it would
mean to deprecate a particular specialization of a template. That
could be addressed with careful wording. The larger issue is that the
(packed) specialization of vector offers an important
optimization that clients of the standard library genuinely seek, but
would not longer be available. It is unlikely that we would be able to
deprecate this part of the standard until a replacement facility is
proposed and accepted, such as N2050. Unfortunately, there are no such
revised proposals currently being offered to the Library Evolution
Working Group.

Look at how it is implemented. the STL builds vastly on templates and therefore the headers do contain the code they do.
for instance look at the stdc++ implementation here.
also interesting even though not an stl conforming bit vector is the llvm::BitVector from here.
the essence of the llvm::BitVector is a nested class called reference and suitable operator overloading to make the BitVector behaves similar to vector with some limitations. The code below is a simplified interface to show how BitVector hides a class called reference to make the real implementation almost behave like a real array of bool without using 1 byte for each value.
class BitVector {
public:
class reference {
reference &operator=(reference t);
reference& operator=(bool t);
operator bool() const;
};
reference operator[](unsigned Idx);
bool operator[](unsigned Idx) const;
};
this code here has the nice properties:
BitVector b(10, false); // size 10, default false
BitVector::reference &x = b[5]; // that's what really happens
bool y = b[5]; // implicitly converted to bool
assert(b[5] == false); // converted to bool
assert(b[6] == b[7]); // bool operator==(const reference &, const reference &);
b[5] = true; // assignment on reference
assert(b[5] == true); // and actually it does work.
This code actually has a flaw, try to run:
std::for_each(&b[5], &b[6], some_func); // address of reference not an iterator
will not work because assert( (&b[5] - &b[3]) == (5 - 3) ); will fail (within llvm::BitVector)
this is the very simple llvm version. std::vector<bool> has also working iterators in it.
thus the call for(auto i = b.begin(), e = b.end(); i != e; ++i) will work. and also std::vector<bool>::const_iterator.
However there are still limitations in std::vector<bool> that makes it behave differently in some cases.

This comes from http://www.cplusplus.com/reference/vector/vector-bool/
Vector of bool This is a specialized version of vector, which is used
for elements of type bool and optimizes for space.
It behaves like the unspecialized version of vector, with the
following changes:
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is
stored in a single bit.
Elements are not constructed using the allocator object, but their value is directly set on the proper bit in the internal storage.
Member function flip and a new signature for member swap.
A special member type, reference, a class that accesses individual bits in the container's internal storage with an interface that
emulates a bool reference. Conversely, member type const_reference is
a plain bool.
The pointer and iterator types used by the container are not necessarily neither pointers nor conforming iterators, although they
shall simulate most of their expected behavior.
These changes provide a quirky interface to this specialization and
favor memory optimization over processing (which may or may not suit
your needs). In any case, it is not possible to instantiate the
unspecialized template of vector for bool directly. Workarounds to
avoid this range from using a different type (char, unsigned char) or
container (like deque) to use wrapper types or further specialize for
specific allocator types.
bitset is a class that provides a similar functionality for fixed-size
arrays of bits.

Should allocator construct() default initialize instead of value initializing?

As a followup to this question, the default allocator (std::allocator<T>) is required to implement construct as follows (according to [default.allocator]):
template <class U, class... Args>
void construct(U* p, Args&&... args);
Effects: ::new((void *)p) U(std::forward<Args>(args)...)
That is, always value-initialization. The result of this is that std::vector<POD> v(num), for any pod type, will value-initialize num elements - which is more expensive than default-initializing num elements.
Why didn't† std::allocator provide a default-initializing additional overload? That is, something like (borrowed from Casey):
template <class U>
void construct(U* p) noexcept(std::is_nothrow_default_constructible<U>::value)
{
::new(static_cast<void*>(p)) U;
}
Was there a reason to prefer value initialization in call cases? It seems surprising to me that this breaks the usual C++ rules where we only pay for what we want to use.
†I assume such a change is impossible going forward, given that currently std::vector<int> v(100) will give you 100 0s, but I'm wondering why that is the case... given that one could just as easily have required std::vector<int> v2(100, 0) in the same way that there are differences between new int[100] and new int[100]{}.

In C++03 Allocators construct member took two arguments: pointer and value which was used to perform copy-initialization:
20.1.6 Table 34
a.construct(p,t)
Effect:
    ::new((void*)p) T(t)
construct taking two parameters can be traced back to 1994 (pg. 18). As you can see, in orignal Stepanov concepts it wasn't part of allocator interface (it wasn't supposed to be configurable) and was present just as wrapper over placement new.
Only way to know for sure would to ask Stepanov himself, but I suppose that reason was following: if you want to construct something, you want to initialize it with specific value. And if you want your integers uninitializated, you can just omit construct call since it is not needed for POD types. Later construct and other related function were bundled into allocators and containers were parametrized on them introducing some loss of control on initialization for end user.
So it seems that lack of default initialization is for historical reasons: nobody though about its importance when C++ was standardized and later versions of the Standard would not introduce breaking change.

Const-Correctness for Elements of std Containers

The following is bad:
vector<const int> vec;
The problem is that the template type needs to be assignable. The following code compiles [EDIT: in Visual Studio 2010], demonstrating a problem with the above:
vector<const int> vec;
vec.push_back(6);
vec[0] += 4;
With more complicated types, this can be a serious problem.
My first question is whether there is a reason for this behavior. It seems to me like it might be possible to make const containers that disallow the above and non-const containers that allow it.
Second, is there a way to make containers that function in this way?
Third, what is actually happening here (with a user type)? I realize it is undefined behavior, but how is the STL even compiling this at all?

The reason std::vector<T const> isn't allowed is that the object in a vector may need to be reshuffled when inserting at a different place than the beginning. Now, the member std::vector<T>::push_back(T const& v) is conceptually equivalent to (leaving the allocator template parameter out as it is irrelevant for this discussion)
template <typename T>
void std::vector<T>::push_back(T const& v) {
this->insert(this->end(), v);
}
which seems to be how it is implemented on some implementations. Now, this operation would requires, in general, that some objects might need to be moved and, thus, the T argument needs to be assignable. It seems that the standard library shipping with MSVC++ doesn't delegate the operation but does all the necessary handling, i.e., resizing the array and moving the objects appropriately when running out of space, in push_back(). It isn't quite clear what the requirements are on the type T to be able to use push_back().
In principle, a container supporting both T const and an insert() operation in the middle would be possible, though: Nothing requires the internal storage to be T rather than typename std::remove_const<T>::type while exposing a T& in the interface. It is necessary to be a bit careful about the const-version ofoperations like operator[]() because just using T const& as the return type when T is some type S const would result in a type S const const. In C++ 2003 this would be an error, in C++ 2011 I think the const are just collapsed. To be safe you could use typename std::add_const<T>::type&.

is uninitialized_copy/fill(In first, In last, For dest, A &a) an oversight in the c++ standard?

I like to know how things work and as such have been delving deep into the c++ standard library. Something occurred to me the other day.
It is required that containters (for example: std::vector<int, std::allocator<int> >) use the allocator specified for allocations. Specifically the standard says:
23.1.8
Copy constructors for all container types defined in this clause copy
an allocator argument from their respective first parameters. All
other constructors for these container types take an Allocator&
argument (20.1.5), an allocator whose value type is the same as the
container’s value type. A copy of this argument is used for any memory
allocation performed, by these constructors and by all member
functions, during the lifetime of each container object. In all
container types defined in this clause, the member get_allocator()
returns a copy of the Allocator object used to construct the
container.
Also later in the standard it says (in a few different spots but i'll pick one) things like this:
explicit deque(size_type n, const T& value = T(), const Allocator& = Allocator());
Effects: Constructs a deque with n copies of value,
using the specified allocator.
OK, so on to my question.
Let's take std::vector as an example, the natural and efficient way to implement something like:
vector<T, A>::vector(const vector& x)
might look something like this:
template <class T, class A>
vector<T, A>::vector(const vector& x) {
pointer p = alloc_.allocate(x.size());
std::uninitialized_copy(x.begin(), x.end(), p);
first_ = p;
last_ = p + x.size();
end_ = p + x.size();
}
specifically, we allocate some memory and then copy construct all the members in place. Not bothering to do something like new value_type[x.size()] because that would default construct the array only to overwrite it!.
but, this doesn't use the allocator to do the copy construction...
I could manually write a loop which does something like this:
while(first != last) {
alloc_.construct(&*dest++, *first++);
}
but that's a waste, it's nearly identical to std::uninitialized_copy, the only difference is that is uses the allocator instead of placement new.
So, would you consider it an oversight that the standard doesn't have the (seemingly obvious to me) set of functions like these:
template <class In, class For, class A>
For uninitialized_copy(In first, In last, For dest, A &a);
template <class In, class Size, class For, class A>
For uninitialized_copy_n(In first, Size count, For dest, A &a);
template <class For, class T, class A>
void uninitialized_fill(For first, For last, const T& x, A &a);
template <class For, class Size, class T, class A>
void uninitialized_fill_n(For first, Size count, const T& x, A &a);
I would imagine that these types of functions (even though they are trivial to implement manually... until you try to make them exception safe) would prove fairly useful if people want to implement there own containers and such and make efficient use of copy construction while using allocators.
Thoughts?

I'm not sure whether we could call it an "oversight", per se.
No, you can't provide your own allocator to these specialised algorithms. But then there are other things that the standard doesn't contain, either.
#MarkB identifies a very good reason that the standard shouldn't do this (that the range has no knowledge of the container's allocator). I'd go so far as to say it's just an inherent limitation.
You can always re-invent uninitialized_copy for your needs, knowing what the allocator should be. It's just a two-line for loop.

If these functions were free-functions, I can't see any way that the compiler could detect allocator mismatches since the allocator type isn't retained by the iterator. This in turn could result in a variety of hard-to-find problems.

Yes, I think it is a (big) oversight, because information about the allocator is lost otherwise. The allocator is, in the current protocols, the only one that knows how to exactly construct an object in memory.
Now Boost includes alloc_construct, alloc_destroy https://www.boost.org/doc/libs/1_72_0/libs/core/doc/html/core/alloc_construct.html
which at least can help a bit implementing generic versions of uninitialized_copy/fill/etc(Alloc a, ...).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

basic_string implementation of libstdc++ in GCC and custom pointers - c++

Related

C++ vector<bool> error: Taking address of temporary [duplicate]

Why the reference of std::vector<bool> is a prvalue whereas some document says that:"std::vector::reference is to provide an l-value"? [duplicate]

Should allocator construct() default initialize instead of value initializing?

Const-Correctness for Elements of std Containers

is uninitialized_copy/fill(In first, In last, For dest, A &a) an oversight in the c++ standard?

Categories

Resources