std::string and multiple concatenations

std::string and multiple concatenations - c++

Let’s consider that snippet, and please suppose that a, b, c and d are non-empty strings.
std::string a, b, c, d;
d = a + b + c;
When computing the sum of those 3 std::string instances, the standard library implementations create a first temporary std::string object, copy in its internal buffer the concatenated buffers of a and b, then perform the same operations between the temporary string and the c.
A fellow programmer was stressing that instead of this behaviour, operator+(std::string, std::string) could be defined to return a std::string_helper.
This object’s very role would be to defer the actual concatenations to the moment where it’s casted into a std::string. Obviously, operator+(std::string_helper, std::string) would be defined to return the same helper, which would "keep in mind" the fact that it has an additional concatenation to carry out.
Such a behavior would save the CPU cost of creating n-1 temporary objects, allocating their buffer, copying them, etc. So my question is: why doesn’t it already work like that ?I can’t think of any drawback or limitation.

why doesn’t it already work like that?
I can only speculate about why it was originally designed like that. Perhaps the designers of the string library simply didn't think of it; perhaps they thought the extra type conversion (see below) might make the behaviour too surprising in some situations. It is one of the oldest C++ libraries, and a lot of wisdom that we take for granted simply didn't exist in past decades.
As to why it hasn't been changed to work like that: it could break existing code, by adding an extra user-defined type conversion. Implicit conversions can only involve at most one user-defined conversion. This is specified by C++11, 13.3.3.1.2/1:
A user-defined conversion sequence consists of an initial standard conversion sequence followed by a user-defined conversion followed by a second standard conversion sequence.
Consider the following:
struct thingy {
thingy(std::string);
};
void f(thingy);
f(some_string + another_string);
This code is fine if the type of some_string + another_string is std::string. That can be implicitly converted to thingy via the conversion constructor. However, if we were to change the definition of operator+ to give another type, then it would need two conversions (string_helper to string to thingy), and so would fail to compile.
So, if the speed of string building is important, you'll need to use alternative methods like concatenation with +=. Or, according to Matthieu's answer, don't worry about it because C++11 fixes the inefficiency in a different way.

The obvious answer: because the standard doesn't allow it. It impacts code by introducing an additional user defined conversion in some cases: if C is a type having a user defined constructor taking an std::string, then it would make:
C obj = stringA + stringB;
illegal.

It depends.
In C++03, it is exact that there may be a slight inefficiency there (comparable to Java and C# as they use string interning by the way). This can be alleviated using:
d = std::string("") += a += b +=c;
which is not really... idiomatic.
In C++11, operator+ is overloaded for rvalue references. Meaning that:
d = a + b + c;
is transformed into:
d.assign(std::move(operator+(a, b).append(c)));
which is (nearly) as efficient as you can get.
The only inefficiency left in the C++11 version is that the memory is not reserved once and for all at the beginning, so there might be reallocation and copies up to 2 times (for each new string). Still, because appending is amortized O(1), unless C is quite longer than B, then at worst a single reallocation + copy should take place. And of course, we are talking POD copy here (so a memcpy call).

Sounds to me like something like this already exists: std::stringstream.
Only you have << instead of +. Just because std::string::operator + exists, it doesn't make it the most efficient option.

I think if you use +=, then it will be little faster:
d += a;
d += b;
d += c;
It should be faster, as it doesn't create temporary objects.Or simply this,
d.append(a).append(b).append(c); //same as above: i.e using '+=' 3 times.

The main reason for not doing a string of individual + concatenations, and especially not doing that in a loop, is that is has O(n2) complexity.
A reasonable alternative with O(n) complexity is to use a simple string builder, like
template< class Char >
class ConversionToString
{
public:
// Visual C++ 10.0 has some DLL linking problem with other types:
CPP_STATIC_ASSERT((
std::is_same< Char, char >::value || std::is_same< Char, wchar_t >::value
));
typedef std::basic_string< Char > String;
typedef std::basic_ostringstream< Char > OutStringStream;
// Just a default implementation, not particularly efficient.
template< class Type >
static String from( Type const& v )
{
OutStringStream stream;
stream << v;
return stream.str();
}
static String const& from( String const& s )
{
return s;
}
};
template< class Char, class RawChar = Char >
class StringBuilder;
template< class Char, class RawChar >
class StringBuilder
{
private:
typedef std::basic_string< Char > String;
typedef std::basic_string< RawChar > RawString;
RawString s_;
template< class Type >
static RawString fastStringFrom( Type const& v )
{
return ConversionToString< RawChar >::from( v );
}
static RawChar const* fastStringFrom( RawChar const* s )
{
assert( s != 0 );
return s;
}
static RawChar const* fastStringFrom( Char const* s )
{
assert( s != 0 );
CPP_STATIC_ASSERT( sizeof( RawChar ) == sizeof( Char ) );
return reinterpret_cast< RawChar const* >( s );
}
public:
enum ToString { toString };
enum ToPointer { toPointer };
String const& str() const { return reinterpret_cast< String const& >( s_ ); }
operator String const& () const { return str(); }
String const& operator<<( ToString ) { return str(); }
RawChar const* ptr() const { return s_.c_str(); }
operator RawChar const* () const { return ptr(); }
RawChar const* operator<<( ToPointer ) { return ptr(); }
template< class Type >
StringBuilder& operator<<( Type const& v )
{
s_ += fastStringFrom( v );
return *this;
}
};
template< class Char >
class StringBuilder< Char, Char >
{
private:
typedef std::basic_string< Char > String;
String s_;
template< class Type >
static String fastStringFrom( Type const& v )
{
return ConversionToString< Char >::from( v );
}
static Char const* fastStringFrom( Char const* s )
{
assert( s != 0 );
return s;
}
public:
enum ToString { toString };
enum ToPointer { toPointer };
String const& str() const { return s_; }
operator String const& () const { return str(); }
String const& operator<<( ToString ) { return str(); }
Char const* ptr() const { return s_.c_str(); }
operator Char const* () const { return ptr(); }
Char const* operator<<( ToPointer ) { return ptr(); }
template< class Type >
StringBuilder& operator<<( Type const& v )
{
s_ += fastStringFrom( v );
return *this;
}
};
namespace narrow {
typedef StringBuilder<char> S;
} // namespace narrow
namespace wide {
typedef StringBuilder<wchar_t> S;
} // namespace wide
Then you can write efficient and clear things like …
using narrow::S;
std::string a = S() << "The answer is " << 6*7;
foo( S() << "Hi, " << username << "!" );

Related

Can you write a copy constructor for a union with const members?

Suppose I have a struct that contains a union with const members, like so:
struct S
{
// Members
const enum { NUM, STR } type;
union
{
const int a;
const std::string s;
};
// Constructors
S(int t_a) : type(NUM), a(t_a);
S(const std::string & t_s) : type(STR), s(t_s);
};
So far, so good. But now say I want to write a copy-constructor for this type.
It doesn't seem like this involves doing anything nefarious, but since I need to initialize the const members in member initializers I don't see how to do this based on logic that depends on the type member.
Questions:
Is it possible to write this constructor?
If not, is this essentially a syntactic oversight, or is there some fundamental reason that the language can't support such a thing?

Yes, it is possible to write copy constructor here. Actually it is already done inside std::variant implementation, which shall support const-types among others. So your class S can be replaced with
using S = std::variant<const int, const std::string>;
But if for dome reason you cannot use std::variant then copy-constructor can be written using std::construct_at function as follows:
#include <string>
struct S {
const enum { NUM, STR } type;
union {
const int a;
const std::string s;
};
S(int t_a) : type(NUM), a(t_a) {}
S(const std::string & t_s) : type(STR), s(t_s) {}
S(const S & rhs) : type(rhs.type) {
if ( type == NUM ) std::construct_at( &a, rhs.a );
if ( type == STR ) std::construct_at( &s, rhs.s );
}
~S() {
if ( type == STR ) s.~basic_string();
}
};
int main() {
S s(1);
S u = s;
S v("abc");
S w = v;
}
Demo: https://gcc.godbolt.org/z/TPe8onhWs

Rambling on std::allocator

I recently had some interest for std::allocator, thinking it might solve an issue I had with some design decision on C++ code.
Now I've read some documentation about it, watched some videos, like Andrei Alexandrescu's one at CppCon 2015, and I now basically understand I shouldn't use them, because they're not designed to work the way I think allocators might work.
That being said, before realising this, I write some test code to see how a custom subclass of std::allocator could work.
Obviously, didn't work as expected... : )
So the question is not about how allocators should be used in C++, but I'm just curious to learn exactly why my test code (provided below) is not working.
Not because I want to use custom allocators. Just curious to see the exact reason...
typedef std::basic_string< char, std::char_traits< char >, TestAllocator< char > > TestString;
int main( void )
{
TestString s1( "hello" );
TestString s2( s1 );
s1 += ", world";
std::vector< int, TestAllocator< int > > v;
v.push_back( 42 );
return 0;
}
Complete code for TestAllocator is provided at the end of this question.
Here I'm simply using my custom allocator with some std::basic_string, and with std::vector.
With std::basic_string, I can see an instance of my allocator is actually created, but no method is called...
So it just looks like it's not used at all.
But with std::vector, my own allocate method is actually being called.
So why is there a difference here?
I did try with different compilers and C++ versions.
Looks like the old GCC versions, with C++98, do call allocate on my TestString type, but not the new ones with C++11 and later.
Clang also don't call allocate.
So just curious to see an explanation about these different behaviours.
Allocator code:
template< typename _T_ >
struct TestAllocator
{
public:
typedef _T_ value_type;
typedef _T_ * pointer;
typedef const _T_ * const_pointer;
typedef _T_ & reference;
typedef const _T_ & const_reference;
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef std::true_type propagate_on_container_move_assignment;
typedef std::true_type is_always_equal;
template< class _U_ >
struct rebind
{
typedef TestAllocator< _U_ > other;
};
TestAllocator( void ) noexcept
{
std::cout << "CTOR" << std::endl;
}
TestAllocator( const TestAllocator & other ) noexcept
{
( void )other;
std::cout << "CCTOR" << std::endl;
}
template< class _U_ >
TestAllocator( const TestAllocator< _U_ > & other ) noexcept
{
( void )other;
std::cout << "CCTOR" << std::endl;
}
~TestAllocator( void )
{
std::cout << "DTOR" << std::endl;
}
pointer address( reference x ) const noexcept
{
return std::addressof( x );
}
pointer allocate( size_type n, std::allocator< void >::const_pointer hint = 0 )
{
pointer p;
( void )hint;
std::cout << "allocate" << std::endl;
p = new _T_[ n ]();
if( p == nullptr )
{
throw std::bad_alloc() ;
}
return p;
}
void deallocate( _T_ * p, std::size_t n )
{
( void )n;
std::cout << "deallocate" << std::endl;
delete[] p;
}
const_pointer address( const_reference x ) const noexcept
{
return std::addressof( x );
}
size_type max_size() const noexcept
{
return size_type( ~0 ) / sizeof( _T_ );
}
void construct( pointer p, const_reference val )
{
( void )p;
( void )val;
std::cout << "construct" << std::endl;
}
void destroy( pointer p )
{
( void )p;
std::cout << "destroy" << std::endl;
}
};
template< class _T1_, class _T2_ >
bool operator ==( const TestAllocator< _T1_ > & lhs, const TestAllocator< _T2_ > & rhs ) noexcept
{
( void )lhs;
( void )rhs;
return true;
}
template< class _T1_, class _T2_ >
bool operator !=( const TestAllocator< _T1_ > & lhs, const TestAllocator< _T2_ > & rhs ) noexcept
{
( void )lhs;
( void )rhs;
return false;
}

std::basic_string can be implemented using the small buffer optimization (a.k.a. SBO or SSO in the context of strings) - this means that it internally stores a small buffer that avoids allocations for small strings. This is very likely the reason your allocator is not being used.
Try changing "hello" to a longer string (more than 32 characters) and it will probably invoke allocate.
Also note that the C++11 standard forbids std::string to be implemented in a COW (copy-on-write) fashion - more information in this question: "Legality of COW std::string implementation in C++11"
The Standard forbids std::vector to make use of the small buffer optimization: more information can be found in this question: "May std::vector make use of small buffer optimization?".

Convert std::string to ci_string

I used this approach to create a case-insensitive typedef for string. Now, I'm trying to convert a std::string to ci_string. All of the following throw compiler errors:
std::string s {"a"};
ci_string cis {s};
ci_string cis (s);
ci_string cis {(ci_string)s};
ci_string cis ((ci_string)s);
ci_string cis = s;
I spent some time trying to figure out how to overload the = operator, and I attempted to use static_cast and dynamic_cast without success. How can I do this?

Your two types are different, so you cannot use the constructor with a regular std::string. But your string is still able to copy a C string, so this should work:
std::string s{"a"};
ci_string cis{ s.data() }; // or s.c_str(), they are the same

std::string and ci_string are unrelated types. Why would static_cast or dynamic_cast be able to convert them? Remember: Two different instantiations of the same template are unrelated types and are potentially completely incompatible.
Give up on the idea of overloading operator= or on some magic that performs the conversion automatically. You have two unrelated types. But they both offer member functions that can you can successfully use to copy the char elements from one to the other.
Just write a simple conversion function that takes advantage of the fact that both std::string and ci_string have their value_type defined as char, and appropriately use one of std::basic_string's constructors, either one which takes a pointer to raw data or one which takes two iterators which form a range.
Here is a complete example:
#include <string>
#include <iostream>
struct ci_char_traits : public std::char_traits<char> {
static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
static bool lt(char c1, char c2) { return toupper(c1) < toupper(c2); }
static int compare(const char* s1, const char* s2, size_t n) {
while( n-- != 0 ) {
if( toupper(*s1) < toupper(*s2) ) return -1;
if( toupper(*s1) > toupper(*s2) ) return 1;
++s1; ++s2;
}
return 0;
}
static const char* find(const char* s, int n, char a) {
while( n-- > 0 && toupper(*s) != toupper(a) ) {
++s;
}
return s;
}
};
typedef std::basic_string<char, ci_char_traits> ci_string;
ci_string to_ci_string(std::string const& src)
{
return ci_string(src.begin(), src.end());
// or:
// return ci_string(src.c_str());
}
int main()
{
std::string s {"a"};
auto cis = to_ci_string(s);
std::cout << cis.c_str() << "\n";
}

Template Class to differentiate object type?

Can I use C++ template classes to differentiate object types? Or what should I use?
Eg. I have a class Synonym and it can be of type Statement, Procedure, etc for example. I have functions that accepts these synonyms and evaluates them depending on its type. So I was thinking it will be nice if I can do something like:
enum Types { Statement, Procedure, Variable, ... };
template <typename Types>
class Synonym { ... }
void evaluate(Synonym<Statement> s, Synonym<Variable> v) { do something }
^ so that I can do this ... instead of checking the type in function like:
void evaluate(Synonym s, Synonym v) {
assert(s.type == Statement);
assert(v.type == Variable);
// also would like to eliminate things like: (if possible)
switch(s.type) {
case XXX: doSomething ...
case YYY: doAnotherThing ...
}
}

You could create a function template and then specialize on that template
template<typename Type>
void evaluate (Type t) {}
template<>
void evaluate<Statement>( Statement s)
{}
This way, when you pass a Statement it will pick that overload, and you can do different behaviors depending on type.

I think using a variant and visitor pattern would be suited. Have a look at Boost.Variant here: http://www.boost.org/doc/libs/1_51_0/doc/html/variant.html, the last example (also below but expanded) shows a visitor implementation. There are also other variant and visitor implementations. std::any and loki are also options. I personally like loki but that is probably just because I'm a huge fan of Alexandrescu.
#include "boost/variant.hpp"
#include <iostream>
class ToLengthVisitor : public boost::static_visitor<int>
{
public:
int operator()(int i) const
{
return i;
}
int operator()(const std::string & str) const
{
return str.length();
}
int operator()(const char * str) const
{
const char * temp = str;
while(*temp != '\0') temp++;
return temp-str;
}
};
int main()
{
typedef boost::variant< int, std::string, const char * > MyVariant;
MyVariant u(std::string("hello world"));
std::cout << u; // output: hello world
MyVariant cu(boost::get<std::string>(u).c_str());
int result = boost::apply_visitor( ToLengthVisitor(), u );
std::cout << result; // output: 11 (i.e., length of "hello world")
result = boost::apply_visitor( ToLengthVisitor(), cu );
std::cout << result; // output: 11 (i.e., length of "hello world")
}

Can I write the contents of vector<bool> to a stream directly from the internal buffer?

I know vector< bool > is "evil", and dynamic_bitset is preferred (bitset is not suitable) but I am using C++ Builder 6 and I don't really want to pursue the Boost route for such an old version. I tried :
int RecordLen = 1;
int NoBits = 8;
std::ofstream Binary( FileNameBinary );
vector< bool > CaseBits( NoBits, 0 );
Binary.write( ( const char * ) & CaseBits[ 0 ], RecordLen);
but the results are incorrect. I suspect that the implementation may mean this is a stupid thing to try, but I don't know.

Operator[] for vector <bool> doesn't return a reference (because bits are not addressable), so taking the return value's address is going to be fraught with problems. Have you considered std::deque <bool>?

the bool vector specialization does not return a reference to bool.
see here, bottom of the page.

It's too late for me to decide how compliant this is, but it works for me: give the bitvector a custom allocator to alias the bits to your own buffer.
Can someone weigh in with whether the rebound allocator inside the vector is required to be copy-constructed from the one passed in? Works on GCC 4.2.1. I seem to recall that the functionality is required for C++0x, and since it's not incompatible with anything in C++03 and is generally useful, support may already be widespread.
Of course, it's implementation-defined whether bits are stored forwards or backwards or left- or right-justified inside whatever storage vector<bool> uses, so take great care.
#include <vector>
#include <iostream>
#include <iomanip>
using namespace std;
template< class T >
struct my_alloc : allocator<T> {
template< class U > struct rebind {
typedef my_alloc<U> other;
};
template< class U >
my_alloc( my_alloc<U> const &o ) {
buf = o.buf;
}
my_alloc( void *b ) { buf = b; }
// noncompliant with C++03: no default constructor
T *allocate( size_t, const void *hint=0 ) {
return static_cast< T* >( buf );
}
void deallocate( T*, size_t ) { }
void *buf;
};
int main() {
unsigned long buf[ 2 ];
vector<bool, my_alloc<bool> > blah( 128, false, my_alloc<bool>( buf ) );
blah[3] = true;
blah[100] = true;
cerr << hex << setw(16) << buf[0] << " " << setw(16) << buf[1] << endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::string and multiple concatenations - c++

The obvious answer: because the standard doesn't allow it. It impacts code by introducing an additional user defined conversion in some cases: if C is a type having a user defined constructor taking an std::string, then it would make: C obj = stringA + stringB; illegal.

Sounds to me like something like this already exists: std::stringstream. Only you have << instead of +. Just because std::string::operator + exists, it doesn't make it the most efficient option.

I think if you use +=, then it will be little faster: d += a; d += b; d += c; It should be faster, as it doesn't create temporary objects.Or simply this, d.append(a).append(b).append(c); //same as above: i.e using '+=' 3 times.

Related

Can you write a copy constructor for a union with const members?

Rambling on std::allocator

Convert std::string to ci_string

Template Class to differentiate object type?

Can I write the contents of vector<bool> to a stream directly from the internal buffer?

Categories

Resources