std::string implementation and expression templates - c++

It seems that std::string - because it doesn't use expression templates - has a O(n^2) complexity instead of a possibly O(n) complexity for some operations like concatenation. Same thing with a std::stringstream class when you have to insert many elements.
I would like to understand this, at least if someone could have some good links about this point that would be great.

Concatenating multiple strings together has different complexities in C++ depending how it is done. I believe the situation that you're thinking of is:
string result = string("Hello, ") + username + "! " +
"Welcome to " + software_product + ".";
which concatenates 6 strings. The first string is copied 5 times, the second is copied 4 times, and so forth. As Leonid Volnitsky notes in his answer, the exact bound for this Θ(NM), where M is the number of concatenation operations, and N is the total length of the strings being concatenated. We can also call this O(N^2) when M <= N. Note that it's not guaranteed that M <= N, because you can increase M without increasing N by trying to concatenate the empty string.
Expression templates could help speed up this use case, though it would cause problems with auto and decltype type inference in C++11, as well as with template type inference in C++98. All of these would deduce the type of
auto result = string("Hello, ") + username + "! " +
"Welcome to " + software_product + ".";
to be the lazily-evaluated string template type that was used to make the expression template magic happen. Other reasons why expression templates are not a great idea include Leonid Volnitsky's answer about this slowing compilation time. It would probably also increase the size of your compiled binary.
Instead, there are other solutions in C++ could be used to get Θ(N) concatenation:
string result = "Hello, ";
result += username;
result += "! ";
result += "Welcome to ";
result += software_product;
result += ".";
In this version, the string is modified in place, and while data that has already been copied into result sometimes needs to be recopied, C++ strings are typically implemented as dynamic arrays that allocate new space exponentially, so that the insertion of each new character takes amortized constant time, leading to overall Θ(N) behavior for repeated concatenation.
The following is a way of doing the same thing, almost on one line. It uses the same principle internally, but also supports converting non-string types to strings using << overloading.
stringstream result;
result << "Hello, " << username << "! " << "Welcome to " << software_product
<< ".";
// do something with result.str()
Lastly, the C++ standard library doesn't include this, but one could define the following function with some stringstream magic inside it. The implementation is left as an exercise for the reader.
template <typename... Items>
std::string concat(std::string const& a, std::string const& b, Items&&... args)
You can then call concat for repeated concatenation on one line in O(N) time:
string result = concat("Hello, ", username, "! ", "Welcome to ",
software_product, ".");
Presumably, all of these are better solutions than messing up type inference by creating an expression template type.

If we have total length of strings expression N, and M concatenation operations then complexity should be O(NM).
It is definitely possible to speed up it up with expression templates. It was not done probably because of their complexity. Compilation speed would be slower too - it will need M-recursive types.

I find hard to believe that a concatenation would go up to O(n^2), but here goes some answer related to your question:
The C++ standard doesn't specify implementation details, and only
specifies complexity requirements in some cases. The only complexity
requirements on std::string operations are that size(), max_size(),
operator[], swap(), c_str() and data() are all constant time. The
complexity of anything else depends on the choices made by whoever
implemented the library you're using.
See reference: C++ string::find complexity

How can it be O(N^2)? String concatenation is:
reallocation if necessary (constant amortized time);
finding the end of the first string (constant time, since they are counted strings);
character copy from the second string (O(N)).
I don't see how templates can have anything to do with this.

Related

Strlen for strings in c++

I have converted c++ to c type string and working with strlen but it is not working.
#include<bits/stdc++.h>
using namespace std;
int main(){
string s("Hello");
s.c_str();
cout<<strlen(s);
}
s.c_str();
This code has no side effects on s actually regarding a conversion or such. You want to process the result further.
Instead of
cout<<strlen(s);
you want to have either
cout<<strlen(s.c_str());
or
cout<<s.size();
Where the latter is the certainly more efficient, because the standard requires a time complexity of O(1) for std::string::size(), where strlen() cannot guarantee a better time complexity than O(strlen(s)).
If you want to waste clock-cycles in your processor:
cout << strlen(s.c_str());
will count every character up to the first nul character in s.
If you just want to know how long the string is:
cout << s.length();
or
cout << s.size();
will give you that (and with an algorithm that is O(1) rather than the strlen algorithm that is O(n) - meaning that a million character long string takes one million times longer to measure the length of than a single character string). [Of course, if you have a std::string that contains a nul character in the middle, you may find that one of these methods is "right" and the other is "wrong" because it either gives too long or too short a length - a C style string is not allowed to contain a nul character in the middle of the string, it is only allowed as a termination]
Why use strlen when the string class does the business?
Please read http://en.cppreference.com/w/cpp/string/basic_string and not the size/length method
Otherwise do as the #πάντα ῥεῖ suggests

How to cleanly use: const char* and std::string?

tl:dr
How can I concatenate const char* with std::string, neatly and
elegantly, without multiple function calls. Ideally in one function
call and have the output be a const char*. Is this impossible, what
is an optimum solution?
Initial Problem
The biggest barrier I have experienced with C++ so far is how it handles strings. In my opinion, of all the widely used languages, it handles strings the most poorly. I've seen other questions similar to this that either have an answer saying "use std::string" or simply point out that one of the options is going to be best for your situation.
However this is useless advice when trying to use strings dynamically like how they are used in other languages. I cannot guaranty to always be able to use std::string and for the times when I have to use const char* I hit the obvious wall of "it's constant, you can't concatenate it".
Every solution to any string manipulation problem I've seen in C++ requires repetitive multiple lines of code that only work well for that format of string.
I want to be able to concatenate any set of characters with the + symbol or make use of a simple format() function just how I can in C# or Python. Why is there no easy option?
Current Situation
Standard Output
I'm writing a DLL and so far I've been output text to cout via the << operator. Everything has been going fine so far using simple char arrays in the form:
cout << "Hello world!"
Runtime Strings
Now it comes to the point where I want to construct a string at runtime and store it with a class, this class will hold a string that reports on some errors so that they can be picked up by other classes and maybe sent to cout later, the string will be set by the function SetReport(const char* report). So I really don't want to use more than one line for this so I go ahead and write something like:
SetReport("Failure in " + __FUNCTION__ + ": foobar was " + foobar + "\n"); // __FUNCTION__ gets the name of the current function, foobar is some variable
Immediately of course I get:
expression must have integral or unscoped enum type and...
'+': cannot add two pointers
Ugly Strings
Right. So I'm trying to add two or more const char*s together and this just isn't an option. So I find that the main suggestion here is to use std::string, sort of weird that typing "Hello world!" doesn't just give you one of those in the first place but let's give it a go:
SetReport(std::string("Failure in ") + std::string(__FUNCTION__) + std::string(": foobar was ") + std::to_string(foobar) + std::string("\n"));
Brilliant! It works! But look how ugly that is!! That's some of the ugliest code I've every seen. We can simplify to this:
SetReport(std::string("Failure in ") + __FUNCTION__ + ": foobar was " + std::to_string(foobar) + "\n");
Still possibly the worst way I've every encounter of getting to a simple one line string concatenation but everything should be fine now right?
Convert Back To Constant
Well no, if you're working on a DLL, something that I tend to do a lot because I like to unit test so I need my C++ code to be imported by the unit test library, you will find that when you try to set that report string to a member variable of a class as a std::string the compiler throws a warning saying:
warning C4251: class 'std::basic_string<_Elem,_Traits,_Alloc>' needs to have dll-interface to be used by clients of class'
The only real solution to this problem that I've found other than "ignore the warning"(bad practice!) is to use const char* for the member variable rather than std::string but this is not really a solution, because now you have to convert your ugly concatenated (but dynamic) string back to the const char array you need. But you can't just tag .c_str() on the end (even though why would you want to because this concatenation is becoming more ridiculous by the second?) you have to make sure that std::string doesn't clean up your newly constructed string and leave you with garbage. So you have to do this inside the function that receives the string:
const std::string constString = (input);
m_constChar = constString.c_str();
Which is insane. Because now I traipsed across several different types of string, made my code ugly, added more lines than should need and all just to stick some characters together. Why is this so hard?
Solution?
So what's the solution? I feel that I should be able to make a function that concatenates const char*s together but also handle other object types such as std::string, int or double, I feel strongly that this should be capable in one line, and yet I'm unable to find any examples of it being achieved. Should I be working with char* rather than the constant variant, even though I've read that you should never change the value of char* so how would this help?
Are there any experienced C++ programmers who have resolved this issue and are now comfortable with C++ strings, what is your solution? Is there no solution? Is it impossible?
The standard way to build a string, formatting non-string types as strings, is a string stream
#include <sstream>
std::ostringstream ss;
ss << "Failure in " << __FUNCTION__ << ": foobar was " << foobar << "\n";
SetReport(ss.str());
If you do this often, you could write a variadic template to do that:
template <typename... Ts> std::string str(Ts&&...);
SetReport(str("Failure in ", __FUNCTION__, ": foobar was ", foobar, '\n'));
The implementation is left as an exercise for the reader.
In this particular case, string literals (including __FUNCTION__) can be concatenated by simply writing one after the other; and, assuming foobar is a std::string, that can be concatenated with string literals using +:
SetReport("Failure in " __FUNCTION__ ": foobar was " + foobar + "\n");
If foobar is a numeric type, you could use std::to_string(foobar) to convert it.
Plain string literals (e.g. "abc" and __FUNCTION__) and char const* do not support concatenation. These are just plain C-style char const[] and char const*.
Solutions are to use some string formatting facilities or libraries, such as:
std::string and concatenation using +. May involve too many unnecessary allocations, unless operator+ employs expression templates.
std::snprintf. This one does not allocate buffers for you and not type safe, so people end up creating wrappers for it.
std::stringstream. Ubiquitous and standard but its syntax is at best awkward.
boost::format. Type safe but reportedly slow.
cppformat. Reportedly modern and fast.
One of the simplest solution is to use an C++ empty string. Here I declare empty string variable named _ and used it in front of string concatenation. Make sure you always put it in the front.
#include <cstdio>
#include <string>
using namespace std;
string _ = "";
int main() {
char s[] = "chararray";
string result =
_ + "function name = [" + __FUNCTION__ + "] "
"and s is [" + s + "]\n";
printf( "%s", result.c_str() );
return 0;
}
Output:
function name = [main] and s is [chararray]
Regarding __FUNCTION__, I found that in Visual C++ it is a macro while in GCC it is a variable, so SetReport("Failure in " __FUNCTION__ "; foobar was " + foobar + "\n"); will only work on Visual C++. See: https://msdn.microsoft.com/en-us/library/b0084kay.aspx and https://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
The solution using empty string variable above should work on both Visual C++ and GCC.
My Solution
I've continued to experiment with different things and I've got a solution which combines tivn's answer that involves making an empty string to help concatenate long std::string and character arrays together and a function of my own which allows single line copying of that std::string to a const char* which is safe to use when the string object leaves scope.
I would have used Mike Seymour's variadic templates but they don't seem to be supported by the Visual Studio 2012 I'm running and I need this solution to be very general so I can't rely on them.
Here is my solution:
Strings.h
#ifndef _STRINGS_H_
#define _STRINGS_H_
#include <string>
// tivn's empty string in the header file
extern const std::string _;
// My own version of .c_str() which produces a copy of the contents of the string input
const char* ToCString(std::string input);
#endif
Strings.cpp
#include "Strings.h"
const std::string str = "";
const char* ToCString(std::string input)
{
char* result = new char[input.length()+1];
strcpy_s(result, input.length()+1, input.c_str());
return result;
}
Usage
m_someMemberConstChar = ToCString(_ + "Hello, world! " + someDynamicValue);
I think this is pretty neat and works in most cases. Thank you everyone for helping me with this.
As of C++20, fmtlib has made its way into the ISO standard but, even on older iterations, you can still download and use it.
It gives similar capabilities as Python's str.format()(a), and your "ugly strings" example then becomes a relatively simple:
#include <fmt/format.h>
// Later on, where code is allowed (inside a function for example) ...
SetReport(fmt::format("Failure in {}: foobar was {}\n", __FUNCTION__, foobar));
It's much like the printf() family but with extensibility and type safety built in.
(a) But, unfortunately, not its string interpolation feature (use of f-strings), which has the added advantage of putting the expressions in the string at the place where they're output, something like:
set_report(f"Failure in {__FUNCTION__}: foobar was {foobar}\n");
If fmtlib ever got that capability, I'd probably wet my pants in excitement :-)

Does an empty string contain an empty string in C++?

Just had an interesting argument in the comment to one of my questions. My opponent claims that the statement "" does not contain "" is wrong.
My reasoning is that if "" contained another "", that one would also contain "" and so on.
Who is wrong?
P.S.
I am talking about a std::string
P.S. P.S
I was not talking about substrings, but even if I add to my question " as a substring", it still makes no sense. An empty substring is nonsense. If you allow empty substrings to be contained in strings, that means you have an infinity of empty substrings. What is the point of that?
Edit:
Am I the only one that thinks there's something wrong with the function std::string::find?
C++ reference clearly says
Return Value: The position of the first character of the first match.
Ok, let's assume it makes sense for a minute and run this code:
string empty1 = "";
string empty2 = "";
int postition = empty1.find(empty2);
cout << "found \"\" at index " << position << endl;
The output is: found "" at index 0
Nonsense part: how can there be index 0 in a string of length 0? It is nonsense.
To be able to even have a 0th position, the string must be at least 1 character long.
And C++ is giving a exception in this case, which proves my point:
cout << empty2.at( empty1.find(empty2) ) << endl;
If it really contained an empty string it would had no problem printing it out.
It depends on what you mean by "contains".
The empty string is a substring of the empty string, and so is contained in that sense.
On the other hand, if you consider a string as a collection of characters, the empty string can't contain the empty string, because its elements are characters, not strings.
Relating to sets, the set
{2}
is a subset of the set
A = {1, 2, 3}
but {2} is not a member of A - all A's members are numbers, not sets.
In the same way, {} is a subset of {}, but {} is not an element in {} (it can't be because it's empty).
So you're both right.
C++ agrees with your "opponent":
#include <iostream>
#include <string>
using namespace std;
int main()
{
bool contains = string("").find(string("")) != string::npos;
cout << "\"\" contains \"\": "
<< boolalpha << contains;
}
Output: "" contains "": true
Demo
It's easy. String A contains sub-string B if there is an argument offset such that A.substr(offset, B.size()) == B. No special cases for empty strings needed.
So, let's see. std::string("").substr(0,0) turns out to be std::string(""). And we can even check your "counter-example". std::string("").substr(0,0).substr(0,0) is also well-defined and empty. Turtles all the way down.
The first thing that is unclear is whether you are talking about std::string or null terminated C strings, the second thing is why should it matter?. I will assume std::string.
The requirements on std::string determine how the component must behave, not what its internal representation must be (although some of the requirements affect the internal representation). As long as the requirements for the component are met, whether it holds something internally is an implementation detail that you might not even be able to test.
In the particular case of an empty string, there is nothing that mandates that it holds anything. It could just hold a size member set to 0 and a pointer (for the dynamically allocated memory if/when not empty) also set to 0. The requirement in operator[] requires that it returns a reference to a character with value 0, but since that character cannot be modified without causing undefined behavior, and since strict aliasing rules allow reading from an lvalue of char type, the implementation could just return a reference to one of the bytes in the size member (all set to 0) in the case of an empty string.
Some implementations of std::string use small object optimizations, in those implementations there will be memory reserved for small strings, including an empty string. While the std::string will obviously not contain a std::string internally, it might contain the sequence of characters that compose an empty string (i.e. a terminating null character)
empty string doesn't contain anything - it's EMPTY. :)
Of course an empty string does not contain an empty string. It'll be turtles all the way down if it did.
Take String empty = ""; that is declaring a string literal that is empty, if you want a string literal to represent a string literal that is empty you would need String representsEMpty = """"; but of course, you need to escape it, giving you string actuallyRepresentsEmpty = "\"\"";
ps, I am taking a pragmatic approach to this. Leave the maths nonsense at the door.
Thinking about you amendment, it could be possible that your 'opponent' meant was that an 'empty' std::string still has an internal storage for characters which is itself empty of characters. That would be an implementation detail I am sure, it could perhaps just keep a certain size (say 10) array of characters 'just incase', so it will technically not be empty.
Of course, there is the trick question answer that 'nothing' fits into anything infinite times, a sort of 'divide by zero' situation.
Today I had the same question since I'm currently bound to a lousy STL implementation (dating back to the pre-C++98 era) that differs from C++98 and all following standards:
TEST_ASSERT(std::string().find(std::string()) == string::npos); // WRONG!!! (non-standard)
This is especially bad if you try to write portable code because it's so hard to prove that no feature depends on that behaviour. Sadly in my case that's actually true: it does string processing to shorten phone numbers input depending on a subscriber line spec.
On Cppreference, I see in std::basic_string::find an explicit description about empty strings that I think matches exactly the case in question:
an empty substring is found at pos if and only if pos <= size()
The referred pos defines the position where to start the search, it defaults to 0 (the beginning).
A standard-compliant C++ Standard Library will pass the following tests:
TEST_ASSERT(std::string().find(std::string()) == 0);
TEST_ASSERT(std::string().substr(0, 0).empty());
TEST_ASSERT(std::string().substr().empty());
This interpretation of "contain" answers the question with yes.

C++ std::string append vs push_back()

This really is a question just for my own interest I haven't been able to determine through the documentation.
I see on http://www.cplusplus.com/reference/string/string/ that append has complexity:
"Unspecified, but generally up to linear in the new string length."
while push_back() has complexity:
"Unspecified; Generally amortized constant, but up to linear in the new string length."
As a toy example, suppose I wanted to append the characters "foo" to a string. Would
myString.push_back('f');
myString.push_back('o');
myString.push_back('o');
and
myString.append("foo");
amount to exactly the same thing? Or is there any difference? You might figure that append would be more efficient because the compiler would know how much memory is required to extend the string the specified number of characters, while push_back may need to secure memory each call?
In C++03 (for which most of "cplusplus.com"'s documentation is written), the complexities were unspecified because library implementers were allowed to do Copy-On-Write or "rope-style" internal representations for strings. For instance, a COW implementation might require copying the entire string if a character is modified and there is sharing going on.
In C++11, COW and rope implementations are banned. You should expect constant amortized time per character added or linear amortized time in the number of characters added for appending to a string at the end. Implementers may still do relatively crazy things with strings (in comparison to, say std::vector), but most implementations are going to be limited to things like the "small string optimization".
In comparing push_back and append, push_back deprives the underlying implementation of potentially useful length information which it might use to preallocate space. On the other hand, append requires that an implementation walk over the input twice in order to find that length, so the performance gain or loss is going to depend on a number of unknowable factors such as the length of the string before you attempt the append. That said, the difference is probably extremely Extremely EXTREMELY small. Go with append for this -- it is far more readable.
I had the same doubt, so I made a small test to check this (g++ 4.8.5 with C++11 profile on Linux, Intel, 64 bit under VmWare Fusion).
And the result is interesting:
push :19
append :21
++++ :34
Could be possible this is because of the string length (big), but the operator + is very expensive compared with the push_back and the append.
Also it is interesting that when the operator only receives a character (not a string), it behaves very similar to the push_back.
For not to depend on pre-allocated variables, each cycle is defined in a different scope.
Note : the vCounter simply uses gettimeofday to compare the differences.
TimeCounter vCounter;
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.push_back('a');
vTest.push_back('b');
vTest.push_back('c');
}
vCounter.stop();
cout << "push :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.append("abc");
}
vCounter.stop();
cout << "append :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest += 'a';
vTest += 'b';
vTest += 'c';
}
vCounter.stop();
cout << "++++ :" << vCounter.elapsed() << endl;
}
Add one more opinion here.
I personally consider it better to use push_back() when adding characters one by one from another string. For instance:
string FilterAlpha(const string& s) {
string new_s;
for (auto& it: s) {
if (isalpha(it)) new_s.push_back(it);
}
return new_s;
}
If using append()here, I would replace push_back(it) with append(1,it), which is not that readable to me.
Yes, I would also expect append() to perform better for the reasons you gave, and in a situation where you need to append a string, using append() (or operator+=) is certainly preferable (not least also because the code is much more readable).
But what the Standard specifies is the complexity of the operation. And that is generally linear even for append(), because ultimately each character of the string being appended (and possible all characters, if reallocation occurs) needs to be copied (this is true even if memcpy or similar are used).

What is wrong with this string assignment?

string s="abcdefghijklmnopqrstuvwxyz"
char f[]=" " (s.substr(s.length()-10,9)).c_str() " ";
I want to get the last 9 characters of s and add " " to the beginning and the end of the substring, and store it as a char[]. I don't understand why this doesn't work even though char f[]=" " "a" " " does.
Is (s.substr(s.length()-10,9)).c_str() not a string literal?
No, it's not a string literal. String literals always have the form "<content>" or expand to that (macros, like __FILE__ for example).
Just use another std::string instead of char[].
std::string f = " " + s.substr(s.size()-10, 9) + " ";
First, consider whether you should be using cstrings. In C++, generally, use string.
However, if you want to use cstrings, the concatenation of "abc" "123" -> "abc123" is a preprocessor operation and so cannot be used with string::c_str(). Instead, the easiest way is to construct a new string and take the .c_str() of that:
string s="abcdefghijklmnopqrstuvwxyz"
char f[]= (string(" ") + s.substr(s.length()-10,9) + " ").c_str();
(EDIT: You know what, on second thought, that's a really bad idea. The cstring should be deallocated after the end of this statement, so using f can cause a segfault. Just don't use cstrings unless you're prepared to mess with strcpy and all that ugly stuff. Seriously.)
If you want to use strings instead, consider something like the following:
#include <sstream>
...
string s="abcdefghijklmnopqrstuvwxyz"
stringstream tmp;
tmp << " " << s.substr(s.length()-10,9) << " ";
string f = tmp.str();
#Xeo tells you how to solve your problem. Here's some complimentary background on how string literals are handled in the compilation process.
From section A.12 Preprocessing of The C Programming language:
Escape sequences in character constants and string literals (Pars. A.2.5.2, A.2.6) are
replaced by their equivalents; then adjacent string literals are concatenated.
It's the Preprocessor, not the compiler, who's responsible for the concatenation. (You asked for a C++ answer. I expect that C++ treats string literals the same way as C). The preprocessor has only a limited knowledge of the C/C++ language; the (s.substr(s.length()-10,9)).c_str() part is not evaluated at the preprocessor stage.