I'm finding standard string addition to be very slow so I'm looking for some tips/hacks that can speed up some code I have.
My code is basically structured as follows:
inline void add_to_string(string data, string &added_data) {
if(added_data.length()<1) added_data = added_data + "{";
added_data = added_data+data;
}
int main()
{
int some_int = 100;
float some_float = 100.0;
string some_string = "test";
string added_data;
added_data.reserve(1000*64);
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
stringstream fragment;
fragment<<some_int <<","<<some_float<<","<<some_string;
add_to_string(fragment.str(),added_data);
}
return;
}
Doing some basic profiling, I'm finding that a ton of time is being used in the for loop. Are there some things I can do that will significantly speed this up? Will it help to use c strings instead of c++ strings?
String addition is not the problem you are facing. std::stringstream is known to be slow due to it's design. On every iteration of your for-loop the stringstream is responsible for at least 2 allocations and 2 deletions. The cost of each of these 4 operations is likely more than that of the string addition.
Profile the following and measure the difference:
std::string stringBuffer;
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
char buffer[128];
sprintf(buffer, "%i,%f,%s",some_int,some_float,some_string.c_str());
stringBuffer = buffer;
add_to_string(stringBuffer ,added_data);
}
Ideally, replace sprintf with _snprintf or the equivalent supported by your compiler.
As a rule of thumb, use stringstream for formatting by default and switch to the faster and less safe functions like sprintf, itoa, etc. whenever performance matters.
Edit: that, and what didierc said: added_data += data;
You can save lots of string operations if you do not call add_to_string in your loop.
I believe this does the same (although I am not a C++ expert and do not know exactly what stringstream does):
stringstream fragment;
for(int ii=0;ii<1000;ii++)
{
//variables manipulated here
some_int = ii;
some_float += ii;
some_string.assign(ii%20,'A');
//then we concatenate the strings!
fragment<<some_int<<","<<some_float<<","<<some_string;
}
// inlined add_to_string call without the if-statement ;)
added_data = "{" + fragment.str();
I see you used the reserve method on added_data, which should help by avoiding multiple reallocations of the string as it grows.
You should also use the += string operator where possible:
added_data += data;
I think that the above should save up some significant time by avoiding unecessary copies back and forth of added_data in a temporary string when doing the catenation.
This += operator is a simpler version of the string::append method, it just copies data directly at the end of added_data. Since you made the reserve, that operation alone should be very fast (almost equivalent to a strcpy).
But why going through all this, when you are already using a stringstream to handle input? Keep it all in there to begin with!
The stringstream class is indeed not very efficient.
You may have a look at the stringstream class for more information on how to use it, if necessary, but your solution of using a string as a buffer seems to avoid that class speed issue.
At any rate, stay away from any attempt at reimplementing the speed critical code in pure C unless you really know what you are doing. Some other SO posts support the idea of doing it,, but I think it's best (read safer) to rely as much as possible on the standard library, which will be enhanced over time, and take care of many corner cases you (or I) wouldn't think of. If your input data format is set in stone, then you might start thinking about taking that road, but otherwise it's premature optimization.
If you start added_data with a "{", you would be able to remove the if from your add_to_string method: the if gets executed exactly once, when the string is empty, so you might as well make it non-empty right away.
In addition, your add_to_string makes a copy of the data; this is not necessary, because it does not get modified. Accepting the data by const reference should speed things up for you.
Finally, changing your added_data from string to sstream should let you append to it in a loop, without the sstream intermediary that gets created, copied, and thrown away on each iteration of the loop.
Please have a look at Twine used in LLVM.
A Twine is a kind of rope, it represents a concatenated string using a
binary-tree, where the string is the preorder of the nodes. Since the
Twine can be efficiently rendered into a buffer when its result is used,
it avoids the cost of generating temporary values for intermediate string
results -- particularly in cases when the Twine result is never
required. By explicitly tracking the type of leaf nodes, we can also avoid
the creation of temporary strings for conversions operations (such as
appending an integer to a string).
It may helpful in solving your problem.
How about this approach?
This is a DevPartner for MSVC 2010 report.
string newstring = stringA & stringB;
i dont think strings are slow, its the conversions that can make it slow
and maybe your compiler that might check variable types for mismatches.
Related
I want to create a function that will take a string and an integer as parameters and return a string that contains the string parameter repeated the given number of times.
For example:
std::string MakeDuplicate( const std::string& str, int x )
{
...
}
Calling MakeDuplicate( "abc", 3 ); would return "abcabcabc".
I know I can do this just by looping x number of times but I'm sure there must be a better way.
I don't see a problem with looping, just make sure you do a reserve first:
std::string MakeDuplicate( const std::string& str, int x )
{
std::string newstr;
newstr.reserve(str.length()*x); // prevents multiple reallocations
// loop...
return newstr;
}
At some point it will have to be a loop. You may be able to hide the looping in some fancy language idiom, but ultimately you're going to have to loop.
For small 'x' simple loop is your friend. For large 'x and relatively short 'str' we can think of a "smarter" solution by reusing already concatenated string.
std::string MakeDuplicate( const std::string& str, unsigned int x ) {
std::string newstr;
if (x>0) {
unsigned int y = 2;
newstr.reserve(str.length()*x);
newstr.append(str);
while (y<x) {
newstr.append(newstr);
y*=2;
}
newstr.append(newstr.c_str(), (x-y/2)*str.length());
}
return newstr;
}
Or something like that :o) (I think it can be written in a nicer way but idea is there).
EDIT: I was intersted myself and did some tests comparing three solutions on my notebook with visual studio (reuse version, simple loop with preallocation, simple copy&loop-1 without preallocation). Results as expected: for small x(<10) preallocation version is generally fastest, no preallocation was tiny bit slower, for larger x speedup of 'reuse' version is really significant (log n vs n complexity). Nice, I just can't think of any real problem that could use it :o)
There is an alternative to a loop, its called recursion, and of recursion tail-recursion is the nicest variety since you can theoretically do it till the end of time -- just like a loop :D
p.s., tail-recursion is often syntactic sugar for a loop -- however in the case of procedural languages (C++), the compiler is generally at loss, so the tail-recursion is not optimised and you might run out of memory (but if you wrote a recursion that runs out of memory than you have bigger problems) :D
more downvotes please !!
recursion is obviously not a construct used in computer science for the same job as looping
Edit: Solutions must compile against Microsoft Visual Studio 2012.
I want to use a known string length to declare another string of the same length.
The reasoning is the second string will act as a container for operation done to the first string which must be non volatile with regards to it.
e.g.
const string messy "a bunch of letters";
string dostuff(string sentence) {
string organised NNN????? // Idk, just needs the same size.
for ( x = 0; x < NNN?; x++) {
organised[x] = sentence[x]++; // Doesn't matter what this does.
}
}
In both cases above, the declaration and the exit condition, the NNN? stands for the length of 'messy'.
How do I discover the length at compile time?
std::string has two constructors which could fit your purposes.
The first, a copy constructor:
string organised(sentence);
The second, a constructor which takes a character and a count. You could initialize a string with a temporary character.
string organised(sentence.length(), '_');
Alternatively, you can:
Use an empty string and append (+=) text to it as you go along, or
Use a std::stringstream for the same purpose.
the stringstream will likely be more efficient.
Overall, I would prefer the copy constructor if the length is known.
std::string isn't a compile time type (it can't be a constexpr), so you can't use it directly to determine the length at compile time.
You could initialize a constexpr char[] and then use sizeof on that:
constexpr char messychar[] = "a bunch of letters";
// - 1 to avoid including NUL terminator which std::string doesn't care about
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
const string messy(messychar);
and use that, but frankly, that's pretty ugly; the length would be compile time, but organized would need to use the count and char constructor that would still be performed on each call, allocating and initializing only to have the contents replaced in the loop.
While it's not compile time, you'd avoid that initialization cost by just using reserve and += to build the new string, which with the #define could be done in an ugly but likely efficient way as:
constexpr char messychar[] = "a bunch of letters";
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
// messy itself may not be needed, but if it is, it's initialized optimally
// by using the compile time calculated length, so there is no need to scan for
// NUL terminators, and it can reserve the necessary space in the initial alloc
const string messy(messychar, messylen);
string dostuff(string sentence) {
string organised;
organized.reserve(messylen);
for (size_t x = 0; x < messylen; x++) {
organised += sentence[x]++; // Doesn't matter what this does.
}
}
This avoids setting organised's values more than once, allocating more than once (well, possibly twice if initial construction performs it) per call, and only performs a single read/write pass of sentence, no full read followed by read/write or the like. It also makes the loop constraint a compile time value, so the compiler has the opportunity to unroll the loop (though there is no guarantee of this, and even if it happens, it may not be helpful).
Also note: In your example, you mutate sentence, but it's accepted by value, so you're mutating the local copy, not the caller copy. If mutation of the caller value is required, accept it by reference, and if mutation is not required, accept by const reference to avoid a copy on every call (I understand the example code was filler, just mentioning this).
This question already has answers here:
Most optimized way of concatenation in strings
(9 answers)
Closed 9 years ago.
Consider first that the amount of total data that will be stored in the output string will almost certainly be small and so I doubt any of these have a noticeable affect on performance. My primary goal is to find a way to concatenate a series of const char*'s of unknown size that doesn't look terrible while also keeping efficiency in mind. Below are the results of my search:
Method 1:
std::string str = std::string(array1) + array2 + array3;
Method 2:
std::string str(array1);
str += array2;
str += array3;
I decided on the first method as it is short and concise. If I'm not mistaken, both methods will invoke the same series of operations. the unoptimized compiler would first create a temporary string and internally allocate some amount of space for its buffer >= sizeof(array1). If that buffer is sufficiently large, the additional + operations will not require any new allocations. Finally, if move semantics are supported, then the buffers of the temporary and named str are swapped.
Are there any other ways to perform such an operation that also look nice and don't incur terrible overhead?
Remember, that, in case of arrays, sizeof(array) returns actual size (aka length) of it's parameter, if it has been declared as an array of explicit size (and you wrote 'series of const char*'s of unknown size'). So, assuming you want to create universal solution, strlen() should come under consideration instead.
I don't think you can avoid all additional operations. In case of many concatenations, the best solution would be to allocate buffer, that is large enough to store all concatenated strings.
We can easily deduce, that the most optimal version of append() in this case is:
string& append (const char* s, size_t n);
Why? Because reference says: 'If s does not point to an array long enough (...), it causes undefined behavior'. So we can assume, that internally no additional checks take place (especially additional strlen() calls). Which is good, since you are completely sure, that values passed to append() are correct and you can avoid unnecesary overhead.
Now, the actual concatenation can be done like this:
len_1 = strlen(array_1);
len_2 = strlen(array_2);
len_3 = strlen(array_3);
//Preallocate enough space for all arrays. Only one reallocation takes place.
target_string.reserve(len_1 + len_2 + len_3);
target_string.append(array_1, len_1);
target_string.append(array_2, len_2);
target_string.append(array_3, len_3);
I do not know if this solution 'looks good' in your opinion, but it's definitely clear and is optimized for this use case.
This really is a question just for my own interest I haven't been able to determine through the documentation.
I see on http://www.cplusplus.com/reference/string/string/ that append has complexity:
"Unspecified, but generally up to linear in the new string length."
while push_back() has complexity:
"Unspecified; Generally amortized constant, but up to linear in the new string length."
As a toy example, suppose I wanted to append the characters "foo" to a string. Would
myString.push_back('f');
myString.push_back('o');
myString.push_back('o');
and
myString.append("foo");
amount to exactly the same thing? Or is there any difference? You might figure that append would be more efficient because the compiler would know how much memory is required to extend the string the specified number of characters, while push_back may need to secure memory each call?
In C++03 (for which most of "cplusplus.com"'s documentation is written), the complexities were unspecified because library implementers were allowed to do Copy-On-Write or "rope-style" internal representations for strings. For instance, a COW implementation might require copying the entire string if a character is modified and there is sharing going on.
In C++11, COW and rope implementations are banned. You should expect constant amortized time per character added or linear amortized time in the number of characters added for appending to a string at the end. Implementers may still do relatively crazy things with strings (in comparison to, say std::vector), but most implementations are going to be limited to things like the "small string optimization".
In comparing push_back and append, push_back deprives the underlying implementation of potentially useful length information which it might use to preallocate space. On the other hand, append requires that an implementation walk over the input twice in order to find that length, so the performance gain or loss is going to depend on a number of unknowable factors such as the length of the string before you attempt the append. That said, the difference is probably extremely Extremely EXTREMELY small. Go with append for this -- it is far more readable.
I had the same doubt, so I made a small test to check this (g++ 4.8.5 with C++11 profile on Linux, Intel, 64 bit under VmWare Fusion).
And the result is interesting:
push :19
append :21
++++ :34
Could be possible this is because of the string length (big), but the operator + is very expensive compared with the push_back and the append.
Also it is interesting that when the operator only receives a character (not a string), it behaves very similar to the push_back.
For not to depend on pre-allocated variables, each cycle is defined in a different scope.
Note : the vCounter simply uses gettimeofday to compare the differences.
TimeCounter vCounter;
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.push_back('a');
vTest.push_back('b');
vTest.push_back('c');
}
vCounter.stop();
cout << "push :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest.append("abc");
}
vCounter.stop();
cout << "append :" << vCounter.elapsed() << endl;
}
{
string vTest;
vCounter.start();
for (int vIdx=0;vIdx<1000000;vIdx++) {
vTest += 'a';
vTest += 'b';
vTest += 'c';
}
vCounter.stop();
cout << "++++ :" << vCounter.elapsed() << endl;
}
Add one more opinion here.
I personally consider it better to use push_back() when adding characters one by one from another string. For instance:
string FilterAlpha(const string& s) {
string new_s;
for (auto& it: s) {
if (isalpha(it)) new_s.push_back(it);
}
return new_s;
}
If using append()here, I would replace push_back(it) with append(1,it), which is not that readable to me.
Yes, I would also expect append() to perform better for the reasons you gave, and in a situation where you need to append a string, using append() (or operator+=) is certainly preferable (not least also because the code is much more readable).
But what the Standard specifies is the complexity of the operation. And that is generally linear even for append(), because ultimately each character of the string being appended (and possible all characters, if reallocation occurs) needs to be copied (this is true even if memcpy or similar are used).
I have some code that I had to write to replace a function that was literally used thousands of times. The problem with the function was that return a pointer to a static allocated buffer and was ridiculously problematic. I was finally able to prove that intermittent high load errors were caused by the bad practice.
The function I was replacing has a signature of char * paddandtruncate(char *,int), char * paddandtruncate(float,int), or char * paddandtruncat(int,int). Each function returned a pointer to a static allocated buffer which was overwritten on subsequent calls.
I had three constants one the
Code had to be replaceable with no impact on the callers.
Very little time to fix the issue.
Acceptable performance.
I wanted some opinion on the style and possible refactoring ideas.
The system is based upon fixed width fields padded with spaces, and has some architectural issues. These are not addressable since the size of the project is around 1,000,000 lines.
I was at first planning on allowing the data to be changed after creation, but thought that immutable objects offered a more secure solution.
using namespace std;
class SYSTEM_DECLSPEC CoreString
{
private:
friend ostream & operator<<(ostream &os,CoreString &cs);
stringstream m_SS ;
float m_FltData ;
long m_lngData ;
long m_Width ;
string m_strData ;
string m_FormatedData;
bool m_Formated ;
stringstream SS ;
public:
CoreString(const string &InStr,long Width):
m_Formated(false),
m_Width(Width),
m_strData(InStr)
{
long OldFlags = SS.flags();
SS.fill(' ');
SS.width(Width);
SS.flags(ios::left);
SS<<InStr;
m_FormatedData = SS.str();
}
CoreString(long longData , long Width):
m_Formated(false),
m_Width(Width),
m_lngData(longData)
{
long OldFlags = SS.flags();
SS.fill('0');
SS.precision(0);
SS.width(Width);
SS.flags(ios::right);
SS<<longData;
m_FormatedData = SS.str();
}
CoreString(float FltData, long width,long lPerprecision):
m_Formated(false),
m_Width(width),
m_FltData(FltData)
{
long OldFlags = SS.flags();
SS.fill('0');
SS.precision(lPerprecision);
SS.width(width);
SS.flags(ios::right);
SS<<FltData;
m_FormatedData = SS.str();
}
CoreString(const string &InStr):
m_Formated(false),
m_strData(InStr)
{
long OldFlags = SS.flags();
SS.fill(' ');
SS.width(32);
SS.flags(ios::left);
SS<<InStr;
m_FormatedData = SS.str();
}
public:
operator const char *() {return m_FormatedData.c_str();}
operator const string& () const {return m_FormatedData;}
const string& str() const ;
};
const string& CoreString::str() const
{
return m_FormatedData;
}
ostream & operator<<(ostream &os,CoreString &cs)
{
os<< cs.m_Formated;
return os;
}
If you really do mean "no impact on the callers", your choices are very limited. You can't return anything that needs to be freed by the caller.
At the risk of replacing one bad solution with another, the quickest and easiest solution might be this: instead of using a single static buffer, use a pool of them and rotate through them with each call of your function. Make sure the code that chooses a buffer is thread safe.
It sounds like the system is threaded, right? If it was simply a matter of it not being safe to call one of these functions again while you're still using the previous output, it should behave the same way every time.
Most compilers have a way to mark a variable as "thread-local data" so that it has a different address depending on which thread is accessing it. In gcc it's __thread, in VC++ it's __declspec(thread).
If you need to be able to call these functions multiple times from the same thread without overwriting the results, I don't see any complete solution but to force the caller to free the result. You could use a hybrid approach, where each thread has a fixed number of buffers, so that callers could make up to N calls without overwriting previous results, regardless of what other threads are doing.
The code you've posted has a one huge problem - if a caller assigns the return value to a const char *, the compiler will make a silent conversion and destroy your temporary CoreString object. Now your pointer will be invalid.
I don't know how the callers are going to be using this, but allocating buffers using new into a auto_ptr<>s might work. It may satisfy criterion 1 (I can't tell without seeing the using code), and could be a pretty fast fix. The big issue is that it uses dynamic memory a lot, and that will slow things down. There's things you can do, using placement new and the like, but that may not be quick to code.
If you can't use dynamic storage, you're limited to non-dynamic storage, and there really isn't much you can do without using a rotating pool of buffers or thread-local buffers or something like that.
The "intermittent high-load errors" are caused by race conditions where one thread tramples on the static buffer before another thread has finished using it, right?
So switch to using an output buffer per thread, using whatever thread-local storage mechanism your platform provides (Windows, I'm thinking).
There's no synchronisation contention, no interference between threads, and based on what you've said about the current implementation rotating buffers, almost certainly the calling code doesn't need to change at all. It can't be relying on the same buffer being used every time, if the current implementation uses multiple buffers.
I probably wouldn't design the API this way from scratch, but it implements your current API without changing it in a significant way, or affecting performance.