Performance optimization for std::string - c++

When I did some performance test in my app I noticed a difference in the following code (Visual Studio 2010).
Slower version
while(heavyloop)
{
if(path+node+"/" == curNode)
{
do something
}
}
This will cause some extra mallocs for the resulting string to be generated.
In order to avoid these mallocs, I changed it in the following way:
std::string buffer;
buffer.reserve(500); // Big enough to hold all combinations without the need of malloc
while(heavyloop)
{
buffer = path;
buffer += node;
buffer += "/";
if(buffer == curNode)
{
do something
}
}
While the second version looks a bit more awkward compared to the first version it's still readable enough. What I was wondering though is, wether this kind of optimization is an oversight on part of the compiler, or if this always has to be done manually. Since it only changes the order of allocations I would expect that the compiler could also figure it out on it's own. On the other hand, certain conditions have to be met, to really make it an optimization, which may not neccessarily be fullfilled, but if the conditions are not, the code would at least perform as good as the first version. Are newer versions of Visual Studio better in this regard?
A more complete version which shows the difference (SSCE):
std::string gen_random(std::string &oString, const int len)
{
static const char alphanum[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
oString = "";
for (int i = 0; i < len; ++i)
{
oString += alphanum[rand() % (sizeof(alphanum) - 1)];
}
return oString;
}
int main(int argc, char *argv[])
{
clock_t start = clock();
std::string s = "/";
size_t adds = 0;
size_t subs = 0;
size_t max_len = 0;
s.reserve(100000);
for(size_t i = 0; i < 1000000; i++)
{
std::string t1;
std::string t2;
if(rand() % 2)
{
// Slow version
//s += gen_random(t1, (rand() % 15)+3) + "/" + gen_random(t2, (rand() % 15)+3);
// Fast version
s += gen_random(t1, (rand() % 15)+3);
s += "/";
s += gen_random(t2, (rand() % 15)+3);
adds++;
}
else
{
subs++;
size_t pos = s.find_last_of("/", s.length()-1);
if(pos != std::string::npos)
s.resize(pos);
if(s.length() == 0)
s = "/";
}
if(max_len < s.length())
max_len = s.length();
}
std::cout << "Elapsed: " << clock() - start << std::endl;
std::cout << "Added: " << adds << std::endl;
std::cout << "Subtracted: " << subs << std::endl;
std::cout << "Max: " << max_len << std::endl;
return 0;
}
On my system I get about 1 second difference between the two (tested with gcc this time but there doesn't seem to be any notable difference to Visual Studio there):
Elapsed: 2669
Added: 500339
Subtracted: 499661
Max: 47197
Elapsed: 3417
Added: 500339
Subtracted: 499661
Max: 47367

Your slow version may be rewritten as
while(heavyloop)
{
std::string tempA = path + node;
std::string tempB = tempA + "/";
if(tempB == curNode)
{
do something
}
}
Yes, it is not a full analog, but makes temporary objects more visible.
See two temporary objects: tempA and tempB. They are created because std::string::operator+ always generates new std::string object. This is how std::string is designed. A compiler won't be able to optimize this code.
There is a technique in C++ called expression templates to address this issue, but again, it it done on library level.

For class types (like std::string) there is no requirement that the conventional relationship between operator + and operator += be honoured like you expect. There is certainly no requirement that a = a + b and a += b have the same net effect, since operator=(), operator+() and operator+=() can all potentially be implemented individually, and not work together in tandem.
As such, a compiler would be semantically incorrect if it replaced
if(path+node+"/" == curNode)
with
std::string buffer = path;
buffer += node;
buffer += "/";
if (buffer == curNode)
If there was some constraint in the standard, for example a fixed relationship between overloaded operator+() and overloaded operator+=() then the two fragments of code would have the same net effect. However, there is no such constraint, so the compiler is not permitted to do such substitutions. The result would be changing meaning of the code.

path+node+"/" will allocate a temp variable string to compare with curNode,it's the c++ implement.

Related

std::string characters somehow turned into Unicode/ASCII numbers

I have a function ls() which parses a vector of string and puts it into a comma-separated list, wrapped within parentheses ():
std::string ls(std::vector<std::string> vec, std::string wrap="()", std::string sep=", ") {
std::string wrap_open, wrap_close;
wrap_open = std::to_string(wrap[0]);
wrap_close = std::to_string(wrap[1]);
std::string result = wrap_open;
size_t length = vec.size();
if (length > 0) {
if (length == 1) {
result += vec[0];
result += wrap_close;
}
else {
for (int i = 0; i < vec.size(); i++) {
if (i == vec.size() - 1) {
result += sep;
result += vec[i];
result += wrap_close;
}
else if (i == 0) {
result += vec[i];
}
else {
result += sep;
result += vec[i];
}
}
}
}
else {
result += wrap_close;
}
return result;
}
If I pass this vector
std::vector<std::string> vec = {"hello", "world", "three"};
to the ls() function, I should get this string:
std::string parsed_vector = ls(vec);
// AKA
std::string result = "(hello, world, three)"
The parsing works fine, however the characters in the wrap string turn into numbers when printed.
std::cout << result << std::endl;
Will result in the following:
40hello, world, three41
When it should instead result in this:
(hello, world, three)
The ( is turned into 40, and the ) is turned into 41.
My guess is that the characters are being turned into the Unicode/ASCII number values or something like that, I do not know how this happened or what to do.
The problem here is std::to_string converts a number to a string. There is no specialization for char values. So here, you're converting the ASCII value to a string:
wrap_open = std::to_string(wrap[0]);
wrap_close = std::to_string(wrap[1]);
Instead, you could simply do:
std::string wrap_open(1, wrap[0]);
std::string wrap_close(1, wrap[1]);
Note that you can greatly simplify your function by using std::ostringstream:
std::ostringstream oss;
oss << wrap[0];
for (size_t i = 0; i < vec.size(); i++)
{
if (i != 0) oss << sep;
oss << vec[i];
}
oss << wrap[1];
return oss.str();
I won't be commenting on how you could improve the function and that passing a vector by value (as an argument in the function) is never a good idea, however I will tell you how to fix your current issue:
std::string ls(std::vector<std::string> vec, std::string wrap = "()", std::string sep = ", ") {
std::string wrap_open, wrap_close;
wrap_open = wrap.at(0); //<----
wrap_close = wrap.at(1); //<----
std::string result = wrap_open;
size_t length = vec.size();
if (length > 0) {
... // Rest of the code
You don't need to use std::to_string, just use one of std::string's constructors to create a string with one character from the wrap string. This constructor is invoked via the = operator.
I recommend reading about std::string, it is apparent that you aren't using the full potential of the STL library : std::string
EDIT: After discussing the usage of .at() vs [] operator in the comments. I've decided to add the bit into this answer:
The main difference between .at() and [] is the bounds checking feature. .at will throw an std::out_of_range exception because it is performing a bounds check. The [] operator (IMHO) is present in STL containers due to backwards compatibility (imagine refactoring old C code into a C++ project). Point being it behaves like you would expect [] to behave and doesn't do any bounds checking.
In general I recommend the usage of .at() especially to beginners and especially if you are relying on human input. The uncaught exception will produce an easy to understand error, while untested [] will either produce weird values or RAV (read access violation) depending on the type stored in the container and from experience beginners usually have a harder time debugging this.
Bare in mind that this is just an opinion of one programmer and opinions may vary (as is visible in the discussion).
Hope it helps!

boost::string_ref is slower than std::string copy

I have implemented an algorithm 1) using ordinary string copy and 2) with boost::string_view to lookup a suffix part of a string after a delimiter. I have profiled the code and counter-intuitively, I have ended up seeing the boost::string_view performed poorly.
std::string algorithm1(const std::string &mine, const std::string& path) {
size_t startPos = mine.find(path);
std::string temp;
if (startPos != mine.npos) {
startPos += path.size();
temp = mine.substr(startPos);
} else {
std::cout << "not found" << std::endl;
return "";
}
return temp;
}
boost::string_ref algorithm2(boost::string_ref mine, boost::string_ref path) {
size_t startPos = mine.find(path);
boost::string_ref temp;
if (startPos != mine.npos) {
startPos += path.size();
temp = mine.substr(startPos);
} else {
std::cout << "not found" << std::endl;
}
return temp;
}
int main(int argc, char* argv[])
{
std::cout << "entered" << std::endl;
const std::string mine = "sth/aStr/theSuffixWeDesire";
const std::string path = "/aStr/";
for (size_t i = 0; i < 10; i++)
{
assert (algorithm1(mine, path) == "theSuffixWeDesire");
assert (algorithm2(mine, path) == "theSuffixWeDesire");
}
return 0;
}
When I have profiled the code with uftrace, I ended up with, for each iteration:
algorithm1 taking 2,083 ns, internally calls std::find
algorithm2 taking 11,835 ns with maximum time spent in calling std::search
Possibly, there could be three reasons I could think of:
CPU cache hit in algorithm1, whereas algorithm2 calling boost
library and therefore CPU cache miss and resulted lower speed
Std operations are optimized by the compiler whereas boost operations
are not.
std::search that boost uses is not a good substitute for
std::find.
Which of these explanations are more plausible? This have been an unexpected behavior, made me doubt using boost::string_ref in my codebase.
My system has gcc 5.4.0 (C++14), boost 1.66, no compiler optimization flags (beginning with -O).
Thank you,

Treat c++ string as a pointer

Disclaimer -- I'm coming from a pretty strictly C background.
How can the STL and std::string instead of char* be used to go about jumping around a string like this (admittedly nonsensical) example?
const char* s = "XXXXXXXXhello";
while (*s == 'X')
s++;
s += 2;
std::cout << --(--s); //prints `hello`
If what you want to do is modify the object, and get rid of "h":
std::string s = "hello";
s = s.substr(1); // position = 1, length = everything (npos)
std::cout << s; //"ello"
or
std::string s = "hello";
s.erase(0, 1); // position = 0, length = 1
std::cout << s; //"ello"
perfer #José 's second answer and no need to modify the original str
just std::cout<<s.substr(1)<<endl
to the edited question:
std::string s = "hello world!";
cout<<s.substr(s.find_first_of('e'))<<endl; // == "ello world!"
man std::string:
http://www.cplusplus.com/reference/string/string/?kw=string
If you want to jump around without modifying the string, you can either use indices or an iterator:
std::string const s("XXXXXXXXhello");
int idx = 0;
while ( s[idx] == 'X' )
++idx;
idx += 2;
std::cout << &s[idx -= 2] << '\n';
The iterator version:
auto it = s.begin();
while (*it == 'X')
++it;
it += 2;
std::cout << &*it << '\n';
Since C++11 it is guaranteed that std::string is stored with null terminator in place, so you can use & to output the tail of the string. In old code bases you will need to index from s.c_str() instead.

C++ SDL 2.0 - Importing multiple textures using a loop

I don't know whether or not this is possible but I have used this technique in different languages but am struggling to use it in C++. I have 10 images that I am trying to load into an array using a loop as so:
for (int i = 0; i < 10; i++)
{
Sprite[i] = IMG_LoadTexture(renderer, "Graphics/Player" + i + ".png");
}
This however does not seem to work in C++ so I was wondering what I am doing wrong, or what can I do to get the same result without having to load each image individually like so:
Sprite[0] = IMG_LoadTexture(renderer, "Graphics/Player0.png");
My error is: "Expression must have integral or unscoped enum type"
Thanks for any help =)
You cannot do this:
"This is my number: " + (int)4 + "!";
This is illegal. It will give you an error for trying to operator+ a const char* and a const char[SOME_INT_GOES_HERE] or another error for trying to use operator+ to add an int onto a string. Things just don't work that way.
You'd either have to use C (i.e. snprintf()) or a string stream. Here's my test code for isolating your problem:
#include <iostream>
#include <string>
int main()
{
int a = 1;
std::string str = "blah";
std::string end = "!";
//std::string hello = str + a + end;// GIVES AN ERROR for operator+
std::string hello = "blah" + a + "!";
//const char* c_str = "blah" + a + "end";
//std::cout << c_str << std::endl;
std::cout << hello << std::endl;
return 0;
}
Here's an alternative solution using string streams.
#include <iostream>
#include <string>
#include <sstream>
int main()
{
int i = 0;
std::string str;
std::stringstream ss;
while (i < 10)
{
//Send text to string stream.
ss << "text" << i;
//Set string to the text inside string stream
str = ss.str();
//Print out the string
std::cout << str << std::endl;
//ss.clear() doesn't work. Calling a constructor
//for std::string() and setting ss.str(std::string())
//will set the string stream to an empty string.
ss.str(std::string());
//Remember to increment the variable inside of while{}
++i;
}
}
Alternatively, you can also use std::to_string() if you're using C++11 (which just requires -std=c++11) but std::to_string() is broken on some sets of compilers (i.e. regular MinGW). Either switch to another flavor where it works (i.e. MinGW-w64) or just write your own to_string() function using string streams behind the scenes.
snprintf() may be the fastest way of doing such a thing, but for safer C++ and better style, it is recommended you use a non-C way of doing things.
I had a similar problem and I solwed it this way:
#include <iostream>
using namespace std;
int main() {
string line;
for (int i = 0; i < 10; i++) {
line = "Graphics/Player" + inttostr(i) + ".png"; //I wrote inttostr function because built in inttostr functions messed up my program (see below)
char charger[line.length()]; //creating char array
for (int i = 0; i < sizeof(line); i++) {
charger[i] = line[i]; // copying string to char arry
}
Sprite[i] = IMG_LoadTexture(renderer, charger);
}
}
string inttostr(int integer) { //I know it isn't the best way to convert integer to string, but it works
string charakter;
int swap;
bool negativ = false;
if (integer < 0) {
integer = -integer;
negativ = true;
}
if (integer == 0) {
charakter = "0";
}
while (integer >= 1) {
swap = integer % 10;
integer = integer / 10;
charakter = char(swap + 48) + charakter;
}
if (negativ) {
charakter = "-" + charakter;
}
return charakter;
}

Reverse order of hex std::string

I'm working with an old program and need help swapping the order of a Hex String.
Yes, a string...as in:
string hexString = "F07D0079"
string hexString2= "F07F"
I need each string to look like:
79007DF0 &
7FF0 respectively.
For the love of god i don't know why they're stored in strings, but they are.
This is a little endian/big endian issue but since it's in a string i can't use standard functions to reverse the order can I?
Is there any easy way to do this?
std::string swapValues(string originalHex)
{
string swappedHex;
//what to do here.
return swappedHex;
}
First check that the length is even (if it hasn't already been sanitised):
assert(hex.length() % 2 == 0);
Then reverse the string:
std::reverse(hex.begin(), hex.end());
Now the bytes are in the correct order, but the digits within each are wrong, so we need to swap them back:
for (auto it = hex.begin(); it != hex.end(); it += 2) {
std::swap(it[0], it[1]);
}
I might use the append member function.
std::string reverse_pairs(std::string const & src)
{
assert(src.size() % 2 == 0);
std::string result;
result.reserve(src.size());
for (std::size_t i = src.size(); i != 0; i -= 2)
{
result.append(src, i - 2, 2);
}
return result;
}
(As an exercise in extensibility, you can make the "2" a parameter, too.)
If you want to do it in-place, you can use std::rotate in a loop.
I wouldn't bother with something overly clever for this:
std::string swapValues(const std::string& o)
{
std::string s(o.length());
if (s.length() == 4) {
s[0] = o[2];
s[1] = o[3];
s[2] = o[0];
s[3] = o[1];
return s;
}
if (s.length() == 8) {
// left as an exercise
}
throw std::logic_error("You got to be kidding me...");
}
There should be library functions available (a naive string manipulation might be no good):
#include <iostream>
#include <arpa/inet.h>
int main() {
std::string hex32 = "F07D0079";
std::string hex16 = "F07F";
std::uint32_t u32 = std::strtoul(hex32.c_str(), 0, 16);
std::uint16_t u16 = std::strtoul(hex16.c_str(), 0, 16);
// Here we would need to know the endian of the sources.
u32 = ntohl(u32);
u16 = ntohs(u16);
std::cout << std::hex << u32 << ", " << u16 << '\n';
}
Linux/Little Endian
Any function operating on the strings must know the target platform (hence there is no general solution)