Avoid memory reallocation on given example - c++

The function gets called very frequently, so I try to lower memory-reallocation etc. What bothers me, is the vector and int, though I cant move them outside the function otherwise I get std::bad_alloc. So far I have:
void callbString(const std_msgs::String::ConstPtr& msg)
{
vector<string> cbstrVec;
int cbtype;
//get string and split into vector
string str = (msg->data.c_str());
if(str.empty()) return;
str.erase(0,1);
boost::split(cbstrVec, str, boost::is_any_of(" "));
stringstream(cbstrVec[2])>>cbtype;
c.setvec(cbstrVec,cbtype); //takes (vector<string>,int)
}

Have you profiled the application? and is that your code REALLY the bottle neck? If you have... then
Well, if you are using a C++11 compiler you could do this, if you don't have a C++11 compiler, remove the thread_local but then you'll have to take care of reentrancy if there is a chance that that routine will be called in a multithreaded code
void callbString(const std_msgs::String::ConstPtr& msg)
{
static thread_local vector<string> cbstrVec;
static thread_local std::string str;
int cbtype; //int is super cheap
cbstrVec.clear();
//get string and split into vector
str = (msg->data.c_str());
if(str.empty()) return;
str.erase(0,1);
boost::split(cbstrVec, str, boost::is_any_of(" "));
stringstream(cbstrVec[2])>>cbtype;
//c.setvec(cbstrVec,cbtype); //takes (vector<string>,int)
c.setvec(std::move(cbstrVec),cbtype); //takes (vector<string>,int)
}
cbstrVec and str will have their memory reused since cbstrVec.clear() doesn't really deallocate all the memory allocated by the vector and reassigning str will reuse the internal storage in a good STL implementation

Related

Error when creating a string in a function of class

I create a class named Employee, in private, I have a Name as a string . here is my class declaring:
class Employee
{
string Name;
public:
Employee();
void SetName(string);
void StringToEmployee(string);
~Employee();
}
this is definition of StringToEmployee(string) method:
void Employee::StringToEmployee(string s)
{
char *first = s, *end = s+strlen(s), *last = NULL;
last = find(first, end, ',');
string temp(first, last- first);
SetName(temp);
}
The error occurs when I debug to the line string temp(first, last- first), it's seem to the compiler does not allow me to construct a new string in method. cause I have also changed into string temp; then temp.assign(first, last-first). the error still remain. How could I create a new string in a method?
You should be using iterators and taking advantage of the features of the standard library, rather than raw pointers and C-style string functions. Not only will this give you more idiomatic and easier to understand C++ code, but it will also implicitly resolve many of your errors.
First, the implementation of StringToEmployee should be rewritten as follows:
void Employee::StringToEmployee(std::string s)
{
const std::string temp(s.begin(),
std::find(s.begin(), s.end(), ',');
SetName(temp);
}
But since you are not modifying the s parameter and do not need a copy of it, you should pass it by constant reference:
void Employee::StringToEmployee(const std::string& s)
{
const std::string temp(s.begin(),
std::find(s.begin(), s.end(), ',');
SetName(temp);
}
Also, you should consider redesigning your Employee class. Currently, you have a default constructor that creates an invalid Employee object, and then you have member functions that allow you to turn that invalid Employee object into a valid one by settings its members. Instead, you could have a constructor that did all of this initialization for you, in one step. Not only would your code be cleaner and easier to understand, but it would be more efficient, too!
Perhaps something like:
class Employee
{
std::string Name; // name of this employee
public:
Employee(const std::string& name); // create Employee with specified name
void SetName(const std::string& newName); // change this employee's name
~Employee();
};
Employee::Employee(const std::string& name)
: Name(s.begin(), std::find(s.begin(), s.end(), ','))
{ }
void Employee::SetName(const std::string& newName)
{
Name = std::string(s.begin(), std::find(s.begin(), s.end(), ','));
}
Employee::~Employee()
{ }
A couple of quick notes:
You'll see that I always explicitly write out std:: whenever I use a class from the standard library's namespace. This is a really good habit to get into, and it's not really that hard to type an extra 5 characters. It's particularly important because using namespace std; is a really bad habit to get into.
I pass objects (like strings) that I don't need to modify or have a copy of inside of the method by constant reference. This is both easier to reason about, and also potentially more efficient (because it avoids unnecessary copies).
Inside of the constructor, I have used what may appear to be a funny-looking syntax, involving a colon and some parentheses. This is called a member initialization list, and it's something you should get used to seeing. It's the standard way for a class's constructor to initialize its member variables.
For some reason you want to assing std::string to char*.
Judging from other your code, you want to work with raw char array, so, you need to put correct pointers to first and last like this:
char *first = &s[0], *end = (&s[0]) + strlen(s.c_str()), *last = NULL;
And this part:
string temp(first, last- first);
is incorrect, because last - first is pointer, and, as I understand, you want to use std::string(const char*, size_t) constructor. But instead, you are using iterator-based constructor and system is correctly dying, because first pointer is larger, than second one.
As you see, your method is error-prone. I recommend re-do this part of code, using iterators, like this:
void Employee::StringToEmployee(string s)
{
auto found = find(s.begin(), s.end(), ',');
string temp(s.begin(), found);
SetName(temp);
}

Implementing a String class with implicit conversion to char* (C++)

It might not be advisable according to what I have read at a couple of places (and that's probably the reason std::string doesn't do it already), but in a controlled environment and with careful usage, I think it might be ok to write a string class which can be implicitly converted to a proper writable char buffer when needed by third party library methods (which take only char* as an argument), and still behave like a modern string having methods like Find(), Split(), SubString() etc. While I can try to implement the usual other string manipulation methods later, I first wanted to ask about the efficient and safe way to do this main task. Currently, we have to allocate a char array of roughly the maximum size of the char* output that is expected from the third party method, pass it there, then convert the return char* to a std::string to be able to use the convenient methods it allows, then again pass its (const char*) result to another method using string.c_str(). This is both lengthy and makes the code look a little messy.
Here is my very initial implementation so far:
MyString.h
#pragma once
#include<string>
using namespace std;
class MyString
{
private:
bool mBufferInitialized;
size_t mAllocSize;
string mString;
char *mBuffer;
public:
MyString(size_t size);
MyString(const char* cstr);
MyString();
~MyString();
operator char*() { return GetBuffer(); }
operator const char*() { return GetAsConstChar(); }
const char* GetAsConstChar() { InvalidateBuffer(); return mString.c_str(); }
private:
char* GetBuffer();
void InvalidateBuffer();
};
MyString.cpp
#include "MyString.h"
MyString::MyString(size_t size)
:mAllocSize(size)
,mBufferInitialized(false)
,mBuffer(nullptr)
{
mString.reserve(size);
}
MyString::MyString(const char * cstr)
:MyString()
{
mString.assign(cstr);
}
MyString::MyString()
:MyString((size_t)1024)
{
}
MyString::~MyString()
{
if (mBufferInitialized)
delete[] mBuffer;
}
char * MyString::GetBuffer()
{
if (!mBufferInitialized)
{
mBuffer = new char[mAllocSize]{ '\0' };
mBufferInitialized = true;
}
if (mString.length() > 0)
memcpy(mBuffer, mString.c_str(), mString.length());
return mBuffer;
}
void MyString::InvalidateBuffer()
{
if (mBufferInitialized && mBuffer && strlen(mBuffer) > 0)
{
mString.assign(mBuffer);
mBuffer[0] = '\0';
}
}
Sample usage (main.cpp)
#include "MyString.h"
#include <iostream>
void testSetChars(char * name)
{
if (!name)
return;
//This length is not known to us, but the maximum
//return length is known for each function.
char str[] = "random random name";
strcpy_s(name, strlen(str) + 1, str);
}
int main(int, char*)
{
MyString cs("test initializer");
cout << cs.GetAsConstChar() << '\n';
testSetChars(cs);
cout << cs.GetAsConstChar() << '\n';
getchar();
return 0;
}
Now, I plan to call the InvalidateBuffer() in almost all the methods before doing anything else. Now some of my questions are :
Is there a better way to do it in terms of memory/performance and/or safety, especially in C++ 11 (apart from the usual move constructor/assignment operators which I plan to add to it soon)?
I had initially implemented the 'buffer' using a std::vector of chars, which was easier to implement and more C++ like, but was concerned about performance. So the GetBuffer() method would just return the beginning pointer of the resized vector of . Do you think there are any major pros/cons of using a vector instead of char* here?
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
Is it too much overkill rather than just doing the usual way of passing char array pointer, converting to a std::string and doing our work with it. The third party function calls expecting char* arguments are used heavily in the code and I plan to completely replace both char* and std::string with this new string if it works.
Thank you for your patience and help!
If I understood you correctly, you want this to work:
mystring foo;
c_function(foo);
// use the filled foo
with a c_function like ...
void c_function(char * dest) {
strcpy(dest, "FOOOOO");
}
Instead, I propose this (ideone example):
template<std::size_t max>
struct string_filler {
char data[max+1];
std::string & destination;
string_filler(std::string & d) : destination(d) {
data[0] = '\0'; // paranoia
}
~string_filler() {
destination = data;
}
operator char *() {
return data;
}
};
and using it like:
std::string foo;
c_function(string_filler<80>{foo});
This way you provide a "normal" buffer to the C function with a maximum that you specify (which you should know either way ... otherwise calling the function would be unsafe). On destruction of the temporary (which, according to the standard, must happen after that expression with the function call) the string is copied (using std::string assignment operator) into a buffer managed by the std::string.
Addressing your questions:
Do you think there are any major pros/cons of using a vector instead of char* here?
Yes: Using a vector frees your from manual memory management. This is a huge pro.
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
A union is a bad idea. How do you know which member is currently active? You need a flag outside of the union. Do you really want every string to carry that around? Instead look what the standard library is doing: It's using templates to provide this abstraction.
Is it too much overkill [..]
Writing a string class? Yes, way too much.
What you want to do already exists. For example with this plain old C function:
/**
* Write n characters into buffer.
* n cann't be more than size
* Return number of written characters
*/
ssize_t fillString(char * buffer, ssize_t size);
Since C++11:
std::string str;
// Resize string to be sure to have memory
str.resize(80);
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
or without first resizing:
std::string str;
if (!str.empty()) // To avoid UB
{
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
}
But before C++11, std::string isn't guaranteed to be stored in a single chunk of contiguous memory. So you have to pass through a std::vector<char> before;
std::vector<char> v;
// Resize string to be sure to have memor
v.resize(80);
ssize_t newSize = fillSrting(&v[0], v.size());
std::string str(v.begin(), v.begin() + newSize);
You can use it easily with something like Daniel's proposition

c++ return pointer vs. return a local object with std::move

I have to create a function which creates a set of characters for me. I am not sure which of the following approaches I should prefer.
As far as I understood I should not use createSet1 because if something goes wrong before returning s it will leak.
set<char>* createSet1(){
set<char>* s = new set<char>;
//does something
return s;
}
set<char> createSet2(){
set<char> s;
//does something
return std::move(s);
}
unique_ptr<set<char>> createSet3(){
unique_ptr<set<char>> s(new set<char>);
//does something
return s;
}
I would be happy if someone could explain which one I should prefer and why.
None of the above:
std::set<char> createSet() {
std::set<char> s;
// do something
return s;
}
There is no reason to dynamically allocate the set, RVO will kick in and remove the copy for you, without having to pay the cost of the extra dynamic allocation and management of that memory.
Now for the concrete problem of a set of characters, you might be better off not using a set at all, but rather a std::vector correctly sized:
class CharSet {
std::vector<bool> d_data; // std::vector<bool> quirks are fine here
void set(char ch, bool value) {
d_data[static_cast<unsigned char>(ch)] = value;
}
public:
CharSet() : d_data(std::numeric_limits<unsigned char>::max()+1) {}
void set(char ch) { set(ch,true); }
void unset(char ch) { set(ch,false); }
bool isset(char ch) const {
return d_data[static_cast<unsigned char>(ch)];
}
};
The advantage of this approach is that the higher cost you are going to pay is the dynamic allocation of the nodes in the std::set, and each such allocation has a cost compared to the single allocation in the case of std::vector (for a small enough vector). The memory of the std::vector<bool> is going to be roughly 32 bytes, which is comparable to a single node in the std::set on a 64bit architecture. You could even make it a std::vector<char> to avoid the quirks of std::vector<bool> and it would be 256 chars, which is the cost of just a few nodes in the set.
std::set is a container, meaning that you can return it without problems, so you can modify your first proposal to:
set<char> createSet1(){
set<char> s;
//does something with s
return s;
}
Hope this helps

How to make this code less memory leak prone?

As an introduction, note that I am a Java programmer still getting used to the memory management issues in C++.
We have a base class which is used to encoded objects to a string of ASCII characters. Essentially, the class is using a stringstream class member to convert different datatypes to one long string, and then returns a char* to the caller which contains the encoded object data.
In testing for memory leaks, I am seeing that the implementation we are using seems prone to create memory leaks, because the user has to always remember to delete the return value of the method. Below is an excerpt of the relevant parts of the code:
char* Msg::encode()
{
// clear any data from the stringstream
clear();
if (!onEncode()) {
return 0;
}
// need to convert stringstream to char*
string encoded = data.str();
// need to copy the stringstream to a new char* because
// stringstream.str() goes out of scope when method ends
char* encoded_copy = copy(encoded);
return encoded_copy;
}
bool Msg::onEncode(void)
{
encodeNameValue(TAG(MsgTags::TAG_USERID), companyName);
encodeNameValue(TAG(MsgTags::TAG_DATE), date);
return true;
}
bool EZXMsg::encodeNameValue(string& name, int value)
{
if(empty(value))
{
return true;
}
// data is stringstream object
data << name << TAG_VALUE_SEPARATOR << value << TAG_VALUE_PAIRS_DELIMITER;
return true;
}
char* copy(string& source) {
char *a=new char[source.length() +1];
a[source.length()]=0;
memcpy(a,source.c_str(),source.length());
return a;
}
UPDATE
Well - I should have been more accurate about how the result of encode() is consumed. It is passed to boost:async_write, and program is crashing because I believe the string goes out of scope before async_write complete. It seems like I need to copy the returned string to a class member which is alive for life time of the class which sends the message (?).
This is the way the encode() method is actually used (after I changed the return value of to string):
void iserver_client::send(ezx::iserver::EZXMsg& msg) {
string encoded = msg.encode();
size_t bytes = encoded.length();
boost::asio::async_write(socket_, boost::asio::buffer(encoded, bytes), boost::bind(&iserver_client::handle_write, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred));
}
It looks like the proper way to do this is to maintain a queue/list/vector of the strings to async write. As noted here (and also in the boost chat_client sample). (But that is a separate issue.)
For this question:
in your copy function you return a pointer to a heap memory!So user maybe create memory leak,I think you can not use this copy function,you can do just like this in your encode func:
return data.str();
If you want to get a char*, you can use the member function of string:c_str(),
just like this:
string ss("hello world");
const char *p = ss.c_str();
If you use a stack string object you will not create memory leak,
You could just return a std::string. You have one there anyway:
string Msg::encode()
{
// clear any data from the stringstream
clear();
if (!onEncode()) {
return string{};
}
return data.str();
}
Then the caller would look like:
Msg msg;
msg.userID = 1234;
send(msg.encode().c_str());
The only way of achieving "automatic" deletion is with a stack variable (at some level) going out of scope. In fact, this is in general the only way of guaranteeing deletion even in case of an exception, for example.
As others mentioned std::string works just fine, since the char * is owned by the stack-allocated string, which will delete the char *.
This will not work in general, for example with non char * types.
RAII (Resource Acquisition is Initialization) is a useful idiom for dealing with such issues as memory management, lock acquisition/release, etc.
A good solution would be to use Boost's scoped_array as follows:
{
Msg msg;
msg.userID = 1234;
scoped_array<char> encoded(msg.encode());
send(encoded.get());
// delete[] automatically called on char *
}
scoped_ptr works similarly for non-array types.
FYI: You should have used delete[] encoded to match new char[source.length() +1]
While using a std::string works adequately for your specific problem, the general solution is to return a std::unique_ptr instead of a raw pointer.
std::unique_ptr<char[]> Msg::encode() {
:
return std::unique_ptr<char[]>(encoded_copy);
}
The user will then get a new unique_ptr when they call it:
auto encoded = msg.encode();
send(encoded.get());
and the memory will be freed automatically when encoded goes out of scope and is destroyed.

Implicit new and delete operator killing perfomance

I am running very sleepy to profile my application and its showing me that 25% and 23% of the time spent by my function is doing new and delete respectively. I don't understand where this is occurring. So can someone tell me where this is occurring in my code.
inline FixParser(fixmessage& tokenMap, const std::string& str) {
static seperator sep_delim("\x01");
static seperator sep_equal("=");
static std::string error("ERROR: ");
static FixKey fix_Key;
static tokenizer token_equal(error);
static tokenizer token_delim(error);
static tokenizer::iterator itr;
token_delim.assign(str, sep_delim);
int key;
try {
for(tokenizer::iterator it = token_delim.begin();
it != token_delim.end(); ++it) {
token_equal.assign(*it, sep_equal);
itr = token_equal.begin();
key = boost::lexical_cast<int>(*itr);
if(fix_Key.keys.find(key) == fix_Key.keys.end()) continue;
++itr;
const std::string& value(*itr);
tokenMap.insert(std::pair<int, std::string>(key, value));
}
} catch(boost::bad_lexical_cast &) {
std::cerr << error << str << std::endl;
return;
}
}
I beg forgiveness for the use of static they will be removed later and placed in a struct.
One note: there are lots of strings being copied. Each string will incur a call to new to grab memory and delete to release it.
If performance is a premium and you have the ability to keep the copy of str around, you might want to use indexes instead. That is, having the tokens be pairs of indexes (begin, end) instead of full-blown strings. This is more error-prone obviously.
Also, tokenMap allocates one node per entry in the map, if you have a lot of entries, there will be a lot of nodes (and thus new to create them). You might want to use a deque instead, and sort the items once you're done, unless you really need what map offers (automatic deduplication).
Bikesheded version, removing most static variables (could not help myself):
inline FixParser(fixmessage& tokenMap, const std::string& str) {
static seperator sep_delim("\x01");
static seperator sep_equal("=");
static FixKey const fix_Key;
try {
tokenizer token_delim(str, sep_delim);
// avoid computing token_delim.end() at each iteration
for(tokenizer::iterator it = token_delim.begin(), end = token_delim.end();
it != end; ++it)
{
tokenizer token_equal(*it, sep_equal);
tokenizer::iterator itr = token_equal.begin();
int const key = boost::lexical_cast<int>(*itr);
if(fix_Key.keys.find(key) == fix_Key.keys.end()) continue;
++itr;
tokenMap.insert(std::make_pair(key, *itr));
}
} catch(boost::bad_lexical_cast &) {
std::cerr << error << str << std::endl;
return;
}
}
Make sure you are testing the Release build, not the Debug version. Debug builds use different versions of new and delete that help detect memory leaks at the expense of speed, and Debug builds don't optimise much (if at all).
I'd look at boost::lexical_cast. In its simplest form it simply uses streams. It probably does a lot of allocations.
The statics may be the problem. How many time are you calling the function FixParser?
Every time you call it the token_delim and token_equal objects have there assign methods called and if these are implemented like a vector assign then the memory backing the sequence will be destroyed and then allocated every time the FixParser function is called to assign the new entry.