How to use a std::string without copying? - c++

I have a class say,
class Foo
{
public:
void ProcessString(std::string &buffer)
{
// perform operations on std::string
// call other functions within class
// which use same std::string string
}
void Bar(std::string &buffer)
{
// perform other operations on "std::string" buffer
}
void Baz(std::string &buffer)
{
// perform other operations on "std::string" buffer
}
};
This class tries to use a std::string buffer to perform operations on it using various methods under these conditions:
I don't want to pass a copy of std::string which I already have.
I don't want to create multiple objects of this class.
For example:
// Once an object is created
Foo myObject;
// We could pass many different std::string's to same method without copying
std::string s1, s2, s3;
myObject.ProcessString(s1);
myObject.ProcessString(s2);
myObject.ProcessString(s3);
I could use the string and assign it as a class member so that other functions using can know about it.
But it seems we cannot have a reference class member std::string &buffer because it can only be initialized from constructor.
I could use a pointer to std::string i.e. std::string *buffer and use it as a class member and then pass the addresses of s1, s2, s3.
class Foo
{
public:
void ProcessString(std::string *buf)
{
// Save pointer
buffer = buf;
// perform operations on std::string
// call other functions within class
// which use same std::string string
}
void Bar()
{
// perform other operations on "std::string" buffer
}
void Baz()
{
// perform other operations on "std::string" buffer
}
private:
std::string *buffer;
};
Or, the other way could be pass each functions a reference to std::string buffer just as shown in the first example above.
Both ways kind of seem a bit ugly workarounds to be able to use a std::string without copying as I have rarely seen the usage of std::string as a pointer or pass all the functions of class the same argument.
Is there a better around this or what I'm doing is just fine?

Keeping in MyObject a reference or a pointer to a string which is not ownned by your object is dangerous. It will be easy to get nasty undefined behaviour.
Look at the following legal example (Bar is public):
myObject.ProcessString(s1); // start with s1 and keep its address
myObject.Bar(); // works with s1 (using address previously stored)
Look at the following UB:
if (is_today) {
myObject.ProcessString(string("Hello")); // uses an automatic temporary string
} // !! end of block: temporary is destroyed!
else {
string tmp = to_string(1234); // create a block variable
myObject.ProcessString(tmp); // call the main function
} // !! end of block: tmp is destroyed
myObject.Bar(); // expects to work with pointer, but in reality use an object that was already destroyed !! => UB
The errors are very nasty, because when reading function's usage, everything seems ok and well managed. The problem is hidden by automatic destruction of bloc variables.
So if you really want to avoid the copy of the string, you could use a pointer as you envisaged, but you shall only use this pointer in functions called directly by ProcessString(), and make these functions private.
In all other case, I'd strongly suggest to reconsider your position, and envisage:
a local copy of the string in the object that shall use it.
Or use a string& parameters in all the object's function that need it. This avoids the copies but leaves to caller the responsibility of organising the proper management of the string.

You basically need to answer this question: who owns the string? Does Foo own the string? Does the external caller own the string? Or do they both share ownership of the string.
"Owning" the string means that the lifetime of the string is tied to it. So if Foo owns the string, the string will stop existing when Foo stops existing or destroys it. Shared ownership is far more complicated, but we can make it simpler by saying that the string will exist as long as any of the owners keep it.
Each situation has a different answer:
Foo owns the string: Copy the string into Foo, then let the member methods mutate it.
External resource owns the string: Foo should never hold a reference to the string outside of its own stack, since the string could be destroyed without its knowledge. This means that it needs to be passed by reference to every method that uses it and does not own it, even if the methods are in the same class.
Shared ownership: Use a shared_ptr when creating the string, then pass that shared_ptr to every instance that shares ownership. You then copy the shared_ptr to a member variable, and methods access it. This has much higher overhead then passing by reference, but if you want shared ownership it is one of the safest ways to do so.
There are actually several other kinds of ways to model ownership, but they tend to be more esoteric. Weak ownership, transferable ownership, etc.

Since your requirement is that
1.I don't want to pass a copy of std::string which I already have.
2.I don't want to create multiple objects of this class.
using pass by ref would be the solution to 1
using static would be the solution to 2. since it is a static memeber method, there would be only one copy of this method. it wont belong to any object, though. With that being said, you can call this method directly instead of through an object.
For example,
class Foo
{
static void ProcessString(std::string &s)
{
// perform operations on std::string
// call other functions within class
// which use same std::string string
}
}
when you call this method, it would be something like this:
std::string s1, s2, s3;
Foo::ProcessString(s1);
Foo::ProcessString(s2);
Foo::ProcessString(s3);
One step further, if you want only one instance of this class, you can refer to singleton design pattern.

Related

Is modifying data passed by const shared_ptr& Ok?

I've come across a situation where I needed to set a property to a library object by means of a setter accepting const shared_ptr reference. Here is a simple example:
// library class
class WidgetProxy {
public:
void setName(const std::shared_ptr<std::string>& name);
// more methods
};
Suspecting nothing, I used it like this:
WidgetProxy widgetProxy(...);
auto name = std::make_shared<std::string>("Turing");
widgetProxy.setName(name);
// continue using `name`
Then I've found out that name had become empty after setName() call. Luckily, library source code was available and I was able to examine the implementation. It was roughly the following:
class WidgetImpl {
public:
void setName(std::string name)
{
name_ = std::move(name);
}
private:
std::string name_;
};
void WidgetProxy::setName(const std::shared_ptr<std::string>& name)
{
widgetImpl_.setName(std::move(*name));
}
So setName() moves out the string wrapped by the shared_ptr which is formally not prohibited since shared_ptr template argument is std::string and not const std::string.
My questions:
Is it a normal design to implement WidgetProxy::setName() like this?
Should a library user normally expect such behavior when they see a const shared_ptr<T>& function parameter?
Upd: The posted code snippets are much simplified. In the library there is a different type in place of std::string. I have also omitted checks for pointer validity.
Is it a normal design to implement setName() like this?
This implementation style is OK:
void setName(std::string name)
{
name_ = std::move(name);
}
The string is first copied by the function call, and the copied string is moved to the class member. The resulting code is as efficient than passing a reference to a string, and then copying to the data member.
This one is not. And I do not not recommend it.
void WidgetProxy::setName(const std::shared_ptr<std::string>& name)
{
widgetImpl_.setName(std::move(*name));
}
For 2 reasons. 1: why require a std::shared_ptr if the pointer is not kept? 2: The net result of the operation deletes the string held by the pointee. This affects all the other holders of the shared_ptr, some of which may need the value of the original string.
A more correct way to write this function, and the associated function call:
void WidgetProxy::setName(std::string name)
{
widgetImpl_.setName(std::move(name));
}
// call as:
if (strPtr)
proxy.setName(*strPtr); // with strPtr being a std::shared_ptr<std::string>
Should a library user normally expect such behavior when they see a const shared_ptr& function parameter?
No. This is a terrible way of coding a library. If the caller wishes to keep the string for any reason, he must create a shared_ptr with a copy of the original string. Plus, the library code does not even check if the shared_ptr holds a valid pointer! Very, very naughty.
You misunderstand what this means:
class WidgetProxy {
public:
void setName(const std::shared_ptr<std::string>& name);
};
setName takes a reference to a possibly mutable shared pointer that it does not have permission to modify. This shared pointer refers to a mutable string.
This means within setName, whenever control flows out of what is visible to the compiler, the pointer and validity of name could change (and, you should check that it does not).
The value pointed to by this non-mutable view of a possibly mutable shared pointer is fully mutable. You have full permission to modify it.
Some alternatives:
class WidgetProxy {
public:
void setName(std::shared_ptr<std::string> name);
};
This is a local shared pointer to a mutable string. It can only be modified locally, unless you leak references to it. The data referred to be be manipulated by any other code, and must be assumed to be modified whenever local context is left. It will, however, remain a valid pointer over the lifetime of the setName function unless you personally clear it.
class WidgetProxy {
public:
void setName(std::shared_ptr<std::string const> name);
};
this is a local shared pointer to a string you do not have mutation rights to. Someone else with a shared pointer to it could modify it if it is actually mutable at any point you leave local code, and should be presumed to be doing so.
class WidgetProxy {
public:
void setName(std::string name);
};
this is a local copy of a buffer of characters that nobody else can modify within the function, and that you own.
class WidgetProxy {
public:
void setName(std::string const& name);
};
this is a reference to a possibly mutable external std::string which must be presumed to be changed every time you leave local code in the function.
Personally, I see no reason why WidgetProxy is taking an arguments by shared_ptr or const&. It doesn't use the shared-ness of the argument, nor does it want the value to be remotely changed on it. It is a "sink" argument that it will consume, and the cost of moving the object is low.
WidgetProxy::setName should take a std::string. Sink arguments of cheap-to-move data should take by-value. And use of smart pointers here seems like a horrid idea; why complicate your life with shared_ptr?
It is perfectly fine for WidgetImpl::setName() to be implemented in this manner, as it is moving from local parameter.
It is simply a bug to implement WidgetProxy::setName this way, because you can't realistically expect the object managed by shared_ptr to be movable.

C++ Should I raw pointer (instead of smart) if I am pointing / referencing / making an alias?

I'm creating a string parser class and there's lots of sub-sub private member functions that all need access to an input. I want to avoid having to put the input as a parameter of every function e.g.
string out = func(input){ sub_func(input) { sub_sub_func(input) } } }
I keep hearing raw pointers are bad, but still not getting a clear answer to this specific situation of "simply using a pointer to refer to something". I could do
string m_str //declared as private member
func(string& input) { m_str = move(input) } //member function
or
string& m_str; //declared as a private member
myclass (string input) : m_str(input); //class construction
but what I want to do is
string m_str* //declared as a private member
func(string input) { m_str = input } //member function
QUESTION
Do I need to set m_str to nullptr before myclass object goes out of scope (i.e. put this into the class destructor)?
Should I use unique instead of raw pointer?
For your first question, no you don't need to "reset" any member variables. The object is destructed and should not be used again.
For the second question, it depends. Most of the time you can look at the new smart pointers not as pointers, but from a resource ownership perspective: Can a resource have multiple simultaneous owners (std::shared_ptr), or only one owner at a time (std::unique_ptr)? If you are not going to transfer ownership then there is no need to use a smart pointer really, except as a nice auto-deleting pointer. The bigger question you should ask yourself is, do you need to use pointers? Very often the answer to that is "no".
No, you don't need to assign a pointer type to nullptr on destruction.
Your "or" case will give you a dangling reference once input goes out of scope. That's undefined behaviour.
As for using std::unique_ptr, decide on a case by case basis. std::shared_ptr might be a better choice if you want to have more than one thing "owning" the pointer.
Since you are trying to parse a std::string, have you considered to pass iterators to your functions? Those could simply point to the space in the string, your parser is currently reading.
BTW, I'd not be happy with the design of functions in functions. Consider to create a class with these functions as private member functions. Then you can have the string as class member and every function can simply use the same string instance.

C++ Destructor Called Multiple Times

I'm making a little wrapper class for sqlite. To get data to/from the database I have a class called SQLiteValue. When binding data for a query SQLiteValue instances get created on the stack and passed around a few functions. A skeleton outline of the class is below.
class SQLiteValue : public SQLiteObject
{
private:
// stores a pointer to the data contained (could be of varying types)
union
{
int* i;
double* d;
std::string* s;
std::wstring* ws;
BYTE* b;
} pdata;
int type;
public:
SQLiteValue(const char* val);
SQLiteValue(const wchar_t* val);
.. and so on for varying types
virtual ~SQLiteValue();
};
The object gets created by one of several overloaded constructors. The constructors instantiate a "member" of pdata based on their type. This is the important thing for this class. Now, the problem. I have the constructors overloaded so I get clean method calls and don't need to explicitly call SQLiteValue(xxx). As such I don't really want to use references for functions, so I define them like.
void BindValue(const char* name, SQLiteValue value)
query->BindValue(":username", "user2"); // the "clean" method call
Declaring them like this causes a new object to be instantiated every time (or something similar?) I call a function and so the destructor frees memory allocated for pdata. This is bad.
What I'd like to know is this. Is there a better way to achieve what I'm trying to do whilst retaining my clean method calls? At the moment I have private functions which operate by reference which solves the issue, but I don't really like this method. It would be easy for me to forget the reference and I'd end up tracking down this same issue again.
Thanks.
Change BindValue to take parameter by const reference.
void BindValue(const char* name, const SQLiteValue &value)
This is situation when rvalue reference can help. It doesn't reduce amount of constructors/destructors called, but allows to "steal" internal resources of temporary class instances in rvalue (&&) copy constructor or operator=. See details here: http://blogs.msdn.com/b/vcblog/archive/2009/02/03/rvalue-references-c-0x-features-in-vc10-part-2.aspx
rvalue reference copy constructor just moves another instance internal resources to "this" instance, and resets another instance resources to 0. So, instead of allocation, copying and releasing, it just copies a pointer or handle. "user2" in your code is such temporary instance - rvalue reference.
This can be applied to any C++ compiler implementing C++0x standard.

Returning strings by reference cpp

Forgive me if this has been asked before, I am sure it has but I couldn't find an answer I was happy with.
I am coming to cpp from a heavy Java background and would like to understand when to return a reference/pointer to an object rather than a copy.
for the following class definition:
class SpaceShip {
string name;
WeaponSystem weaponSystem; //represents some object, this is just an example, I dont have this type of object at all in my program
int hull;
string GetName() const {
return name;
}
WeaponSystem GetWeaponSystem() const {
return weaponSystem;
}
int GetHull() const {
return hull;
}
};
I know that returning a copy of things is expensive, I would think this means I want to avoid returning something like a string or weaponSystem by value, but an int by value is ok.
Is this right? I also know that I need to be aware of where things live in memory, does returning a reference to something in this class mean danger down the line if this object is destroyed and something still owns a reference to it's name?
On your last point, you definitely need to be a lot more careful about resource management in C++ than in Java. In particular, you need to decide when an object is no longer needed. Returning by reference has an effect of aliasing to the returned object. It is not noticeable when the object you are sharing is immutable, but unlike Java's Strings, C++ string are mutable. Therefore if you return name by value and then rename your SpaceShip, the caller would see the old name even after the renaming. If you return by reference, however, the caller will see a change as soon as ShaceShip is renamed.
When you deal with copying complex objects, you can decide how much is copied by providing a custom implementation of a copy constructor. If you decide to provide a copy constructor, don't forget the rule of three, and override the other two.
It "works" but you should have
const string& GetName() const {
It may also be beneficial to have the following also
const WeaponSystem& GetWeaponSystem() const {
Also, class is private by default, as such, your accessor functions are private.
the thing you have to know is every getter of your class must be prototype like that :
const <type> &className::getXXX() const
{
...
}
and every setter you make like that :
void className::setXXX(const <type> &)
{
...
}
Use reference when it's possible.
Sometimes, with complex object you can use pointer. That's depend on your code structure.

Storing local variable in std::map

I have a class Message and a class Cache.
In Message::processMessage() fn. I create a instance of another class CacheRef(not shown below.)
then I call Cache::cacheData(cacheRef)
Now, in Cache class, I have a map which has its key as CacheReference. I store the ref that I passed to cacheData fn. in this map.
class Message
{
private:
Key m_key;
public:
void processMessage(int a, int b, Cache *pCache)
{
CacheRef ref(a, b, m_key); //CacheRef is a class defined in same file
//some char *data - do processing an dfill it!!
pCache->cacheData(ref, data);
}
}
class Cache
{
public:
void cacheData(CacheRef &ref, const char* data)
{
CacheDir *dir;
std::map<<CacheRef, CacheDir*>::iterator it = m_dirs.find(ref);
if(it == m_dirs.end())
{
dir = new CacheDir();
m_dirs.insert(ref, dir);
}
}
std::map<CacheRef, CacheDir*> m_dirs; //CacheDir is some class defined in the same file
}
Now, the code is working absolutely fine. But I have this concern(not sure!!) that I am storing some local variable in map, which which cease to exist as soon as processMessage()fn. exits. So, am I accessing some invalid memory, is it just by luck that this code is working.
If this is wrong, what is the best way to achieve this behaviour?
I don't have boost on my system, so can't use shared_ptr for anything.
Because the 1st template parameter is a CacheRef (and not a reference or pointer to a CacheRef) then ref will be copied into the map when you do the insert. Hence, you won't be storing a reference to a local stack variable.
As long as there is an appropriate copy constructor or assignment operator for CacheRef then this will work ok.
As Stephen Doyle pointed out, you are actually storing a copy of the CacheRef in the map, not a reference to the one passed to the cacheData() method.
Whether this causes a problem or not depends on the definition of the CacheRef class. If, for example, a CacheRef holds a pointer or a reference to the Key passed to the constructor, you will end up with an invalid pointer once the Message instance is destroyed.
By the way, since you are storing dynamically allocated objects of CacheDir in Cache::m_dirs, you should make sure to delete all values in the map in the Cache::~Cache() destructor to avoid memory leaks.