C++ String pointers - c++

In my previous app I had an object like this:
class myType
{
public:
int a;
string b;
}
It had a lot of instances scattered everywhere and passed around to nearly every function.
The app was slow. Profiling said that 95% of time is eaten by the string allocator function.
I know how to work with the object above, but not how to work with string pointers.
class myType
{
public:
int a;
string* b;
}
They told me to use pointers as above.
How much faster is it with a string pointer?
What is copied when I copy the object?
How to the following using the class with the pointer:
Access the string value
Modify the string value without modifying the one in the object (copy?)
General things that change if I use string pointers?

It will actually probably be slower - you still need to create and copy the strings, but now you have the overhead of dynamic allocation on top. My guess is that you are copying your objects around too much - whenever you call a function, your myType objects should be passed as const references, wherever possible, not by value:
void f( const myType & mt ) {
// stuff
}
If you actually need to change mt, you would have used a non-const reference - this is also less expensive than passing a value and returning a new value with modified fields.

I think using a pointer like this is a bad idea. Instead, look at how your myType is being used instead. In particular, instead of this:
void foo(myType a)
{
// ...
}
Consider this:
void foo(myType const &a)
{
// ...
}
In the former case, a copy of myType needs to be created to pass to the function foo(), in the second, no copy is needed, since a reference is passed instead (it's marked as const so that you can be sure foo() doesn't try to modify it - giving you (almost)the same behaviour as the first method).
There are probably other things you could change, but my guess is that doing this would give you the most bang for your buck (and it's a pretty mechanical change, so hopefully not too much chance of problems being introduced)

To add to the other answers, you also should be making sure any member functions of that class that pass a string are passing by const reference. For example, say your class constructor definition looks like this:
myType::myType(int a, string b)
Use this instead:
myType::myType(int a, const string& b)
So basically, go through all your function parameters throughout your project, and change string to const string&, and myType to const myType&. This alone should fix the majority of your performance issues.
Note: About dynamically allocating the string and passing as a pointer: This is not a good idea, as though it will lighten the performance load somewhat, you're going to be extremely vulnerable to memory leaks, which makes debugging a nightmare (in addition to being much more haphazardly destructive to your performance than running slow). As a general rule, I highly discourage passing naked pointers. There's almost always a better, safer alternative.

Related

C++ When does it make sense to pass a const struct parameter by value vs. reference?

I've seen a similar question to this, but i'd like some clarification...
Assuming a basic C++ class:
class MyClass
{
public:
struct SomeData
{
std::wstring name;
std::vector<int> someValues;
};
void DoSomething(const SomeData data);
}
I understand that data will be passed as const to DoSomething and that is ok since data will not be modified in any way by the function...but I am used to seeing & specified with const parameters to ensure that they are passed by reference, e.g.
void DoSomething(const SomeData& data);
That seems more efficient to me. If we omit the &, then isn't data being passed by value to DoSomething? I'm not sure why it would ever be preferable to pass a const parameter by value when you can pass by reference and avoid the copy occurring?
Pass by value/reference and const-correctness are two different concepts. But used together.
Pass by Value
void DoSomething (SomeData data);
Pass by value is used when it is less costly to copy and do not want to keep references to foreign objects. This function could (if it is inside a class) keep a pointer to this in some case and have its own copy.
Pass by reference
void DoSomething (SomeData& data);
Always use pass by reference if you know this might cause a performance loss copying the struct. This function could (if it is inside a class) keep a pointer to this in some case and pointing to a foreign object. Keeping pointers to foreign objects mean you should aware of its life-time and when this foreign object goes out of bound. More importantly changes to foreign object appears to your pointer.
const correctness
void DoSomething (const SomeData data); // const for local copy
void DoSomething (const SomeData& data); // const for callers object
Adding constto pass by value or reference means this function does not change it. But not having or having & decides which object you are trying to add safety of modifying. const is a very helpful tool in C++ in terms of Documenting APIs, provide compile time safety, allow more compiler optimizations.
Read this article.
The biggest problem with void DoSomething(const SomeData data) is that it conflates interface and implementation. From the caller's point of view, the const doesn't change anything; the function receives a copy anyway, and the original object is not modified. What the implementation does or does not with its own, function-internal copy should not bother the caller and should thus not be expressed in the interface.
The const does make the implementation more const-correct if the copy is not changed, but leaking implementation details into the interface is a high price to pay. I recommend not using void DoSomething(const SomeData data).
As always, performance gains or losses should not be overestimated here. It's more about semantics and conventions.
Passing a const value is mostly informational for the caller, it shows intent. This is important to make code easy to read, understand and maintain.
It might also be possible for the compiler to add some extra optimizations if it knows that the function doesn't modify its argument. For example it might cause the compiler to not perform a copy at all.
This:
void DoSomething(const SomeData data);
is a bit unusual, because while it does not change anything for the caller (who can pass a const or non-const value), it restricts what the function can do internally. There's not a lot of value in that, and it's not commonly done.
Passing by reference (const or not) is more efficient if the value is expensive to copy, including if it is larger than approximately two pointers on the target platform. In other words, if SomeData were a struct containing two integers it would probably be more efficient to pass it by value. But if it contains a std::map or some larger data, better pass it by reference.
An exception to this is if the function is going to copy the value anyway, then it is better to take it by value, because the value might be "moved" instead of copied if the caller allows it.

C++ Move Constructor With Pointers (*&& Syntax)

I would like to know if there are any problems with the construction <type>*&& in C++. Let me give a concrete example.
Say we have a class that should be constructed from an array. We would usually do something like this:
class Things
{
public:
Things(const ThingType* arrayOfThings, int sizeOfArray)
: myArray(new ThingType[sizeOfArray])
{
for (int i = 0; i < sizeOfArray; i++)
myArray[i] = arrayOfThings[i];
}
private:
ThingType* myArray;
}
This is fine if we want to preserve arrayOfThings, because we are doing a deep copy of it. Moreover, by using const we are ensuring it won't be modified inside the constructor.
But suppose our program has a lot of statements like this one:
Things myThings(new ThingType[9001] {thing_0, ... , thing_9000}, 9001);
This might seem weird, but it may happen that the huge ThingType array is returned from a function as a rvalue.
In that case, we don't care about preserving the pointer passed as a parameter. In fact, we definitely don't want to do a deep copy of it, because it would be a huge waste of time preserving something we are about to destroy anyways.
One possible solution to this would to add another constructor that would handle the case of a non-const rvalue ThingType pointer, like a general move constructor handles the case of a non-const rvalue instance of the class:
public:
Things(ThingType*&& arrayOfThings, int sizeOfArray)
: myArray(arrayOfThings)
{
arrayOfThings = NULL;
}
This seems to be solving the problem for me, but I did not find much information about the <type>*&& construction seen above. Is it kosher, or will I be sent to the dungeons for mixing pointers and references?
After some time I believe I found satisfactory - although far from optimal - solutions. Since no one answered the question, I will share the best workarounds I found.
As Justin pointed out, using ThingType*&& can lead to trouble when we take the address of a variable of ThingType (but not of a ThingType pointer, as he implied in his answer) in the constructor call.
If t is a ThingType, the expression &t is an r-value of type ThingType*, so Thing myThing(&t, ...) will call the move constructor, with the result of making myThing.myArray point to t. This is not what we would want in most cases.
One solution is to use vectors instead of arrays, as illustrated below:
// Copy
explicit Things(const std::vector<ThingType>& vectorOfThings)
: myVector(vectorOfThings)
{ }
// Move
explicit Things(std::vector<ThingType>&& vectorOfThings)
: myVector(std::move(vectorOfThings))
{ }
The move constructor would then be used in situations like this:
Things myThings(vector<ThingType>{thing_0, ... , thing_9000});
Although this solves the problem, it is not feasible if we are dependent on an API which returns raw pointers, or if we want don't want to give up arrays. In this case, we can use smart pointers to solve the problem.
Suppose, we have a function ThingType* generateArray() which we want to use to initialize our object of type Things. The first thing we should do is to wrap this function with another function that returns a smart pointer instead.
unique_ptr<ThingType[]> generateSmartPointer()
{
return unique_ptr<ThingType[]>(generateArray());
}
Here I used a unique_pointer, but this could change depending on the implementation.
Now we add a new constructor to Things, with a instance of unique_ptr as an argument. This will act as the move constructor for arrays of ThingType:
Things(unique_ptr<ThingType[]> thingsPointer, int sizeOfArray)
: myArray(thingsInput.release()), size(sizeOfArray)
{ }
unique_ptr<T>.release() is used to get the array, and at the same time make the unique pointer release ownership of it, preventing the array from being deleted once the unique pointer is destroyed.
And that's it. These are the two best solutions I found to this problem, and while they are far from perfect, they have worked so far considering the objectives for each implementation.

Passing const references to functions

I was watching a video and saw this code:
class Dog {
public:
Dog() : age(3), name("dummy") {}
void setAge(const int &a) { age = a; }
private:
int age;
std::string name;
};
I was curious about the function signature for setAge because I've never used const as a function parameter. I've looked at several related answers, but none that seemed to answer my question.
In such an elementary example, it's hard to see the benefit of passing a const reference to a function.
Is there any reason why you'd want to make the reference a const? The only application I could think of is in embedded programming when making a copy of a variable could waste precious space.
Are there any simple examples, perhaps, where the impact is seen easily of passing a const reference?
Consider the following three examples:
(i) void setAge(int &a) { age = a; }
(ii) void setAge(const int &a) { age = a; }
(iii) void setAge(int a) { age = a; }
Further think of your class as an encapsulated object, that is the outside world in general doesn't know what's going on inside.
Then, using case (i), the caller cannot know that a has not been changed afterwards.
int a=3;
dog.setAge(a); //case (i): what is "a" afterwards?
One does not know what value a holds after the function call -- in fact, the function signature tells the caller that a change of a is likely to occur.
On the other hand, by using variant (ii), you again pass the object via reference, that is you do not make a copy, but tell the function the memory address where it can go to accss the parameter. In contrast to case (i), now you ensure the caller "nothing is going to happen to your parameter". That is, you can safely work with the parameter afterwards and be assured it has still the same value as before (--at least in principle, as also bad things like a const_cast might happen inside the function).
Finally, in case (iii), one makes a copy of the int and uses that inside the function. For built-in types like int, that is in fact the preferred way to pass function parameters. However, it might be uneffective if the object is expensive-to-copy.
With regard to the whole const correctness-topic, see here.
Making a copy of a variable wastes precious space and wastes precious time, which is why avoiding unnecessary copies is something to worry about in any kind of programming, not just embedded one. For this reason using const T & references for passing read-only parameters has always been an idiom in C++.
However, it is usually reserved for passing in "heavy" objects, i.e. objects that that are relatively expensive to copy. Using it with scalar types (as in your example) makes little or no sense.
Consider an object which has a large memory footprint. You need to pass it to a function - a function which will only extract some information without changing the object in anyway possible. A const reference is a good candidate in such a case.
std::vector can be an example, a Matrix object is another example.
void setAge(const int &a) { age = a; }
You are passing a const int reference, you don't intend to modify the value a inside function setAge.
void setAge(int &a) { age = a; }
You are passing a int reference, you don't care does function setAge modify the value a inside function setAge.
Doing const correctness is a good programming practice here as a practice of better interface definition. First function signature convey the message very clearly, and second one is vague compared to first one.
You can read more about const correctness here https://isocpp.org/wiki/faq/const-correctness#overview-const

In c++11, is there ever still a need to pass in a reference to an object that will accept the output of a function?

Prior to C++11, if I had a function that operated on large objects, my instinct would be to write functions with this kind of prototype.
void f(A &return_value, A const &parameter_value);
(Here, return_value is just a blank object which will receive the output of the function. A is just some class which is large and expensive to copy.)
In C++11, taking advantage of move semantics, the default recommendation (as I understand it) is the more straightforward:
A f(A const &parameter_value);
Is there ever still a need to do it the old way, passing in an object to hold the return value?
Others have covered the case where A might not have a cheap move constructor. I'm assuming your A does. But there is still one more situation where you might want to pass in an "out" parameter:
If A is some type like vector or string and it is known that the "out" parameter already has resources (such as memory) that can be reused within f, then it makes sense to reuse that resource if you can. For example consider:
void get_info(std::string&);
bool process_info(const std::string&);
void
foo()
{
std::string info;
for (bool not_done = true; not_done;)
{
info.clear();
get_info(info);
not_done = process_info(info);
}
}
vs:
std::string get_info();
bool process_info(const std::string&);
void
foo()
{
for (bool not_done = true; not_done;)
{
std::string info = get_info();
not_done = process_info(info);
}
}
In the first case, capacity will build up in the string as the loop executes, and that capacity is then potentially reused on each iteration of the loop. In the second case a new string is allocated on every iteration (neglecting the small string optimization buffer).
Now this isn't to say that you should never return std::string by value. Just that you should be aware of this issue and apply engineering judgment on a case by case basis.
It is possible for an object to be large and expensive to copy, and for which move semantics cannot improve on copying. Consider:
struct A {
std::array<double,100000> m_data;
};
It may not be a good idea to design your objects this way, but if you have an object of this type for some reason and you want to write a function to fill the data in then you might do it using an out param.
It depends: does your compiler support return-value-optimization, and is your function f designed to be able to use the RVO your compiler supports?
If so, then yes, by all means return by value. You will gain nothing at all by passing a mutable parameter, and you'll gain a great deal of code clarity by doing it this way. If not, then you have to investigate the definition of A.
For some types, a move is nothing more than a copy. If A doesn't contain anything that is actually worth moving (pointers transferring ownership and so forth), then you're not going to gain anything by moving. A move isn't free, after all; it's simply a copy that knows that anything owned by the original is being transferred to the copy. If the type doesn't own anything, then a move is just a copy.

Returning strings by reference cpp

Forgive me if this has been asked before, I am sure it has but I couldn't find an answer I was happy with.
I am coming to cpp from a heavy Java background and would like to understand when to return a reference/pointer to an object rather than a copy.
for the following class definition:
class SpaceShip {
string name;
WeaponSystem weaponSystem; //represents some object, this is just an example, I dont have this type of object at all in my program
int hull;
string GetName() const {
return name;
}
WeaponSystem GetWeaponSystem() const {
return weaponSystem;
}
int GetHull() const {
return hull;
}
};
I know that returning a copy of things is expensive, I would think this means I want to avoid returning something like a string or weaponSystem by value, but an int by value is ok.
Is this right? I also know that I need to be aware of where things live in memory, does returning a reference to something in this class mean danger down the line if this object is destroyed and something still owns a reference to it's name?
On your last point, you definitely need to be a lot more careful about resource management in C++ than in Java. In particular, you need to decide when an object is no longer needed. Returning by reference has an effect of aliasing to the returned object. It is not noticeable when the object you are sharing is immutable, but unlike Java's Strings, C++ string are mutable. Therefore if you return name by value and then rename your SpaceShip, the caller would see the old name even after the renaming. If you return by reference, however, the caller will see a change as soon as ShaceShip is renamed.
When you deal with copying complex objects, you can decide how much is copied by providing a custom implementation of a copy constructor. If you decide to provide a copy constructor, don't forget the rule of three, and override the other two.
It "works" but you should have
const string& GetName() const {
It may also be beneficial to have the following also
const WeaponSystem& GetWeaponSystem() const {
Also, class is private by default, as such, your accessor functions are private.
the thing you have to know is every getter of your class must be prototype like that :
const <type> &className::getXXX() const
{
...
}
and every setter you make like that :
void className::setXXX(const <type> &)
{
...
}
Use reference when it's possible.
Sometimes, with complex object you can use pointer. That's depend on your code structure.