Questions concerning value classes and vectors - c++

More C++ learning questions. I've been using vectors primarily with raw pointers with a degree of success, however, I've been trying to play with using value objects instead. The first issue I'm running into is compile error in general. I get errors when compiling the code below:
class FileReference {
public:
FileReference(const char* path) : path(string(path)) {};
const std::string path;
};
int main(...) {
std::vector<FileReference> files;
// error C2582: 'operator =' function is unavailable in 'FileReference'
files.push_back(FileReference("d:\\blah\\blah\\blah"));
}
Q1: I'm assuming it's because of somehow specifying a const path, and/or not defining an assignment operator - why wouldn't a default operator work? Does defining const on my object here even I'm assuming it's because I defined a const path, Does const even win me anything here?
Q2: Secondly, in a vector of these value objects, are my objects memory-safe? (meaning, will they get automatically deleted for me). I read here that vectors by default get allocated to the heap -- so does that mean I need to "delete" anything.
Q3: Thirdly, to prevent copying of the entire vector, I have to create a parameter that passes the vector as a reference like:
// static
FileReference::Query(const FileReference& reference, std::vector<FileReference>& files) {
// push stuff into the passed in vector
}
What's the standard for returning large objects that I don't want to die when the function dies. Would I benefit from using a shared_ptr here or something like that?

If any member variables are const, then a default assignment operator can't be created; the compiler doesn't know what you would want to happen. You would have to write your own operator overload, and figure out what behaviour you want. (For this reason, const member variables are often less useful than one might first think.)
So long as you're not taking ownership of raw memory or other resources, then there's nothing to clean up. A std::vector always correctly deletes its contained elements when its lifetime ends, so long as they in turn always correctly clean up their own resources. And in your case, your only member variable is a std:string, which also looks after itself. So you're completely safe.
You could use a shared pointer, but unless you do profiling and identify a bottleneck here, I wouldn't worry about it. In particular, you should read about copy elision, which the compiler can do in many circumstances.

Elements in vector must be assignable from section 23.2.4 Class template vector of the C++ standard:
...the stored object shall meet the requirements of Assignable.
Having a const member makes the class unassignable.
As the elements are being stored by value, they will be destructed when the vector is destroyed or when they are removed from the vector. If the elements were raw pointers, then they would have to be explicitly deleted.

Related

Make struct object thread safe

I have a function call
void moveMeToThread(UnsafeStruct *ptr)
{
// do stuff with ptr
}
Now I want to move moveMeToThread to a different thread, so I do not want anyone creating an object of UnsafeStruct on the stack and I also want memory of all UnsafeStruct objects made on the heap to be freed automatically. Anyone have an elegant way to do this?
Sounds like you'd like to make a heap-only class. There are many ways to force this:
you might make private ctors (all of them!) and create a static create() function that returns a pointer (sometimes called named ctor)
you might make dtor private
The latter technically does not save you from placement new to a suitable memory block, but otherwise protects from sensible coding mistakes and is way more compatible with algorithms and containers. E.g. you can still copy such an object via copy ctor outside the class, which is not possible if you make all ctors private (which is a requirement for the first version).
You might do this:
template<typename T>
class HeapOnly
{
public:
T t;
operator T&() { return t; }
operator const T&() const { return t; }
private:
~HeapOnly();
};
void moveMeToThread(HeapOnly<UnsafeStruct> *ptr)
{ /* ... */ }
int main()
{
HeapOnly<UnsafeStruct> *ptr =
new HeapOnly<UnsafeStruct>{/* args to UnsafeStruct */};
moveToThread(ptr);
}
Small note: there's no such thing as (call/parameter) stack in the C++ standard. It only appears in ItaniumABI (and potentially in other ABIs). Standard says ASDV (automatic storage duration variables) for what's commonly referred to as 'on the stack', but nothing prevents an implementation to allocate the memory on the stack (as long as compiler can prove that the object's lifetime cannot extend the stack unroll - this works e.g. if it's allocated before static initialization). It might be completely unimportant in your case, but in security-related codes, where buffer overflow is important, you can't strictly enforce not having objects allocated from the same stack this way (and thus it's suggested to do a runtime check) - but you can still enforce that the object is allocated via new (or the given static member function).
I do not want anyone creating an object of UnsafeStruct on the stack.
I want to forbid it because if someone creates an object on the stack and sends it on the thread, it can cause a crash(dangling pointer)
Do you also want to prevent anybody from creating int variables on the stack? Because if somebody creates an int variable on the stack, and if they allow a reference or a pointer to it to outlive the stack frame that contains the variable, then their program could crash.
Seriously.
That problem is older than C++. That problem has existed since the very first edition of the C programming language. Every C and C++ programmer has to learn not to do that. Always have. Always will.
In some languages (e.g., Java, Python), No object of any kind can be allocated anywhere else except the garbage-collected heap. Variables can only hold references to objects, and dangling references are impossible. Programmers in those languages expect an assignment a=b to copy an object reference. That is, after the assignment, a and b both refer to the same object.
That's not the C++ way. C++ programmers expect that if some type T is publicly constructable, then they expect to be allowed to declare one wherever they want. And when they see a=b, they think of that assignment operator as copying a value. They expect that after the assignment, a and b still are two different objects that both have (in some sense) the same "value."
You will find more people saying positive things about your library if you design it to work in that same way.

Is this the right way to return a struct in a parameter?

I made the following method in a C++/CLI project:
void GetSessionData(CDROM_TOC_SESSION_DATA& data)
{
auto state = CDROM_TOC_SESSION_DATA{};
// ...
data = state;
}
Then I use it like this in another method:
CDROM_TOC_SESSION_DATA data;
GetSessionData(data);
// do something with data
It does work, returned data is not garbage, however there's something I don't understand.
Question:
C++ is supposed to clean up state when it has exitted its scope, so data is a copy of state, correct ?
And in what exactly it is different from the following you see on many examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data); // signature should be GetSession(CDROM_TOC_SESSION_DATA *data)
Which one makes more sense to use or is the right way ?
Reference:
CDROM_TOC_SESSION_DATA
Using a reference vs a pointer for an out parameter is really more of a matter of style. Both function equally well, but some people feel that the explicit & when calling a function makes it more clear that the function may modify the parameter it was passed.
i.e.
doAThing(someObject);
// It's not clear that doAThing accepts a reference and
// therefore may modify someObject
vs
doAThing(&someObject);
// It's clear that doAThing accepts a pointer and it's
// therefore possible for it to modify someOjbect
Note that 99% of the time the correct way to return a class/struct type is to just return it. i.e.:
MyType getObject()
{
MyType object{};
// ...
return object;
}
Called as
auto obj = getObject();
In the specific case of CDROM_TOC_SESSION_DATA it likely makes sense to use an out parameter, since the class contains a flexible array member. That means that the parameter is almost certainly a reference/pointer to the beginning of some memory buffer that's larger than sizeof(CDROM_TOC_SESSION_DATA), and so must be handled in a somewhat peculiar way.
C++ is supposed to clean up state when it has exitted its scope, so
data is a copy of state, correct ?
In the first example, the statement
data = state
presumably copies the value of state into local variable data, which is a reference to the same object that is identified by data in the caller's scope (because those are the chosen names -- they don't have to match). I say "presumably" because in principle, an overridden assignment operator could do something else entirely. In any library you would actually want to use, you can assume that the assignment operator does something sensible, but it may be important to know the details, so you should check.
The lifetimes of local variables data and state end when the method exits. They will be cleaned up at that point, and no attempt may be made to access them thereafter. None of that affects the caller's data object.
And in what exactly it is different from the following you see on many
examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data);
Not much. Here the caller passes a pointer instead of a reference. GetSessionData must be declared appropriately for that, and its implementation must explicitly dereference the pointer to access the caller's data object, but the general idea is the same for most intents and purposes. Pointer and reference are similar mechanisms for indirect access.
Which one makes more sense to use or is the right way ?
It depends. Passing a reference is generally a bit more idiomatic in C++, and it has the advantage that the method does not have to worry about receiving a null or invalid pointer. On the other hand, passing a pointer is necessary if the function has C linkage, or if you need to accommodate the possibility of receiving a null pointer.

Questions and Verifications on immutable [string] objects c++

I've been doing some reading on immutable strings in general and in c++, here, here, and I think I have a decent understanding of how things work. However I have built a few assumptions that I would just like to run by some people for verification. Some of the assumptions are more general than the title would suggest:
While a const string in c++ is the closest thing to an immutable string in STL, it is only locally immutable and therefore doesn't experience the benefit of being a smaller object. So it has all the trimmings of a standard string object but it can't access all of the member functions. This means that it doesn't create any optimization in the program over non-const? But rather just protects the object from modification? I understand that this is an important attribute but I'm simply looking to know what it means to use this
I'm assuming that an object's member functions exist only once in read-only memory, and how is probably implementation specific, so does a const object have a separate location in memory? Or are the member functions limited in another way? If there are only 'const string' objects and no non-const strings in a code base, does the compiler leave out the inaccessible functions?
I recall hearing that each string literal is stored only once in read-only memory in c++, however I don't find anything on this here. In other words, if I use some string literal multiple times in the same program, each instance references the same location in memory. I'm going to assume no, but would two string objects initialized by the same string literal point to the same string until one is modified?
I apologize if I have included too many disjunct thoughts in the same post, they are all related to me as string representation and just learning how to code better.
As far as I know, std::string cannot assume that the input string is a read-only constant string from your data segment. Therefore, point (3) does not apply. It will most likely allocate a buffer and copy the string in the buffer.
Note that C++ (like C) has a const qualifier for compilation time, it is a good idea to use it for two reasons: (a) it will help you find bugs, a statement such as a = 5; if a is declared const fails to compile; (b) the compile may be able to optimize the code more easily (it may otherwise not be able to figure out that the object is constant.)
However, C++ has a special cast to remove the const-ness of a variable. So our a variable can be cast and assigned a value as in const_cast<int&>(a) = 5;. An std::string can also get its const-ness removed. (Note that C does not have a special cast, but it offers the exact same behavior: * (int *) &a = 5)
Are all class members defined in the final binary?
No. std::string as most of the STL uses templates. Templates are compiled once per unit (your .o object files) and the link will reduce duplicates automatically. So if you look at the size of all the .o files and add them up, the final output will be a lot small.
That also means only the functions that are used in a unit are compiled and saved in the object file. Any other function "disappear". That being said, often function A calls function B, so B will be defined, even if you did not explicitly call it.
On the other hand, because these are templates, very often the functions get inlined. But that is a choice by the compiler, not the language or the STL (although you can use the inline keyword for fun; the compiler has the right to ignore it anyway).
Smaller objects... No, in C++ an object has a very specific size that cannot change. Otherwise the sizeof(something) would vary from place to place and C/C++ would go berserk!
Static strings that are saved in read-only data sections, however, can be optimized. If the linker/compiler are good enough, they will be able to merge the same string in a single location. These are just plan char * or wchar_t *, of course. The Microsoft compiler has been able to do that one for a while now.
Yet, the const on a string does not always force your string to be put in a read-only data section. That will generally depend on your command line option. C++ may have corrected that, but I think C still put everything in a read/write section unless you use the correct command line option. That's something you need to test to make sure (your compiler is likely to do it, but without testing you won't know.)
Finally, although std::string may not use it, C++ offers a quite interesting keyword called mutable. If you heard about it, you would know that a variable member can be marked as mutable and that means even const functions can modify that variable member. There are two main reason for using that keyword: (1) you are writing a multi-thread program and that class has to be multi-thread safe, in that case you mark the mutex as mutable, very practical; (2) you want to have a buffer used to cache a computed value which is costly, that buffer is only initialized when that value is requested to not waste time otherwise, that buffer is made mutable too.
Therefore the "immutable" concept is only really something that you can count on at a higher level. In practice, reality is often quite different. For example, an std::string c_str() function may reallocate the buffer to add the necessary '\0' terminator, yet that function is marked as being a const:
const CharT* c_str() const;
Actually, an implementation is free to allocate a completely different buffer, copy its existing data to that buffer and return that bare pointer. That means internally the std::string could be allocate many buffers to store large strings (instead of using realloc() which can be costly.)
Once thing, though... when you copy string A into string B (B = A;) the string data does not get copied. Instead A and B will share the same data buffer. Once you modify A or B, and only then, the data gets copied. This means calling a function which accepts a string by copy does not waste that much time:
int func(std::string a)
{
...
if(some_test)
{
// deep copy only happens here
a += "?";
}
}
std::string b;
func(b);
The characters of string b do not get copied at the time func() gets called. And if func() never modifies 'a', the string data remains the same all along. This is often referenced as a shallow copy or copy on write.

Why is this allocation of a vector not accepted?

So I have declared a vector in my class header like this:
...
private:
vector<Instruction> instructions;
...
Then in the .cpp implementation in the constructor, I try to initialize it like this:
instructions = new vector<Instruction>();
Xcode tells me: No viable overloaded '='
I am basically trying to get this class to behave like I would expect in java, where instances of the class retain this vector. Thats why I wanted to dynamically allocate it using new, so as to make sure that it doesn't get lost on the stack or something. Any help would be appreciated with this, thanks so much.
In order to do what you're trying to do the instructions = new vector<Instruction>() line is entirely unnecessary. Simply remove it. The vector will automatically get default-constructed when an instance of your class gets constructed.
An alternative is to make instructions into a pointer, but there doesn't appear to be any reason to do this here.
when you write
vector<Instruction> instructions;
you already have instantiated instructions to whatever memory model the user of your class is using e.g.
class YourClass
{
vector<Instruction> instructions;
};
...
int main()
{
YourClass class1; // stack
std::unique_ptr<YourClass> class2(new YourClass); // heap
...
}
In your class, you declare a std::vector<Instruction>. new vector<Instruction>(); returns you a std::vector<Instruction>*.
operator new returns a pointer, so you have a type mismatch.
The real issue is the fact that you are doing it at all. Do you have a good reason for dynamically allocating that vector? I doubt it, just omit that entirely as it will be allocated along with instances of your type.
You have a member value but you try to initialize it from a vector<Instruction>*. Initialize it from vector<Instruction> or change the declaration to a pointer. If you go down the second route, you need to observe the rule of three.
You might also want to get a decent C++ book from this list.
Also, I think you have a using namespace std; in your header which is bad.
Do not use new in C++ unless you know what you are doing. (Which you do not, currently.)
Instead use automatic objects. You already defined instructions to be an automatic object. You just need to init it as if it were one:
class wrgxl {
public:
wrgxl()
: instructions() // this initializes the vector using its default constructor
{
// nothing needed here
}
...
private:
vector<Instruction> instructions;
...
};
The initialization of instructions in the constructor's initialization list is optional, though, if you only want to call the default constructor anyway. So in this case, this would be enough:
wrgxl()
{
}
If you wanted to dynamically allocate a vector, you would need to make instructions a pointer to a vector. But this rarely ever make sense, since the vector already allocates its data dynamically, but wraps this, so you do not have to deal with the ugly details resulting from this.
One of those details is that, if you have a dynamically allocated object in a class, you will then have to worry about destruction, copy construction, and copy assignment for that class.
As Kerrek already pointed out, you will need to have a good C++ book in order to properly learn C++. Make your pick.
I think you are confusing C++'s with C#'s syntax.
First, unlike in many languages, variables allocated on the stack (such as yours), are initialized by calling the default constructor, so I suspect that what you are doing is unnecessary.
Second, in order to do what you are trying to do, you use the following syntax:
instructions = vector<Instruction>();
however, as I said, this is likely redundant (and wasteful on a non-optimizing compiler as it might call both the constructor and the assignment operator). A much better way to do this is found in sbi's answer.
Third, unlike in C#, the new operator allocates memory on the heap and returns a pointer to the newly allocated data. Your variable instructions is not a pointer, thus the error.

When is it not a good idea to pass by reference?

This is a memory allocation issue that I've never really understood.
void unleashMonkeyFish()
{
MonkeyFish * monkey_fish = new MonkeyFish();
std::string localname = "Wanda";
monkey_fish->setName(localname);
monkey_fish->go();
}
In the above code, I've created a MonkeyFish object on the heap, assigned it a name, and then unleashed it upon the world. Let's say that ownership of the allocated memory has been transferred to the MonkeyFish object itself - and only the MonkeyFish itself will decide when to die and delete itself.
Now, when I define the "name" data member inside the MonkeyFish class, I can choose one of the following:
std::string name;
std::string & name;
When I define the prototype for the setName() function inside the MonkeyFish class, I can choose one of the following:
void setName( const std::string & parameter_name );
void setName( const std::string parameter_name );
I want to be able to minimize string copies. In fact, I want to eliminate them entirely if I can. So, it seems like I should pass the parameter by reference...right?
What bugs me is that it seems that my localname variable is going to go out of scope once the unleashMonkeyFish() function completes. Does that mean I'm FORCED to pass the parameter by copy? Or can I pass it by reference and "get away with it" somehow?
Basically, I want to avoid these scenarios:
I don't want to set the MonkeyFish's name, only to have the memory for the localname string go away when the unleashMonkeyFish() function terminates. (This seems like it would be very bad.)
I don't want to copy the string if I can help it.
I would prefer not to new localname
What prototype and data member combination should I use?
CLARIFICATION: Several answers suggested using the static keyword to ensure that the memory is not automatically de-allocated when unleashMonkeyFish() ends. Since the ultimate goal of this application is to unleash N MonkeyFish (all of which must have unique names) this is not a viable option. (And yes, MonkeyFish - being fickle creatures - often change their names, sometime several times in a single day.)
EDIT: Greg Hewgil has pointed out that it is illegal to store the name variable as a reference, since it is not being set in the constructor. I'm leaving the mistake in the question as-is, since I think my mistake (and Greg's correction) might be useful to someone seeing this problem for the first time.
One way to do this is to have your string
std::string name;
As the data-member of your object. And then, in the unleashMonkeyFish function create a string like you did, and pass it by reference like you showed
void setName( const std::string & parameter_name ) {
name = parameter_name;
}
It will do what you want - creating one copy to copy the string into your data-member. It's not like it has to re-allocate a new buffer internally if you assign another string. Probably, assigning a new string just copies a few bytes. std::string has the capability to reserve bytes. So you can call "name.reserve(25);" in your constructor and it will likely not reallocate if you assign something smaller. (i have done tests, and it looks like GCC always reallocates if you assign from another std::string, but not if you assign from a c-string. They say they have a copy-on-write string, which would explain that behavior).
The string you create in the unleashMonkeyFish function will automatically release its allocated resources. That's the key feature of those objects - they manage their own stuff. Classes have a destructor that they use to free allocated resources once objects die, std::string has too. In my opinion, you should not worry about having that std::string local in the function. It will not do anything noticeable to your performance anyway most likely. Some std::string implementations (msvc++ afaik) have a small-buffer optimization: For up to some small limit, they keep characters in an embedded buffer instead of allocating from the heap.
Edit:
As it turns out, there is a better way to do this for classes that have an efficient swap implementation (constant time):
void setName(std::string parameter_name) {
name.swap(parameter_name);
}
The reason that this is better, is that now the caller knows that the argument is being copied. Return value optimization and similar optimizations can now be applied easily by the compiler. Consider this case, for example
obj.setName("Mr. " + things.getName());
If you had the setName take a reference, then the temporary created in the argument would be bound to that reference, and within setName it would be copied, and after it returns, the temporary would be destroyed - which was a throw-away product anyway. This is only suboptimal, because the temporary itself could have been used, instead of its copy. Having the parameter not a reference will make the caller see that the argument is being copied anyway, and make the optimizer's job much more easy - because it wouldn't have to inline the call to see that the argument is copied anyway.
For further explanation, read the excellent article BoostCon09/Rvalue-References
If you use the following method declaration:
void setName( const std::string & parameter_name );
then you would also use the member declaration:
std::string name;
and the assignment in the setName body:
name = parameter_name;
You cannot declare the name member as a reference because you must initialise a reference member in the object constructor (which means you couldn't set it in setName).
Finally, your std::string implementation probably uses reference counted strings anyway, so no copy of the actual string data is being made in the assignment. If you're that concerned about performance, you had better be intimately familiar with the STL implementation you are using.
Just to clarify the terminology, you've created MonkeyFish from the heap (using new) and localname on the stack.
Ok, so storing a reference to an object is perfectly legit, but obviously you must be aware of the scope of that object. Much easier to pass the string by reference, then copy to the class member variable. Unless the string is very large, or your performing this operation a lot (and I mean a lot, a lot) then there's really no need to worry.
Can you clarify exactly why you don't want to copy the string?
Edit
An alternative approach is to create a pool of MonkeyName objects. Each MonkeyName stores a pointer to a string. Then get a new MonkeyName by requesting one from the pool (sets the name on the internal string *). Now pass that into the class by reference and perform a straight pointer swap. Of course, the MonkayName object passed in is changed, but if it goes straight back into the pool, that won't make a difference. The only overhead is then the actual setting of the name when you get the MonkeyName from the pool.
... hope that made some sense :)
This is precisely the problem that reference counting is meant to solve. You could use the Boost shared_ptr<> to reference the string object in a way such that it lives at least as long as every pointer at it.
Personally I never trust it, though, preferring to be explicit about the allocation and lifespan of all my objects. litb's solution is preferable.
When the compiler sees ...
std::string localname = "Wanda";
... it will (barring optimization magic) emit 0x57 0x61 0x6E 0x64 0x61 0x00 [Wanda with the null terminator] and store it somewhere in the the static section of your code. Then it will invoke std::string(const char *) and pass it that address. Since the author of the constructor has no way of knowing the lifetime of the supplied const char *, s/he must make a copy. In MonkeyFish::setName(const std::string &), the compiler will see std::string::operator=(const std::string &), and, if your std::string is implemented with copy-on-write semantics, the compiler will emit code to increment the reference count but make no copy.
You will thus pay for one copy. Do you need even one? Do you know at compile time what the names of the MonkeyFish shall be? Do the MonkeyFish ever change their names to something that is not known at compile time? If all the possible names of MonkeyFish are known at compile time, you can avoid all the copying by using a static table of string literals, and implementing MonkeyFish's data member as a const char *.
As a simple rule of thumb store your data as a copy within a class, and pass and return data by (const) reference, use reference counting pointers wherever possible.
I'm not so concerned about copying a few 1000s bytes of string data, until such time that the profiler says it is a significant cost. OTOH I do care that the data structures that hold several 10s of MBs of data don't get copied.
In your example code, yes, you are forced to copy the string at least once. The cleanest solution is defining your object like this:
class MonkeyFish {
public:
void setName( const std::string & parameter_name ) { name = parameter_name; }
private:
std::string name;
};
This will pass a reference to the local string, which is copied into a permanent string inside the object. Any solutions that involve zero copying are extremely fragile, because you would have to be careful that the string you pass stays alive until after the object is deleted. Better not go there unless it's absolutely necessary, and string copies aren't THAT expensive -- worry about that only when you have to. :-)
You could make the string in unleashMonkeyFish static but I don't think that really helps anything (and could be quite bad depending on how this is implemented).
I've moved "down" from higher-level languages (like C#, Java) and have hit this same issue recently. I assume that often the only choice is to copy the string.
If you use a temporary variable to assign the name (as in your sample code) you will eventually have to copy the string to your MonkeyFish object in order to avoid the temporary string object going end-of-scope on you.
As Andrew Flanagan mentioned, you can avoid the string copy by using a local static variable or a constant.
Assuming that that isn't an option, you can at least minimize the number of string copies to exactly one. Pass the string as a reference pointer to setName(), and then perform the copy inside the setName() function itself. This way, you can be sure that the copy is being performed only once.